llama.cpp

Author	SHA1	Message	Date
Concedo	e4c9aea840	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-06-26 10:35:47 +08:00
Georgi Gerganov	447ccbe8c3	readme : add new roadmap + manifesto	2023-06-25 16:08:12 +03:00
Georgi Gerganov	bd34cdde38	ggml : sync latest ggml (custom operators)	2023-06-25 14:25:08 +03:00
Concedo	d2034ced7b	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # build.zig # flake.nix # tests/test-grad0.c # tests/test-sampling.cpp # tests/test-tokenizer-0.cpp	2023-06-25 17:01:15 +08:00
anon998	c2a08f87b8	fix server sampling: top k sampler first (#1977 ) Co-authored-by: anon <anon@example.org>	2023-06-25 10:48:36 +02:00
Georgi Gerganov	66a2555ba6	readme : add Azure CI discussion link	2023-06-25 09:07:03 +03:00
sjinzh	e65ca7e14a	zig : upgrade build system support (#1981 ) * upgrade zig build system support * zig : add new line at the end of the file --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-25 08:45:44 +03:00
Robyn	5ec8dd5a3c	#1869 Fix null reference errors when training from scratch with CUDA (#1907 ) * #1869 Fix null reference errors when training from scratch with CUDA build Calling ggml_compute_forward when node->src0 was null was causing train-text-from-scratch.exe to terminate unexpectedly. * ggml : do not dereference src0 if NULL --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-24 20:10:29 +02:00
Georgi Gerganov	65bdd52a86	tests : sync test-grad0 from ggml	2023-06-24 19:40:18 +03:00
Rowan Hart	fdd1860911	flake : fix ggml-metal.metal path and run nixfmt (#1974 )	2023-06-24 14:07:08 +03:00
AN Long	c943d823c1	convert : fix invalid params in write_vocab_only (#1975 )	2023-06-24 14:02:06 +03:00
slaren	f2c754e1c3	ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978 ) * Improve ggml_graph_dump_dot, add ggml_format_name * add more automatic names to view ops * fix name of copies	2023-06-24 13:57:18 +03:00
Georgi Gerganov	11da1a85cd	readme : fix whitespaces	2023-06-24 13:38:18 +03:00
Alberto	235b610d65	readme : fixed termux instructions (#1973 )	2023-06-24 13:32:13 +03:00
Alex Renda	b061ba9e2a	llama : fix top-p sampling to match the canonical definition (#1953 ) * Fix top-p sampling to match the standard definition (smallest set that has probability mass at least p, not largest set with probability mass less than p) * top-p: correct gt to gte * add test for correct top-p behavior	2023-06-24 13:15:01 +03:00
Didzis Gosko	527b6fba1d	llama : make model stateless and context stateful (llama_state) (#1797 ) * llama : make model stateless and context stateful * llama : minor cleanup * llama : update internal API declaration * Apply suggestions from code review fix style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Missing model memory release * Fix style * Add deprecated warning for public API function llama_init_from_file * Update public API use cases: move away from deprecated llama_init_from_file * Deprecate public API function llama_apply_lora_from_file --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-24 11:47:58 +03:00
Concedo	8342fe81b1	revert the wstring tokenization. coherency was affected	2023-06-24 12:58:49 +08:00
Concedo	6da38b0d40	up ver	2023-06-24 12:30:38 +08:00
Concedo	0485fa65a2	wstring convert for mpt	2023-06-24 11:43:42 +08:00
Concedo	6d718525c4	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-23 23:56:31 +08:00
Concedo	f7b096374d	fixed string too long CI issue	2023-06-23 23:56:22 +08:00
Concedo	490cf395f8	better alloc error	2023-06-23 22:51:51 +08:00
Concedo	ece453ed09	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md	2023-06-23 22:46:54 +08:00
Concedo	f39a746089	bug fixes for openblas	2023-06-23 22:45:22 +08:00
Concedo	43c2891afa	option to not use scratch	2023-06-23 19:01:36 +08:00
Concedo	d5e4cf7ffe	handle ctx manip	2023-06-23 19:01:15 +08:00
Concedo	df9135e3a9	fixing memory bugs	2023-06-23 18:41:23 +08:00
eiery	d7b7484f74	Add OpenLLaMA instructions to the README (#1954 ) * add openllama to readme	2023-06-23 10:38:01 +02:00
Erik Scholz	7487137227	rework convert.py to read hyper-parameters from config.json (#1958 ) * Read hyper-parameters from HuggingFace-transformer config.json, if they exist, and fall back to guessing, like before otherwise. This allows converting open_llama 3B and other non-standard model designs.	2023-06-22 14:20:47 +02:00
Concedo	0eedccaf06	Merge branch 'master' into optimize_quants_upstream	2023-06-22 17:59:58 +08:00
Concedo	e6ddb15c3a	cleanup	2023-06-22 10:38:27 +08:00
Johannes Gäßler	bbca06e269	cmake: revert CUDA arch default to 52, 61 if f16 (#1959 )	2023-06-21 23:49:25 +02:00
Rahul Vivek Nair	fb98254f99	Fix typo in README.md (#1961 )	2023-06-21 23:48:43 +02:00
Concedo	1b71752a9f	Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX	2023-06-22 00:43:25 +08:00
Ycros	b1f00fa9cc	Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter. (#252 ) * Fix hordeconfig maxcontext setting. * cuda: Bring DMMV_F16 and KQUANTS_ITER Makefile flags over from llama.	2023-06-21 23:01:46 +08:00
Concedo	dfdd20240c	gpt j use scratch buffers	2023-06-21 16:10:31 +08:00
Georgi Gerganov	049aa16b8c	readme : add link to p1	2023-06-20 19:05:54 +03:00
Concedo	266d47a4b9	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 22:46:35 +08:00
Concedo	da668e685f	fixing address spaces	2023-06-20 22:46:11 +08:00
Concedo	cce6e67f44	fixing address spaces	2023-06-20 22:45:16 +08:00
Concedo	1f1735f5ad	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 21:39:35 +08:00
Concedo	6b75fc48b9	fixed global const struct types	2023-06-20 21:38:48 +08:00
Xiake Sun	2322ec223a	Fix typo (#1949 )	2023-06-20 15:42:40 +03:00
Concedo	537ff22ec9	fixed a bug with token timings, updated lite	2023-06-20 20:41:42 +08:00
Concedo	c5ae3f50a7	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 18:41:13 +08:00
Concedo	a6e8b0216d	remove old dot kernels and template	2023-06-20 18:37:48 +08:00
Concedo	93247a11cd	ported q2k and q5k speedups	2023-06-20 18:37:41 +08:00
Concedo	029bed6446	ported q3k speedup successfully	2023-06-20 18:37:26 +08:00
Concedo	d754915269	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 17:26:39 +08:00
Concedo	b4c532e862	Merge branch 'master' into concedo_experimental	2023-06-20 17:26:27 +08:00

1 2 3 4 5 ...

1336 commits