llama.cpp

Author	SHA1	Message	Date
Kerfuffle	a5e7dbd614	llama : validate special token ids are in range when loading GGUF model (#3635 ) * Add validation for special token ids to llama.cpp Small optimization for llama_byte_to_token SPM mode * Fix BPE newline check, only I could break something so simple * Killll meeeeee * Account for GGUF_KEY_KEY only setting when the key exists * Minor code cleanups. * Fix convert.py error msg when added tokens are out of range * Make gguf SpecialVocab vocab size-aware Update conversion scripts accordingly * Avoid a string copy Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-22 21:14:56 +03:00
vvhg1	d3956aea53	main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623 ) * infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit `63ba0b621f`. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check * process escapes for neg prompt and interactive consec prompts * removed unneccessary static string escape	2023-10-22 21:09:51 +03:00
Concedo	5f1f8a5a89	adjust	2023-10-22 21:53:54 +08:00
Concedo	ccf8334651	remove script (+8 squashed commit) Squashed commit: [bde2e3da] should be working [1cde82c0] update [bb6c8676] wip [66b698d1] wip colab [9953466a] wip colab [ae0bedea] json fix [0aac144f] wip on optimized colab [ec9f8e96] prepare colab binaries notebook	2023-10-22 21:38:38 +08:00
Concedo	fafe999ff9	update lite and colab (+1 squashed commits) Squashed commits: [06b6ca6d] updated lite and colab	2023-10-22 14:03:18 +08:00
Georgi Gerganov	22c69a2794	batched : add len CLI argument	2023-10-22 08:37:20 +03:00
Concedo	cff75061fe	fixed some old models failing due to tokenizer changes, update lite (+1 squashed commits) Squashed commits: [9dee81ec] fixed some old models failing due to tokenizer changes, update lite tooltip (+3 squashed commit) Squashed commit: [5ab95a79] fixes [a561d5e2] fixed some old models failing due to tokenizer changes [95e65daf] lite updates	2023-10-22 11:04:59 +08:00
Concedo	dd1d61ea6b	colab is fixed (+1 squashed commits) Squashed commits: [0b2a51f3] fix colab (+1 squashed commits) Squashed commits: [a6b832d0] fix colab (+1 squashed commits) Squashed commits: [8f88f210] updated colab (+1 squashed commits) Squashed commits: [75552e0d] try new colab	2023-10-21 10:08:32 +08:00
shibe2	465219b914	CLBlast: Add outer loops over src0 for broadcasting in mulmat Reduce repeated dequantization of the same data.	2023-10-20 22:30:52 +04:00
Georgi Gerganov	d1031cf49c	sampling : refactor init to use llama_sampling_params (#3696 ) * sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci	2023-10-20 21:07:23 +03:00
Concedo	6119a2b5b2	revert lite change	2023-10-20 22:13:56 +08:00
Concedo	6fa681b692	fixed a race condition with SSE streaming	2023-10-20 22:01:09 +08:00
Concedo	5f5d5f1d86	quick fix	2023-10-20 19:43:56 +08:00
Qin Yue Chen	8cf19d60dc	gguf : support big endian platform (#3552 ) * check whether platform is 390x if yes->do not import immintrin.h * support s390x big endian * support --bigendian option for s390x 1. verified with baichuan7b-chat with float 16 on s390x 2. verified with baichuan7b-chat 3. verified with chinese-alpaca-2-13b-f16 * update format based on editor-config checker result * Update convert-baichuan-hf-to-gguf.py * 1. check in ggml.c if endianess is not match 2. update GGUF version 3. change get_pack_prefix to property 4. update information log * always use "GGUF" as beginng of GGUF file * Compare "GGUF" with file header char by char 1. Set GGUF_MAGIC to "GGUF" string instead of int value 2. Compare "GGUF" char by char to ensure its byte order 3. Move bytes swap code from convert.py to gguf.py write_tensor_data --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-20 14:19:40 +03:00
Concedo	012c53367d	minor lite fixes	2023-10-20 18:41:17 +08:00
Georgi Gerganov	a0edf73bda	server : fix uninitialized sampling context (close #3685 )	2023-10-20 13:06:10 +03:00
Herman Semenov	f439e506e8	ggml : fix rope + llama minor optimizations (#3560 ) * Minor fixes and fixed memleak * Using const auto references in range-based loop C++17	2023-10-20 13:02:12 +03:00
Concedo	d3c7b7cc71	colab fix	2023-10-20 16:34:45 +08:00
Concedo	d5016fdc8f	updated lite bug	2023-10-20 16:03:06 +08:00
Concedo	ee93213218	updated lite	2023-10-20 15:44:52 +08:00
Concedo	cd3bb3ede2	update colab link	2023-10-20 13:49:34 +08:00
cebtenzzre	e78f3ef24a	convert : restore compat with old Falcon models (#3680 )	2023-10-20 08:32:08 +03:00
Concedo	8947142c46	updated lite and colab	2023-10-20 11:35:44 +08:00
M. Yusuf Sarıgöz	f3b25e4043	multimodal : add BakLLaVA conversion support (#3682 )	2023-10-19 19:40:41 +03:00
Concedo	8d31550d48	fix groupchat	2023-10-19 23:40:15 +08:00
Concedo	957e245285	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md	2023-10-19 23:32:52 +08:00
kalomaze	ddce116ec9	Fix for Top K disabling (#480 ) * Update gpttype_adapter.cpp * use n_vocab instead of 32000 for when top k is off	2023-10-19 23:20:44 +08:00
Concedo	8c6001de2a	updated lite	2023-10-19 23:18:14 +08:00
Concedo	fd770bb105	patch	2023-10-19 23:04:26 +08:00
Concedo	4382e51719	updated lite and default horde ctx amount	2023-10-19 22:49:59 +08:00
M. Yusuf Sarıgöz	60abea9798	llava : avoid segfault in case of non-existent mmproj file (#3674 )	2023-10-19 16:59:11 +03:00
Georgi Gerganov	004797f6ac	readme : update hot topics	2023-10-18 21:44:43 +03:00
Georgi Gerganov	4e82b2ea3f	speculative : bug fixes	2023-10-18 18:49:40 +03:00
Georgi Gerganov	0e89203b51	speculative : add tree-based sampling example (#3624 ) * sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants	2023-10-18 16:21:57 +03:00
Jhen-Jie Hong	c67fe68e41	metal : implement q5_0 and q5_1 kernels (#3648 ) * metal : implement dequantize_q5_0 * metal : block_q_n_dot_y for block_q5_0 (broken) * metal : revert unnecessary change * metal : implement dequantize_q5_1 * metal : block_q_n_dot_y for q5_1 (broken) * metal : fix block_q_n_dot_y * minor : spaces / formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-18 15:21:48 +03:00
shibe2	1117d06607	opencl : fix element-wise multiplication (#3656 )	2023-10-18 15:09:22 +03:00
Concedo	c1ca1de2ac	fixed support for old falcon models	2023-10-18 17:20:44 +08:00
Concedo	700951dbd4	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-10-18 16:33:09 +08:00
Concedo	53b7cdf8a3	Merge branch 'concedo' into concedo_experimental	2023-10-18 13:51:13 +08:00
slaren	cb33f43a2a	fix embeddings when using CUDA (#3657 )	2023-10-17 22:24:50 +02:00
Georgi Gerganov	e1675d133c	llama : avoid fprintf in favor of LLAMA_LOG (#3538 )	2023-10-17 22:34:26 +03:00
BarfingLemurs	8402566a7c	readme : update hot-topics & models, detail windows release in usage (#3615 ) * Update README.md * Update README.md * Update README.md * move "Running on Windows" section below "Prepare data and run" --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 21:13:21 +03:00
LostRuins	6e34d31c44	Update README.md (#479 )	2023-10-18 01:24:14 +08:00
shibe2	40e5ce054f	CLBlast: Fix temporary buffer size for f16 conversion (wsize) Fix buffer overflow. Reduce the size to fit just one 2D slice. Assert sufficient size.	2023-10-17 21:02:30 +04:00
slaren	a5e8c1d8c7	train-text-from-scratch : fix assert failure in ggml-alloc (#3618 )	2023-10-17 20:00:58 +03:00
Georgi Gerganov	e74c705e15	editorconfig : remove trailing spaces	2023-10-17 19:52:53 +03:00
coezbek	3ad1e3f1a1	server : documentation of JSON return value of /completion endpoint (#3632 ) * Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 19:51:02 +03:00
Georgi Gerganov	1142013da4	save-load-state : fix example + add ci test (#3655 ) * save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci	2023-10-17 19:12:46 +03:00
ldwang	5fe268a4d9	readme : add Aquila2 links (#3610 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-17 18:52:33 +03:00
staviq	1a159553f9	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00

1 2 3 4 5 ...

2457 commits