llama.cpp

Author	SHA1	Message	Date
Concedo	48544cd2ef	Revert "Revert "ggml : add ggml_soft_max_ext (#4256 )"" This reverts commit `a8e66ef31c`.	2023-12-03 21:46:50 +08:00
Concedo	6570a2005b	token count includes ids	2023-12-03 15:44:53 +08:00
Concedo	0ca814e544	added minP preset	2023-12-03 11:18:03 +08:00
Concedo	c142c5634a	fixed segfault with clblast by reversing commit in issue https://github.com/ggerganov/llama.cpp/issues/4296	2023-12-03 00:56:00 +08:00
Concedo	a8e66ef31c	Revert "ggml : add ggml_soft_max_ext (#4256 )" This reverts commit `ef47ec18da`.	2023-12-03 00:42:01 +08:00
Concedo	a829a1ee56	fix for janitorai	2023-12-02 23:58:41 +08:00
Concedo	12f66eaa1d	adjust fragmentation fix	2023-12-02 15:59:08 +08:00
Concedo	1c422f45cb	more printouts	2023-12-02 11:48:48 +08:00
Concedo	495bb3ab1e	Merge branch 'master' into concedo_experimental	2023-12-01 23:48:20 +08:00
Concedo	4f40c226a0	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # .gitignore # CMakeLists.txt # Makefile # README.md	2023-12-01 23:46:59 +08:00
Daniel Bevenius	8d6d9f033b	py : add requirements file for convert-hf-to-gguf.py (#4277 ) This commit adds a requirements file for the convert-hf-to-gguf.py script, and also add the torch and transformers packages to it. The motivation for this is that currently running convert-hf-to-gguf.py will produce the following error: ```console $ python3 -m venv venv $ source venv/bin/activate (venv) $ pip install -r requirements.txt Collecting numpy==1.24.4 Collecting sentencepiece==0.1.98 Collecting gguf>=0.1.0 Installing collected packages: sentencepiece, numpy, gguf Successfully installed gguf-0.5.1 numpy-1.24.4 sentencepiece-0.1.98 (venv) $ python convert-hf-to-gguf.py --help Traceback (most recent call last): File "llama.cpp/convert-hf-to-gguf.py", line 16, in <module> import torch ModuleNotFoundError: No module named 'torch' ``` With this commit, and using requirements-hf-to-gguf.txt instead of requirements.txt, the script can be run and shows the help output. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-12-01 11:41:56 +02:00
Georgi Gerganov	ef47ec18da	ggml : add ggml_soft_max_ext (#4256 ) * metal : implement soft_max_ext * cuda : implement soft_max_ext * ggml : implement soft_max_ext (CPU) * batched-bench : print threads ggml-ci * metal : simplify soft_max encoding ggml-ci * cuda : use 512 threads for soft_max instead of 32 * ggml : update soft max cpu * cuda : do warp-based block reduce * cuda : increase max block size to 1024 * cuda : fix warp reduction initialization of shared mem * metal : warp-based reduction for soft max kernel * metal : warp-based reduce for rms_norm * metal : simplify soft max kernel ggml-ci * alloc : fix build with debug	2023-12-01 10:51:24 +02:00
Ziad Ben Hadj-Alouane	1d144112c0	server : add --log-disable to disable logging to file (#4260 ) * * add --log-disable to disable logging to file in the server example * * typo fix	2023-12-01 00:25:49 +02:00
Ziad Ben Hadj-Alouane	f43f09366d	server : add single-client multi-prompt support (#4232 ) * * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * * change to set --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-12-01 00:25:04 +02:00
WillCorticesAI	d2809a3ba2	make : fix Apple clang determination bug (#4272 ) Co-authored-by: Will Findley <findley@gmail.com>	2023-12-01 00:23:44 +02:00
Jared Van Bortel	15f5d96037	build : fix build info generation and cleanup Makefile (#3920 ) * cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit `635e9fadfd`.	2023-12-01 00:23:08 +02:00
John	33c9892af5	llava : ShareGPT4V compatibility (vision encoder only loading) (#4172 ) * ShareGPT4 compatibility (vision encoder only loading) Load only a CLIP vision encoder (as supplied by ShareGPT finetunes) Corrects the argument parsing for --img_mean and --img_std (which were previously not parsed but attempted to access) Defines defaults for img_mean and img_std which are equal to the llava 1.5 CLIP encoder, so you do not have to provide them * Update convert-image-encoder-to-gguf.py	2023-11-30 23:11:14 +01:00
Andrew Godfrey	8efa0f6ebe	main : pass LOG_TEE callback to llama.cpp log (#4033 ) * main : Call llama_log_set to use LOG_TEE * tabs to spaces	2023-11-30 23:56:19 +02:00
vodkaslime	524907aa76	readme : fix (#4135 ) * fix: readme * chore: resolve comments * chore: resolve comments	2023-11-30 23:49:21 +02:00
Juraj Bednar	3bd2c7ce1b	docker : add finetune option (#4211 )	2023-11-30 23:46:01 +02:00
Miwa / Ensan	bde629bb53	batched.swift : update README.md (#4214 ) docs: update how to run	2023-11-30 23:45:17 +02:00
Li Tan	f7f9e06212	cmake : fix the metal file foder path (#4217 )	2023-11-30 23:44:11 +02:00
Dawid Wysocki	74daabae69	readme : fix typo (#4253 ) llama.cpp uses GitHub Actions, not Gitlab Actions.	2023-11-30 23:43:32 +02:00
Daniel Bevenius	b18c66ca6e	llama : fix alignment of general.name in print meta (#4254 ) * llama: fix alignment of general.name in print meta This commit fixes the alignment of the general.name field in the llm_load_print_meta function. Currently the output looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` And with this commit it looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama: fix alignment of special tokens Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-11-30 23:43:08 +02:00
slaren	f4d973cecb	convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258 )	2023-11-30 23:42:23 +02:00
tarcey	954e22858c	llama : fix typical sampling (#4261 ) Typical sampling was broken because after copying new_candidates into canditates, the "sorted" bool is left at "true", but the new data is no longer sorted according to probability. Patch to set "sorted" to false. Test: Generating with temp=0.0001 (approx. argmax) should generate the same sequence at typical>=1.0 and typical=0.9999 (approx. disabled, but enters the typical sampling codepath).	2023-11-30 23:40:23 +02:00
rhjdvsgsgks	e2bd725f4b	py : fix oai proxy (#3972 ) * fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n`	2023-11-30 22:50:40 +02:00
Concedo	a195cdeec8	fixed chub ai imports (+1 squashed commits) Squashed commits: [cdb74264] fixed chub ai imports	2023-11-30 18:07:56 +08:00
Concedo	e9724cdc9d	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-11-30 14:31:53 +08:00
Concedo	a012342a77	updated docs, shifted kv extra space to be subtracted from user's ctx value instead of added on load.	2023-11-30 14:19:40 +08:00
Georgi Gerganov	1f5cd83275	examples : add readme files	2023-11-29 11:00:17 +02:00
Peter Sugihara	4fea3420ee	readme : add FreeChat (#4248 )	2023-11-29 09:16:34 +02:00
Concedo	66ef4a20e2	refined multiuser mode	2023-11-29 14:29:45 +08:00
Concedo	b75152e3e9	added a proper quiet mode	2023-11-28 21:20:51 +08:00
Concedo	581021ab93	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # scripts/build-info.cmake	2023-11-28 20:57:56 +08:00
Concedo	ba5c33319b	Allocate a small amount of extra context for GGUF to deal with KV fragmentation causing issues in some scenarios.	2023-11-28 20:55:14 +08:00
Jared Van Bortel	64e64aa255	ggml : restore abort() in GGML_ASSERT (#4242 )	2023-11-28 11:51:11 +02:00
Concedo	d2ef458b02	show more info about available APIs	2023-11-28 17:17:47 +08:00
Georgi Gerganov	8406b0924b	ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240 ) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci	2023-11-28 10:32:03 +02:00
bandoti	b38a16dfcf	cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970 ) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option	2023-11-27 21:25:42 +02:00
Kasumi	0dab8cd7cc	readme : add Amica to UI list (#4230 )	2023-11-27 19:39:42 +02:00
Bailey Chittle	bb03290c17	examples : iOS example with swift ui (#4159 ) * copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-27 16:56:52 +02:00
Concedo	0e5f16de53	reduce max ctx to fit instead of crashing	2023-11-27 19:08:54 +08:00
Concedo	8acd7be734	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md	2023-11-27 14:06:14 +08:00
Concedo	ec1796bec1	updated lite	2023-11-27 14:04:53 +08:00
Jared Van Bortel	f3b269813f	ggml : fix -Warray-bounds warning with gcc (#4231 )	2023-11-26 22:58:43 -05:00
Georgi Gerganov	3e73d31d9c	lookahead : support `-n -1` infinite generation	2023-11-26 21:52:23 +02:00
Georgi Gerganov	9656026b53	readme : update hot topics	2023-11-26 20:42:51 +02:00
Georgi Gerganov	922754a8d6	lookahead : add example for lookahead decoding (#4207 ) * lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-11-26 20:33:07 +02:00
Concedo	2f51a6afd5	trigger quiet mode when selecting remotetunnel	2023-11-27 00:16:36 +08:00

1 2 3 4 5 ...

2723 commits