llama.cpp

History

Francis Couture-Harpin 8e39037b86 llama : refactor session file management * llama : saving and restoring state checks for overflow The size of the buffers should now be given to the functions working with them, otherwise a truncated file could cause out of bound reads. * llama : stream from session file instead of copying into a big buffer Loading session files should no longer cause a memory usage spike. * llama : llama_state_get_size returns the actual size instead of max This is a breaking change, but makes that function much easier to keep up to date, and it also makes it reflect the behavior of llama_state_seq_get_size. * llama : share code between whole and seq_id-specific state saving Both session file types now use a more similar format. * llama : no longer store all hparams in session files Instead, the model arch name is stored. The layer count and the embedding dimensions of the KV cache are still verified when loading. Storing all the hparams is not necessary.		2024-07-25 21:40:26 -04:00
..
baby-llama	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
batched	batched: fix n_predict parameter (#8527 )	2024-07-17 10:34:28 +03:00
batched-bench	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
batched.swift	Detokenizer fixes (#8039 )	2024-07-05 19:01:35 +02:00
benchmark	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
convert-llama2c-to-ggml	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
cvector-generator	cvector: better prompt handling, add "mean vector" method (#8069 )	2024-06-25 13:59:54 +02:00
deprecation-warning	Deprecation warning to assist with migration to new binary names (#8283 )	2024-07-09 11:54:43 -04:00
embedding	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
eval-callback	examples : sprintf -> snprintf (#8434 )	2024-07-12 10:46:14 +03:00
export-lora	examples : Fix `llama-export-lora` example (#8607 )	2024-07-23 23:48:37 +02:00
finetune	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
gbnf-validator	llama : move vocab, grammar and sampling into separate files (#8508 )	2024-07-23 13:10:17 +03:00
gguf	gguf : handle null name during init (#8587 )	2024-07-20 17:15:42 +03:00
gguf-hash	gguf-hash : update clib.json to point to original xxhash repo (#8491 )	2024-07-16 10:14:16 +03:00
gguf-split	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
gritlm	llama : allow pooled embeddings on any model (#7477 )	2024-06-21 08:38:22 +03:00
imatrix	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
infill	infill : assert prefix/suffix tokens + remove old space logic (#8351 )	2024-07-08 09:34:35 +03:00
jeopardy	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
llama-bench	[CANN] Add Ascend NPU backend (#6035 )	2024-07-17 14:23:50 +03:00
llama.android	examples: fix android example cannot be generated continuously (#8621 )	2024-07-22 09:54:42 +03:00
llama.swiftui	llama.swiftui: fix end of generation bug (#8268 )	2024-07-20 16:09:37 +03:00
llava	[CANN] Add Ascend NPU backend (#6035 )	2024-07-17 14:23:50 +03:00
lookahead	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
lookup	lookup: fibonacci hashing, fix crashes (#8548 )	2024-07-17 23:35:44 +02:00
main	main : print error on empty input (#8456 )	2024-07-12 14:48:04 +03:00
main-cmake-pkg	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
parallel	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
passkey	passkey : add short intro to README.md [no-ci] (#8317 )	2024-07-05 09:14:24 +03:00
perplexity	ppl : fix n_seq_max for perplexity (#8277 )	2024-07-03 20:33:31 +03:00
quantize	llama : valign + remove unused ftype (#8502 )	2024-07-16 10:00:30 +03:00
quantize-stats	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
retrieval	llama : allow pooled embeddings on any model (#7477 )	2024-06-21 08:38:22 +03:00
rpc	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
save-load-state	llama : refactor session file management	2024-07-25 21:40:26 -04:00
server	server : fix URL.parse in the UI (#8646 )	2024-07-23 17:37:42 +03:00
simple	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
speculative	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
sycl	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
tokenize	tokenize : add --no-parse-special option (#8423 )	2024-07-11 10:41:48 +03:00
train-text-from-scratch	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
base-translate.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat-13B.bat	Create chat-13B.bat (#592 )	2023-03-29 20:21:09 +03:00
chat-13B.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat-persistent.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat-vicuna.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
CMakeLists.txt	gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048 )	2024-07-07 22:58:43 +10:00
convert_legacy_llama.py	convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )	2024-07-18 20:40:15 +10:00
json_schema_pydantic_example.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
llama.vim	llama.vim : added api key support (#5090 )	2024-01-23 08:51:27 +02:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
Miku.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
pydantic_models_to_grammar.py	pydantic : replace uses of __annotations__ with get_type_hints (#8474 )	2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py	examples : Rewrite pydantic_models_to_grammar_examples.py (#8493 )	2024-07-20 22:09:17 -04:00
reason-act.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
regex_to_grammar.py	py : switch to snake_case (#8305 )	2024-07-05 07:53:33 +03:00
server-llama2-13B.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
server_embd.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
ts-type-to-grammar.sh	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00