llama.cpp

Author	SHA1	Message	Date
slaren	e702f2ff11	ggml : reduce hash table reset cost (#8698 ) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-07-27 21:38:41 +08:00
Judd	a1cf044dd1	llama : fix order of parameters (#8706 ) usage of `aclrtGetMemInfo` is correct: https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html Co-authored-by: Judd <foldl@boxvest.com>	2024-07-27 21:38:41 +08:00
Yaiko	3395a68a2d	server : add Speech Recognition & Synthesis to UI (#8679 ) * server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)	2024-07-27 21:38:41 +08:00
Xuan Son Nguyen	fbc71e9312	examples : export-lora : fix issue with quantized base models (#8687 )	2024-07-27 21:38:41 +08:00
DavidKorczynski	df106e9211	ggml: handle ggml_init failure to fix NULL pointer deref (#8692 ) `ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.	2024-07-27 21:38:41 +08:00
Georgi Gerganov	1353a813cc	llama : fix build + fix fabs compile warnings (#8683 ) ggml-ci	2024-07-27 21:38:41 +08:00
Andreas (Andi) Kunar	21905dd445	ggml : fix build on Windows with Snapdragon X (#8531 ) * Improvements for Windows with Snapdragon X * Revert "Improvements for Windows with Snapdragon X" This reverts commit `bf21397ae5`. * Improvements for Windows with Snapdragon X * WOA build clarifications * WIndows on ARM build clarifications * cmake build for Windows clarifications * Update docs/build.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: AndreasKunar <andreaskmsn.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-27 21:38:41 +08:00
Georgi Gerganov	fd905e8be1	tests : fix printfs (#8068 )	2024-07-27 21:38:41 +08:00
Georgi Gerganov	ca87ca98a7	ggml : add and use ggml_cpu_has_llamafile() (#8664 )	2024-07-27 21:37:37 +08:00
Xuan Son Nguyen	43d92892c3	examples : remove `finetune` and `train-text-from-scratch` (#8669 ) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora	2024-07-27 21:37:37 +08:00
Ujjawal Panchal	5a5d5d28f8	docs : Quantum -> Quantized (#8666 ) * docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.	2024-07-27 21:37:37 +08:00
Fan Shupei	fc5b21bf10	llama: use sliding window for phi3 (#8627 ) * use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window	2024-07-27 21:37:37 +08:00
MorganRO8	65e54b5db4	readme : update games list (#8673 ) Added link to game I made that depends on llama	2024-07-27 21:37:37 +08:00
Joe Todd	0aeae29190	Build Llama SYCL Intel with static libs (#8668 ) Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>	2024-07-27 21:37:37 +08:00
Thorsten Sommer	791e3e0b27	readme : update UI list [no ci] (#8505 )	2024-07-27 21:37:37 +08:00
Xuan Son Nguyen	dc7836c79e	llama : fix `llama_chat_format_single` for mistral (#8657 ) * fix `llama_chat_format_single` for mistral * fix typo * use printf	2024-07-27 21:37:37 +08:00
Joe Todd	146da8b6c6	Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667 )	2024-07-27 21:37:37 +08:00
Xuan Son Nguyen	a5f1f44b6a	add llama_lora_adapter_clear (#8653 )	2024-07-27 21:37:37 +08:00
Xuan Son Nguyen	a17bcdfd4c	examples : Fix `llama-export-lora` example (#8607 ) * fix export-lora example * add more logging * reject merging subset * better check * typo	2024-07-27 21:37:37 +08:00
Vali Malinoiu	b7e8bada5e	server : fix URL.parse in the UI (#8646 )	2024-07-27 21:37:36 +08:00
Joe Todd	e2d7ec46fc	sycl : Add support for non-release DPC++ & oneMKL (#8644 ) * Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>	2024-07-27 21:37:36 +08:00
Georgi Gerganov	96d14f4b58	llama : move vocab, grammar and sampling into separate files (#8508 ) * llama : move sampling code into llama-sampling ggml-ci * llama : move grammar code into llama-grammar ggml-ci * cont ggml-ci * cont : pre-fetch rules * cont ggml-ci * llama : deprecate llama_sample_grammar * llama : move tokenizers into llama-vocab ggml-ci * make : update llama.cpp deps [no ci] * llama : redirect external API to internal APIs ggml-ci * llama : suffix the internal APIs with "_impl" ggml-ci * llama : clean-up	2024-07-27 21:37:36 +08:00
0cc4m	4b93675489	Vulkan IQ4_NL Support (#8613 ) * Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support	2024-07-27 21:37:36 +08:00
Jeroen Mostert	f54ffa8f04	Allow all RDNA2 archs to use sdot4 intrinsic (#8629 ) The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.	2024-07-27 21:37:36 +08:00
Georgi Gerganov	6676362327	contrib : clarify PR squashing + module names (#8630 ) * contrib : clarify PR squashing * contrib : fix typo + add list of modules	2024-07-27 21:37:36 +08:00
luoyu-intel	a9c03e4827	[SYCL] fix scratch size of softmax (#8642 )	2024-07-27 21:37:36 +08:00
Keke Han	549e1c7e41	llama : fix codeshell support (#8599 ) * llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order	2024-07-27 21:37:36 +08:00
Jason Stillerman	c70ddd889f	llama : add support for SmolLm pre-tokenizer (#8609 ) * Adding SmolLM Pre Tokenizer * Update convert_hf_to_gguf_update.py Co-authored-by: compilade <git@compilade.net> * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * handle regex * removed .inp and out .out ggufs --------- Co-authored-by: compilade <git@compilade.net>	2024-07-27 21:37:36 +08:00
Jiří Podivín	b4c8a9c8a6	.py: Stylistic adjustments for python (#8233 ) Superflous parens in conditionals were removed. * Unused args in function were removed. * Replaced unused `idx` var with `_` * Initializing file_format and format_version attributes * Renaming constant to capitals * Preventing redefinition of the `f` var Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2024-07-27 21:37:36 +08:00
Georgi Gerganov	525b48a49c	llama : allow overrides for tokenizer flags (#8614 ) ggml-ci	2024-07-27 21:37:36 +08:00
Georgi Gerganov	52f90c6d3c	tests : re-enable tokenizer tests (#8611 ) * models : remove duplicated gpt-2 vocab * models : remove old stablelm vocab * tests : re-enable MPT tokenizer tests * tests : re-enable DeepSeek tokenizer tests * cmake : sort ggml-ci	2024-07-27 21:37:36 +08:00
Douglas Hanley	af37c4bd7f	llama : add Mistral Nemo inference support (#8604 )	2024-07-27 21:37:36 +08:00
Jan Boon	9ffb78d54f	server : update doc to clarify n_keep when there is bos token (#8619 )	2024-07-27 21:37:36 +08:00
Mark Zhuang	6220d93595	ggml: fix compile error for RISC-V (#8623 )	2024-07-27 21:37:36 +08:00
devojony	6f19b8c09d	examples: fix android example cannot be generated continuously (#8621 ) When generation ends `completion_loop()` should return a NULL, not the empty string	2024-07-27 21:37:36 +08:00
Georgi Gerganov	24404ef9f3	flake.lock: Update (#8610 )	2024-07-27 21:37:36 +08:00
M-A	52a7238985	examples : Rewrite pydantic_models_to_grammar_examples.py (#8493 ) Changes: - Move each example into its own function. This makes the code much easier to read and understand. - Make the program easy to only run one test by commenting out function calls in main(). - Make the output easy to parse by indenting the output for each example. - Add shebang and +x bit to make it clear it's an executable. - Make the host configurable via --host with a default 127.0.0.1:8080. - Make the code look in the tools list to call the registered tool, instead of hardcoding the returned values. This makes the code more copy-pastable. - Add error checking, so that the program exits 1 if the LLM didn't returned expected values. It's super useful to check for correctness. Testing: - Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and Meta-Llama-3-8B-Instruct in F16 and Q5_K_M. - I did not observe a failure even once in Mistral-7B-Instruct-v0.3. - Llama-3 failed about a third of the time in example_concurrent: it only returned one call instead of 3. Even for F16. Potential follow ups: - Do not fix the prompt encoding yet. Surprisingly it mostly works even if the prompt encoding is not model optimized. - Add chained answer and response. Test only change.	2024-07-27 21:37:36 +08:00
compilade	82478f1934	gguf-py : fix some metadata name extraction edge cases (#8591 ) * gguf-py : fix some metadata name extraction edge cases * convert_lora : use the lora dir for the model card path * gguf-py : more metadata edge cases fixes Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust. * gguf-py : add more name metadata extraction tests * convert_lora : fix default filename The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None * gguf-py : do not use title case for naming convention Some models use acronyms in lowercase, which can't be title-cased like other words, so it's best to simply use the same case as in the original model name. Note that the size label still has an uppercased suffix to make it distinguishable from the context size of a finetune.	2024-07-27 21:37:36 +08:00
compilade	264c2830d8	convert_hf : fix Gemma v1 conversion (#8597 ) * convert_hf : fix Gemma v1 conversion * convert_hf : allow renaming tokens, but with a warning * convert_hf : fix Gemma v1 not setting BOS and EOS tokens	2024-07-27 21:37:36 +08:00
Johannes Gäßler	6887f5f02a	CUDA: MMQ code deduplication + iquant support (#8495 ) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-07-27 21:37:36 +08:00
Georgi Gerganov	c3ca2aa58e	gguf : handle null name during init (#8587 )	2024-07-27 21:37:36 +08:00
Michael Coppola	4b895a57cf	llama : add support for Tekken pre-tokenizer (#8579 ) * llama : Added support for Tekken pre-tokenizer (#8577) Removed uneeded `vocab.tokenizer_clean_spaces` assignment * llama : fix order of pre-tokenizers * * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces * Updated chkhsh for Tekken tokenizer --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-27 21:37:36 +08:00
Huifeng Ou	8b6f28ab31	llama.swiftui: fix end of generation bug (#8268 ) * fix continuing generating blank lines after getting EOT token or EOS token from LLM * change variable name to is_done (variable name suggested by ggerganov) * minor : fix trailing whitespace * minor : add space --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-27 21:37:36 +08:00
Brian	3a4d206f7b	gguf_dump.py: fix markddown kv array print (#8588 ) * gguf_dump.py: fix markddown kv array print * Update gguf-py/scripts/gguf_dump.py Co-authored-by: compilade <git@compilade.net> * gguf_dump.py: refactor kv array string handling * gguf_dump.py: escape backticks inside of strings * gguf_dump.py: inline code markdown escape handler added >>> escape_markdown_inline_code("hello world") '`hello world`' >>> escape_markdown_inline_code("hello ` world") '``hello ` world``' * gguf_dump.py: handle edge case about backticks on start or end of a string --------- Co-authored-by: compilade <git@compilade.net>	2024-07-27 21:37:36 +08:00
slaren	384b0cd40e	ggml : fix quant dot product with odd number of blocks (#8549 ) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix odd blocks for ARM_NEON (#8556) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix q4_1 * ggml : fix q5_0 * ggml : fix q5_1 * ggml : fix iq4_nl metal ggml-ci * ggml : fix q4_0 * ggml : fix q8_0 ggml-ci * ggml : remove special Q4_0 code for first 2 blocks * ggml : fix sumf redefinition --------- Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-27 21:37:36 +08:00
Brian	c70cd4fbd1	convert-*.py: remove add_name from ChatGLMModel class (#8590 )	2024-07-27 21:37:36 +08:00
Georgi Gerganov	af831106a4	llama : bump max layers from 256 to 512 (#8530 ) * llama : bump max layers from 256 to 512 * llama : replace asserts with exceptions	2024-07-27 21:37:36 +08:00
Georgi Gerganov	093ee371b0	readme : fix server badge	2024-07-27 21:37:36 +08:00
Clint Herron	0ed406ebd2	ggml : add friendlier error message to fopen errors (#8575 ) * Add additional error information when model files fail to load. * Adding additional error information to most instances of fopen.	2024-07-27 21:37:36 +08:00
Frank Mai	e5cf8b43ec	fix: typo of chatglm4 chat tmpl (#8586 ) Signed-off-by: thxCode <thxcode0824@gmail.com>	2024-07-27 21:37:36 +08:00

1 2 3 4 5 ...

3471 commits