llama.cpp

Author	SHA1	Message	Date
jaime-m-p	107923cdd2	Better leading space removal	2024-06-25 17:33:56 +02:00
jaime-m-p	9854a9cde9	Symetric params for llama_tokenize() and llama_detokenize()	2024-06-25 17:28:53 +02:00
jaime-m-p	4a28063b1f	Update brute force test: Detokenize special tokens. Replace errors with '\uFFFD' when detokenizing to 'utf-8'. More edge cases. Better detokenization results check.	2024-06-24 20:56:26 +02:00
jaime-m-p	95a0df5578	Bugfix: custom regexs splits undefined unicode codepoints	2024-06-24 20:47:28 +02:00
jaime-m-p	12e2c317c8	style: remove trailing whitespace	2024-06-24 20:39:54 +02:00
jaime-m-p	9eb0fca027	Do not remove space when decoding special tokens	2024-06-24 20:37:48 +02:00
jaime-m-p	44c8648461	Fix detokenizer(): UNKNOWN and CONTROL are 'special pieces'. Remove space after UNKNOWN and CONTROL. Refactor llama_token_to_piece().	2024-06-23 21:12:24 +02:00
jaime-m-p	38d54b3c39	tets: skip unicode surrogaes and undefined	2024-06-23 20:56:32 +02:00
jaime-m-p	0cf2989b6c	tests: gracefully exit threads Using exit() is throwing random exceptions	2024-06-23 20:54:46 +02:00
jaime-m-p	9af762c0ac	tests: unexpected vocab type as test fail instead of error Useful when automating tests: - If you don't know in advance the vocab type. - Differenciate other loading errors.	2024-06-23 20:49:02 +02:00
jaime-m-p	b452e826cb	Add tokenizer flag: clean_up_tokenization_spaces	2024-06-21 16:12:26 +02:00
jaime-m-p	6d233bc132	Remove previous space	2024-06-21 00:01:31 +02:00
jaime-m-p	0cc6593f10	Remove previous space	2024-06-21 00:00:35 +02:00
jaime-m-p	503b7531c7	Fix add_space_prefix, set false by default	2024-06-20 22:48:24 +02:00
jaime-m-p	064b35eaff	Update bruteforce random tests Add detokenizer checks New generator: ascii_lr_strip New generator: apostrophe Add more vocabs files	2024-06-20 21:41:37 +02:00
jaime-m-p	071bf42f23	Clean old known problematic codepoints	2024-06-20 19:25:32 +02:00
jaime-m-p	03dbcc89f6	minor: confusing hexadecimal codepoint	2024-06-20 19:20:37 +02:00
jaime-m-p	16a7503dcc	Fix tokenizer tests	2024-06-20 19:18:23 +02:00
jaime-m-p	40a66606a8	Using llama_tokenize() in tests	2024-06-20 19:14:02 +02:00
jaime-m-p	d779bab49c	Using llama_tokenize() in tests	2024-06-20 18:20:16 +02:00
jaime-m-p	eea8dfab6b	Add llama_detokenize()	2024-06-20 17:51:16 +02:00
Michael de Gans	2075a66a96	metal : fix `ggml_metal_supports_op` for BF16 (#8021 ) Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.	2024-06-20 08:32:01 +03:00
sasha0552	ba58993152	server : fix smart slot selection (#8020 )	2024-06-20 09:57:10 +10:00
Michael de Gans	a7854743c5	un-ignore `build-info.cmake` and `build-info.sh` (#7996 ) * un-ignore `build-info.cmake` and `build-info.sh` I am assuming that ignoring them was unintentional. If they are ignored, some tools, like cargo, will consider the files inexistent, even if they're comitted, for the purpose of publishing. This leads to the build failing in such cases. * un-ignore `build-info.cpp.in` For the same reason as the previous two files. * Reorganize `.gitignore` * Add exceptions for files mentioned by @slaren I did leave .clang-tidy since it was explicitly ignored before. * Add comments for organization * Sort some lines for pretty * Test with `make` and `cmake` builds to ensure no build artifacts might be comitted * Remove `.clang-tidy` from `.gitignore` Per comment by @ggerganov * Remove `IDEWorkspaceChecks.plist` from root-level `.gitignore`	2024-06-19 22:10:42 +02:00
slaren	9c77ec1d74	ggml : synchronize threads using barriers (#7993 )	2024-06-19 15:04:15 +02:00
Georgi Gerganov	a04a953cab	codecov : remove (#8004 )	2024-06-19 13:04:36 +03:00
Meng, Hengyu	623494a478	[SYCL] refactor (#6408 ) * seperate lower precision GEMM from the main files * fix workgroup size hardcode	2024-06-19 09:11:51 +08:00
jaime-m-p	37bef89433	tokenizer : BPE fixes (#7530 ) * Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t	2024-06-18 18:40:52 +02:00
Sigbjørn Skjæret	91c188d6c2	Only use FIM middle token if it exists (#7648 ) * Only use FIM middle if it exists * Only use FIM middle if it exists	2024-06-18 22:19:45 +10:00
jojorne	84f6de17f6	Fix no gcc pragma on Windows (#7751 )	2024-06-18 22:18:32 +10:00
Ulrich Drepper	61665277af	Allow compiling with CUDA without CUDA runtime installed (#7989 ) On hosts which are not prepared/dedicated to execute code using CUDA it is still possible to compile llama.cpp with CUDA support by just installing the development packages. Missing are the runtime libraries like /usr/lib64/libcuda.so* and currently the link step will fail. The development environment is prepared for such situations. There are stub libraries for all the CUDA libraries available in the $(CUDA_PATH)/lib64/stubs directory. Adding this directory to the end of the search path will not change anything for environments which currently work fine but will enable compiling llama.cpp also in case the runtime code is not available.	2024-06-18 14:00:14 +02:00
Frank Mai	b96f9afb0d	chore: clean useless beam search param (#7985 ) Signed-off-by: thxCode <thxcode0824@gmail.com>	2024-06-18 10:11:40 +03:00
Abheek Gulati	1193778105	readme : update UI list (#7943 )	2024-06-18 09:57:41 +03:00
Georgi Gerganov	5326bcceeb	ggml : sync	2024-06-18 09:50:45 +03:00
Georgi Gerganov	e6ecc2be47	whisper : use ggml_backend_sched (whisper/2239) * whisper : use ggml_backend_sched (wip) * use sched in whisper_allocr * whisper : single backend in whisper_context * whisper : remove whisper_state->backends_used * whisper : remove whisper_context->backend * whisper : reset scheduler after init * whisper : fix external encoder (e.g. CoreML) * whisper : cleanup * whisper : handle null GPU buffer types + fix sycl --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-18 09:50:40 +03:00
Ștefan-Gabriel Muscalu	a94e6ff877	update: support Qwen2-57B-A14B (#7835 ) * update: convert-hf-to-gguf.py to support Qwen2-57B-A14B * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated * update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated	2024-06-17 21:08:46 +02:00
Srihari-mcw	5b6da18750	Make updates to type cast based on compiler instead of OS (#7851 )	2024-06-17 20:23:17 +02:00
Georgi Gerganov	7c26775adb	llama : disable FA if KV head size do not match (#7982 )	2024-06-17 19:40:01 +03:00
Bryan Honof	b473e95084	Add Nix and Flox install instructions (#7899 )	2024-06-17 09:37:55 -06:00
slaren	99052cd227	sched : offload_op also requires supports_op (#7977 )	2024-06-17 16:51:42 +02:00
Frank Mai	c637fcd34d	fix: divide 0 exception in mamba (#7932 ) Signed-off-by: thxCode <thxcode0824@gmail.com>	2024-06-17 16:11:08 +02:00
Markus Tavenrath	6a2f0b3474	Implement non-mapped async IO for CUDA on Windows. (#7896 ) * Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-17 16:10:15 +02:00
Georgi Gerganov	21be9cab94	rpc : fix load/store misaligned addresses (#7948 )	2024-06-17 11:09:20 +03:00
Brian	006167aaf6	gguf-dump.py: add --markdown dump output (#7853 ) * gguf-dump.py: add --markdown dump output * gguf-dump.py: Add toc * gguf-dump.py: use standard tensor name lookup. Also add tensor ID field * gguf-dump.py: Add tensor overview count * gguf-dump.py: fix array preview * gguf-dump.py: markdownTableWithAlignmentSupport() added * Add type hints and spacing Co-authored-by: compilade <git@compilade.net> * gguf-dump.py: prettyfy dimention * gguf-dump: right align element count * gguf-dump.py: element count autosizing * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2024-06-17 15:25:20 +10:00
Neo Zhang	df68d4fa5d	[SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946 ) * Update README-sycl.md * Update README-sycl.md * Update README-sycl.md * Update README-sycl.md	2024-06-17 11:17:07 +08:00
Calvin Laurenson	43b35e38ba	Add support for sqrt on CUDA (#7953 ) * cuda sqrt support * enable cuda in pca * fix comments in pca * add test * add sqrt to ggml_backend_cuda_supports_op * fix test * new line * Use F32 sqrtf instead of F64 sqrt Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-06-17 00:23:04 +02:00
Georgi Gerganov	19b7a836f6	cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) * cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-06-16 20:32:49 +03:00
Hong Bo PENG	b5fcf8ef5c	ggml : fix and optimize ppc64le (ggml/849) * fix compile issues introduced by loongarch_asx * restore quant changes to merge * fix compile issues introduced by loongarch_asx * further optimize by using vec_msum & vec_sum4s on ppc64le	2024-06-16 20:32:49 +03:00
Daniel Bevenius	398105ff43	ggml : remove duplicate include of ggml-common.h (ggml/853) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-06-16 20:32:49 +03:00
Georgi Gerganov	bc6c457fa3	flake.lock: Update (#7951 )	2024-06-16 09:16:21 -07:00

1 2 3 4 5 ...

3208 commits