Commit graph

3208 commits

Author SHA1 Message Date
jaime-m-p
107923cdd2 Better leading space removal 2024-06-25 17:33:56 +02:00
jaime-m-p
9854a9cde9 Symetric params for llama_tokenize() and llama_detokenize() 2024-06-25 17:28:53 +02:00
jaime-m-p
4a28063b1f Update brute force test:
Detokenize special tokens.
Replace errors with '\uFFFD' when detokenizing to 'utf-8'.
More edge cases.
Better detokenization results check.
2024-06-24 20:56:26 +02:00
jaime-m-p
95a0df5578 Bugfix: custom regexs splits undefined unicode codepoints 2024-06-24 20:47:28 +02:00
jaime-m-p
12e2c317c8 style: remove trailing whitespace 2024-06-24 20:39:54 +02:00
jaime-m-p
9eb0fca027 Do not remove space when decoding special tokens 2024-06-24 20:37:48 +02:00
jaime-m-p
44c8648461 Fix detokenizer():
UNKNOWN and CONTROL are 'special pieces'.
Remove space after UNKNOWN and CONTROL.
Refactor llama_token_to_piece().
2024-06-23 21:12:24 +02:00
jaime-m-p
38d54b3c39 tets: skip unicode surrogaes and undefined 2024-06-23 20:56:32 +02:00
jaime-m-p
0cf2989b6c tests: gracefully exit threads
Using exit() is throwing random exceptions
2024-06-23 20:54:46 +02:00
jaime-m-p
9af762c0ac tests: unexpected vocab type as test fail instead of error
Useful when automating tests:
 - If you don't know in advance the vocab type.
 - Differenciate other loading errors.
2024-06-23 20:49:02 +02:00
jaime-m-p
b452e826cb Add tokenizer flag: clean_up_tokenization_spaces 2024-06-21 16:12:26 +02:00
jaime-m-p
6d233bc132 Remove previous space 2024-06-21 00:01:31 +02:00
jaime-m-p
0cc6593f10 Remove previous space 2024-06-21 00:00:35 +02:00
jaime-m-p
503b7531c7 Fix add_space_prefix, set false by default 2024-06-20 22:48:24 +02:00
jaime-m-p
064b35eaff Update bruteforce random tests
Add detokenizer checks
New generator: ascii_lr_strip
New generator: apostrophe
Add more vocabs files
2024-06-20 21:41:37 +02:00
jaime-m-p
071bf42f23 Clean old known problematic codepoints 2024-06-20 19:25:32 +02:00
jaime-m-p
03dbcc89f6 minor: confusing hexadecimal codepoint 2024-06-20 19:20:37 +02:00
jaime-m-p
16a7503dcc Fix tokenizer tests 2024-06-20 19:18:23 +02:00
jaime-m-p
40a66606a8 Using llama_tokenize() in tests 2024-06-20 19:14:02 +02:00
jaime-m-p
d779bab49c Using llama_tokenize() in tests 2024-06-20 18:20:16 +02:00
jaime-m-p
eea8dfab6b Add llama_detokenize() 2024-06-20 17:51:16 +02:00
Michael de Gans
2075a66a96
metal : fix ggml_metal_supports_op for BF16 (#8021)
Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.
2024-06-20 08:32:01 +03:00
sasha0552
ba58993152
server : fix smart slot selection (#8020) 2024-06-20 09:57:10 +10:00
Michael de Gans
a7854743c5
un-ignore build-info.cmake and build-info.sh (#7996)
* un-ignore `build-info.cmake` and `build-info.sh`

I am assuming that ignoring them was unintentional. If they are ignored, some tools, like cargo, will consider the files inexistent, even if they're comitted, for the purpose of publishing. This leads to the build failing in such cases.

* un-ignore `build-info.cpp.in`

For the same reason as the previous two files.

* Reorganize `.gitignore`

* Add exceptions for files mentioned by @slaren

I did leave .clang-tidy since it was explicitly ignored before.

* Add comments for organization
* Sort some lines for pretty
* Test with `make` and `cmake` builds to ensure no build artifacts might be comitted

* Remove `.clang-tidy` from `.gitignore`

Per comment by @ggerganov

* Remove `IDEWorkspaceChecks.plist` from root-level `.gitignore`
2024-06-19 22:10:42 +02:00
slaren
9c77ec1d74
ggml : synchronize threads using barriers (#7993) 2024-06-19 15:04:15 +02:00
Georgi Gerganov
a04a953cab
codecov : remove (#8004) 2024-06-19 13:04:36 +03:00
Meng, Hengyu
623494a478
[SYCL] refactor (#6408)
* seperate lower precision GEMM from the main files

* fix workgroup size hardcode
2024-06-19 09:11:51 +08:00
jaime-m-p
37bef89433
tokenizer : BPE fixes (#7530)
* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t
2024-06-18 18:40:52 +02:00
Sigbjørn Skjæret
91c188d6c2
Only use FIM middle token if it exists (#7648)
* Only use FIM middle if it exists

* Only use FIM middle if it exists
2024-06-18 22:19:45 +10:00
jojorne
84f6de17f6
Fix no gcc pragma on Windows (#7751) 2024-06-18 22:18:32 +10:00
Ulrich Drepper
61665277af
Allow compiling with CUDA without CUDA runtime installed (#7989)
On hosts which are not prepared/dedicated to execute code using CUDA
it is still possible to compile llama.cpp with CUDA support by just
installing the development packages.  Missing are the runtime
libraries like /usr/lib64/libcuda.so* and currently the link step
will fail.

The development environment is prepared for such situations.  There
are stub libraries for all the CUDA libraries available in the
$(CUDA_PATH)/lib64/stubs directory.  Adding this directory to the end
of the search path will not change anything for environments which
currently work fine but will enable compiling llama.cpp also in case
the runtime code is not available.
2024-06-18 14:00:14 +02:00
Frank Mai
b96f9afb0d
chore: clean useless beam search param (#7985)
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-06-18 10:11:40 +03:00
Abheek Gulati
1193778105
readme : update UI list (#7943) 2024-06-18 09:57:41 +03:00
Georgi Gerganov
5326bcceeb
ggml : sync 2024-06-18 09:50:45 +03:00
Georgi Gerganov
e6ecc2be47
whisper : use ggml_backend_sched (whisper/2239)
* whisper : use ggml_backend_sched (wip)

* use sched in whisper_allocr

* whisper : single backend in whisper_context

* whisper : remove whisper_state->backends_used

* whisper : remove whisper_context->backend

* whisper : reset scheduler after init

* whisper : fix external encoder (e.g. CoreML)

* whisper : cleanup

* whisper : handle null GPU buffer types + fix sycl

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-18 09:50:40 +03:00
Ștefan-Gabriel Muscalu
a94e6ff877
update: support Qwen2-57B-A14B (#7835)
* update: convert-hf-to-gguf.py to support Qwen2-57B-A14B

* fix: QWEN2MOE support for expert_feed_forward_length

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shared_exp are now properly calculated

* update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM

* fix: QWEN2MOE support for expert_feed_forward_length

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shexp are now properly calculated
2024-06-17 21:08:46 +02:00
Srihari-mcw
5b6da18750
Make updates to type cast based on compiler instead of OS (#7851) 2024-06-17 20:23:17 +02:00
Georgi Gerganov
7c26775adb
llama : disable FA if KV head size do not match (#7982) 2024-06-17 19:40:01 +03:00
Bryan Honof
b473e95084
Add Nix and Flox install instructions (#7899) 2024-06-17 09:37:55 -06:00
slaren
99052cd227
sched : offload_op also requires supports_op (#7977) 2024-06-17 16:51:42 +02:00
Frank Mai
c637fcd34d
fix: divide 0 exception in mamba (#7932)
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-06-17 16:11:08 +02:00
Markus Tavenrath
6a2f0b3474
Implement non-mapped async IO for CUDA on Windows. (#7896)
* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive.

* Free resources except for backend.

* Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA.

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* Fix editorconfig and unused variable

* Fix issues with Windows build

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-17 16:10:15 +02:00
Georgi Gerganov
21be9cab94
rpc : fix load/store misaligned addresses (#7948) 2024-06-17 11:09:20 +03:00
Brian
006167aaf6
gguf-dump.py: add --markdown dump output (#7853)
* gguf-dump.py: add --markdown dump output

* gguf-dump.py: Add toc

* gguf-dump.py: use standard tensor name lookup. Also add tensor ID field

* gguf-dump.py: Add tensor overview count

* gguf-dump.py: fix array preview

* gguf-dump.py: markdownTableWithAlignmentSupport() added

* Add type hints and spacing

Co-authored-by: compilade <git@compilade.net>

* gguf-dump.py: prettyfy dimention

* gguf-dump: right align element count

* gguf-dump.py: element count autosizing

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2024-06-17 15:25:20 +10:00
Neo Zhang
df68d4fa5d
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946)
* Update README-sycl.md

* Update README-sycl.md

* Update README-sycl.md

* Update README-sycl.md
2024-06-17 11:17:07 +08:00
Calvin Laurenson
43b35e38ba
Add support for sqrt on CUDA (#7953)
* cuda sqrt support

* enable cuda in pca

* fix comments in pca

* add test

* add sqrt to ggml_backend_cuda_supports_op

* fix test

* new line

* Use F32 sqrtf instead of F64 sqrt

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-17 00:23:04 +02:00
Georgi Gerganov
19b7a836f6
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-16 20:32:49 +03:00
Hong Bo PENG
b5fcf8ef5c
ggml : fix and optimize ppc64le (ggml/849)
* fix compile issues introduced by loongarch_asx

* restore quant changes to merge

* fix compile issues introduced by loongarch_asx

* further optimize by using vec_msum & vec_sum4s on ppc64le
2024-06-16 20:32:49 +03:00
Daniel Bevenius
398105ff43
ggml : remove duplicate include of ggml-common.h (ggml/853)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-06-16 20:32:49 +03:00
Georgi Gerganov
bc6c457fa3
flake.lock: Update (#7951) 2024-06-16 09:16:21 -07:00