Zihao Chen
da6c1f29f2
Merge pull request #10 from zihaoccc/cleanup7
...
big cleanup
2024-07-26 17:47:43 -07:00
Wenjing Yu
a53d266f7b
big cleanup
2024-07-26 17:47:11 -07:00
Zihao Chen
3ba087913f
Merge pull request #9 from zihaoccc/cleanup6
...
remove benchmark
2024-07-26 17:00:26 -07:00
Wenjing Yu
6ed7279adb
remove benchmark
2024-07-26 16:59:48 -07:00
Zihao Chen
0f2350dcb6
Merge pull request #8 from zihaoccc/cleanup5
...
remove batch-benched
2024-07-26 16:51:16 -07:00
Wenjing Yu
5545e8ff92
remove batch-benched
2024-07-26 16:50:28 -07:00
Zihao Chen
5630607f2b
Merge pull request #7 from zihaoccc/cleanup4
...
remove batched
2024-07-26 16:47:56 -07:00
Wenjing Yu
4b850f0ce4
remove batched
2024-07-26 16:47:33 -07:00
Zihao Chen
715540a77b
Merge pull request #6 from zihaoccc/cleanup3
...
remove tests 2
2024-07-26 16:43:23 -07:00
Wenjing Yu
bf621daa86
remove tests 2
2024-07-26 16:42:54 -07:00
Zihao Chen
f7c0f9f576
Merge pull request #5 from zihaoccc/cleanup2
...
remove tests
2024-07-26 16:39:41 -07:00
Wenjing Yu
4810ab1aa1
remove tests
2024-07-26 16:38:13 -07:00
Zihao Chen
0e5165b605
Merge pull request #4 from zihaoccc/cleanup1
...
remove ci
2024-07-26 16:35:22 -07:00
Wenjing Yu
b76557d7c6
remove ci
2024-07-26 16:34:50 -07:00
Zihao Chen
cd78f93710
Merge pull request #3 from zihaoccc/cleanup
...
Cleanup baby-llama
2024-07-25 16:10:46 -07:00
Wenjing Yu
7addbe3e9d
remove baby-llama
2024-07-25 16:10:05 -07:00
Zihao Chen
8fd767a557
Merge branch 'ggerganov:master' into master
2024-07-25 15:49:32 -07:00
Yaiko
01aec4a631
server : add Speech Recognition & Synthesis to UI ( #8679 )
...
* server : add Speech Recognition & Synthesis to UI
* server : add Speech Recognition & Synthesis to UI (fixes)
2024-07-26 00:10:16 +02:00
Xuan Son Nguyen
41cd47caab
examples : export-lora : fix issue with quantized base models ( #8687 )
2024-07-25 23:49:39 +02:00
DavidKorczynski
49ce0ab6d4
ggml: handle ggml_init failure to fix NULL pointer deref ( #8692 )
...
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.
This fixes it by bailing out if no context is found.
2024-07-25 23:23:05 +02:00
Georgi Gerganov
4226a8d10e
llama : fix build + fix fabs compile warnings ( #8683 )
...
ggml-ci
2024-07-25 19:57:31 +03:00
Andreas (Andi) Kunar
bf5a81df37
ggml : fix build on Windows with Snapdragon X ( #8531 )
...
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit bf21397ae5
.
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
Georgi Gerganov
88954f7fbd
tests : fix printfs ( #8068 )
2024-07-25 18:58:04 +03:00
Chen Xi
ed67bcb24f
[SYCL] fix multi-gpu issue on sycl ( #8554 )
...
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
Georgi Gerganov
eddcb5238b
ggml : add and use ggml_cpu_has_llamafile() ( #8664 )
2024-07-25 12:37:42 +03:00
Xuan Son Nguyen
be6d7c0791
examples : remove finetune
and train-text-from-scratch
( #8669 )
...
* examples : remove finetune and train-text-from-scratch
* fix build
* update help message
* fix small typo for export-lora
2024-07-25 10:39:04 +02:00
Ujjawal Panchal
4b0eff3df5
docs : Quantum -> Quantized ( #8666 )
...
flake8 Lint / Lint (push) Has been cancelled
* docfix: imatrix readme, quantum models -> quantized models.
* docfix: server readme: quantum models -> quantized models.
2024-07-25 11:13:27 +03:00
Fan Shupei
8a4bad50a8
llama: use sliding window for phi3 ( #8627 )
...
* use sliding window for phi3
* fix typo, "data_swa" -> "data"
* [conver_hf_to_gguf.py] add phi3 sliding window
2024-07-25 10:21:09 +03:00
MorganRO8
68504f0970
readme : update games list ( #8673 )
...
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Joe Todd
f19bf99c01
Build Llama SYCL Intel with static libs ( #8668 )
...
Ensure SYCL CI builds both static & dynamic libs for testing purposes
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-24 14:36:00 +01:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] ( #8505 )
2024-07-24 15:52:30 +03:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single
for mistral ( #8657 )
...
* fix `llama_chat_format_single` for mistral
* fix typo
* use printf
2024-07-24 13:48:46 +02:00
Joe Todd
79167d9e49
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS ( #8667 )
2024-07-24 11:55:26 +01:00
Xuan Son Nguyen
b115105f05
add llama_lora_adapter_clear ( #8653 )
2024-07-24 11:25:19 +02:00
Xuan Son Nguyen
de280085e7
examples : Fix llama-export-lora
example ( #8607 )
...
* fix export-lora example
* add more logging
* reject merging subset
* better check
* typo
2024-07-23 23:48:37 +02:00
Vali Malinoiu
b841d07408
server : fix URL.parse in the UI ( #8646 )
2024-07-23 17:37:42 +03:00
Joe Todd
64cf50a0ed
sycl : Add support for non-release DPC++ & oneMKL ( #8644 )
...
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-23 14:58:37 +01:00
Georgi Gerganov
938943cdbf
llama : move vocab, grammar and sampling into separate files ( #8508 )
...
* llama : move sampling code into llama-sampling
ggml-ci
* llama : move grammar code into llama-grammar
ggml-ci
* cont
ggml-ci
* cont : pre-fetch rules
* cont
ggml-ci
* llama : deprecate llama_sample_grammar
* llama : move tokenizers into llama-vocab
ggml-ci
* make : update llama.cpp deps [no ci]
* llama : redirect external API to internal APIs
ggml-ci
* llama : suffix the internal APIs with "_impl"
ggml-ci
* llama : clean-up
2024-07-23 13:10:17 +03:00
0cc4m
751fcfc6c3
Vulkan IQ4_NL Support ( #8613 )
...
* Fix Vulkan matmul tests compile errors
* Add Vulkan IQ4_NL support
* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
Jeroen Mostert
46e47417aa
Allow all RDNA2 archs to use sdot4 intrinsic ( #8629 )
...
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-07-23 10:50:40 +02:00
Georgi Gerganov
e7e6487ba0
contrib : clarify PR squashing + module names ( #8630 )
...
* contrib : clarify PR squashing
* contrib : fix typo + add list of modules
2024-07-23 11:28:38 +03:00
luoyu-intel
063d99ad11
[SYCL] fix scratch size of softmax ( #8642 )
2024-07-23 15:43:28 +08:00
Keke Han
081fe431aa
llama : fix codeshell support ( #8599 )
...
* llama : fix codeshell support
* llama : move codeshell after smollm below to respect the enum order
2024-07-22 19:43:43 +03:00
Jason Stillerman
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer ( #8609 )
...
* Adding SmolLM Pre Tokenizer
* Update convert_hf_to_gguf_update.py
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* handle regex
* removed .inp and out .out ggufs
---------
Co-authored-by: compilade <git@compilade.net>
2024-07-22 17:43:01 +03:00
Jiří Podivín
566daa5a5b
*.py: Stylistic adjustments for python ( #8233 )
...
* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-07-22 23:44:53 +10:00
Georgi Gerganov
6f11a83e4e
llama : allow overrides for tokenizer flags ( #8614 )
...
ggml-ci
2024-07-22 13:33:22 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests ( #8611 )
...
* models : remove duplicated gpt-2 vocab
* models : remove old stablelm vocab
* tests : re-enable MPT tokenizer tests
* tests : re-enable DeepSeek tokenizer tests
* cmake : sort
ggml-ci
2024-07-22 13:32:49 +03:00
Douglas Hanley
50e05353e8
llama : add Mistral Nemo inference support ( #8604 )
2024-07-22 11:06:17 +03:00
Jan Boon
628154492a
server : update doc to clarify n_keep when there is bos token ( #8619 )
2024-07-22 11:02:09 +03:00
Mark Zhuang
04bab6b7da
ggml: fix compile error for RISC-V ( #8623 )
2024-07-22 10:56:45 +03:00