Commit graph

3487 commits

Author SHA1 Message Date
Wenjing Yu
a53d266f7b big cleanup 2024-07-26 17:47:11 -07:00
Zihao Chen
3ba087913f
Merge pull request #9 from zihaoccc/cleanup6
remove benchmark
2024-07-26 17:00:26 -07:00
Wenjing Yu
6ed7279adb remove benchmark 2024-07-26 16:59:48 -07:00
Zihao Chen
0f2350dcb6
Merge pull request #8 from zihaoccc/cleanup5
remove batch-benched
2024-07-26 16:51:16 -07:00
Wenjing Yu
5545e8ff92 remove batch-benched 2024-07-26 16:50:28 -07:00
Zihao Chen
5630607f2b
Merge pull request #7 from zihaoccc/cleanup4
remove batched
2024-07-26 16:47:56 -07:00
Wenjing Yu
4b850f0ce4 remove batched 2024-07-26 16:47:33 -07:00
Zihao Chen
715540a77b
Merge pull request #6 from zihaoccc/cleanup3
remove tests 2
2024-07-26 16:43:23 -07:00
Wenjing Yu
bf621daa86 remove tests 2 2024-07-26 16:42:54 -07:00
Zihao Chen
f7c0f9f576
Merge pull request #5 from zihaoccc/cleanup2
remove tests
2024-07-26 16:39:41 -07:00
Wenjing Yu
4810ab1aa1 remove tests 2024-07-26 16:38:13 -07:00
Zihao Chen
0e5165b605
Merge pull request #4 from zihaoccc/cleanup1
remove ci
2024-07-26 16:35:22 -07:00
Wenjing Yu
b76557d7c6 remove ci 2024-07-26 16:34:50 -07:00
Zihao Chen
cd78f93710
Merge pull request #3 from zihaoccc/cleanup
Cleanup baby-llama
2024-07-25 16:10:46 -07:00
Wenjing Yu
7addbe3e9d remove baby-llama 2024-07-25 16:10:05 -07:00
Zihao Chen
8fd767a557
Merge branch 'ggerganov:master' into master 2024-07-25 15:49:32 -07:00
Yaiko
01aec4a631
server : add Speech Recognition & Synthesis to UI (#8679)
* server : add Speech Recognition & Synthesis to UI

* server : add Speech Recognition & Synthesis to UI (fixes)
2024-07-26 00:10:16 +02:00
Xuan Son Nguyen
41cd47caab
examples : export-lora : fix issue with quantized base models (#8687) 2024-07-25 23:49:39 +02:00
DavidKorczynski
49ce0ab6d4
ggml: handle ggml_init failure to fix NULL pointer deref (#8692)
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.
2024-07-25 23:23:05 +02:00
Georgi Gerganov
4226a8d10e
llama : fix build + fix fabs compile warnings (#8683)
ggml-ci
2024-07-25 19:57:31 +03:00
Andreas (Andi) Kunar
bf5a81df37
ggml : fix build on Windows with Snapdragon X (#8531)
* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
Georgi Gerganov
88954f7fbd
tests : fix printfs (#8068) 2024-07-25 18:58:04 +03:00
Chen Xi
ed67bcb24f
[SYCL] fix multi-gpu issue on sycl (#8554)
---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
Georgi Gerganov
eddcb5238b
ggml : add and use ggml_cpu_has_llamafile() (#8664) 2024-07-25 12:37:42 +03:00
Xuan Son Nguyen
be6d7c0791
examples : remove finetune and train-text-from-scratch (#8669)
* examples : remove finetune and train-text-from-scratch

* fix build

* update help message

* fix small typo for export-lora
2024-07-25 10:39:04 +02:00
Ujjawal Panchal
4b0eff3df5
docs : Quantum -> Quantized (#8666)
Some checks failed
flake8 Lint / Lint (push) Has been cancelled
* docfix: imatrix readme, quantum models -> quantized models.

* docfix: server readme: quantum models -> quantized models.
2024-07-25 11:13:27 +03:00
Fan Shupei
8a4bad50a8
llama: use sliding window for phi3 (#8627)
* use sliding window for phi3

* fix typo, "data_swa" -> "data"

* [conver_hf_to_gguf.py] add phi3 sliding window
2024-07-25 10:21:09 +03:00
MorganRO8
68504f0970
readme : update games list (#8673)
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Joe Todd
f19bf99c01
Build Llama SYCL Intel with static libs (#8668)
Ensure SYCL CI builds both static & dynamic libs for testing purposes

Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-24 14:36:00 +01:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] (#8505) 2024-07-24 15:52:30 +03:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single for mistral (#8657)
* fix `llama_chat_format_single` for mistral

* fix typo

* use printf
2024-07-24 13:48:46 +02:00
Joe Todd
79167d9e49
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667) 2024-07-24 11:55:26 +01:00
Xuan Son Nguyen
b115105f05
add llama_lora_adapter_clear (#8653) 2024-07-24 11:25:19 +02:00
Xuan Son Nguyen
de280085e7
examples : Fix llama-export-lora example (#8607)
* fix export-lora example

* add more logging

* reject merging subset

* better check

* typo
2024-07-23 23:48:37 +02:00
Vali Malinoiu
b841d07408
server : fix URL.parse in the UI (#8646) 2024-07-23 17:37:42 +03:00
Joe Todd
64cf50a0ed
sycl : Add support for non-release DPC++ & oneMKL (#8644)
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-23 14:58:37 +01:00
Georgi Gerganov
938943cdbf
llama : move vocab, grammar and sampling into separate files (#8508)
* llama : move sampling code into llama-sampling

ggml-ci

* llama : move grammar code into llama-grammar

ggml-ci

* cont

ggml-ci

* cont : pre-fetch rules

* cont

ggml-ci

* llama : deprecate llama_sample_grammar

* llama : move tokenizers into llama-vocab

ggml-ci

* make : update llama.cpp deps [no ci]

* llama : redirect external API to internal APIs

ggml-ci

* llama : suffix the internal APIs with "_impl"

ggml-ci

* llama : clean-up
2024-07-23 13:10:17 +03:00
0cc4m
751fcfc6c3
Vulkan IQ4_NL Support (#8613)
* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
Jeroen Mostert
46e47417aa
Allow all RDNA2 archs to use sdot4 intrinsic (#8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-07-23 10:50:40 +02:00
Georgi Gerganov
e7e6487ba0
contrib : clarify PR squashing + module names (#8630)
* contrib : clarify PR squashing

* contrib : fix typo + add list of modules
2024-07-23 11:28:38 +03:00
luoyu-intel
063d99ad11
[SYCL] fix scratch size of softmax (#8642) 2024-07-23 15:43:28 +08:00
Keke Han
081fe431aa
llama : fix codeshell support (#8599)
* llama : fix codeshell support

* llama : move codeshell after smollm below to respect the enum order
2024-07-22 19:43:43 +03:00
Jason Stillerman
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer (#8609)
* Adding SmolLM Pre Tokenizer

* Update convert_hf_to_gguf_update.py

Co-authored-by: compilade <git@compilade.net>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* handle regex

* removed .inp and out .out ggufs

---------

Co-authored-by: compilade <git@compilade.net>
2024-07-22 17:43:01 +03:00
Jiří Podivín
566daa5a5b
*.py: Stylistic adjustments for python (#8233)
* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var

Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-07-22 23:44:53 +10:00
Georgi Gerganov
6f11a83e4e
llama : allow overrides for tokenizer flags (#8614)
ggml-ci
2024-07-22 13:33:22 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests (#8611)
* models : remove duplicated gpt-2 vocab

* models : remove old stablelm vocab

* tests : re-enable MPT tokenizer tests

* tests : re-enable DeepSeek tokenizer tests

* cmake : sort

ggml-ci
2024-07-22 13:32:49 +03:00
Douglas Hanley
50e05353e8
llama : add Mistral Nemo inference support (#8604) 2024-07-22 11:06:17 +03:00
Jan Boon
628154492a
server : update doc to clarify n_keep when there is bos token (#8619) 2024-07-22 11:02:09 +03:00
Mark Zhuang
04bab6b7da
ggml: fix compile error for RISC-V (#8623) 2024-07-22 10:56:45 +03:00
devojony
b7c11d36e6
examples: fix android example cannot be generated continuously (#8621)
When generation ends `completion_loop()` should return a NULL, not the empty string
2024-07-22 09:54:42 +03:00