Tony Wasserka
203b7f1531
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
...
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.
Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
2024-07-27 17:43:44 +03:00
Borislav Stanimirov
d2b851bfa1
cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885)
2024-07-27 17:43:44 +03:00
Daniel Bevenius
c12b6e8ee7
ggml : remove unnecessary UNUSED macro call (ggml/880)
...
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-27 17:43:44 +03:00
Jeffrey Morgan
b5e95468b1
llama : add support for llama 3.1 rope scaling factors ( #8676 )
...
* Add llama 3.1 rope scaling factors to llama conversion and inference
This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* address comments
* address comments
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: compilade <git@compilade.net>
2024-07-27 15:03:45 +03:00
Georgi Gerganov
92090eca21
llama : add function for model-based max number of graph nodes ( #8622 )
...
* llama : model-based max number of graph nodes
ggml-ci
* llama : disable 405B max_nodes path due to lack of complaints
ggml-ci
2024-07-27 14:59:29 +03:00
Daniel Bevenius
9d03d085dd
common : add --no-warmup option for main/llama-cli ( #8712 )
...
This commit adds a --no-warmup option for llama-cli.
The motivation for this is that it can be convenient to skip the
warmup llama_decode call when debugging.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-27 13:45:02 +03:00
wangshuai09
bfb4c74981
cann: Fix Multi-NPU execution error ( #8710 )
...
* cann: fix multi-npu exec error
* cann: update comment for ggml_backend_cann_supports_buft
2024-07-27 16:36:44 +08:00
slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
caitianchi
65f7455cea
Modify 2 notes
2024-07-26 21:49:23 +08:00
caitianchi
f3d400dac0
remove uhd_image_embed
2024-07-26 21:15:03 +08:00
Judd
01245f5b16
llama : fix order of parameters ( #8706 )
...
usage of `aclrtGetMemInfo` is correct:
https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-26 11:38:12 +03:00
Yaiko
01aec4a631
server : add Speech Recognition & Synthesis to UI ( #8679 )
...
* server : add Speech Recognition & Synthesis to UI
* server : add Speech Recognition & Synthesis to UI (fixes)
2024-07-26 00:10:16 +02:00
Xuan Son Nguyen
41cd47caab
examples : export-lora : fix issue with quantized base models ( #8687 )
2024-07-25 23:49:39 +02:00
DavidKorczynski
49ce0ab6d4
ggml: handle ggml_init failure to fix NULL pointer deref ( #8692 )
...
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.
This fixes it by bailing out if no context is found.
2024-07-25 23:23:05 +02:00
Georgi Gerganov
4226a8d10e
llama : fix build + fix fabs compile warnings ( #8683 )
...
ggml-ci
2024-07-25 19:57:31 +03:00
Andreas (Andi) Kunar
bf5a81df37
ggml : fix build on Windows with Snapdragon X ( #8531 )
...
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit bf21397ae5
.
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
Georgi Gerganov
88954f7fbd
tests : fix printfs ( #8068 )
2024-07-25 18:58:04 +03:00
Chen Xi
ed67bcb24f
[SYCL] fix multi-gpu issue on sycl ( #8554 )
...
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
Georgi Gerganov
eddcb5238b
ggml : add and use ggml_cpu_has_llamafile() ( #8664 )
2024-07-25 12:37:42 +03:00
Xuan Son Nguyen
be6d7c0791
examples : remove finetune
and train-text-from-scratch
( #8669 )
...
* examples : remove finetune and train-text-from-scratch
* fix build
* update help message
* fix small typo for export-lora
2024-07-25 10:39:04 +02:00
Ujjawal Panchal
4b0eff3df5
docs : Quantum -> Quantized ( #8666 )
...
flake8 Lint / Lint (push) Has been cancelled
* docfix: imatrix readme, quantum models -> quantized models.
* docfix: server readme: quantum models -> quantized models.
2024-07-25 11:13:27 +03:00
caitianchi
72b962925b
delete minicpmv-wrapper in pr
2024-07-25 16:01:26 +08:00
caitianchi
107e1edb20
fix uhd code for review comment
2024-07-25 15:22:11 +08:00
Fan Shupei
8a4bad50a8
llama: use sliding window for phi3 ( #8627 )
...
* use sliding window for phi3
* fix typo, "data_swa" -> "data"
* [conver_hf_to_gguf.py] add phi3 sliding window
2024-07-25 10:21:09 +03:00
MorganRO8
68504f0970
readme : update games list ( #8673 )
...
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Joe Todd
f19bf99c01
Build Llama SYCL Intel with static libs ( #8668 )
...
Ensure SYCL CI builds both static & dynamic libs for testing purposes
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-24 14:36:00 +01:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] ( #8505 )
2024-07-24 15:52:30 +03:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single
for mistral ( #8657 )
...
* fix `llama_chat_format_single` for mistral
* fix typo
* use printf
2024-07-24 13:48:46 +02:00
Joe Todd
79167d9e49
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS ( #8667 )
2024-07-24 11:55:26 +01:00
Xuan Son Nguyen
b115105f05
add llama_lora_adapter_clear ( #8653 )
2024-07-24 11:25:19 +02:00
Xuan Son Nguyen
de280085e7
examples : Fix llama-export-lora
example ( #8607 )
...
* fix export-lora example
* add more logging
* reject merging subset
* better check
* typo
2024-07-23 23:48:37 +02:00
Vali Malinoiu
b841d07408
server : fix URL.parse in the UI ( #8646 )
2024-07-23 17:37:42 +03:00
Joe Todd
64cf50a0ed
sycl : Add support for non-release DPC++ & oneMKL ( #8644 )
...
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-23 14:58:37 +01:00
Georgi Gerganov
938943cdbf
llama : move vocab, grammar and sampling into separate files ( #8508 )
...
* llama : move sampling code into llama-sampling
ggml-ci
* llama : move grammar code into llama-grammar
ggml-ci
* cont
ggml-ci
* cont : pre-fetch rules
* cont
ggml-ci
* llama : deprecate llama_sample_grammar
* llama : move tokenizers into llama-vocab
ggml-ci
* make : update llama.cpp deps [no ci]
* llama : redirect external API to internal APIs
ggml-ci
* llama : suffix the internal APIs with "_impl"
ggml-ci
* llama : clean-up
2024-07-23 13:10:17 +03:00
0cc4m
751fcfc6c3
Vulkan IQ4_NL Support ( #8613 )
...
* Fix Vulkan matmul tests compile errors
* Add Vulkan IQ4_NL support
* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
Jeroen Mostert
46e47417aa
Allow all RDNA2 archs to use sdot4 intrinsic ( #8629 )
...
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-07-23 10:50:40 +02:00
Georgi Gerganov
e7e6487ba0
contrib : clarify PR squashing + module names ( #8630 )
...
* contrib : clarify PR squashing
* contrib : fix typo + add list of modules
2024-07-23 11:28:38 +03:00
luoyu-intel
063d99ad11
[SYCL] fix scratch size of softmax ( #8642 )
2024-07-23 15:43:28 +08:00
caitianchi
6fd0937e9f
remove the extern "C", MINICPMV_API
2024-07-23 15:25:32 +08:00
caitianchi
fcde997126
remove load_image_size into clip_ctx
2024-07-23 15:24:43 +08:00
caitianchi
3642be9937
fix KEY_HAS_MINICPMV_PROJ
2024-07-23 14:55:55 +08:00
caitianchi
dad4abe1bc
add warn
2024-07-23 11:57:42 +08:00
caitianchi
62fa15bcd2
fix cmakefile
2024-07-23 11:52:34 +08:00
Keke Han
081fe431aa
llama : fix codeshell support ( #8599 )
...
* llama : fix codeshell support
* llama : move codeshell after smollm below to respect the enum order
2024-07-22 19:43:43 +03:00
Jason Stillerman
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer ( #8609 )
...
* Adding SmolLM Pre Tokenizer
* Update convert_hf_to_gguf_update.py
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* handle regex
* removed .inp and out .out ggufs
---------
Co-authored-by: compilade <git@compilade.net>
2024-07-22 17:43:01 +03:00
caitianchi
4c755832fe
remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
2024-07-22 21:44:56 +08:00
Jiří Podivín
566daa5a5b
*.py: Stylistic adjustments for python ( #8233 )
...
* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-07-22 23:44:53 +10:00
caitianchi
be8b5b2f8d
fix code review
2024-07-22 21:34:21 +08:00
Georgi Gerganov
6f11a83e4e
llama : allow overrides for tokenizer flags ( #8614 )
...
ggml-ci
2024-07-22 13:33:22 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests ( #8611 )
...
* models : remove duplicated gpt-2 vocab
* models : remove old stablelm vocab
* tests : re-enable MPT tokenizer tests
* tests : re-enable DeepSeek tokenizer tests
* cmake : sort
ggml-ci
2024-07-22 13:32:49 +03:00