Commit graph

4467 commits

Author SHA1 Message Date
Eve
c9463641af Merge https://github.com/ggerganov/llama.cpp into vulkan 2025-01-08 21:59:37 -05:00
Eve
923e9a8377 q3_k use hmask simd from cpu avx version 2025-01-08 21:51:27 -05:00
Eve
fe71a8c4a1 q3_k optimizations 2025-01-08 21:51:27 -05:00
Eve
cc28742ca3 q2_k better dequant 2025-01-08 21:51:27 -05:00
Eve
91f1d9ce99 better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations 2025-01-08 21:51:27 -05:00
Eve
6f5d62b098 q5_k 2025-01-08 21:51:27 -05:00
Eve
cdf70cf27f better q4_k scales 2025-01-08 21:51:27 -05:00
Eve
b4ae7005e6 unpack should be u16, add vim swap to gitignore (about time) 2025-01-08 21:51:27 -05:00
Eve
173077180f Revert "try precalculating products of a and q2_k scales"
This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b.
2025-01-08 21:51:27 -05:00
Eve
bdd98c74e2 try precalculating products of a and q2_k scales 2025-01-08 21:51:27 -05:00
hydai
8d59d91171
fix: add missing msg in static_assert (#11143)
Signed-off-by: hydai <z54981220@gmail.com>
2025-01-08 20:03:28 +00:00
Vinesh Janarthanan
8a1d9c25fa
gguf-py : move scripts directory (#11116)
* Moved scripts dir and fixed pyproject.toml

* updated readme

* fixed README urls

* bump pypi gguf to v0.14.0

* retrigger ci

* empty commit - trigger ci
2025-01-08 20:54:58 +02:00
Eric Curtin
1bf839b1e8
Enhance user input handling for llama-run (#11138)
The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-08 18:47:05 +00:00
Xuan Son Nguyen
f7cd13301c
ci : use actions from ggml-org (#11140) 2025-01-08 16:09:20 +01:00
Xuan Son Nguyen
4d2b3d8804
lora : improve compat with mergekit-extract-lora (#11131)
* (wip) support mergekit-extracted lora

* support mergekit-extract-lora

* use lora->get_scale

* correct comment

* correct norm name & condition

* add some hints
2025-01-08 15:59:53 +01:00
Georgi Gerganov
c07d437bbd
llama : avoid hardcoded QK_K (#11061)
ggml-ci
2025-01-08 16:19:36 +02:00
Georgi Gerganov
99a3755a3c
sync : ggml 2025-01-08 13:40:30 +02:00
Radoslav Gerganov
c792dcf488
ggml : allow loading backend with env variable (ggml/1059)
ref: #1058
2025-01-08 13:40:18 +02:00
Xuan Son Nguyen
80ccf5d725
ci : pin dependency to specific version (#11137)
* ci : pin dependency to specific version

* will this fix ec?
2025-01-08 12:07:20 +01:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples (#11136)
* arg : option to exclude arguments from specific examples

ggml-ci

* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
amritahs-ibm
8cef75c743
llamafile : ppc64le MMA INT8 implementation (#10912)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-01-08 12:54:19 +02:00
Georgi Gerganov
0d52a69e4b
ci : fix cmake option (#11125) 2025-01-08 11:29:34 +02:00
Mathieu Baudier
02f0430141
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117)
* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.

* Perform Vulkan extensions checks in a more sensible order

* Remove unnecessary #ifdef directive
2025-01-08 09:18:13 +01:00
ag2s20150909
bec2183f2c
fix: Vulkan shader gen binary path when Cross-compiling (#11096)
* fix: Vulkan shader gen binary path when cross compiling
2025-01-08 09:17:29 +01:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes (#11030)
* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
2025-01-07 18:01:58 +01:00
Diego Devesa
017cc5f446
ggml-backend : only offload from host buffers (fix) (#11124) 2025-01-07 16:11:57 +01:00
Diego Devesa
a3d50bc022
ggml-backend : only offload from host buffers (#11120) 2025-01-07 12:38:05 +01:00
Radoslav Gerganov
a4dd490069
rpc : code cleanup (#11107)
Remove duplicated macros, use GGML_LOG_ERROR for errors
2025-01-07 08:37:02 +02:00
Akarshan Biswas
c0d6f790d0
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087)
* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6

* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"

This reverts commit f62dc45f31.

* Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6
2025-01-07 14:26:07 +08:00
Eric Curtin
dc7cef9f37
llama-run : fix context size (#11094)
Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-06 23:45:28 +01:00
Georgi Gerganov
ecebbd292d
llama : remove unused headers (#11109)
ggml-ci
2025-01-06 17:52:35 +02:00
Xuan Son Nguyen
96be8c3264
github : add cmd line field to bug report (#11090)
* github : cmd line to bug report

* codeowners : (@ngxson) only watch dockerfile

* Apply suggestions from code review [no ci]

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* rm cmd in log output [no ci]

* rm 2 [no ci]

* no need backticks [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-06 16:34:49 +01:00
Georgi Gerganov
e6e7c75d94
server : fix extra BOS in infill endpoint (#11106)
* server : fix extra BOS in infill endpoing

ggml-ci

* server : update infill tests
2025-01-06 15:36:08 +02:00
Xuan Son Nguyen
09186fabbe
llama : remove check flash_attn with lora (#11104) 2025-01-06 13:41:12 +01:00
Asghar Ghorbani
96a1dc27c3
llama : prevent system info string accumulation across calls (#11101) 2025-01-06 13:21:46 +02:00
Daniel Bevenius
6369f867a4
llama : rename missed batch params/vars to ubatch (#10059)
This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.

The motivation for this is that this should have been done as part of
Commit 19d900a756 ("llama : rename batch
to ubatch (#9950)") but for some reason I missed these functions in
that commit and only noticed them now (sorry).
2025-01-06 11:28:17 +02:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names (#11063)
* llama : deprecate llama_free_model, add llama_model_free

ggml-ci

* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`

ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
3e6e7a6bc2
tokenize : escape the prompt (#11058)
* tokenize : escape the prompt

* tokenize : update help
2025-01-06 10:54:25 +02:00
Georgi Gerganov
ae2f606bb5
mmap : fix fileno macro clash (#11076)
* mmap : fix fileno macro clash

ggml-ci

* cont

ggml-ci
2025-01-06 10:52:38 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL (#11062)
ggml-ci
2025-01-06 10:52:15 +02:00
Georgi Gerganov
5047dd3546
llama : use _impl suffix instead of _internal (#11060)
ggml-ci
2025-01-06 10:52:01 +02:00
Johannes Gäßler
46e3556e01
CUDA: add BF16 support (#11093)
* CUDA: add BF16 support
2025-01-06 02:33:52 +01:00
Eve
c01ccf8288 little stuff 2025-01-04 21:41:50 -05:00
Eve
d70a731639 q2_k 2025-01-04 21:00:45 -05:00
Eve
07d0d58bef q3_k 2025-01-04 20:50:59 -05:00
Eve
b0e4ccbeb9 revert it 2025-01-04 20:50:59 -05:00
Eve
21c6b805c9 q4_k test (slow) 2025-01-04 20:50:44 -05:00
0cc4m
b56f079e28
Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074)
* Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver

* Add (TM) to AMD name check
2025-01-04 21:09:59 +01:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 (#11049)
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type

* vocab : add DeepSeek V3 pre-tokenizer regexes

* unicode : handle ACCENT_MARK and SYMBOL categories in regex

* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-01-04 21:06:11 +01:00
Eve
6b06d16890 16 bit unpack 2025-01-04 14:29:28 -05:00