Commit graph

4826 commits

Author SHA1 Message Date
Christian Köhnenkamp
9830b6923b
Add apple arm to presets (#10134)
* Add apple arm to presets

* Add final new line
2024-11-02 15:35:31 -07:00
sasha0552
42cadc74bd
server : fix slot selection by lru (#10126)
* server : fix slot selection by lru, migrate lcs to `size_t`

* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks (#10135)
ggml-ci
2024-11-02 18:34:00 +02:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings (#10136)
* llama : adjust default context size + print warnings

ggml-ci

* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Diego Devesa
b634f8a26f
simple-chat : only add bos on first prompt (#10129) 2024-11-02 13:08:53 +01:00
Xuan Son Nguyen
7554aa4655
convert-lora : make --base optional (#10110)
* convert-lora : make `--base` optional

* lint

* handle case where base_model_name_or_path is invalid

* do not include metadata from base model

* clarify unspecified --base

* add small comment [no ci]

* trigger ci
2024-11-02 12:53:17 +01:00
Diego Devesa
a6744e43e8
llama : add simple-chat example (#10124)
* llama : add simple-chat example

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-11-01 23:50:59 +01:00
Diego Devesa
e991e3127f
llama : use smart pointers for ggml resources (#10117) 2024-11-01 23:48:26 +01:00
Shupei Fan
418f5eef26
vulkan : improve ggml_vk_create_buffer error handling (#9898) 2024-11-01 19:33:14 +01:00
Georgi Gerganov
ba6f62eb79
readme : update hot topics 2024-11-01 17:31:51 +02:00
sasha0552
d865d1478c
server : fix smart selection of available slot (#10120)
* Fix smart selection of available slot

* minor fix

* replace vectors of tokens with shorthands
2024-11-01 14:33:14 +01:00
Georgi Gerganov
1804adb0cf
ggml : remove ggml_scratch (#10121)
ggml-ci
2024-11-01 12:58:45 +02:00
Georgi Gerganov
815fe72adc
sync : ggml 2024-11-01 10:28:24 +02:00
Georgi Gerganov
f221d56220
ggml : alloc ggml_contexts on the heap (whisper/2525) 2024-11-01 10:24:50 +02:00
Zhenwei Jin
e597e50794
build: fix build error in Windows env with OneAPI setup (#10107) 2024-11-01 11:09:59 +08:00
Diego Devesa
85679d37f3
llama : improve output buffer type selection (#10098) 2024-11-01 00:49:53 +01:00
Diego Devesa
1e9f94994e
quantize : fix --keep-split (#10114) 2024-11-01 00:45:34 +01:00
Diego Devesa
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
* llama : fix buffer checks for mamba and rwk

* llama : fix missing worst case flag during reserve

* cuda : fix supports_op for norm

* disable sched SET_CAUSE
2024-10-31 22:54:23 +01:00
Zhenwei Jin
ab3d71f97f
loader: refactor tensor weights storage (#9935)
* loader: refactor tensor weights storage

* use sorted map, sort weights by layer

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-10-31 19:50:39 +01:00
ochafik
bc52c0a4f0 agent: add missing tool name in response! 2024-10-31 15:01:17 +00:00
ochafik
479c1520b1 tool-call: fix qwen template test 2024-10-31 14:49:59 +00:00
ochafik
fe967b61a1 Update README.md 2024-10-31 14:37:55 +00:00
ochafik
f5f74751b9 nits 2024-10-31 14:28:52 +00:00
ochafik
c4a8050120 Update README.md 2024-10-31 14:27:40 +00:00
ochafik
9477c54676 tool-call: functionary-small-v3.2 test now green 2024-10-31 14:11:34 +00:00
ochafik
b35aa4ae1c tool-call: add LLAMA_UPDATE_GOLDENS env for test-chat-template 2024-10-31 13:53:33 +00:00
ochafik
c773516d57 tool-call: don't use -fa w/ Mistral-Nemo (hard crashes?) 2024-10-31 13:53:11 +00:00
ochafik
f5b7825595 tool-call: code_interpreter & system + tool call support for all jinja templates! 2024-10-31 13:52:46 +00:00
ochafik
c395d4804f tool-call: behaviour-based detection of template features 2024-10-31 13:45:10 +00:00
Kevin Gibbons
0a683e8088
server : include scheme when printing URL (#10106) 2024-10-31 14:02:35 +01:00
Diego Devesa
dea5e86051
ggml : check tensor name lengths in gguf files (#10100) 2024-10-31 11:40:59 +01:00
Sergio López
1329c0a75e
kompute: add mul_mat_q4_k shader (#10097)
This is a more or less direct translation from the Metal implementation
to GLSL.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-10-31 11:09:52 +02:00
ochafik
e8d9d711f6 Update tool_call.feature 2024-10-31 04:50:38 +00:00
ochafik
7d9c90f46b tool-call: nemo tweak (accept raw sql again) 2024-10-31 04:39:40 +00:00
ochafik
542853b34b tool-call: greedy sampling in server tests + tweak prompt 2024-10-31 04:38:22 +00:00
ochafik
be9de3ed8a Update llama-sampling.cpp 2024-10-31 03:58:15 +00:00
ochafik
61655b9cdd Merge remote-tracking branch 'origin/master' into tool-call 2024-10-31 01:45:07 +00:00
Olivier Chafik
e4d5449638 tool-calls: test Qwen2.5-7B-Instruct-Q4_K_M.gguf 2024-10-30 21:40:15 +00:00
Sergio López
61408e7fad
kompute: add backend registry / device interfaces (#10045)
Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-10-30 17:01:52 +01:00
Diego Devesa
b9e02e8184
ggml : fix memory leaks when loading invalid gguf files (#10094)
* ggml : fix gguf string leak when reading kv pairs fails

* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type

* ggml : avoid crashing on failed memory allocations when loading a gguf file
2024-10-30 14:51:21 +01:00
ochafik
5227321dfd tool-call: when slow server tests fail, hint to run python scripts/fetch_server_test_models.py 2024-10-30 12:40:22 +00:00
ochafik
35ac17f3f1 tool-call: fix missing initializer errors 2024-10-30 12:38:34 +00:00
Rich Dougherty
6763f713bb
readme : more lora detail in main example readme (#10064) 2024-10-30 13:22:39 +01:00
Rich Dougherty
79a2bc042d
convert : more detailed convert lora usage docs (#10065) 2024-10-30 13:22:21 +01:00
ochafik
3ebdb2b805 tool-call: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template 2024-10-30 10:07:10 +00:00
xctan
fc83a9e584
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM
2024-10-30 09:00:40 +02:00
Diego Devesa
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
Olivier Chafik
92c384a5e8 nits 2024-10-29 17:24:59 +00:00
Olivier Chafik
773ff91b7a tool-call: force printing of lazy grammar trigger tokens to regularize function call parsing 2024-10-29 15:26:51 +00:00
Olivier Chafik
fa4c1119c9 tool-call: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call) 2024-10-29 15:25:37 +00:00