ochafik
238b9689e0
Update test_chat_completion.py
2024-12-30 04:59:13 +00:00
ochafik
389d79b6b4
Try and work around msvc++ non-macro max resolution quirk
2024-12-30 04:50:20 +00:00
ochafik
ce48584f7d
No designated initializers yet
2024-12-30 04:19:33 +00:00
ochafik
06b5159560
Avoid print in get_hf_chat_template.py
2024-12-30 04:10:35 +00:00
ochafik
80138d9007
Add missing <optional> include
2024-12-30 04:10:20 +00:00
ochafik
e5113e8d74
Add --jinja and --chat-template-file flags
2024-12-30 03:50:51 +00:00
ochafik
abd274a48f
Copy minja from 58f0ca6dd7
2024-12-30 03:50:51 +00:00
Jeff Bolz
a813badbbd
vulkan: im2col and matmul optimizations for stable diffusion ( #10942 )
...
* tests: Add im2col perf tests
* vulkan: optimize im2col, more elements per thread
* vulkan: increase small tile size for NV_coopmat2
* vulkan: change im2col to 512 elements per workgroup
2024-12-29 10:16:34 +01:00
Jeff Bolz
fdd2188912
vulkan: Use push constant offset to handle misaligned descriptors ( #10987 )
2024-12-29 09:35:11 +01:00
Isaac McFadyen
f865ea149d
server: added more docs for response_fields field ( #10995 )
2024-12-28 16:09:19 +01:00
Alexey Parfenov
16cdce7b68
server : fix token duplication when streaming with stop strings ( #10997 )
2024-12-28 16:08:54 +01:00
ochafik
523ebf8cba
Simplify tool call grammars when there's only 1 tool
2024-12-27 02:20:52 +00:00
ochafik
a2fe8a4922
Fix tool-call server tests
2024-12-27 02:15:43 +00:00
ochafik
0a5d527508
Update fetch_server_test_models.py
2024-12-27 00:58:59 +00:00
ochafik
0e87ae24cd
rm trailing spaces
2024-12-27 00:07:58 +00:00
ochafik
f645887e0c
Update minja.hpp 202aa2f3de
2024-12-26 21:36:34 +00:00
ochafik
f0bd69380b
Update test-tool-call.cpp
2024-12-26 21:26:25 +00:00
ochafik
e70ce3f613
Merge remote-tracking branch 'origin/master' into tool-call
2024-12-26 21:26:21 +00:00
Eve
d79d8f39b4
vulkan: multi-row k quants ( #10846 )
...
* multi row k quant shaders!
* better row selection
* more row choices
* readjust row selection
* rm_kq=2 by default
2024-12-26 16:54:44 +01:00
Peter
d283d02bf2
examples, ggml : fix GCC compiler warnings ( #10983 )
...
Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)
2024-12-26 14:59:11 +01:00
Reza Kakhki
9ba399dfa7
server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )
...
* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 21:33:04 +01:00
Djip007
2cd43f4900
ggml : more perfo with llamafile tinyblas on x86_64 ( #10714 )
...
* more perfo with llamafile tinyblas on x86_64.
- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth
simple tinyblas dispache and more cache freindly
* tinyblas dynamic dispaching
* sgemm: add M blocs.
* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2
* remove not stable test
2024-12-24 18:54:49 +01:00
NeverLucky
09fe2e7613
server: allow filtering llama server response fields ( #10940 )
...
* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 17:39:49 +01:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS ( #10930 )
...
* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment
2024-12-24 09:44:20 +02:00
Diego Devesa
60cfa728e2
ggml : use wstring for backend search paths ( #10960 )
...
ggml-ci
2024-12-24 04:05:27 +01:00
Diego Devesa
3327bb0f8d
ggml : fix arm enabled features check ( #10961 )
2024-12-24 04:05:17 +01:00
Diego Devesa
32d6ee6385
ggml : fix const usage in SSE path ( #10962 )
2024-12-23 20:25:52 +01:00
Xuan Son Nguyen
14b699ecde
server : fix missing model id in /model endpoint ( #10957 )
...
* server : fix missing model id in /model endpoint
* fix ci
2024-12-23 12:52:25 +01:00
Xuan Son Nguyen
485dc01214
server : add system_fingerprint to chat/completion ( #10917 )
...
* server : add system_fingerprint to chat/completion
* update README
2024-12-23 12:02:44 +01:00
Radoslav Gerganov
86bf31cfe6
rpc-server : add support for the SYCL backend ( #10934 )
2024-12-23 10:39:30 +02:00
Yun Dou
b92a14a841
llama : support InfiniAI Megrez 3b ( #10893 )
...
* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez
2024-12-23 01:35:44 +01:00
ymcki
6f0c9e034b
llama : support for Llama-3_1-Nemotron-51B ( #10669 )
...
* conflict resolution
* move comments after bracket to its own line
2024-12-23 01:22:33 +01:00
Eric Curtin
dab76c92cc
llama-run : include temperature option ( #10899 )
...
This commit updates the `examples/run/README.md` file to include a new
option for setting the temperature and updates the `run.cpp` file to
parse this option.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-23 01:21:40 +01:00
yuri@FreeBSD
7024d59e6a
ggml : fix run-time on FreeBSD in get_executable_path() ( #10948 )
2024-12-23 01:20:11 +01:00
Rudi Servo
7c0e285858
devops : add docker-multi-stage builds ( #10832 )
2024-12-22 23:22:58 +01:00
Billel Mokeddem
7ae33a616f
llama : add Falcon3 support ( #10883 )
...
* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring
2024-12-23 00:09:58 +02:00
Jeff Bolz
ebdee9478c
vulkan: build fixes for 32b ( #10927 )
...
* vulkan: build fixes for 32b
Should fix #10923
* vulkan: initialize some buffer/offset variables
2024-12-22 10:44:01 +01:00
Georgi Gerganov
5cd85b5e00
convert : add BertForMaskedLM ( #10919 )
2024-12-21 10:10:18 +02:00
Jeff Bolz
a91a41364b
vulkan: optimize coopmat2 dequant functions ( #10855 )
...
Change the code to do 16b loads when possible and extract the appropriate
component late, so the code is effectively decoding a pair of elements and
then selecting one. This can allow more commoning to happen in the compiler
when neighboring elements are loaded.
2024-12-21 08:04:45 +01:00
Adrien Gallouët
e34c5af43f
ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() ( #10874 )
...
* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* ggml-cpu: format code
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-21 00:33:37 +01:00
Akarshan Biswas
eb5c3dc64b
SYCL: Migrate away from deprecated ggml_tensor->backend ( #10840 )
...
* Migrate to tensor->buffer for checking backend buffer type: 1
* SYCL: common.cpp try to migrate away from tensor->backend
* SYCL: fix assertions and add proper comments
* SYCL: remove extra space
* SYCL: Add back static to ggml_backend_buffer_is_sycl_split function
* SYCL: Add pragma directive to suppress warning spam
* SYCL: Integrate debug logs with GGML_LOG and other fixes
* Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"
This reverts commit 2607b7de0f
.
Let's keep the current SYCL specific logging mechanism for now
* SYCL: Use GGML_SYCL_DEBUG after reverting
* SYCL: reg_get_proc_address func, update to the current func signature
* SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
2024-12-20 23:31:28 +08:00
Xuan Son Nguyen
0ca416c91a
server : (UI) fix copy to clipboard function ( #10916 )
2024-12-20 14:12:06 +01:00
Diego Devesa
21ae3b9be8
ggml : add test for SVE and disable when it fails ( #10906 )
2024-12-20 13:31:28 +01:00
Molly Sophia
0a11f8b7b5
convert : fix RWKV v6 model conversion ( #10913 )
...
* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-20 11:44:58 +02:00
Georgi Gerganov
d408bb9268
clip : disable GPU support ( #10896 )
...
ggml-ci
2024-12-19 18:47:15 +02:00
Georgi Gerganov
5cab3e4aaa
llama : minor grammar refactor ( #10897 )
...
ggml-ci
2024-12-19 17:42:13 +02:00
Georgi Gerganov
36319dec5d
tts : small QoL for easy model fetch ( #10903 )
2024-12-19 17:35:15 +02:00
Xuan Son Nguyen
57bb2c40cd
server : fix logprobs, make it OAI-compatible ( #10783 )
...
* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token
2024-12-19 15:40:08 +01:00
Adrien Gallouët
a3c33b1dce
ggml: fix arm build with gcc ( #10895 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-19 14:20:41 +01:00
Sukriti Sharma
2fffc52b50
llama : fix Roberta embeddings ( #10856 )
...
* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
2024-12-19 15:04:51 +02:00