Georgi Gerganov
9d0156bf0a
minor [no ci]
2025-01-03 10:17:41 +02:00
Georgi Gerganov
69dd1e859a
llama : quant (cont)
...
ggml-ci
2025-01-02 21:57:46 +02:00
Georgi Gerganov
e06d267ac6
llama : quant
...
ggml-ci
2025-01-02 21:40:16 +02:00
Georgi Gerganov
272cd0eaea
common : update lora
...
ggml-ci
2025-01-02 17:31:26 +02:00
Georgi Gerganov
8d117a518d
llama : model loader
...
ggml-ci
2025-01-02 16:56:37 +02:00
Georgi Gerganov
736e6922ce
llama : context (cont)
...
ggml-ci
2025-01-02 16:56:37 +02:00
Georgi Gerganov
4b39d7020d
minor
2025-01-02 16:56:37 +02:00
Georgi Gerganov
007064f5ec
llama : context
...
ggml-ci
2025-01-02 16:56:36 +02:00
Georgi Gerganov
5bf9dc5783
cont
...
ggml-ci
2025-01-02 16:56:36 +02:00
Georgi Gerganov
add3bfe068
llama : batch
...
ggml-ci
2025-01-02 16:56:36 +02:00
Georgi Gerganov
5f794937d9
llama : impl
...
ggml-ci
2025-01-02 16:56:36 +02:00
Georgi Gerganov
8ab668e122
llama : kv cache
...
ggml-ci
2025-01-02 16:56:36 +02:00
Georgi Gerganov
55791c17f6
minor
2025-01-02 16:56:36 +02:00
Georgi Gerganov
2a3aa05ce9
rebase
...
ggml-ci
2025-01-02 16:56:35 +02:00
Georgi Gerganov
2ebe8fe60e
examples : fix
...
ggml-ci
2025-01-02 16:56:34 +02:00
Georgi Gerganov
30e0c88975
llama : adapter
...
ggml-ci
2025-01-02 16:55:42 +02:00
Georgi Gerganov
a25ff12f8e
llama : hparams
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
7a3065f368
llama : model
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
a2dc93ed20
llama : chat
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
6c22ce1097
llama : arch (cont)
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
e9c9209e01
ci : remove BUILD_SHARED_LIBS=OFF
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
6b24e6eb97
llama : mmap
...
ggml-ci
2025-01-02 16:55:41 +02:00
Georgi Gerganov
cf899ea0d3
llama : arch
2025-01-02 16:55:41 +02:00
Georgi Gerganov
844660ba5d
llama : control-vector -> adapter
2025-01-02 16:55:40 +02:00
Georgi Gerganov
498b68f97d
llama : scatter llama.cpp into multiple modules (wip)
2025-01-02 16:55:40 +02:00
Xuan Son Nguyen
0da5d86026
server : allow using LoRA adapters per-request ( #10994 )
...
* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-02 15:05:18 +01:00
Benson Wong
a45433ba20
readme : add llama-swap to infrastructure section ( #11032 )
...
* list llama-swap under tools in README
* readme: add llama-swap to Infrastructure
2025-01-02 09:14:54 +02:00
Srihari-mcw
0827b2c1da
ggml : fixes for AVXVNNI instruction set with MSVC and Clang ( #11027 )
...
* Fixes for clang AVX VNNI
* enable AVX VNNI and alder lake build for MSVC
* Apply suggestions from code review
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-31 15:23:33 +01:00
Xuan Son Nguyen
45095a61bf
server : clean up built-in template detection ( #11026 )
...
* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition
2024-12-31 15:22:01 +01:00
Xuan Son Nguyen
5896c65232
server : add OAI compat for /v1/completions ( #10974 )
...
* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs
2024-12-31 12:34:13 +01:00
ymcki
bc7b1f8632
convert : fix Llama-3_1-Nemotron-51B rope settings ( #11008 )
...
* conflict resolution
* move comments after bracket to its own line
* DeciLMCausalModel now reads rope_theta from config.json properly
2024-12-31 13:04:48 +02:00
Peter
6e1531aca5
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )
...
In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)
2024-12-31 01:46:06 +01:00
Jeff Bolz
716bd6dec3
vulkan: optimize mul_mat for small values of N ( #10991 )
...
Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.
Share some code for reducing the result values to memory in mul_mat_vec_base.
2024-12-30 18:27:11 +01:00
ag2s20150909
c250ecb315
android : fix llama_batch free ( #11014 )
2024-12-30 14:35:13 +02:00
Jeff Bolz
a813badbbd
vulkan: im2col and matmul optimizations for stable diffusion ( #10942 )
...
* tests: Add im2col perf tests
* vulkan: optimize im2col, more elements per thread
* vulkan: increase small tile size for NV_coopmat2
* vulkan: change im2col to 512 elements per workgroup
2024-12-29 10:16:34 +01:00
Jeff Bolz
fdd2188912
vulkan: Use push constant offset to handle misaligned descriptors ( #10987 )
2024-12-29 09:35:11 +01:00
Isaac McFadyen
f865ea149d
server: added more docs for response_fields field ( #10995 )
2024-12-28 16:09:19 +01:00
Alexey Parfenov
16cdce7b68
server : fix token duplication when streaming with stop strings ( #10997 )
2024-12-28 16:08:54 +01:00
Eve
d79d8f39b4
vulkan: multi-row k quants ( #10846 )
...
* multi row k quant shaders!
* better row selection
* more row choices
* readjust row selection
* rm_kq=2 by default
2024-12-26 16:54:44 +01:00
Peter
d283d02bf2
examples, ggml : fix GCC compiler warnings ( #10983 )
...
Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)
2024-12-26 14:59:11 +01:00
Reza Kakhki
9ba399dfa7
server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )
...
* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 21:33:04 +01:00
Djip007
2cd43f4900
ggml : more perfo with llamafile tinyblas on x86_64 ( #10714 )
...
* more perfo with llamafile tinyblas on x86_64.
- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth
simple tinyblas dispache and more cache freindly
* tinyblas dynamic dispaching
* sgemm: add M blocs.
* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2
* remove not stable test
2024-12-24 18:54:49 +01:00
NeverLucky
09fe2e7613
server: allow filtering llama server response fields ( #10940 )
...
* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 17:39:49 +01:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS ( #10930 )
...
* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment
2024-12-24 09:44:20 +02:00
Diego Devesa
60cfa728e2
ggml : use wstring for backend search paths ( #10960 )
...
ggml-ci
2024-12-24 04:05:27 +01:00
Diego Devesa
3327bb0f8d
ggml : fix arm enabled features check ( #10961 )
2024-12-24 04:05:17 +01:00
Diego Devesa
32d6ee6385
ggml : fix const usage in SSE path ( #10962 )
2024-12-23 20:25:52 +01:00
Xuan Son Nguyen
14b699ecde
server : fix missing model id in /model endpoint ( #10957 )
...
* server : fix missing model id in /model endpoint
* fix ci
2024-12-23 12:52:25 +01:00
Xuan Son Nguyen
485dc01214
server : add system_fingerprint to chat/completion ( #10917 )
...
* server : add system_fingerprint to chat/completion
* update README
2024-12-23 12:02:44 +01:00
Radoslav Gerganov
86bf31cfe6
rpc-server : add support for the SYCL backend ( #10934 )
2024-12-23 10:39:30 +02:00