compilade
511636df0c
ci : reduce severity of unused Pyright ignore comments ( #9697 )
2024-09-30 14:13:16 -04:00
vb
08a43d05b6
py : update transfomers version ( #9694 )
...
* update transfomers version.
* update hfh version.
2024-09-30 18:03:47 +03:00
Georgi Gerganov
ace4f4be37
flake.lock: Update ( #9680 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/c04d5652cfa9742b1d519688f65d1bbccea9eb7e?narHash=sha256-PmUr/2GQGvFTIJ6/Tvsins7Q43KTMvMFhvG6oaYK%2BWk%3D' (2024-09-19)
→ 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-30 07:48:49 -07:00
Ruchira Hasaranga
8277a817f1
console : utf-8 fix for windows stdin ( #9690 )
...
* utf-8 fix for windows stdin
* Update common/console.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-30 11:23:42 +03:00
ochafik
d9451fd647
antiprompts
: avoid c++20 struct initializers in test
2024-09-30 04:08:55 +01:00
ochafik
0fc5ad7ae1
minja
: avoid c++20 struct initializers in test
2024-09-30 03:51:48 +01:00
ochafik
277f38536c
minja
: attempt to handle windows' crlf
2024-09-30 03:45:50 +01:00
Georgi Gerganov
c919d5db39
ggml : define missing HWCAP flags ( #9684 )
...
ggml-ci
Co-authored-by: Willy Tarreau <w@1wt.eu>
2024-09-29 21:18:23 +03:00
Georgi Gerganov
d0b1d663e4
sync : ggml
2024-09-29 21:16:07 +03:00
Johannes Gäßler
aaa4099925
CUDA: remove bad assert (ggml/972)
2024-09-29 21:15:37 +03:00
Jeff Bolz
641002fba8
vulkan : multithread pipeline creation (ggml/963)
2024-09-29 21:15:37 +03:00
Jeff Bolz
0de8b203f1
vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961)
2024-09-29 21:15:37 +03:00
Salvatore Mesoraca
544f409b4b
vulkan : argsort barriers must be under uniform control flow (ggml/951)
...
a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).
BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-29 21:15:37 +03:00
Georgi Gerganov
6084bfb261
ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969)
2024-09-29 21:15:35 +03:00
matiaslin
faac0bae26
common : ensure llama_batch size does not exceed max size ( #9668 )
...
A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
2024-09-29 15:25:00 +03:00
nopperl
f99d3f8367
py : add model class for Chameleon conversion ( #9683 )
2024-09-29 15:02:06 +03:00
Georgi Gerganov
589b48d41e
contrib : add Resources section ( #9675 )
2024-09-29 14:38:18 +03:00
ochafik
9ac4b04aa2
tool-call
: add fs_list_files to common, w/ win32 impl for msys2 build
2024-09-29 00:42:52 +01:00
ochafik
cb7912ee74
chat-template
: add phi-3.5-vision-instruct
2024-09-29 00:33:19 +01:00
ochafik
8738d94bbd
minja
: qualify std::nullptr_t type for msys2 build
2024-09-29 00:18:22 +01:00
ochafik
c87c12168a
tool-call
: fix memory leak in test
2024-09-28 23:44:28 +01:00
ochafik
22493c8e9e
tests
: fix test-chat-template run from build
2024-09-28 23:31:23 +01:00
ochafik
ad6719e2a7
tests
: fix typo
2024-09-28 23:26:19 +01:00
ochafik
a072f30a8d
tests
: attempt to find assets for tests run from build subfolder
2024-09-28 23:15:36 +01:00
ochafik
bc3e0c0830
tool-call
: Qwen 2.5 Instruct also requires object arguments
2024-09-28 23:05:35 +01:00
ochafik
b10ef04d8d
chat-template
: tweak --chat-template error message when --jinja is set
2024-09-28 22:36:38 +01:00
ochafik
dbda025f87
tool-call
: test messages -> template -> grammar -> tool call parser
2024-09-28 22:32:47 +01:00
ochafik
0ae1112faa
agent
: try to fix pyright lint
2024-09-28 20:10:08 +01:00
ochafik
1b32ac129f
chat-template
: fix test-arg
2024-09-28 20:06:10 +01:00
ochafik
9358d1f62c
minja
: fix gcc8 build of test
2024-09-28 19:50:08 +01:00
ochafik
e6be59c2a0
antiprompts
: fix gcc8 build (avoid recursive struct)
2024-09-28 19:39:52 +01:00
ochafik
ef2a020276
tool-call
: make agent async
2024-09-28 19:11:09 +01:00
ochafik
05bbba9f8a
tool-call
: only match json eagerly for Llama 3.2
2024-09-28 19:05:10 +01:00
ochafik
6e0053a81b
chat-template
: enumerate files w/ C API rather than private using std::__fs::filesystem
2024-09-28 18:47:11 +01:00
ochafik
c657857e21
tool-call
: cleanup tools.py
2024-09-28 18:33:40 +01:00
ochafik
55cf337560
tool-call
: better error reporting for server tests
2024-09-28 18:33:40 +01:00
ochafik
7cef90cf9c
tool-call
: more eager function call parsing for Functionary & Llama (give a chance to 3B model)
2024-09-28 18:33:40 +01:00
ochafik
8b2cf3509f
tool-call
: fix grammar trigger crash
2024-09-28 18:30:01 +01:00
ochafik
d983516f40
tool-call
: let the tool call handler expand chat template, moving builtin_tools down as extra_context
2024-09-28 17:46:36 +01:00
ochafik
0c85bc7a8f
tool-call
: test tool call style detection
2024-09-28 17:43:09 +01:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support ( #9510 )
...
* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00
slaren
1b2f992cd2
test-backend-ops : use flops for some performance tests ( #9657 )
...
* test-backend-ops : use flops for some performance tests
- parallelize tensor quantization
- use a different set of cases for performance and correctness tests
- run each test for at least one second
2024-09-28 14:32:46 +02:00
Georgi Gerganov
739842703e
llama : add comment about thread-safety [no ci] ( #9449 )
2024-09-28 15:13:42 +03:00
Zhenwei Jin
6102037bbb
vocab : refactor tokenizer to reduce init overhead ( #9449 )
...
* refactor tokenizer
* llama : make llm_tokenizer more private
ggml-ci
* refactor tokenizer
* refactor tokenizer
* llama : make llm_tokenizer more private
ggml-ci
* remove unused files
* remove unused fileds to avoid unused filed build error
* avoid symbol link error
* Update src/llama.cpp
* Update src/llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-28 15:10:58 +03:00
nopperl
9a913110cf
llama : add support for Chameleon ( #8543 )
...
* convert chameleon hf to gguf
* add chameleon tokenizer tests
* fix lint
* implement chameleon graph
* add swin norm param
* return qk norm weights and biases to original format
* implement swin norm
* suppress image token output
* rem tabs
* add comment to conversion
* fix ci
* check for k norm separately
* adapt to new lora implementation
* fix layer input for swin norm
* move swin_norm in gguf writer
* add comment regarding special token regex in chameleon pre-tokenizer
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix punctuation regex in chameleon pre-tokenizer (@compilade)
Co-authored-by: compilade <git@compilade.net>
* fix lint
* trigger ci
---------
Co-authored-by: compilade <git@compilade.net>
2024-09-28 15:08:43 +03:00
Aarni Koskela
43bcdd9703
readme : add tool ( #9655 )
2024-09-28 15:07:14 +03:00
Dan Johansson
6a0f779484
ggml : add run-time detection of neon, i8mm and sve ( #9331 )
...
* ggml: Added run-time detection of neon, i8mm and sve
Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.
* ggml: Extend feature detection to include non aarch64 Arm arch
* ggml: Move definition of ggml_arm_arch_features to the global data section
2024-09-28 15:06:16 +03:00
Markus Tavenrath
89f9944981
Enable use to the rebar feature to upload buffers to the device. ( #9251 )
2024-09-28 12:05:05 +02:00
ochafik
887951beb0
minja
: generate chat goldens w/ fixed date to support Llama-3.2-3B-Instruct (uses strftime_now)
2024-09-27 19:52:15 +01:00
ochafik
701b664551
minja
: add indent
filter to support command-r-plus's chat templates
2024-09-27 19:00:14 +01:00