Pierrick HYMBERT
2cdd21e26b
server: tests: increase timeout for completion
2024-03-02 20:32:20 +01:00
Pierrick HYMBERT
c1f66f05f5
server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1
2024-03-02 20:19:21 +01:00
Pierrick HYMBERT
1aa5ad9150
server: tests: fix re content
2024-03-02 19:30:19 +01:00
Pierrick HYMBERT
830d0efbd2
server: tests: CI workflow failed on first scenario failed
2024-03-02 19:30:13 +01:00
Pierrick HYMBERT
763ae0a1fd
Merge remote-tracking branch 'origin/tests/server/passkey' into tests/server/passkey
...
# Conflicts:
# .github/workflows/server.yml
2024-03-02 19:13:33 +01:00
Pierrick HYMBERT
61b97915b0
server: metrics: fix when no prompt processed
2024-03-02 19:11:53 +01:00
Pierrick HYMBERT
9fcfa63a11
server: tests: schedule slow tests on master
2024-03-02 19:00:29 +01:00
Pierrick HYMBERT
9ab72d7ade
server: tests: schedule slow tests on master
2024-03-02 18:58:21 +01:00
Pierrick HYMBERT
178b0c693d
server: tests: fix regex content matching
2024-03-02 18:57:57 +01:00
Pierrick HYMBERT
407cc609d3
server: tests: fix passkey, add doc, fix regex content matching, fix timeout
2024-03-02 18:53:01 +01:00
Pierrick HYMBERT
8abf8d3a08
server: tests: fix server timeout
2024-03-02 15:51:27 +01:00
Pierrick HYMBERT
a80533e276
server: tests - passkey - limit the number of max tokens to predix
2024-03-02 14:42:11 +01:00
Pierrick HYMBERT
f8773f759e
server: tests - passkey - limit the number of max tokens to predix
2024-03-02 14:38:08 +01:00
Pierrick HYMBERT
cf4c86ee20
server: tests - passkey - first good working value of nga
2024-03-02 14:31:27 +01:00
Pierrick HYMBERT
ed60b97434
server: tests - fix passkey not using pre/suffix
2024-03-02 14:25:10 +01:00
Pierrick HYMBERT
3b8242a188
server: tests - missing EOL at EOF
2024-03-02 14:13:49 +01:00
Pierrick HYMBERT
af82fb4ad7
server: revert change on slot n_ctx
2024-03-02 14:12:12 +01:00
Pierrick HYMBERT
2495f7273a
server: logs: do not truncate log values
2024-03-02 14:01:06 +01:00
Pierrick HYMBERT
616d7e9a9b
server: do not truncate prompt tokens if self-extend through group attention is enabled
2024-03-02 13:52:52 +01:00
Pierrick HYMBERT
60113da241
server: tests: add group attention params
2024-03-02 13:50:28 +01:00
Pierrick HYMBERT
ab5b06b2cf
server: logs: do not truncate log values
2024-03-02 13:49:18 +01:00
Pierrick HYMBERT
18e739d61d
server: tests: add passkey test
2024-03-02 13:10:18 +01:00
Pierrick HYMBERT
319ded7dde
server: tests: download model from HF, add batch size
2024-03-02 13:01:57 +01:00
Pierrick HYMBERT
1780d9601d
server: tests: add debug field in context before scenario
2024-03-02 12:50:55 +01:00
Pierrick HYMBERT
0f774a81cd
server: /v1/models add some metadata
2024-03-02 12:07:22 +01:00
Pierrick HYMBERT
73a7e42692
server: tests: add models endpoint scenario
2024-03-02 07:37:49 +01:00
Tushar
cb5e8f7fc4
build(nix): Introduce flake.formatter for nix fmt
( #5687 )
...
* build(nix): Introduce flake.formatter for `nix fmt`
* chore: Switch to pkgs.nixfmt-rfc-style
2024-03-01 15:18:26 -08:00
nold
da3b9ba2b7
convert-hf-to-gguf : require einops for InternLM2ForCausalLM ( #5792 )
2024-03-01 16:51:12 -05:00
Sourab Mangrulkar
c29af7e225
llama : add StarCoder2 support ( #5795 )
...
* Add support for starcoder2
* handle rope type
* skip rope freq and rotary embeddings from being serialized
* resolve comments
* Update llama.cpp
* remove redundant changes
* handle `rope-theta`
* llama : change starcoder2 rope type
* address comment
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-01 21:30:46 +02:00
Georgi Gerganov
38d16b1426
server : remove api_like_OAI.py proxy script ( #5808 )
2024-03-01 20:00:58 +02:00
ddpasa
c2224f003b
ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken ( #5813 )
2024-03-01 18:00:00 +01:00
kunal-vaishnavi
e743386728
gemma : fix bfloat16 -> float16 conversion issue ( #5810 )
2024-03-01 16:08:08 +02:00
Miwa / Ensan
f49a535686
common : fix flag --logits-all
to --all-logits
( #5805 )
2024-03-01 15:48:56 +02:00
Pierrick Hymbert
3ab8b3a92e
llama : cleanup unused mmq flags ( #5772 )
...
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q
* remove: mul_mat_q in compare llama bench and usage
* update llama-bench
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-01 13:39:06 +02:00
Douglas Hanley
9600d59e01
unicode : switch to multimap based nfd_map ( #5799 )
...
* switch to multimap based nfd_map due to compile time issues
* simplify multimap keys
* dont construct new locale every time
2024-03-01 11:15:36 +02:00
Pierrick Hymbert
5cb02b4a01
server: allow to override threads server pool with --threads-http ( #5794 )
2024-03-01 10:08:08 +01:00
Eve
6ea0f010ff
ci : add Ubuntu 22 Vulkan CI run ( #5789 )
2024-03-01 10:54:53 +02:00
Georgi Gerganov
f105471ef6
server : fix newlines in help ( #5785 )
2024-03-01 09:59:43 +02:00
AidanBeltonS
38d1521608
[SYCL] Use batched mul_mat pathway ( #5591 )
...
* Use batched mul_mat pathway
* rm extra line
* Explicitly state scaled data type
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-03-01 13:06:47 +05:30
Xuan Son Nguyen
052051d8ae
Server: normalize naming ( #5779 )
...
* server: normalize naming
* fix spacing
2024-02-29 21:42:11 +01:00
Marcus Dunn
d5ab29757e
llama : constified llama_set_state_data
's src
( #5774 )
2024-02-29 10:17:23 +02:00
Georgi Gerganov
87c91c0766
ci : reduce 3b ppl chunks to 1 to avoid timeout ( #5771 )
...
ggml-ci
2024-02-28 21:44:21 +02:00
Eve
317709b2a8
make portability_enumeration_ext apple only ( #5757 )
2024-02-28 20:33:37 +01:00
Georgi Gerganov
08c5ee87e4
llama : remove deprecated API ( #5770 )
...
ggml-ci
2024-02-28 18:43:38 +02:00
Georgi Gerganov
78aacf3634
awq-py : remove ( #5768 )
2024-02-28 17:36:53 +02:00
Georgi Gerganov
8c0e8f4e73
sync : ggml
2024-02-28 11:17:32 +02:00
slaren
2774b0c974
add google magika inference example (ggml/748)
...
* add magika inference example
* ggml : fix unaligned accesses in custom ops
* ggml : fix FP32 GELU for values that exceed the FP16 range
* use ggml_pool_1d
* add README
* Update README.md
* pad inputs if the files are too small
* cleanup
ggml-ci
2024-02-28 11:17:06 +02:00
UEXTM.com
5f70671856
Introduce backend GUIDs (ggml/743)
...
* Introduce backend GUIDs
Initial proposed implementation of backend GUIDs
(Discussed in https://github.com/ggerganov/ggml/pull/741 )
Hardcoded CPU backend GUID (for now)
Change ggml_backend_is_cpu logic to use GUID
* Remove redundant functions
Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion
* Add spaces to match style
Co-authored-by: slaren <slarengh@gmail.com>
* Fix brace style to match
Co-authored-by: slaren <slarengh@gmail.com>
* Add void to () in function signature
Co-authored-by: slaren <slarengh@gmail.com>
* Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid
* add guids to all backends
ggml-ci
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-02-28 11:17:05 +02:00
Xuan Son Nguyen
a693bea1e6
server : hit Ctrl+C twice to exit ( #5734 )
...
* server: twice ctrl+C to exit
* std::atomic_flag
* sigint: message
* sigint: stderr
* Update examples/server/server.cpp
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-28 10:55:37 +02:00
compilade
adcb12a9ba
llama : fix non-quantization of expert gating tensors ( #5754 )
...
This reverts a single line from #5475
2024-02-28 10:52:56 +02:00