Commit graph

2337 commits

Author SHA1 Message Date
Pierrick HYMBERT
a6ea72541f server: tests: keep only the PHI-2 test 2024-03-02 20:53:00 +01:00
Pierrick HYMBERT
2cdd21e26b server: tests: increase timeout for completion 2024-03-02 20:32:20 +01:00
Pierrick HYMBERT
c1f66f05f5 server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1 2024-03-02 20:19:21 +01:00
Pierrick HYMBERT
1aa5ad9150 server: tests: fix re content 2024-03-02 19:30:19 +01:00
Pierrick HYMBERT
830d0efbd2 server: tests: CI workflow failed on first scenario failed 2024-03-02 19:30:13 +01:00
Pierrick HYMBERT
763ae0a1fd Merge remote-tracking branch 'origin/tests/server/passkey' into tests/server/passkey
# Conflicts:
#	.github/workflows/server.yml
2024-03-02 19:13:33 +01:00
Pierrick HYMBERT
61b97915b0 server: metrics: fix when no prompt processed 2024-03-02 19:11:53 +01:00
Pierrick HYMBERT
9fcfa63a11 server: tests: schedule slow tests on master 2024-03-02 19:00:29 +01:00
Pierrick HYMBERT
9ab72d7ade server: tests: schedule slow tests on master 2024-03-02 18:58:21 +01:00
Pierrick HYMBERT
178b0c693d server: tests: fix regex content matching 2024-03-02 18:57:57 +01:00
Pierrick HYMBERT
407cc609d3 server: tests: fix passkey, add doc, fix regex content matching, fix timeout 2024-03-02 18:53:01 +01:00
Pierrick HYMBERT
8abf8d3a08 server: tests: fix server timeout 2024-03-02 15:51:27 +01:00
Pierrick HYMBERT
a80533e276 server: tests - passkey - limit the number of max tokens to predix 2024-03-02 14:42:11 +01:00
Pierrick HYMBERT
f8773f759e server: tests - passkey - limit the number of max tokens to predix 2024-03-02 14:38:08 +01:00
Pierrick HYMBERT
cf4c86ee20 server: tests - passkey - first good working value of nga 2024-03-02 14:31:27 +01:00
Pierrick HYMBERT
ed60b97434 server: tests - fix passkey not using pre/suffix 2024-03-02 14:25:10 +01:00
Pierrick HYMBERT
3b8242a188 server: tests - missing EOL at EOF 2024-03-02 14:13:49 +01:00
Pierrick HYMBERT
af82fb4ad7 server: revert change on slot n_ctx 2024-03-02 14:12:12 +01:00
Pierrick HYMBERT
2495f7273a server: logs: do not truncate log values 2024-03-02 14:01:06 +01:00
Pierrick HYMBERT
616d7e9a9b server: do not truncate prompt tokens if self-extend through group attention is enabled 2024-03-02 13:52:52 +01:00
Pierrick HYMBERT
60113da241 server: tests: add group attention params 2024-03-02 13:50:28 +01:00
Pierrick HYMBERT
ab5b06b2cf server: logs: do not truncate log values 2024-03-02 13:49:18 +01:00
Pierrick HYMBERT
18e739d61d server: tests: add passkey test 2024-03-02 13:10:18 +01:00
Pierrick HYMBERT
319ded7dde server: tests: download model from HF, add batch size 2024-03-02 13:01:57 +01:00
Pierrick HYMBERT
1780d9601d server: tests: add debug field in context before scenario 2024-03-02 12:50:55 +01:00
Pierrick HYMBERT
0f774a81cd server: /v1/models add some metadata 2024-03-02 12:07:22 +01:00
Pierrick HYMBERT
73a7e42692 server: tests: add models endpoint scenario 2024-03-02 07:37:49 +01:00
Tushar
cb5e8f7fc4
build(nix): Introduce flake.formatter for nix fmt (#5687)
* build(nix): Introduce flake.formatter for `nix fmt`
* chore: Switch to pkgs.nixfmt-rfc-style
2024-03-01 15:18:26 -08:00
nold
da3b9ba2b7
convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) 2024-03-01 16:51:12 -05:00
Sourab Mangrulkar
c29af7e225
llama : add StarCoder2 support (#5795)
* Add support for starcoder2

* handle rope type

* skip rope freq and rotary embeddings from being serialized

* resolve comments

* Update llama.cpp

* remove redundant changes

* handle `rope-theta`

* llama : change starcoder2 rope type

* address comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-01 21:30:46 +02:00
Georgi Gerganov
38d16b1426
server : remove api_like_OAI.py proxy script (#5808) 2024-03-01 20:00:58 +02:00
ddpasa
c2224f003b
ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) 2024-03-01 18:00:00 +01:00
kunal-vaishnavi
e743386728
gemma : fix bfloat16 -> float16 conversion issue (#5810) 2024-03-01 16:08:08 +02:00
Miwa / Ensan
f49a535686
common : fix flag --logits-all to --all-logits (#5805) 2024-03-01 15:48:56 +02:00
Pierrick Hymbert
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q

* remove: mul_mat_q in compare llama bench and usage

* update llama-bench

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-01 13:39:06 +02:00
Douglas Hanley
9600d59e01
unicode : switch to multimap based nfd_map (#5799)
* switch to multimap based nfd_map due to compile time issues

* simplify multimap keys

* dont construct new locale every time
2024-03-01 11:15:36 +02:00
Pierrick Hymbert
5cb02b4a01
server: allow to override threads server pool with --threads-http (#5794) 2024-03-01 10:08:08 +01:00
Eve
6ea0f010ff
ci : add Ubuntu 22 Vulkan CI run (#5789) 2024-03-01 10:54:53 +02:00
Georgi Gerganov
f105471ef6
server : fix newlines in help (#5785) 2024-03-01 09:59:43 +02:00
AidanBeltonS
38d1521608
[SYCL] Use batched mul_mat pathway (#5591)
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-03-01 13:06:47 +05:30
Xuan Son Nguyen
052051d8ae
Server: normalize naming (#5779)
* server: normalize naming

* fix spacing
2024-02-29 21:42:11 +01:00
Marcus Dunn
d5ab29757e
llama : constified llama_set_state_data's src (#5774) 2024-02-29 10:17:23 +02:00
Georgi Gerganov
87c91c0766
ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771)
ggml-ci
2024-02-28 21:44:21 +02:00
Eve
317709b2a8
make portability_enumeration_ext apple only (#5757) 2024-02-28 20:33:37 +01:00
Georgi Gerganov
08c5ee87e4
llama : remove deprecated API (#5770)
ggml-ci
2024-02-28 18:43:38 +02:00
Georgi Gerganov
78aacf3634
awq-py : remove (#5768) 2024-02-28 17:36:53 +02:00
Georgi Gerganov
8c0e8f4e73
sync : ggml 2024-02-28 11:17:32 +02:00
slaren
2774b0c974
add google magika inference example (ggml/748)
* add magika inference example

* ggml : fix unaligned accesses in custom ops

* ggml : fix FP32 GELU for values that exceed the FP16 range

* use ggml_pool_1d

* add README

* Update README.md

* pad inputs if the files are too small

* cleanup

ggml-ci
2024-02-28 11:17:06 +02:00
UEXTM.com
5f70671856
Introduce backend GUIDs (ggml/743)
* Introduce backend GUIDs

Initial proposed implementation of backend GUIDs
(Discussed in https://github.com/ggerganov/ggml/pull/741)

Hardcoded CPU backend GUID (for now)
Change ggml_backend_is_cpu logic to use GUID

* Remove redundant functions

Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion

* Add spaces to match style

Co-authored-by: slaren <slarengh@gmail.com>

* Fix brace style to match

Co-authored-by: slaren <slarengh@gmail.com>

* Add void to () in function signature

Co-authored-by: slaren <slarengh@gmail.com>

* Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid

* add guids to all backends

ggml-ci

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-28 11:17:05 +02:00
Xuan Son Nguyen
a693bea1e6
server : hit Ctrl+C twice to exit (#5734)
* server: twice ctrl+C to exit

* std::atomic_flag

* sigint: message

* sigint: stderr

* Update examples/server/server.cpp

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-28 10:55:37 +02:00