llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	2cdd21e26b	server: tests: increase timeout for completion	2024-03-02 20:32:20 +01:00
Pierrick HYMBERT	c1f66f05f5	server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1	2024-03-02 20:19:21 +01:00
Pierrick HYMBERT	1aa5ad9150	server: tests: fix re content	2024-03-02 19:30:19 +01:00
Pierrick HYMBERT	830d0efbd2	server: tests: CI workflow failed on first scenario failed	2024-03-02 19:30:13 +01:00
Pierrick HYMBERT	763ae0a1fd	Merge remote-tracking branch 'origin/tests/server/passkey' into tests/server/passkey # Conflicts: # .github/workflows/server.yml	2024-03-02 19:13:33 +01:00
Pierrick HYMBERT	61b97915b0	server: metrics: fix when no prompt processed	2024-03-02 19:11:53 +01:00
Pierrick HYMBERT	9fcfa63a11	server: tests: schedule slow tests on master	2024-03-02 19:00:29 +01:00
Pierrick HYMBERT	9ab72d7ade	server: tests: schedule slow tests on master	2024-03-02 18:58:21 +01:00
Pierrick HYMBERT	178b0c693d	server: tests: fix regex content matching	2024-03-02 18:57:57 +01:00
Pierrick HYMBERT	407cc609d3	server: tests: fix passkey, add doc, fix regex content matching, fix timeout	2024-03-02 18:53:01 +01:00
Pierrick HYMBERT	8abf8d3a08	server: tests: fix server timeout	2024-03-02 15:51:27 +01:00
Pierrick HYMBERT	a80533e276	server: tests - passkey - limit the number of max tokens to predix	2024-03-02 14:42:11 +01:00
Pierrick HYMBERT	f8773f759e	server: tests - passkey - limit the number of max tokens to predix	2024-03-02 14:38:08 +01:00
Pierrick HYMBERT	cf4c86ee20	server: tests - passkey - first good working value of nga	2024-03-02 14:31:27 +01:00
Pierrick HYMBERT	ed60b97434	server: tests - fix passkey not using pre/suffix	2024-03-02 14:25:10 +01:00
Pierrick HYMBERT	3b8242a188	server: tests - missing EOL at EOF	2024-03-02 14:13:49 +01:00
Pierrick HYMBERT	af82fb4ad7	server: revert change on slot n_ctx	2024-03-02 14:12:12 +01:00
Pierrick HYMBERT	2495f7273a	server: logs: do not truncate log values	2024-03-02 14:01:06 +01:00
Pierrick HYMBERT	616d7e9a9b	server: do not truncate prompt tokens if self-extend through group attention is enabled	2024-03-02 13:52:52 +01:00
Pierrick HYMBERT	60113da241	server: tests: add group attention params	2024-03-02 13:50:28 +01:00
Pierrick HYMBERT	ab5b06b2cf	server: logs: do not truncate log values	2024-03-02 13:49:18 +01:00
Pierrick HYMBERT	18e739d61d	server: tests: add passkey test	2024-03-02 13:10:18 +01:00
Pierrick HYMBERT	319ded7dde	server: tests: download model from HF, add batch size	2024-03-02 13:01:57 +01:00
Pierrick HYMBERT	1780d9601d	server: tests: add debug field in context before scenario	2024-03-02 12:50:55 +01:00
Pierrick HYMBERT	0f774a81cd	server: /v1/models add some metadata	2024-03-02 12:07:22 +01:00
Pierrick HYMBERT	73a7e42692	server: tests: add models endpoint scenario	2024-03-02 07:37:49 +01:00
Tushar	cb5e8f7fc4	build(nix): Introduce flake.formatter for `nix fmt` (#5687 ) * build(nix): Introduce flake.formatter for `nix fmt` * chore: Switch to pkgs.nixfmt-rfc-style	2024-03-01 15:18:26 -08:00
nold	da3b9ba2b7	convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792 )	2024-03-01 16:51:12 -05:00
Sourab Mangrulkar	c29af7e225	llama : add StarCoder2 support (#5795 ) * Add support for starcoder2 * handle rope type * skip rope freq and rotary embeddings from being serialized * resolve comments * Update llama.cpp * remove redundant changes * handle `rope-theta` * llama : change starcoder2 rope type * address comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-01 21:30:46 +02:00
Georgi Gerganov	38d16b1426	server : remove api_like_OAI.py proxy script (#5808 )	2024-03-01 20:00:58 +02:00
ddpasa	c2224f003b	ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813 )	2024-03-01 18:00:00 +01:00
kunal-vaishnavi	e743386728	gemma : fix bfloat16 -> float16 conversion issue (#5810 )	2024-03-01 16:08:08 +02:00
Miwa / Ensan	f49a535686	common : fix flag `--logits-all` to `--all-logits` (#5805 )	2024-03-01 15:48:56 +02:00
Pierrick Hymbert	3ab8b3a92e	llama : cleanup unused mmq flags (#5772 ) * cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q * remove: mul_mat_q in compare llama bench and usage * update llama-bench --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-01 13:39:06 +02:00
Douglas Hanley	9600d59e01	unicode : switch to multimap based nfd_map (#5799 ) * switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time	2024-03-01 11:15:36 +02:00
Pierrick Hymbert	5cb02b4a01	server: allow to override threads server pool with --threads-http (#5794 )	2024-03-01 10:08:08 +01:00
Eve	6ea0f010ff	ci : add Ubuntu 22 Vulkan CI run (#5789 )	2024-03-01 10:54:53 +02:00
Georgi Gerganov	f105471ef6	server : fix newlines in help (#5785 )	2024-03-01 09:59:43 +02:00
AidanBeltonS	38d1521608	[SYCL] Use batched mul_mat pathway (#5591 ) * Use batched mul_mat pathway * rm extra line * Explicitly state scaled data type --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-03-01 13:06:47 +05:30
Xuan Son Nguyen	052051d8ae	Server: normalize naming (#5779 ) * server: normalize naming * fix spacing	2024-02-29 21:42:11 +01:00
Marcus Dunn	d5ab29757e	llama : constified `llama_set_state_data`'s `src` (#5774 )	2024-02-29 10:17:23 +02:00
Georgi Gerganov	87c91c0766	ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771 ) ggml-ci	2024-02-28 21:44:21 +02:00
Eve	317709b2a8	make portability_enumeration_ext apple only (#5757 )	2024-02-28 20:33:37 +01:00
Georgi Gerganov	08c5ee87e4	llama : remove deprecated API (#5770 ) ggml-ci	2024-02-28 18:43:38 +02:00
Georgi Gerganov	78aacf3634	awq-py : remove (#5768 )	2024-02-28 17:36:53 +02:00
Georgi Gerganov	8c0e8f4e73	sync : ggml	2024-02-28 11:17:32 +02:00
slaren	2774b0c974	add google magika inference example (ggml/748) * add magika inference example * ggml : fix unaligned accesses in custom ops * ggml : fix FP32 GELU for values that exceed the FP16 range * use ggml_pool_1d * add README * Update README.md * pad inputs if the files are too small * cleanup ggml-ci	2024-02-28 11:17:06 +02:00
UEXTM.com	5f70671856	Introduce backend GUIDs (ggml/743) * Introduce backend GUIDs Initial proposed implementation of backend GUIDs (Discussed in https://github.com/ggerganov/ggml/pull/741) Hardcoded CPU backend GUID (for now) Change ggml_backend_is_cpu logic to use GUID * Remove redundant functions Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion * Add spaces to match style Co-authored-by: slaren <slarengh@gmail.com> * Fix brace style to match Co-authored-by: slaren <slarengh@gmail.com> * Add void to () in function signature Co-authored-by: slaren <slarengh@gmail.com> * Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid * add guids to all backends ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-02-28 11:17:05 +02:00
Xuan Son Nguyen	a693bea1e6	server : hit Ctrl+C twice to exit (#5734 ) * server: twice ctrl+C to exit * std::atomic_flag * sigint: message * sigint: stderr * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-02-28 10:55:37 +02:00
compilade	adcb12a9ba	llama : fix non-quantization of expert gating tensors (#5754 ) This reverts a single line from #5475	2024-02-28 10:52:56 +02:00

1 2 3 4 5 ...

2336 commits