llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	305ac3b61b	llama: dbrx: quantize fix n_attention_wv tensor name	2024-04-07 05:01:33 +02:00
Neo Zhang Jianyu	d4f220a5cc	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521 )	2024-04-07 10:55:59 +08:00
Pierrick HYMBERT	06a59abf0a	model: dbrx: convert add n_ff	2024-04-07 03:17:24 +02:00
Pierrick HYMBERT	52c403355f	llama: increase maximum experts allowed	2024-04-07 03:16:33 +02:00
Pierrick HYMBERT	7e7cd53ca6	llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS	2024-04-06 23:55:37 +02:00
Pierrick HYMBERT	69856297b9	Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx	2024-04-06 23:53:11 +02:00
Pierrick HYMBERT	4f12a580d9	llama: dbrx: remove not existing condition on empty output layer	2024-04-06 23:35:23 +02:00
Pierrick HYMBERT	fe8089871e	model: dbrx: fix missing embedding tensor, mix with output layer	2024-04-06 23:27:29 +02:00
Pierrick HYMBERT	9c7dedb0f3	llama: dbrx: no attention output layer	2024-04-06 22:25:37 +02:00
Pierrick HYMBERT	76f266beef	scripts: get-wikitext-2 add unzip	2024-04-06 21:10:19 +02:00
Pierrick HYMBERT	03da419fc0	llama: dbrx: remove wrong attn output layer in model arch	2024-04-06 20:43:46 +02:00
Pierrick HYMBERT	916b91852b	convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping	2024-04-06 20:30:30 +02:00
Pierrick HYMBERT	c8e6f903e0	doc: dbrx: add the model as supported	2024-04-06 20:09:01 +02:00
Pierrick HYMBERT	0a35f5881b	convert: dbrx: fix mixed up and down expert tensors llama: dbrx: review graph	2024-04-06 19:56:37 +02:00
Pierrick HYMBERT	e3c1e8127c	convert: dbrx: fix mixed up and down expert tensors	2024-04-06 19:21:43 +02:00
Pierrick HYMBERT	a7f9a3eafc	dbrx: minor	2024-04-06 19:09:04 +02:00
Georgi Gerganov	54ea0698fb	sync : ggml	2024-04-06 18:27:46 +03:00
Daniel Bevenius	b66aec675c	backend : fix typo in scheduler documentation (ggml/781) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-06 17:42:26 +03:00
Clint Herron	57dd02c44b	Tests: Added integration tests for GBNF parser (#6472 ) * Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.	2024-04-06 10:31:33 -04:00
Pierrick HYMBERT	e4f8ee4f48	llama: support dbrx fix norm type	2024-04-06 16:14:58 +02:00
Pierrick HYMBERT	09210334bf	model: dbrx fix python linter in convert-hf-to-gguf.py	2024-04-06 16:00:32 +02:00
Pierrick HYMBERT	c0beb3cf7e	llama: add label for model 132B	2024-04-06 15:58:17 +02:00
Pierrick HYMBERT	3937100adb	model: dbrx, trust remote code	2024-04-06 15:57:57 +02:00
Pierrick HYMBERT	3e3d2d127c	gguf-py: remove wrong clip -> clamp	2024-04-06 15:46:47 +02:00
Pierrick HYMBERT	ed582c1dde	llama: support dbrx #6344	2024-04-06 15:22:55 +02:00
Pierrick HYMBERT	1d8de31565	model: dbrx convert to gguf #6344	2024-04-06 14:17:24 +02:00
Pierrick Hymbert	75cd4c7729	ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495 ) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate	2024-04-06 05:40:47 +02:00
Brian	a8bd14d557	gguf.py : add licence and version to gguf writer (#6504 )	2024-04-05 21:41:38 +03:00
Hoang Nguyen	d0f5deebf8	readme : update UI list (#6503 ) * Add MindMac to UI list * Update proprietary description Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-04-05 21:39:43 +03:00
Ting Sun	87e21bbacd	bench : make n_batch and n_ubatch configurable in Batched bench (#6500 ) * bench: make n_batch and n_ubatch configurable * bench: update doc for batched bench	2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI	1b496a745c	[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464 ) * moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>	2024-04-05 19:05:06 +05:30
alexpinel	a307375c02	readme : add Dot to UI list (#6487 )	2024-04-04 13:22:50 -04:00
Jun Jie	b660a5729e	readme : fix typo (#6481 )	2024-04-04 13:16:37 -04:00
Ed Lepedus	0a1d889e27	server: add cURL support to server Dockerfiles (#6474 ) * server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-04 18:31:22 +02:00
Minsoo Cheong	7dda1b727e	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	c666ba26c3	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00
Shakhar Dasgupta	2e66913e5f	server: allow penalizing repetition of newlines on server webpage (#6431 )	2024-04-04 17:03:00 +02:00
Pierrick Hymbert	8120efee1d	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478 )	2024-04-04 16:59:04 +02:00
limitedAtonement	a74401f0e5	Correct README link (#6458 ) README is called README.md.	2024-04-04 16:30:02 +02:00
Pierrick Hymbert	7a2c92637a	ci: bench: add more ftype, fix triggers and bot comment (#6466 ) * ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces	2024-04-04 12:57:58 +03:00
Daniel Bevenius	4bcd6b959c	common: remove duplicate check for curl (#6471 ) This commit removes one of the two identical checks for curl being NULL in llama_load_model_from_url. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-04 09:49:21 +02:00
Clint Herron	9b84ae1806	examples : add GBNF validator program (#5948 ) * Revising GBNF validator program to be much simpler. * Changing from streams to using cstdio * Adding final newline character.	2024-04-04 10:44:28 +03:00
Georgi Gerganov	4399f13fb9	server : remove obsolete --memory-f32 option	2024-04-04 09:34:58 +03:00
Xiao-Yong Jin	1a43c7254e	server : add option to disable KV offload (#6468 )	2024-04-04 09:33:48 +03:00
Clint Herron	72d73af651	convert : fix for lint error complaining of bare except (#6470 )	2024-04-04 09:32:53 +03:00
Fattire	5fb1574c81	A few small fixes to server's README docs (#6428 ) * Typo fix to server's README.md Fix minor typo ("tonen") in server README. * server readme grammar/style fixes. Quickly went through this file to look for inconsistencies in presentation of defaults, flag options, and looked for typos and grammar issues. Not perfect, but hopefully improved. * Update README.md Remove an extra space before newline.	2024-04-03 22:22:57 +02:00
JH23X	60cdf40cc3	server : handle exception on wrong type in request (#6452 ) Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>	2024-04-03 21:09:52 +03:00
bryanSwk	bb43cf7e9d	llama : add SEA-LION support (#6448 ) * initial commit for sealion support * add sealion support * minor fix * q/k ln and pos_embd only if required * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : clear whitespaces --------- Co-authored-by: bryan <bryansiow@aisingapore.org> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-03 21:05:10 +03:00
Ewout ter Hoeven	9f62c0173d	ci : update checkout, setup-python and upload-artifact to latest (#6456 ) * CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4	2024-04-03 21:01:13 +03:00
Ed Lepedus	5d4f12e462	server: add cURL support to `server.Dockerfile` (#6461 )	2024-04-03 19:56:37 +02:00

1 2 3 4 5 ...

2692 commits