llama.cpp

Author	SHA1	Message	Date
Yazan Agha-Schrader	80888e93cc	renaming to ensure consistency	2024-05-31 06:17:40 +02:00
Yazan Agha-Schrader	d9742fbf4e	fix wrong link to old ui	2024-05-31 05:37:55 +02:00
Yazan Agha-Schrader	bb9542b54f	include new ui in cpp	2024-05-31 05:37:55 +02:00
Yazan Agha-Schrader	0d75e07bd9	Merge branch 'ggerganov:master' into server-ui-pr	2024-05-30 08:28:26 +02:00
Meng, Hengyu	3854c9d07f	[SYCL] fix intel docker (#7630 ) * Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server	2024-05-30 16:19:08 +10:00
Yazan Agha-Schrader	505d0a3346	move new ui to "/public" due to otherwise problematic CORS behaviour	2024-05-30 04:00:56 +02:00
Yazan Agha-Schrader	8b937a1a71	add a button to the new ui	2024-05-30 03:59:28 +02:00
Galunid	eb57fee51f	gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627 )	2024-05-30 02:10:40 +02:00
Yazan Agha-Schrader	734be4dcc9	Merge branch 'master' into server-ui-pr	2024-05-30 01:47:22 +02:00
Yazan Agha-Schrader	d55081767c	fix css path	2024-05-30 01:17:47 +02:00
Yazan Agha-Schrader	89b1b38144	move files, clean code	2024-05-30 01:13:10 +02:00
Yazan Agha-Schrader	63de7201fa	set default prompt to empty	2024-05-29 22:34:15 +02:00
Yazan Agha-Schrader	dcdc11a5c4	add cmd-r prompt et reduce redundancy	2024-05-29 22:24:24 +02:00
Yazan Agha-Schrader	87bcbbb6c2	fix toggle state localstorage	2024-05-29 22:23:40 +02:00
Georgi Gerganov	55d62262a9	metal : remove invalid asserts (#7617 )	2024-05-29 22:21:20 +03:00
Yazan Agha-Schrader	c2badb4697	add hacky llama2 prompt solution, reduce redundancy in promptFormats.js	2024-05-29 20:03:20 +02:00
Georgi Gerganov	975ec63ff2	metal : add missing asserts (#7617 )	2024-05-29 20:45:25 +03:00
Georgi Gerganov	fb76ec31a9	ggml : fix YARN + add tests + add asserts (#7617 ) * tests : add rope tests ggml-ci * ggml : fixes (hopefully) ggml-ci * tests : add non-cont tests ggml-ci * cuda : add asserts for rope/norm + fix DS2 ggml-ci * ggml : assert contiguousness * tests : reduce RoPE tests ggml-ci	2024-05-29 20:17:31 +03:00
Georgi Gerganov	cce3dcffc5	cuda : non-cont concat support (#7610 ) * tests : add non-cont concat tests * cuda : non-cont concat support ggml-ci	2024-05-29 15:38:26 +03:00
Yazan Agha-Schrader	1c24ab6e20	move prompt style	2024-05-29 14:09:19 +02:00
Radoslav Gerganov	210d99173d	llama-bench : add support for the RPC backend (#7435 )	2024-05-29 14:45:44 +03:00
slaren	87bdf2a199	ggml : use atomic_flag for critical section (#7598 ) * ggml : use atomic_flag for critical section * add windows shims	2024-05-29 13:36:39 +02:00
Yazan Agha-Schrader	f2ef89415c	do not separate with new line or comma	2024-05-29 13:36:07 +02:00
Yazan Agha-Schrader	39a163f76e	add missing char	2024-05-29 13:32:33 +02:00
Georgi Gerganov	00281b7be3	scripts : remove mpi remnants	2024-05-29 14:31:18 +03:00
Georgi Gerganov	2ab977282b	sync : ggml	2024-05-29 14:29:52 +03:00
Georgi Gerganov	72de268bec	ggml : restore ggml_rope_xpos_inplace (ggml/0) ggml-ci	2024-05-29 14:29:33 +03:00
Yazan Agha-Schrader	513406ab60	add more comon stop tokens	2024-05-29 13:29:00 +02:00
Yazan Agha-Schrader	80b6143f78	more prompt format fixes	2024-05-29 13:19:22 +02:00
Yazan Agha-Schrader	ca565f4ed6	fix llama3 prompt template	2024-05-29 12:08:39 +02:00
Yazan Agha-Schrader	9fa0aa53f5	fix chatml & add llama3 format	2024-05-29 11:26:34 +02:00
Yazan Agha-Schrader	5fa255edfb	add user message suffix	2024-05-29 10:28:07 +02:00
Yazan Agha-Schrader	eac8d739a5	update forgotten css theme	2024-05-29 08:54:04 +02:00
Akarshan Biswas	0e8d8bfd6c	Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605 )	2024-05-29 16:53:47 +10:00
Yazan Agha-Schrader	aa493e022d	add css class	2024-05-29 08:45:20 +02:00
Yazan Agha-Schrader	9bb074e1f6	add phi3 to dropdown	2024-05-29 06:28:27 +02:00
Yazan Agha-Schrader	be675948d4	add phi-3 prompt template	2024-05-29 05:28:52 +02:00
zhouwg	504f0c340f	ggml : fix typo in ggml.c (#7603 )	2024-05-29 04:09:31 +02:00
Meng, Hengyu	b864b50ce5	[SYCL] Align GEMM dispatch (#7566 ) * align GEMM dispatch	2024-05-29 07:00:24 +08:00
jaime-m-p	02c1ecad07	Tokenizer WPM fixes (#7500 ) * Update random test: add_bos_token. * Update random test: add WPM models for testing. * Build vocab.special_tokens_cache using vocab token types. * Fix and improve WPM preprocessing. - Fix unicode edge case combinations. - Split by whitspace in the same pass. * Discard all tokens when no matching found.	2024-05-28 21:46:34 +02:00
Georgi Gerganov	6bd12ce409	sycl : fix assert (#7563 )	2024-05-28 22:22:50 +03:00
Giuseppe Scrivano	5442939fcc	llama : support small Granite models (#7481 ) * Add optional MLP bias for Granite models Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite. * llama: honor add_space_prefix from the model configuration propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * llama: add support for small granite models it works only for the small models 3b and 8b. The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Steffen Roecker <sroecker@redhat.com>	2024-05-28 21:49:49 +03:00
k.h.lai	56411a950f	vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552 )	2024-05-28 19:25:08 +02:00
Radoslav Gerganov	2b737caae1	rpc : resource management rework (#7562 ) * rpc : resource management rework * address review comments	2024-05-28 18:13:36 +03:00
fairydreaming	ee3dff6b8e	Add support for DeepseekV2ForCausalLM (#7519 ) * common : increase max number of experts to 160 * common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture * common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier * convert-hf : add model conversion support for DeepseekV2ForCausalLM * llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models * llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor) * llama : add inference support for LLM_ARCH_DEEPSEEK2 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-05-28 17:07:05 +02:00
Georgi Gerganov	edc29433fa	tests : fix test-tokenizer-0.sh	2024-05-28 15:04:09 +03:00
Georgi Gerganov	8b99e2aa66	llama : handle unknown utf8 bytes (#7588 )	2024-05-28 13:55:35 +03:00
Brian	271ff3fc44	github: add refactor to issue template (#7561 ) * github: add refactor issue template [no ci] * Update 07-refactor.yml	2024-05-28 20:27:27 +10:00
Neo Zhang	e2b065071c	[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436 ) * fix mul_mat_id to match the change of api * rm comment * rm unused or duplicated code, rename as review comment	2024-05-28 10:53:37 +01:00
Georgi Gerganov	0548a4187f	ggml : generalize GGML_OP_CONCAT (#7563 ) * ggml : generalize GGML_OP_CONCAT (WIP) ggml-ci * tests : add dim != 2 tests * metal : generalize concat kernel * tests : naming * cuda : generalize concat kernel ggml-ci * sycl : add warning and assert * ggml : fix op params handling * metal : bugfix kernel ggml-ci * ggml : reimplement CPU and Metal * cuda : add asserts ggml-ci * ggml : fix ptrs ggml-ci	2024-05-28 11:04:19 +03:00

1 2 3 4 5 ...

3084 commits