llama.cpp

Author	SHA1	Message	Date
slaren	86f3666ab4	cuda : fix warning	2024-04-03 00:46:56 +02:00
slaren	31adc93486	llama : more loader cleanup, better error checking	2024-04-03 00:46:15 +02:00
slaren	fe62909618	metal : add support for non-pow-2 argsort	2024-04-02 20:36:42 +02:00
Georgi Gerganov	c704c778f6	convert : fix grok tensor names	2024-04-02 21:35:13 +03:00
slaren	f421b32d5a	cuda/argsort : use shared memory instead of pool memory	2024-04-02 20:09:25 +02:00
slaren	9530398013	make linter happy	2024-04-02 18:21:45 +02:00
slaren	d08a1f4860	convert-hf-to-gguf.py : update grok (untested)	2024-04-02 18:19:37 +02:00
slaren	f27cbf3610	fix quantizing of merged experts	2024-04-02 17:07:14 +02:00
slaren	68d21debe4	gguf : bump version	2024-04-02 16:38:05 +02:00
slaren	6f33852f3d	minor	2024-04-02 16:08:55 +02:00
slaren	6875369909	llama : add merged experts tensors to the grok tensor map	2024-04-02 16:08:45 +02:00
slaren	5de4a5da07	update grok model loading	2024-04-02 03:08:04 +02:00
slaren	8f84ca3cd9	test-backend-ops : test qwen argsort	2024-04-02 02:07:22 +02:00
slaren	b4a62062db	update imatrix	2024-04-02 02:05:38 +02:00
slaren	deea2007b4	cleanup + disable mmap automatically with split tensors models	2024-04-02 01:55:22 +02:00
slaren	6886fdb887	allow quantize to work for split and merged experts models in the same way	2024-04-02 01:35:19 +02:00
slaren	4531b029ee	cuda : support non-pow-2 number of experts	2024-04-02 01:11:59 +02:00
slaren	8c2f7b8169	Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-31 19:52:46 +02:00
slaren	3b3298af17	update convert.py for mixtral hf models	2024-03-31 01:35:10 +01:00
slaren	4a5d50eb61	update convert-hf-to-gguf.py	2024-03-31 01:24:05 +01:00
slaren	6203d72651	update convert.py	2024-03-30 23:51:21 +01:00
slaren	2abb6c7225	Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-30 11:42:28 +01:00
slaren	26c09adce6	fix cuda	2024-03-30 00:44:16 +01:00
slaren	325e5efa0d	update test-backend-ops	2024-03-29 23:48:10 +01:00
slaren	93db37e274	update metal	2024-03-29 22:22:52 +01:00
slaren	2479900a1c	minor	2024-03-29 20:41:27 +01:00
slaren	9c9fe60f53	update cuda	2024-03-29 20:06:00 +01:00
slaren	0c7e21d7b2	ggml : update mul_mat_id to use the same tensor for all the experts	2024-03-29 19:10:20 +01:00
0cc4m	ba0c7c70ab	Vulkan k-quant mmq and ggml-backend offload functionality (#6155 ) * Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning	2024-03-29 17:29:21 +01:00
Georgi Gerganov	d48ccf3ad4	sync : ggml (#6351 ) * sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-29 17:45:46 +02:00
hxer7963	069574775c	[Model] Add support for xverse (#6301 ) * Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <willhe@xverse.cn> Co-authored-by: willhe <hexin@xverse.cn>	2024-03-29 14:37:03 +01:00
Georgi Gerganov	cfde806eb9	ci : fix BGE wget (#6383 ) ggml-ci	2024-03-29 14:34:28 +02:00
zhouwg	b910287954	readme : add project (#6356 ) * readme: add Android UI binding * Update README.md	2024-03-29 09:33:46 +02:00
Matt Clayton	8093987090	cmake : add explicit metal version options (#6370 ) * cmake: add explicit metal version options * Update CMakeLists.txt --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-29 09:27:42 +02:00
Daniel Bevenius	057400a3fd	llama : remove redundant reshape in build_kv_store (#6369 ) * llama: remove redundant reshape in build_kv_store This commit removes the reshape of the V matrix in the build_kv_store. The motivation for this is that V matrix has the shape: ```console (gdb) p v_cur $46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_MUL_MAT, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x0, view_offs = 0, data = 0x0, name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` And after reshaping this tensor we get: ```console gdb) p ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens) $44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_RESHAPE, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0, name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` I noticed that the `src` and `view_src` fields are different but that the dimensions are the same. From the code comment it seems like the reshape call is not needed and perhaps the above can motivate the removal of the reshape call. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama : add assert --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-29 09:23:22 +02:00
Pedro Cuenca	b75c38166c	convert : allow conversion of Mistral HF models (#6144 ) * Allow conversion of Mistral HF models * Homogenize Llama, Mistral, Mixtral under the same entry. * Fix tokenizer, permute tensors * Use sentencepiece tokenizer, or fall back to hfft. * convert-hf : small fix for mypy * convert-hf : fix duplicated block_count * convert-hf : add vocab size to metadata --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-03-29 09:15:00 +02:00
Georgi Gerganov	bfe7dafc9c	readme : add notice for UI list	2024-03-28 22:56:03 +02:00
Ouadie EL FAROUKI	5106ef482c	[SYCL] Revisited & updated SYCL build documentation (#6141 ) * Revisited & updated SYCL build documentation * removed outdated comment * Addressed PR comments * Trimed white spaces * added new end line	2024-03-28 16:01:47 +00:00
Jared Van Bortel	be55134a53	convert : refactor vocab selection logic (#6355 )	2024-03-28 11:44:36 -04:00
Ziang Wu	66ba560256	llava : fix MobileVLM (#6364 ) * fix empty bug * Update MobileVLM-README.md added more results on devices * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update MobileVLM-README.md remove gguf links --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 16:33:10 +02:00
compilade	0308f5e3d7	llama : fix command-r inference when omitting outputs (#6367 )	2024-03-28 14:05:54 +02:00
Pierrick Hymbert	28cb9a09c4	ci: bench: fix master not schedule, fix commit status failed on external repo (#6365 )	2024-03-28 11:27:56 +01:00
Ting Sun	cfc4d75df6	doc: fix outdated default value of batch size (#6336 ) * doc: fix outdated default value of batch size * doc: add doc for ubatch-size	2024-03-28 09:51:06 +01:00
Eric Zhang	6902cb7f2e	server : stop gracefully on SIGTERM (#6348 )	2024-03-28 09:50:48 +01:00
hutli	d2d8f38996	nix: removed unnessesary indentation	2024-03-28 07:48:27 +00:00
hutli	d39b308eaf	nix: moved blas availability check to package inputs so it is still overridable	2024-03-28 07:48:27 +00:00
hutli	c873976649	using blas.meta.available to check host platform	2024-03-28 07:48:27 +00:00
hutli	dbb03e2b9c	only using explicit blas if hostPlatform is allowed	2024-03-28 07:48:27 +00:00
Someone Serge	e9f17dc3bf	nix: .#windows: proper cross-compilation set-up Take all dependencies from the cross stage, rather tha only stdenv	2024-03-28 07:48:27 +00:00
Someone Serge	22a462cc1f	nix: package: don't introduce the dependency on python - The generic /usr/bin/env shebangs are good enough - Python deps are provisioned in the devShells - We need to be able to leave python out at least on windows (currently breaks eval)	2024-03-28 07:48:27 +00:00

1 2 3 4 5 ...

2606 commits