llama.cpp

Author	SHA1	Message	Date
Guspan Tanadi	7919256c57	readme : reference examples relative links (#11505 )	2025-01-30 06:58:02 +01:00
Daniel Bevenius	e0449763a4	server : update json snippets in README.md [no ci] (#11492 ) This commit updates some of JSON snippets in README.md file and removes the `json` language tag from the code blocks. The motivation for this changes is that if there is invalid json in a code snippet these are highlighted in red which can make it somewhat difficult to read and can be a little distracting.	2025-01-30 05:48:14 +01:00
Nigel Bosch	eb7cf15a80	server : add /apply-template endpoint for additional use cases of Minja functionality (#11489 ) * add /apply-template endpoint to server * remove unnecessary line * add /apply-template documentation * return only "prompt" field in /apply-template * use suggested idea instead of my overly verbose way	2025-01-29 19:45:44 +01:00
Rémy Oudompheng	66ee4f297c	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 ) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-01-29 18:29:39 +01:00
Daniel Bevenius	e51c47b401	server : update auto gen files comments [no ci] (#11484 ) * server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit `91c36c269b` ("server : (web ui) Various improvements, now use vite as bundler (#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.	2025-01-29 16:34:18 +01:00
Jeff Bolz	2711d0215f	vulkan: Catch pipeline creation failure and print an error message (#11436 ) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging	2025-01-29 09:26:50 -06:00
Georgi Gerganov	c30e34cdba	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-29 15:01:26 +02:00
Georgi Gerganov	918885697e	llama : resolve rwkv conflict ggml-ci	2025-01-29 14:45:04 +02:00
Eric Curtin	f0d4b29edf	Parse https://ollama.com/library/ syntax (#11480 ) People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-29 11:23:10 +00:00
Georgi Gerganov	815857791d	sync : ggml	2025-01-29 11:25:29 +02:00
William Tambellini	1a0e87d291	ggml : add option to not print stack on abort (ggml/1081) * Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-29 11:24:53 +02:00
issixx	d2e518e9b4	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-01-29 11:24:51 +02:00
Daniel Bevenius	b636228c0a	embedding : enable --no-warmup option (#11475 ) This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.	2025-01-29 10:38:54 +02:00
Molly Sophia	325afb370a	llama: fix missing k_cache store for rwkv6qwen2 (#11445 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-01-29 12:07:21 +08:00
Emreerdog	794fe23f29	cmake: add hints for locating ggml on Windows using Llama find-package (#11466 )	2025-01-28 19:22:06 -04:00
peidaqi	cf8cc856d7	server : Fixed wrong function name in llamacpp server unit test (#11473 ) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True	2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen	d0c08040b6	ci : fix build CPU arm64 (#11472 ) * ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble	2025-01-29 00:02:56 +01:00
uvos	be5ef7963f	HIP: Supress transformation warning in softmax.cu loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.	2025-01-28 23:06:32 +01:00
Nikita Sarychev	cae9fb4361	HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080 ) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.	2025-01-28 16:42:20 +01:00
Eric Curtin	7fee2889e6	Add github protocol pulling and http:// (#11465 ) As pulling protocols to llama-run Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-28 14:45:41 +00:00
Nuno	d7d1eccacc	docker: allow installing pip packages system-wide (#11437 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 14:17:25 +00:00
someone13574	4bf3119d61	cmake : don't fail on `GGML_CPU=OFF` (#11457 )	2025-01-28 15:15:34 +01:00
Nuno	f643120bad	docker: add perplexity and bench commands to full image (#11438 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 10:42:32 +00:00
Akarshan Biswas	6e84b0ab8e	SYCL : SOFTMAX F16 mask support and other fixes (#11261 ) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-01-28 09:56:58 +00:00
Michael Engel	2b8525d5c8	Handle missing model in CLI parameters for llama-run (#11399 ) The HTTP client in llama-run only prints an error in case the download of a resource failed. If the model name in the CLI parameter list is missing, this causes the application to crash. In order to prevent this, a check for the required model parameter has been added and errors for resource downloads get propagated to the caller. Signed-off-by: Michael Engel <mengel@redhat.com>	2025-01-28 08:32:40 +00:00
Eric Curtin	a4417ddda9	Add new hf protocol for ollama (#11449 ) https://huggingface.co/docs/hub/en/ollama Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-27 19:36:10 +01:00
Haus1	d6d24cd9ed	AMD: parse the architecture as supplied by gcnArchName (#11244 ) The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.	2025-01-27 14:58:17 +01:00
lexasub	a5203b4465	llama : minor fixes for up llama load model speed (#11448 ) * impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30% * llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings * Update src/llama-vocab.cpp --------- Co-authored-by: lexasub <empty@empty.ru> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-27 14:42:09 +01:00
Georgi Gerganov	e665b57fa2	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-27 14:09:22 +02:00
Johannes Gäßler	df984e0147	llama: refactor llama_decode_impl (#11381 )	2025-01-27 12:07:12 +01:00
Ihar Hrachyshka	acd38efee3	metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441 ) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).	2025-01-27 09:41:59 +02:00
Xuan Son Nguyen	caf773f249	docker : fix ARM build and Vulkan build (#11434 ) * ci : do not fail-fast for docker * build arm64/amd64 separatedly * fix pip * no fast fail * vulkan: try jammy	2025-01-26 22:45:32 +01:00
Georgi Gerganov	a0c500b4dc	context : prepare for abstraction ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	99422dfa3f	context : introduce llama_batch_manager ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	cb8f2095c6	wip	2025-01-26 20:16:22 +02:00
Georgi Gerganov	133ad6a723	context : initial need_reserve logic ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	c75ba6851e	context : move adapter code in the implementation [no ci]	2025-01-26 20:16:22 +02:00
Georgi Gerganov	f0713498fd	context : add get_ctx_padding() ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	b4ec1d4429	cont : move kv_self update to llama_context ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	f2524c0e41	llama : remove references to llama_kv_cache (wip) Intermediate step necessary to abstract the `llama_context` and `llama_kv_cache`. ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	ae274f9747	llama : fix names [no ci]	2025-01-26 20:16:21 +02:00
Georgi Gerganov	a19f671fe0	context : minor ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	17b363afd3	llama : update llama_kv_self API ggml-ci	2025-01-26 20:16:20 +02:00
Georgi Gerganov	fd05ab87aa	kv_cache : move state read/write to llama_kv_cache ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	4cd1b6fa4c	context : prepare kv_cache_read/write to be moved to kv_cache ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	73a14eccc9	kv_cache : minor	2025-01-26 20:14:36 +02:00
Georgi Gerganov	fef90cb3d7	kv_cache : fix ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	4d7bd03e65	kv_cache : functions -> members ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	e4550fbafc	llama : cont ggml-ci	2025-01-26 20:14:35 +02:00
Georgi Gerganov	f78b396ee7	llama : add struct llama_kv_cache (wip) [no ci]	2025-01-26 20:12:06 +02:00

1 2 3 4 5 ...

4712 commits