llama.cpp

Author	SHA1	Message	Date
slaren	5fb5e24811	llama : minor sampling refactor (2) (#9386 )	2024-09-09 17:10:46 +02:00
Georgi Gerganov	38ca6f644b	readme : update hot topics	2024-09-09 15:51:37 +03:00
Johannes Gäßler	8e6e2fbe14	CUDA: fix variable name conflict for Windows build (#9382 )	2024-09-09 14:22:53 +02:00
Antonis Makropoulos	5ed087573e	readme : add LLMUnity to UI projects (#9381 ) * add LLMUnity to UI projects * add newline to examples/rpc/README.md to fix editorconfig-checker unit test	2024-09-09 14:21:38 +03:00
Radoslav Gerganov	54f376d0b9	rpc : update README [no ci] (#9320 ) Update README with instructions how to offload model layers to both local and remote devices	2024-09-09 11:04:39 +03:00
Dan Johansson	b2e89a3274	Arm AArch64: Documentation updates (#9321 ) * Arm AArch64: Documentation updates * Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels * Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats * Add newline to the end of docs/build.md	2024-09-09 10:02:45 +03:00
Markus Tavenrath	daa9623ab0	Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118 ) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.	2024-09-08 21:43:48 +02:00
Georgi Gerganov	e079bffb66	cuda : fix FA Q src index (1 -> 0) (#9374 )	2024-09-08 22:01:02 +03:00
Xuan Son Nguyen	3f7ccfd649	common : bring back missing args, add env var duplication check (#9375 ) * common : bring back missing args * move duplication check to test-arg-parser * add check for duplicated env var * correct default values	2024-09-08 18:08:55 +02:00
slaren	a249843d89	common : restore --n-gpu-layers (#9371 )	2024-09-08 16:44:42 +02:00
slaren	19f4a7b296	llama : refactor samplers internal implementation (#9370 )	2024-09-08 15:52:07 +02:00
Neo Zhang Jianyu	2a358fb0c4	[SYCL] add check malloc result on device (#9346 ) * add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-08 19:05:29 +08:00
slaren	eae597182c	llama : sanitize tokens in the upper bound (#9359 )	2024-09-08 12:41:51 +02:00
Xuan Son Nguyen	00b02bb249	imatrix : fix arg parser for imatrix (#9366 ) * imatrix : fix arg parser * beautify printing first arg	2024-09-08 12:12:17 +02:00
Georgi Gerganov	a876861455	metal : update support condition for im2col + fix warning (#0 )	2024-09-08 11:05:55 +03:00
Georgi Gerganov	385decbd63	sync : ggml	2024-09-08 11:05:55 +03:00
Georgi Gerganov	60a3107ccd	scripts : option to increase git patch context	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	406c1a32a1	vulkan: add dryrun support to sin and cos ops (ggml/947) sin and cos failed test-backend-ops because they tried to dereference a context pointer that is null on dry runs. This commit prevents that segfault. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	9cb9260861	vulkan: correctly report support for OP_CONT (ggml/946) test-backend-ops fails because ggml_cont aborts when invoked passing an unsupported type. This commit makes ggml_cont tests pass Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Johannes Gäßler	202084d31d	tests: add gradient tests for all backends (ggml/932) * tests: add gradient checking to test-backend-ops * remove old comment * reorder includes * adjust SIN/COS parameters * add documentation, use supports_op if possible	2024-09-08 11:05:55 +03:00
Johannes Gäßler	dbbebcab33	ggml: fix ggml_graph_cpy undefined behavior (ggml/943)	2024-09-08 11:05:55 +03:00
Georgi Gerganov	ba1cf846ed	cann : fix doxy (ggml/0)	2024-09-08 11:05:55 +03:00
Mengqing Cao	d2d3200b38	cann : add Ascend NPU support (whisper/2336) * enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp	2024-09-08 11:05:55 +03:00
Georgi Gerganov	51d964a4ef	cuda : mark BF16 CONT as unsupported	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	efe6a83e30	ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) * ggml_cont: fix issue with transposed tensors when one dimension is 1 when using multiple threads, it is not enough to check for the tensors to be contiguous for ggml_compute_forward_dup_same_cont to work correctly. The tensors strides also need to match. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Add ggml_cont tests Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Remove dead code it isn't possible to reach this code because all these functions are invoked by ggml_compute_forward_dup if and only if src0->type != dst->type Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Make ggml_compute_forward_dup_same_cont work with contiguous tensors Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-08 11:05:55 +03:00
Kevin Gibbons	fbb7fcffbc	llama : set attrs of mislabelled EOT/EOM tokens (#9348 )	2024-09-08 08:51:00 +03:00
Georgi Gerganov	a5b5d9a101	llama.android : fix build (#9350 )	2024-09-08 00:33:50 +03:00
Georgi Gerganov	f12295b8a9	llama : fix empty ring buffer push (#9358 )	2024-09-08 00:33:33 +03:00
Georgi Gerganov	faf69d4237	llama : sanitize invalid tokens (#9357 ) * common : do not add null tokens during warmup ggml-ci * llama : check that the input tokens are valid ggml-ci * tests : fix batch size of bert model ggml-ci	2024-09-08 00:33:13 +03:00
Eve	e536426ded	llamafile : disable sgemm for batch-size 1 (#9330 )	2024-09-07 22:02:26 +03:00
Xuan Son Nguyen	1b9ae5189c	common : refactor arg parser (#9308 ) * (wip) argparser v3 * migrated * add test * handle env * fix linux build * add export-docs example * fix build (2) * skip build test-arg-parser on windows * update server docs * bring back missing --alias * bring back --n-predict * clarify test-arg-parser * small correction * add comments * fix args with 2 values * refine example-specific args * no more lamba capture Co-authored-by: slaren@users.noreply.github.com * params.sparams * optimize more * export-docs --> gen-docs	2024-09-07 20:43:51 +02:00
slaren	e32d0816ed	ggml : always check bounds on get_rows operations (#9354 )	2024-09-07 20:23:07 +02:00
Georgi Gerganov	df270ef745	llama : refactor sampling v2 (#9294 ) - Add `struct llama_sampler` and `struct llama_sampler_i` - Add `llama_sampler_` API - Add `llama_sampler_chain_` API for chaining multiple samplers - Remove `LLAMA_API_INTERNAL` - Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`	2024-09-07 15:16:19 +03:00
Xuan Son Nguyen	947538acb8	ggml : fix missing `cpu_set_t` on emscripten (#9336 ) * ggml : fix missing cpu_set_t on emscripten * better version * bring back android part	2024-09-07 12:01:34 +02:00
slaren	6c89eb0b47	ci : disable rocm image creation (#9340 )	2024-09-07 10:48:54 +03:00
Xuan Son Nguyen	9b2c24c099	server : simplify state machine for slot (#9283 ) * server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-06 23:21:29 +02:00
Aarni Koskela	134bc38ecf	llama-bench : log benchmark progress (#9287 ) * llama-bench : add optional progress messages	2024-09-06 23:03:01 +02:00
Aarni Koskela	815b1fb20a	batched-bench : add `--output-format jsonl` option (#9293 ) `--output-format` is modeled after `llama-bench`'s options	2024-09-06 17:59:58 +02:00
Changyeon Kim	409dc4f8bb	ggml : fix build break for the vulkan-debug (#9265 ) - windows build : Ok. - linux build : Ok. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>	2024-09-06 15:54:50 +03:00
Xuan Son Nguyen	4a1411b4f1	server : fix missing lock (#9334 )	2024-09-06 14:06:04 +02:00
Feng Jiang	424e3a52fe	llama/kompute: Add multi-GPU support Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>	2024-09-06 16:27:27 +08:00
Feng Jiang	56c5f988eb	ggml/kompute: Introduce ggml_backend_kompute_get_device_memory() Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>	2024-09-06 16:27:27 +08:00
Feng Jiang	97efd5047a	ggml/kompute: Introduce ggml_backend_kompute_get_device_count() Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>	2024-09-06 16:27:27 +08:00
Feng Jiang	cc9514f941	ggml/kompute: Remove unused ggml_backend_kompute_device_{ref, unref}() Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>	2024-09-06 16:27:27 +08:00
Cong Liu	f57f8cb3da	ggml/kompute: Reimplement kompute_manager Signed-off-by: Cong Liu <liucong@kylinos.cn>	2024-09-06 16:27:12 +08:00
Markus Tavenrath	8ebe8ddebd	Improve Vulkan shader build system (#9239 ) * Improve Vulkan shader builds system - Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility. - Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools * remove not required self dependency	2024-09-06 08:56:17 +02:00
Cong Liu	3676778e82	ggml/kompute: Implement ggml_backend_i.offload_op interface Signed-off-by: Cong Liu <liucong@kylinos.cn>	2024-09-06 10:57:00 +08:00
Weishi Li	d94ad56f87	ggml/kompute: Use the kp::Manager in ggml_backend_kompute_context instead of global Signed-off-by: Weishi Li <liweishi@kylinos.cn>	2024-09-06 10:57:00 +08:00
Weishi Li	74ba8516ce	ggml/kompute: Move butf into struct ggml_backend_kompute_context Signed-off-by: Weishi Li <liweishi@kylinos.cn>	2024-09-06 10:57:00 +08:00
Ming Xie	e914ac7c68	ggml/kompute: Introducing struct ggml_backend_kompute_buffer_context Signed-off-by: Ming Xie <xieming@kylinos.cn>	2024-09-06 10:57:00 +08:00

... 2 3 4 5 6 ...

3873 commits