llama.cpp

Author	SHA1	Message	Date
FirstTimeEZ	9d1ab28aed	Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion	2024-11-18 11:56:39 +13:00
FirstTimeEZ	b75785d5b4	vulkan: change an assertion and minify others	2024-11-18 11:56:33 +13:00
FirstTimeEZ	c100e21a78	Merge branch 'ggerganov:master' into vulkan-assertion	2024-11-18 11:24:17 +13:00
Johannes Gäßler	76e9e58b78	CUDA: fix MMV kernel being used for FP16 src1 (#10357 )	2024-11-17 23:20:42 +01:00
FirstTimeEZ	2ed70d8c8d	vulkan: change an assertion and minify others	2024-11-18 11:19:19 +13:00
FirstTimeEZ	b5d5af4cdb	Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion	2024-11-18 11:08:28 +13:00
FirstTimeEZ	4629b76d75	vulkan: change an assertion and minify others	2024-11-18 11:06:38 +13:00
FirstTimeEZ	a1e88f0bcc	Merge branch 'ggerganov:master' into vulkan-assertion	2024-11-18 01:04:05 +13:00
Johannes Gäßler	ce2e59ba10	CMake: fix typo in comment [no ci] (#10360 )	2024-11-17 12:59:38 +01:00
Diego Devesa	be5caccef9	llama : only use default buffer types for the KV cache (#10358 )	2024-11-17 12:25:45 +01:00
Georgi Gerganov	20a780c7b6	gitignore : ignore local run scripts [no ci]	2024-11-17 13:12:22 +02:00
FirstTimeEZ	b7904dd728	Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion	2024-11-17 23:56:45 +13:00
FirstTimeEZ	855a685cc0	vulkan-assertions	2024-11-17 23:56:38 +13:00
FirstTimeEZ	7345c2cccb	Merge branch 'ggerganov:master' into vulkan-assertion	2024-11-17 23:54:25 +13:00
FirstTimeEZ	3db18a765f	Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion	2024-11-17 23:52:36 +13:00
FirstTimeEZ	281d629380	vulkan: less assertions	2024-11-17 23:52:30 +13:00
Georgi Gerganov	cf32a9b93a	metal : refactor kernel args into structs (#10238 ) * metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci	2024-11-17 11:23:01 +02:00
FirstTimeEZ	a43178299c	ggml : fix undefined reference to 'getcpu' (#10354 ) https://github.com/ggerganov/llama.cpp/issues/10352	2024-11-17 10:39:22 +02:00
Johannes Gäßler	c3ea58aca4	CUDA: remove DMMV, consolidate F16 mult mat vec (#10318 )	2024-11-17 09:09:55 +01:00
Johannes Gäßler	467576b6cc	CMake: default to -arch=native for CUDA build (#10320 )	2024-11-17 09:06:34 +01:00
Diego Devesa	eda7e1d4f5	ggml : fix possible buffer use after free in sched reserve (#9930 )	2024-11-17 08:31:17 +02:00
Georgi Gerganov	24203e9dd7	ggml : inttypes.h -> cinttypes (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Georgi Gerganov	5d9e59979c	ggml : adapt AMX to tensor->grad removal (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Georgi Gerganov	a4200cafad	make : add ggml-opt (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Georgi Gerganov	84274a10c3	tests : remove test-grad0	2024-11-17 08:30:29 +02:00
Georgi Gerganov	68fcb4759c	ggml : fix compile warnings (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Johannes Gäßler	8a43e940ab	ggml: new optimization interface (ggml/988)	2024-11-17 08:30:29 +02:00
Georgi Gerganov	5c9a8b22b1	scripts : update sync	2024-11-17 08:30:29 +02:00
FirstTimeEZ	0fff7fd798	docs : vulkan build instructions to use git bash mingw64 (#10303 )	2024-11-17 00:29:18 +01:00
Johannes Gäßler	4e54be0ec6	llama/ex: remove --logdir argument (#10339 )	2024-11-16 23:00:41 +01:00
FirstTimeEZ	7a4ac544f6	Merge branch 'ggerganov:master' into vulkan-assertion	2024-11-17 10:50:46 +13:00
Georgi Gerganov	db4cfd5dbc	llamafile : fix include path (#0 ) ggml-ci	2024-11-16 20:36:26 +02:00
Georgi Gerganov	8ee0d09ae6	make : auto-determine dependencies (#0 )	2024-11-16 20:36:26 +02:00
MaggotHATE	bcdb7a2386	server: (web UI) Add samplers sequence customization (#10255 ) * Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-16 14:26:54 +01:00
Georgi Gerganov	f245cc28d4	scripts : fix missing key in compare-llama-bench.py (#10332 )	2024-11-16 10:32:50 +02:00
Jeff Bolz	772703c8ff	vulkan: Optimize some mat-vec mul quant shaders (#10296 ) Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.	2024-11-16 07:26:57 +01:00
FirstTimeEZ	d283dbcd37	vulkan: change an assertion src1 can go down the first pipeline as nullptr and src0 only needs to be checked once this means the assertion is only required to check if the type is GGML_TYPE_F16 and can usually be skipped	2024-11-16 17:17:10 +13:00
FirstTimeEZ	dd3a6ce9f8	vulkan : add cmake preset debug/release (#10306 )	2024-11-16 02:59:33 +01:00
Dan Johansson	1e58ee1318	ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324 )	2024-11-16 01:53:37 +01:00
FirstTimeEZ	89e4caaaf0	llama : save number of parameters and the size in llama_model (#10286 ) fixes #10285	2024-11-16 01:42:13 +01:00
Srihari-mcw	74d73dc85c	Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314 )	2024-11-15 22:27:00 +01:00
Johannes Gäßler	4047be74da	scripts: update compare-llama-bench.py (#10319 )	2024-11-15 21:19:03 +01:00
slaren	883d206fbd	ggml : fix some build issues	2024-11-15 21:45:32 +02:00
Georgi Gerganov	09ecbcb596	cmake : fix ppc64 check (whisper/0) ggml-ci	2024-11-15 15:44:06 +02:00
thewh1teagle	3225008973	ggml : vulkan logs (whisper/2547)	2024-11-15 15:44:06 +02:00
Georgi Gerganov	cbf5541a82	sync : ggml	2024-11-15 15:44:06 +02:00
Eve	18429220bd	AVX BF16 and single scale quant optimizations (#10212 ) * use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge	2024-11-15 12:47:58 +01:00
R0CKSTAR	f0204a0ec7	ci: build test musa with cmake (#10298 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 12:47:25 +01:00
Romain Biessy	57f8355b29	sycl: Update Intel docker images to use DPC++ 2025.0 (#10305 )	2024-11-15 13:10:45 +02:00
Xuan Son Nguyen	9901068ac7	server : (web UI) add copy button for code block, fix api key (#10242 ) * server : (web ui) add copy btn for code blocks * fix problem with api key * use settings-modal-short-input component * always show copy btn for code snippet	2024-11-15 10:48:49 +01:00

1 2 3 4 5 ...

4134 commits