llama.cpp

Author	SHA1	Message	Date
slaren	ac145fd2e3	ggml : fix mul_mat_id work size	2024-01-08 03:51:15 +01:00
slaren	5e879c9977	llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)	2024-01-07 23:26:49 +01:00
Lars Grammel	b7e7982953	readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814 )	2024-01-07 22:24:11 +02:00
slaren	87c8207a04	Merge remote-tracking branch 'origin/master' into sl/backend-sched	2024-01-07 17:59:26 +01:00
slaren	226460cc0d	llama-bench : add no-kv-offload parameter (#4812 )	2024-01-07 17:59:01 +01:00
Johannes Gäßler	d5a410e855	CUDA: fixed redundant value dequantization (#4809 )	2024-01-07 17:24:08 +01:00
slaren	7c16cf106d	test-backend-ops : check buffer allocation failures	2024-01-07 13:50:02 +01:00
Georgi Gerganov	9dede37d81	llama : remove unused vars (#4796 )	2024-01-07 14:29:36 +02:00
Georgi Gerganov	f77c72f371	ggml : fix null backend dereference (#4807 ) * ggml : fix null backend dereference * ggml : also check ggml_backend_is_cpu	2024-01-07 12:06:57 +01:00
Georgi Gerganov	3c36213df8	llama : remove redundant GQA check (#4796 )	2024-01-07 11:21:53 +02:00
Alex Azarov	72d8407b36	llama.swiftui : use llama.cpp as SPM package (#4804 )	2024-01-07 10:20:50 +02:00
Georgi Gerganov	d117d4dc5d	llama : print tensor meta for debugging	2024-01-07 09:51:12 +02:00
Alex Azarov	3418c03ecc	llama.swiftui : add visionOS target (#4805 )	2024-01-07 09:46:55 +02:00
Konstantin Zhuravlyov	63ee677efd	ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787 )	2024-01-07 08:52:42 +02:00
Georgi Gerganov	67984921a7	server : fix n_predict check (#4798 )	2024-01-07 08:45:26 +02:00
slaren	72b74f364b	cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)	2024-01-07 00:33:51 +01:00
slaren	2f2c36799d	cuda : add ggml-backend split buffer support	2024-01-07 00:09:26 +01:00
Daniel Illescas Romero	c75ca5d96f	llama.swiftui : use correct pointer for llama_token_eos (#4797 )	2024-01-06 17:12:59 +02:00
Georgi Gerganov	96e80dabc6	examples : improve base-translate.sh script (#4783 )	2024-01-06 11:40:24 +02:00
slaren	ece0b0d855	improve graph splitting, partial fix for --no-kv-offload	2024-01-06 05:17:15 +01:00
slaren	d107459321	ggml-backend : increase GGML_MAX_BACKENDS	2024-01-06 01:02:24 +01:00
slaren	863ef45539	llama : check for null tensor_split	2024-01-06 01:02:24 +01:00
Georgi Gerganov	1fa7ee2e51	batched-bench : add tensor_split param	2024-01-06 01:02:24 +01:00
slaren	a1ab35c682	fix unmap after loading	2024-01-06 01:02:24 +01:00
slaren	6483328fa9	ggml-backend : add names to buffers	2024-01-06 01:02:24 +01:00
slaren	33f0761e9b	llama : ggml-backend integration	2024-01-06 01:02:24 +01:00
a-n-n-a-l-e-e	eec22a1c63	cmake : check for openblas64 (#4134 ) openblas v0.3.22 64-bit pkg-config file is named openblas64.pc https://github.com/OpenMathLib/OpenBLAS/issues/3790	2024-01-05 18:04:40 +02:00
Ikko Eltociear Ashimine	be36bb946a	flake.nix : fix typo (#4700 ) betwen -> between	2024-01-05 18:02:44 +02:00
Georgi Gerganov	91d38876df	metal : switch back to default.metallib (ggml/681) ggml-ci	2024-01-05 18:02:06 +02:00
Georgi Gerganov	d061bf9405	ggml : fix q2_k bpw in comments (ggml/680)	2024-01-05 18:02:06 +02:00
Finn Voorhees	1bf681f90e	ggml : add error handling to graph_compute (whisper/1714)	2024-01-05 18:02:06 +02:00
Georgi Gerganov	c1d7cb28d3	ggml : do not sched_yield when calling BLAS (#4761 ) * ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci	2024-01-05 15:18:21 +02:00
Georgi Gerganov	3681f22443	examples : add few-shot translation example (#4783 )	2024-01-05 15:11:10 +02:00
Daniel Bevenius	b3a7c20b5c	finetune : remove unused includes (#4756 ) This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-04 21:45:37 +02:00
Georgi Gerganov	012cf349ae	server : send token probs for "stream == false" (#4714 )	2024-01-04 19:56:33 +02:00
Johannes Gäßler	a91928014f	Print backend name on test-backend-ops failure (#4751 )	2024-01-04 09:43:23 +01:00
singularity	3c0b585561	llama.swiftui : support loading custom model from file picker (#4767 ) * swiftui: support load model from file picker * swiftui: remove trailing whitespace	2024-01-04 10:22:38 +02:00
Michael Coppola	e5804313a1	server : fix options in README.md (#4765 ) * fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-04 10:17:09 +02:00
Georgi Gerganov	dc891b7f7a	ggml : include stdlib.h before intrin.h (#4736 )	2024-01-04 10:12:26 +02:00
singularity	46cea79e1f	llama.swiftui : fix build of ggml.metallib (#4754 ) * metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-04 09:58:16 +02:00
Daniel Bevenius	cb1e2818e0	train : fix typo in overlapping-samples help msg (#4758 ) This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-03 19:53:40 +02:00
Ashraful Islam	ece9a45e8f	swift : update Package.swift to use ggml as dependency (#4691 ) * updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov	2024-01-03 19:30:02 +02:00
Georgi Gerganov	7bed7eba35	cuda : simplify expression Co-authored-by: slaren <slarengh@gmail.com>	2024-01-03 14:38:38 +02:00
Georgi Gerganov	d55356d3ba	cuda : mark I16 and I32 ops as unsupported ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	75e3fd8581	sync : ggml ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	289313716f	metal : add kernel_get_rows_i32 ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	ab62fc3e55	scripts : fix sync order + metal sed	2024-01-03 14:38:38 +02:00
Guillaume Wenzek	5f66ebca9c	ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) * add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-03 14:38:38 +02:00
Justin Parker	f2eb19bd8b	server : throw an error when `slot unavailable` (#4741 )	2024-01-03 10:43:19 +02:00
Georgi Gerganov	f3f62f0d83	metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725 ) * ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id	2024-01-02 21:07:47 +02:00

1 2 3 4 5 ...

1851 commits