llama.cpp

Author	SHA1	Message	Date
MaggotHATE	28d2cff729	Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc	2024-10-15 09:46:14 +05:00
MaggotHATE	2be814aa69	Fixed tests and outdated README	2024-10-15 09:46:04 +05:00
MaggotHATE	17ad143ead	Merge branch 'ggerganov:master' into master	2024-10-14 18:36:52 +05:00
MaggotHATE	3613a6d27b	Renamed random distribution	2024-10-14 18:36:03 +05:00
MaggotHATE	436a9919e3	Simplified algorithm since threshold_max is removed	2024-10-14 16:10:13 +05:00
VoidIsVoid	a89f75e1b7	server : handle "logprobs" field with false value (#9871 ) Co-authored-by: Gimling <huangjl@ruyi.ai>	2024-10-14 10:04:36 +03:00
MaggotHATE	dfef2c4c37	Merge branch 'ggerganov:master' into master	2024-10-14 11:44:50 +05:00
MaggotHATE	a3e652296a	Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc	2024-10-14 11:44:00 +05:00
MaggotHATE	44bbd6337a	Quick fixes by comments	2024-10-14 11:43:45 +05:00
agray3	13dca2a54a	Vectorize load instructions in dmmv f16 CUDA kernel (#9816 ) * Vectorize load instructions in dmmv f16 CUDA kernel Replaces scalar with vector load instructions, which substantially improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup. * addressed comment * Update ggml/src/ggml-cuda/dmmv.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-14 02:49:08 +02:00
Georgi Gerganov	d4c19c0f5c	server : accept extra_context for the infill endpoint (#9874 ) * server : accept extra_context for the infill endpoint ggml-ci * server : update readme [no ci] * server : use repo-level FIM pattern if possible ggml-ci	2024-10-13 21:31:35 +03:00
Georgi Gerganov	c7181bd294	server : reuse cached context chunks (#9866 ) ggml-ci	2024-10-13 18:52:48 +03:00
MaggotHATE	ea62e65fe9	Merge branch 'ggerganov:master' into master	2024-10-13 13:45:40 +05:00
Georgi Gerganov	92be9f1216	flake.lock: Update (#9870 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04) → 'github:NixOS/nixpkgs/5633bcff0c6162b9e4b5f1264264611e950c8ec7?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-10-12 20:11:26 -07:00
MaggotHATE	cca842fbd3	Fixed arg after update	2024-10-12 18:46:13 +05:00
MaggotHATE	ea85a51af1	Merge branch 'ggerganov:master' into master	2024-10-12 18:38:06 +05:00
MaggotHATE	68557eb7a0	Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc	2024-10-12 18:36:14 +05:00
MaggotHATE	9c43a01c5d	Removed xtc_threshold_max	2024-10-12 18:35:56 +05:00
Georgi Gerganov	edc265661c	server : add option to time limit the generation phase (#9865 ) ggml-ci	2024-10-12 16:14:27 +03:00
Georgi Gerganov	1bde94dd02	server : remove self-extend features (#9860 ) * server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci	2024-10-12 16:06:31 +03:00
Georgi Gerganov	95c76e8e92	server : remove legacy system_prompt feature (#9857 ) * server : remove legacy system_prompt feature ggml-ci * readme : update [no ci] * server : fix non-transformer logic + remove response from /props	2024-10-12 14:51:54 +03:00
Georgi Gerganov	11ac9800af	llama : improve infill support and special token detection (#9798 ) * llama : improve infill support ggml-ci * llama : add more FIM token strings ggml-ci * server : update prompt on slot restore (#9800) * gguf : deprecate old FIM token KVs	2024-10-12 08:21:51 +03:00
R0CKSTAR	943d20b411	musa : update doc (#9856 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-10-12 08:09:53 +03:00
MaggotHATE	dfe587a5f3	Merge branch 'ggerganov:master' into master	2024-10-12 00:41:34 +05:00
Diego Devesa	96776405a1	ggml : move more prints to the ggml log system (#9839 ) * ggml : move more prints to the ggml log system * show BLAS OpenMP warnings in all builds using debug print	2024-10-11 15:34:45 +02:00
MaggotHATE	acada1a5e7	Made algorithm safer and more readable	2024-10-11 15:36:25 +05:00
MaggotHATE	3968369071	Fixed labels in old server UI	2024-10-11 11:53:19 +05:00
MaggotHATE	882a603bda	Merge branch 'master' into master	2024-10-11 11:26:05 +05:00
Diego Devesa	7eee341bee	common : use common_ prefix for common library functions (#9805 ) * common : use common_ prefix for common library functions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-10-10 22:57:42 +02:00
Diego Devesa	0e9f760eb1	rpc : add backend registry / device interfaces (#9812 ) * rpc : add backend registry / device interfaces * llama : add llama_supports_rpc API * ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server	2024-10-10 20:14:55 +02:00
R0CKSTAR	cf8e0a3bb9	musa: add docker image support (#9685 ) * mtgpu: add docker image support Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: enable docker workflow Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-10-10 20:10:37 +02:00
MaggotHATE	72db625bd4	Added XTC to server UIs	2024-10-10 22:59:23 +05:00
Diego Devesa	c7499c557c	examples : do not use common library in simple example (#9803 ) * examples : do not use common library in simple example * add command line parser, simplify code	2024-10-10 19:50:49 +02:00
MaggotHATE	f7a383ffb3	Initial server support	2024-10-10 21:48:49 +05:00
MaggotHATE	2107882cf5	Renamed parameters, fixed info and defaults * probability is at 0 by default, but XTC is included in sampling queue * threshold higher than 0.5 switches XTC off	2024-10-10 19:35:28 +05:00
MaggotHATE	ba29d31fb7	Merge branch 'ggerganov:master' into master	2024-10-10 11:42:50 +05:00
Diego Devesa	c81f3bbb05	cmake : do not build common library by default when standalone (#9804 )	2024-10-09 18:49:52 +02:00
Georgi Gerganov	e7022064ab	perplexity : fix integer overflow (#9783 ) * perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci	2024-10-09 17:00:18 +03:00
MaggotHATE	37e02e34a1	Added XTC to README	2024-10-09 14:08:02 +05:00
MaggotHATE	ed535bb2ae	Merge branch 'ggerganov:master' into master	2024-10-09 14:00:55 +05:00
Georgi Gerganov	3dc48fe75a	examples : remove llama.vim An updated version will be added in #9787	2024-10-09 10:55:42 +03:00
MaggotHATE	d0b1053897	Fixed incorrect min_keep check	2024-10-09 00:59:46 +05:00
MaggotHATE	6feb6b399c	Update dump info in common	2024-10-08 21:15:37 +05:00
MaggotHATE	c19fb26042	Merged back lost commits in common and arg	2024-10-08 21:11:35 +05:00
MaggotHATE	09bc6d507c	Updated info in common and args	2024-10-08 20:57:36 +05:00
MaggotHATE	81a0c2603c	Simplified algorithm and more tests	2024-10-08 18:38:43 +05:00
MaggotHATE	8110f783d1	Merge branch 'ggerganov:master' into master	2024-10-08 18:36:38 +05:00
Diego Devesa	dca1d4b58a	ggml : fix BLAS with unsupported types (#9775 ) * ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it	2024-10-08 14:21:43 +02:00
Xuan Son Nguyen	458367a906	server : better security control for public deployments (#9776 ) * server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests	2024-10-08 13:27:04 +02:00
standby24x7	fa42aa6d89	scripts : fix spelling typo in messages and comments (#9782 ) Signed-off-by: Masanari Iida <standby24x7@gmail.com>	2024-10-08 09:19:53 +03:00

1 2 3 4 5 ...

3964 commits