llama.cpp

Author	SHA1	Message	Date
root	6d34ad7f3c	Merge branch 'master' of https://github.com/bmtwl/llama.cpp	2024-02-08 22:21:33 +00:00
bmwl	99a203d02f	Update ggml.h Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-02-08 14:21:16 -08:00
root	16b91d138e	Merge branch 'master' of https://github.com/bmtwl/llama.cpp	2024-02-08 22:00:47 +00:00
root	e107c4cd54	fixed ggml_init_numa variable	2024-02-08 22:00:35 +00:00
bmwl	fecd66ac06	Merge branch 'ggerganov:master' into master	2024-02-08 13:42:06 -08:00
root	c2c31660a5	add missing enum ggml_numa_strategies declaration	2024-02-08 21:41:36 +00:00
Johannes Gäßler	8e6a9d2de0	CUDA: more warps for mmvq on NVIDIA (#5394 )	2024-02-08 21:56:40 +01:00
slaren	41f308f58e	llama : do not print "offloading layers" message in CPU-only builds (#5416 )	2024-02-08 21:33:03 +01:00
root	314174ddc5	add missing enum ggml_numa_strategies declaration and revert sync problem with master	2024-02-08 19:55:47 +00:00
root	7bbe511b8e	Revert bad merge with dynatemp flags	2024-02-08 19:04:02 +00:00
root	b65c863947	Remote enum llama_numa_strategies	2024-02-08 18:07:40 +00:00
bmwl	90668fb596	Merge branch 'ggerganov:master' into master	2024-02-08 09:17:23 -08:00
Abhilash Majumder	6e99f2a04f	Fix f16_sycl cpy call from Arc (#5411 ) * fix f16_sycl cpy call * rm old logic * add fp16 build CI * use macro * format fix	2024-02-08 22:39:10 +05:30
bmwl	18fb9a5382	Merge branch 'ggerganov:master' into master	2024-02-08 08:39:54 -08:00
root	12c23b60c6	Fixed lingering init_llama_backend() bool calls in tests and examples	2024-02-08 16:28:49 +00:00
Daniel Bevenius	ff4ff05c5f	llava : add missing .py, and fix paths in README.md (#5414 ) This commit adds the missing .py extension to the convert-image-encoder-to-gguf script. It also fixes the paths for the `model` and `mmproj` options in the example llava-cli command. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-08 16:20:03 +02:00
Johannes Gäßler	b7b74cef36	fix trailing whitespace (#5407 )	2024-02-08 11:36:54 +01:00
runfuture	4aa43fab56	llama : fix MiniCPM (#5392 ) * fix bug for norm_rms_eps missing * to align with the same order as convert.py for model write * fix: undo HF models permute tensor * update for flake8 lint	2024-02-08 12:36:19 +02:00
Daniel Bevenius	a6e514a85f	llava: fix typo/formatting in README.md (#5405 ) This commit fixes a typo in the README.md file for the llava example which is causing the formatting to look a little off: Clone llava-v15-7b`` and clip-vit-large-patch14-336`` locally Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-08 09:58:19 +01:00
Johannes Gäßler	26d4efd11e	sampling: fix top_k <= 0 (#5388 ) * sampling: fix top_k <= 0 * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-08 09:46:30 +01:00
Georgi Gerganov	8504d2d0da	tests : .gitignore obj files	2024-02-08 09:46:47 +02:00
bmwl	f156112f56	Merge branch 'ggerganov:master' into master	2024-02-07 17:38:53 -08:00
root	783b7ca02d	Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static	2024-02-07 22:28:29 +00:00
root	d47f232fc1	Removing last bit of MIRROR_MODE code for this PR	2024-02-07 22:02:21 +00:00
root	61c37ba93c	Removing MIRROR_MODE code for this PR	2024-02-07 21:46:19 +00:00
Michael Podvitskiy	c4fbb6717c	CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-02-07 16:39:23 -05:00
root	3eccea1b63	Syncing to pr	2024-02-07 21:36:39 +00:00
Ebey Abraham	8c933b70c2	fix typo in readme (#5399 ) Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>	2024-02-07 22:11:30 +01:00
root	c43808c625	Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet	2024-02-07 19:49:07 +00:00
Kamil Tomšík	b906596bb7	Add Ava in the list of llama.cpp UIs (#4362 )	2024-02-07 13:44:52 -05:00
Johannes Gäßler	aa7ab99be2	CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386 )	2024-02-07 12:40:26 +01:00
Neo Zhang Jianyu	10afa6f1d1	[SYCL] update install make by w64devkit (#5297 )	2024-02-07 18:16:55 +08:00
Xiao-Yong Jin	0ef46da632	llava-cli : always tokenize special tokens (#5382 ) * llava-cli: tokenize special tokens in prompt * llava-cli: use the escape CLI argument, remove incomplete separate escaping process	2024-02-07 10:17:25 +02:00
0cc4m	ee1628bdfe	Basic Vulkan Multi-GPU implementation (#5321 ) * Initial Vulkan multi-gpu implementation Move most global variables into backend context * Add names to backend device functions * Add further missing cleanup code * Reduce code duplication in tensor split layer assignment * generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h * Only do device info print in the beginning and initialize one backend for cpu assist Add missing cleanup code * Rework backend memory management to make sure devices and buffers get properly allocated and freed * Rename cpu assist free function --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-02-07 07:54:50 +01:00
Eve	ed0bf32290	readme : modernize (#5379 ) * first cleanup, update everything to Llama 2 and remove outdated content * Delete SHA256SUMS * make build instructions generic * recommend Q4_K_M quantization method * Update README.md	2024-02-07 08:21:30 +02:00
Ben Williams	9a697d842b	readme : update ui list (#5354 )	2024-02-07 08:16:48 +02:00
runfuture	316c7faf77	llama : add MiniCPM support (#5346 ) * support minicpm arch. * fix tab/space typo. * convert minicpm model via convert-hf-gguf.py * try to make tokenizer work * fix bug for quantize minicpm * fix for flake8 lint * remove convert-minicpm.py * fix for editorconfig * correct minicpm model type (size) * constants expanded for minicpm * Minor change of the constant names for minicpm	2024-02-07 08:15:56 +02:00
Justin Parker	f3e2b4fa3f	server : update `/props` with "total_slots" value (#5373 ) * include total "num_slots" in default_generation_settings_for_props * cleanup total_slots return value in /props endpoint * update /props endpoint docs with total_slots * remove num_slots from default_generation_settings_for_props * update /props endpoint section	2024-02-07 08:15:19 +02:00
Sang-Kil Park	f68664ac24	convert : fix TypeError on GPT-2 vocab.json (#5288 )	2024-02-06 23:28:00 -05:00
root	12789eb308	Reverting Makefile	2024-02-06 22:45:21 +00:00
root	7aa974de5e	Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h	2024-02-06 22:43:13 +00:00
root	60b80b0e8a	removed trailing whitespace	2024-02-06 22:27:38 +00:00
root	a69d6e2b91	Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables	2024-02-06 22:23:34 +00:00
Alexey Parfenov	213d1439fa	server : remove model.json endpoint (#5371 )	2024-02-06 20:08:38 +02:00
Johannes Gäßler	17c97fb062	CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370 )	2024-02-06 19:43:06 +02:00
Kawrakow	b08f22c882	Update README.md (#5366 ) Add some links to quantization related PRs	2024-02-06 19:00:16 +02:00
Kawrakow	f57fadc009	Slight quantization improvement for Q4_K and Q5_K (#5361 ) * Q4_K: slightly better quantization * Q5_K: slightly better quantization --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-06 17:28:02 +02:00
BarfingLemurs	2e9c0bd6b3	readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362 )	2024-02-06 16:06:48 +02:00
Johannes Gäßler	2c516611f1	CUDA: mul_mat_vec_q for batch sizes > 1 (#5351 )	2024-02-06 14:44:06 +01:00
Justin Parker	8a79c591de	server : include total "num_slots" in props endpoint (#5349 )	2024-02-06 11:20:59 +02:00

1 2 3 4 5 ...

2130 commits