llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	08da184147	add hot topic notice to README.md	2024-06-12 11:27:01 +01:00
Olivier Chafik	ceb2859eef	Merge remote-tracking branch 'origin/master' into bins	2024-06-12 10:43:17 +01:00
Olivier Chafik	be66f9e605	Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`.	2024-06-12 10:40:49 +01:00
Meng, Hengyu	dcf752707d	update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894 ) In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04	2024-06-12 19:05:35 +10:00
Patrice Ferlet	f2b5764beb	Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794 ) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support	2024-06-12 11:18:16 +10:00
k.h.lai	73bac2b11d	vulkan: select only one device for single gpu with multiple drivers (#7582 )	2024-06-11 21:26:05 +02:00
0cc4m	ef52d1d16a	Update Vulkan RoPE implementation (#7818 ) * Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-11 21:20:29 +02:00
Deven Mistry	14f83526cd	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	6fe42d073f	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
Olivier Chafik	e474ef1df4	update llama-rpc-server bin name + doc	2024-06-11 14:42:03 +01:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
Georgi Gerganov	4bfe50f741	tests : check the Python version (#7872 ) ggml-ci	2024-06-11 10:10:20 +03:00
Johannes Gäßler	bdcb8f4222	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860 )	2024-06-11 08:26:07 +02:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Olivier Chafik	ee3a086fdf	Merge pull request #2 from HanClinto/bins-nits-2 Bins nits again	2024-06-11 02:36:25 +01:00
ochafik	166397f1e4	update grammar/README.md w/ new llama-* names	2024-06-11 02:35:30 +01:00
ochafik	2a9c4cd7ba	Merge remote-tracking branch 'origin/master' into bins	2024-06-11 02:35:01 +01:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
ochafik	8cf8c129d4	Update apps.nix	2024-06-11 00:18:47 +01:00
HanClinto	1f5ec2c0b4	Updating two small `main` references missed earlier in the finetune docs.	2024-06-10 16:12:50 -07:00
Olivier Chafik	82df7f9f0e	Merge pull request #1 from HanClinto/bins-rename-nits Nits found in binary renames	2024-06-10 23:58:12 +01:00
HanClinto	70de0debab	Updating documentation references for lookup-merge and export-lora	2024-06-10 15:32:21 -07:00
Jared Van Bortel	864a99e7a0	cmake : fix CMake requirement for CUDA (#7821 )	2024-06-10 18:32:10 -04:00
HanClinto	72660c357c	Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename.	2024-06-10 15:23:32 -07:00
HanClinto	2fd66b2ce2	Updating a few lingering doc references for rename of main to llama-cli	2024-06-10 14:53:23 -07:00
HanClinto	e7e03733b2	Updating docs for eval-callback binary to use new `llama-` prefix.	2024-06-10 14:44:46 -07:00
ochafik	0be5f399c4	add two missing llama- prefixes	2024-06-10 22:00:28 +01:00
Olivier Chafik	f9cfd04bd4	address gbnf-validator unused fread warning (switched to C++ / ifstream)	2024-06-10 17:38:36 +01:00
Olivier Chafik	b8436395b4	rename: llama-cli-cmake-pkg(.exe)	2024-06-10 16:23:45 +01:00
Olivier Chafik	4881a94bee	fix test-eval-callback	2024-06-10 16:21:14 +01:00
Olivier Chafik	b8cb44e812	more llama-cli(.exe)	2024-06-10 16:08:06 +01:00
Olivier Chafik	051633ed2d	update dockerfile refs	2024-06-10 16:05:11 +01:00
Olivier Chafik	1cc651446d	rename(make): llama-baby-llama	2024-06-10 16:03:18 +01:00
Olivier Chafik	0fcf2c328e	rename dockerfile w/ llama-cli	2024-06-10 15:44:49 +01:00
Olivier Chafik	0bb2a3f233	fix some missing -cli suffixes	2024-06-10 15:42:20 +01:00
Olivier Chafik	daeaeb1222	Merge remote-tracking branch 'origin/master' into bins	2024-06-10 15:38:41 +01:00
Olivier Chafik	5265c15d4c	rename llama\|main -> llama-cli; consistent RPM bin prefixes	2024-06-10 15:34:14 +01:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00
Ben Ashbaugh	af4ae502dd	use the correct SYCL context for host USM allocations (#7777 ) Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>	2024-06-10 10:21:31 +01:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00

1 2 3 4 5 ...

3186 commits