llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	48e5009e64	rename gguf-split & quantize bins refs in **/tests.sh	2024-06-13 00:31:04 +01:00
Olivier Chafik	08da184147	add hot topic notice to README.md	2024-06-12 11:27:01 +01:00
Olivier Chafik	ceb2859eef	Merge remote-tracking branch 'origin/master' into bins	2024-06-12 10:43:17 +01:00
Olivier Chafik	be66f9e605	Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`.	2024-06-12 10:40:49 +01:00
Meng, Hengyu	dcf752707d	update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894 ) In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04	2024-06-12 19:05:35 +10:00
Patrice Ferlet	f2b5764beb	Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794 ) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support	2024-06-12 11:18:16 +10:00
k.h.lai	73bac2b11d	vulkan: select only one device for single gpu with multiple drivers (#7582 )	2024-06-11 21:26:05 +02:00
0cc4m	ef52d1d16a	Update Vulkan RoPE implementation (#7818 ) * Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-11 21:20:29 +02:00
Deven Mistry	14f83526cd	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	6fe42d073f	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
Olivier Chafik	e474ef1df4	update llama-rpc-server bin name + doc	2024-06-11 14:42:03 +01:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
Georgi Gerganov	4bfe50f741	tests : check the Python version (#7872 ) ggml-ci	2024-06-11 10:10:20 +03:00
Johannes Gäßler	bdcb8f4222	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860 )	2024-06-11 08:26:07 +02:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Olivier Chafik	ee3a086fdf	Merge pull request #2 from HanClinto/bins-nits-2 Bins nits again	2024-06-11 02:36:25 +01:00
ochafik	166397f1e4	update grammar/README.md w/ new llama-* names	2024-06-11 02:35:30 +01:00
ochafik	2a9c4cd7ba	Merge remote-tracking branch 'origin/master' into bins	2024-06-11 02:35:01 +01:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
ochafik	8cf8c129d4	Update apps.nix	2024-06-11 00:18:47 +01:00
HanClinto	1f5ec2c0b4	Updating two small `main` references missed earlier in the finetune docs.	2024-06-10 16:12:50 -07:00
Olivier Chafik	82df7f9f0e	Merge pull request #1 from HanClinto/bins-rename-nits Nits found in binary renames	2024-06-10 23:58:12 +01:00
HanClinto	70de0debab	Updating documentation references for lookup-merge and export-lora	2024-06-10 15:32:21 -07:00
Jared Van Bortel	864a99e7a0	cmake : fix CMake requirement for CUDA (#7821 )	2024-06-10 18:32:10 -04:00
HanClinto	72660c357c	Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename.	2024-06-10 15:23:32 -07:00
HanClinto	2fd66b2ce2	Updating a few lingering doc references for rename of main to llama-cli	2024-06-10 14:53:23 -07:00
HanClinto	e7e03733b2	Updating docs for eval-callback binary to use new `llama-` prefix.	2024-06-10 14:44:46 -07:00
ochafik	0be5f399c4	add two missing llama- prefixes	2024-06-10 22:00:28 +01:00
Olivier Chafik	f9cfd04bd4	address gbnf-validator unused fread warning (switched to C++ / ifstream)	2024-06-10 17:38:36 +01:00
Olivier Chafik	b8436395b4	rename: llama-cli-cmake-pkg(.exe)	2024-06-10 16:23:45 +01:00
Olivier Chafik	4881a94bee	fix test-eval-callback	2024-06-10 16:21:14 +01:00
Olivier Chafik	b8cb44e812	more llama-cli(.exe)	2024-06-10 16:08:06 +01:00
Olivier Chafik	051633ed2d	update dockerfile refs	2024-06-10 16:05:11 +01:00
Olivier Chafik	1cc651446d	rename(make): llama-baby-llama	2024-06-10 16:03:18 +01:00
Olivier Chafik	0fcf2c328e	rename dockerfile w/ llama-cli	2024-06-10 15:44:49 +01:00
Olivier Chafik	0bb2a3f233	fix some missing -cli suffixes	2024-06-10 15:42:20 +01:00
Olivier Chafik	daeaeb1222	Merge remote-tracking branch 'origin/master' into bins	2024-06-10 15:38:41 +01:00
Olivier Chafik	5265c15d4c	rename llama\|main -> llama-cli; consistent RPM bin prefixes	2024-06-10 15:34:14 +01:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00
Ben Ashbaugh	af4ae502dd	use the correct SYCL context for host USM allocations (#7777 ) Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>	2024-06-10 10:21:31 +01:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00

1 2 3 4 5 ...

3187 commits