llama.cpp

Author	SHA1	Message	Date
ngxson	a2a5f1bfbd	better error handling	2024-06-12 16:01:00 +02:00
ngxson	679f5137f8	move param parser to common	2024-06-12 15:58:20 +02:00
ngxson	f54cb8e307	reuse allocr	2024-06-12 12:53:17 +02:00
ngxson	8ee0c96688	fix compile warn	2024-06-12 12:50:29 +02:00
ngxson	e683b9af60	attemp to fix compile problem on mac	2024-06-12 12:49:01 +02:00
ngxson	7297817d13	use ggml_backend_tensor_copy	2024-06-12 11:41:37 +02:00
ngxson	e9cb3b336d	fix .editorconfig	2024-06-11 22:09:14 +02:00
ngxson	5ffba9ecc3	add readme	2024-06-11 19:35:17 +02:00
ngxson	04c91d29ff	use ggml_format_name	2024-06-11 19:14:04 +02:00
ngxson	54f77e2467	add to makefile all targets	2024-06-11 19:03:13 +02:00
ngxson	85db22dd20	Merge branch 'master' into xsn/control-vector-generator	2024-06-11 19:00:19 +02:00
Deven Mistry	14f83526cd	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	6fe42d073f	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
ngxson	da6babdf0a	fix macos build	2024-06-11 15:47:35 +02:00
ngxson	3223133cf5	default n_pca_batch to 20	2024-06-11 15:05:06 +02:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
ngxson	d41c719980	bring back n_completions	2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng	446da906d9	fix n_completions	2024-06-11 08:22:38 -04:00
ngxson	163916864c	remember to copy back the last_eigenvector	2024-06-11 12:40:07 +02:00
ngxson	1a088fb0a5	working version	2024-06-11 12:37:05 +02:00
ngxson	9e39571fc2	add n_batch for pca	2024-06-11 11:45:16 +02:00
Georgi Gerganov	4bfe50f741	tests : check the Python version (#7872 ) ggml-ci	2024-06-11 10:10:20 +03:00
Johannes Gäßler	bdcb8f4222	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860 )	2024-06-11 08:26:07 +02:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
ngxson	6a5adf3d7c	fix shape of v_diff_original	2024-06-11 01:33:16 +02:00
ngxson	c241b500a1	clean up PCA ggml implementation	2024-06-11 01:13:10 +02:00
Jared Van Bortel	864a99e7a0	cmake : fix CMake requirement for CUDA (#7821 )	2024-06-10 18:32:10 -04:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00
Ben Ashbaugh	af4ae502dd	use the correct SYCL context for host USM allocations (#7777 ) Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>	2024-06-10 10:21:31 +01:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit `9422c5e34b`.	2024-06-09 01:43:39 +02:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
ngxson	a710df749c	(wip) refactor	2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00

1 2 3 4 5 ...

3181 commits