llama.cpp

Author	SHA1	Message	Date
Eddie-Wang1120	fcf2da4621	add dequantize	2024-06-19 21:48:04 +08:00
Eddie-Wang1120	89c7e4c1dd	remove block scale	2024-06-18 23:33:58 +08:00
Eddie-Wang1120	4edc958fec	fix code	2024-06-18 22:16:16 +08:00
Eddie-Wang1120	a03eff318c	i2s->q22	2024-06-17 20:33:09 +08:00
Eddie-Wang	569a03ed97	finish i2_s/i8_s vec_dot x86 simd	2024-06-15 14:01:26 +00:00
Eddie-Wang1120	95dced07e4	i2_s to absmax	2024-06-15 10:10:40 +08:00
Eddie-Wang1120	7a8961fff5	delete redundant	2024-06-14 12:30:27 +08:00
Eddie-Wang1120	5e5eee7b44	fix whitespace	2024-06-12 16:25:46 +08:00
Eddie-Wang1120	f395dd9ca0	change table name	2024-06-12 14:28:24 +08:00
Eddie-Wang	c0cd08d45e	Merge branch 'ggerganov:master' into bitnet	2024-06-12 14:12:27 +08:00
Patrice Ferlet	f2b5764beb	Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794 ) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support	2024-06-12 11:18:16 +10:00
k.h.lai	73bac2b11d	vulkan: select only one device for single gpu with multiple drivers (#7582 )	2024-06-11 21:26:05 +02:00
0cc4m	ef52d1d16a	Update Vulkan RoPE implementation (#7818 ) * Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-11 21:20:29 +02:00
Deven Mistry	14f83526cd	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	6fe42d073f	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
Georgi Gerganov	4bfe50f741	tests : check the Python version (#7872 ) ggml-ci	2024-06-11 10:10:20 +03:00
Johannes Gäßler	bdcb8f4222	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860 )	2024-06-11 08:26:07 +02:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Eddie-Wang	2322e9db9a	Merge branch 'ggerganov:master' into bitnet	2024-06-11 10:50:12 +08:00
Eddie-Wang1120	de1d5073e4	remove unused	2024-06-11 10:23:20 +08:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
Jared Van Bortel	864a99e7a0	cmake : fix CMake requirement for CUDA (#7821 )	2024-06-10 18:32:10 -04:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00
Ben Ashbaugh	af4ae502dd	use the correct SYCL context for host USM allocations (#7777 ) Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>	2024-06-10 10:21:31 +01:00
Eddie-Wang	c0fd4df883	fix merge	2024-06-10 03:07:38 +00:00
Eddie-Wang	841c903ff9	Merge branch 'ggerganov:master' into bitnet	2024-06-10 10:51:47 +08:00
Eddie-Wang	abd798d70f	fix code	2024-06-10 02:50:14 +00:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Eddie-Wang1120	65ac3a3627	fix	2024-06-10 00:06:09 +08:00
Eddie-Wang1120	344467f2b8	fix code	2024-06-10 00:00:52 +08:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
Eddie-Wang1120	97d22be58c	fix codestyle	2024-06-09 21:22:50 +08:00
root	3a0f8b0697	clean code 2	2024-06-09 21:15:02 +08:00
root	1c5a8b7fec	clean code	2024-06-09 20:22:03 +08:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
root	dbee0a86c1	move i2 to quantize	2024-06-09 18:20:32 +08:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
Eddie-Wang	ca09085593	move i2s to quantize v1	2024-06-09 02:43:38 +00:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit `9422c5e34b`.	2024-06-09 01:43:39 +02:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
Eddie-Wang	4e1ab50628	finish bitnet i2 e2e	2024-06-08 12:44:13 +00:00

1 2 3 4 5 ...

3164 commits