Eddie-Wang1120
fcf2da4621
add dequantize
2024-06-19 21:48:04 +08:00
Eddie-Wang1120
89c7e4c1dd
remove block scale
2024-06-18 23:33:58 +08:00
Eddie-Wang1120
4edc958fec
fix code
2024-06-18 22:16:16 +08:00
Eddie-Wang1120
a03eff318c
i2s->q22
2024-06-17 20:33:09 +08:00
Eddie-Wang
569a03ed97
finish i2_s/i8_s vec_dot x86 simd
2024-06-15 14:01:26 +00:00
Eddie-Wang1120
95dced07e4
i2_s to absmax
2024-06-15 10:10:40 +08:00
Eddie-Wang1120
7a8961fff5
delete redundant
2024-06-14 12:30:27 +08:00
Eddie-Wang1120
5e5eee7b44
fix whitespace
2024-06-12 16:25:46 +08:00
Eddie-Wang1120
f395dd9ca0
change table name
2024-06-12 14:28:24 +08:00
Eddie-Wang
c0cd08d45e
Merge branch 'ggerganov:master' into bitnet
2024-06-12 14:12:27 +08:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Eddie-Wang
2322e9db9a
Merge branch 'ggerganov:master' into bitnet
2024-06-11 10:50:12 +08:00
Eddie-Wang1120
de1d5073e4
remove unused
2024-06-11 10:23:20 +08:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json
: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Eddie-Wang
c0fd4df883
fix merge
2024-06-10 03:07:38 +00:00
Eddie-Wang
841c903ff9
Merge branch 'ggerganov:master' into bitnet
2024-06-10 10:51:47 +08:00
Eddie-Wang
abd798d70f
fix code
2024-06-10 02:50:14 +00:00
Georgi Gerganov
10ceba354a
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Eddie-Wang1120
65ac3a3627
fix
2024-06-10 00:06:09 +08:00
Eddie-Wang1120
344467f2b8
fix code
2024-06-10 00:00:52 +08:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
Eddie-Wang1120
97d22be58c
fix codestyle
2024-06-09 21:22:50 +08:00
root
3a0f8b0697
clean code 2
2024-06-09 21:15:02 +08:00
root
1c5a8b7fec
clean code
2024-06-09 20:22:03 +08:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
root
dbee0a86c1
move i2 to quantize
2024-06-09 18:20:32 +08:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present ( #7693 )
...
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix ( #7687 )
...
In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names.
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
Eddie-Wang
ca09085593
move i2s to quantize v1
2024-06-09 02:43:38 +00:00
compilade
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter ( #7827 )
...
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value.
In addition use_temp_file is now opt-in instead of opt-out defaulting to False.
Also GGUFWriter now does not require output file name until when actually writing to it.
And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )
...
This reverts commit 9422c5e34b
.
2024-06-09 01:43:39 +02:00
Olivier Chafik
d4d915d351
url: save -mu downloads to new cache location ( #7826 )
...
* url: save -mu download to new cache location
* url: fs_get_cache_file_path util
* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Eddie-Wang
4e1ab50628
finish bitnet i2 e2e
2024-06-08 12:44:13 +00:00