Olivier Chafik
08da184147
add hot topic notice to README.md
2024-06-12 11:27:01 +01:00
Olivier Chafik
ceb2859eef
Merge remote-tracking branch 'origin/master' into bins
2024-06-12 10:43:17 +01:00
Olivier Chafik
be66f9e605
Revert "update llama-rpc-server bin name + doc"
...
This reverts commit e474ef1df4
.
2024-06-12 10:40:49 +01:00
Meng, Hengyu
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )
...
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
Olivier Chafik
e474ef1df4
update llama-rpc-server bin name + doc
2024-06-11 14:42:03 +01:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
ee3a086fdf
Merge pull request #2 from HanClinto/bins-nits-2
...
Bins nits again
2024-06-11 02:36:25 +01:00
ochafik
166397f1e4
update grammar/README.md w/ new llama-* names
2024-06-11 02:35:30 +01:00
ochafik
2a9c4cd7ba
Merge remote-tracking branch 'origin/master' into bins
2024-06-11 02:35:01 +01:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json
: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ochafik
8cf8c129d4
Update apps.nix
2024-06-11 00:18:47 +01:00
HanClinto
1f5ec2c0b4
Updating two small main
references missed earlier in the finetune docs.
2024-06-10 16:12:50 -07:00
Olivier Chafik
82df7f9f0e
Merge pull request #1 from HanClinto/bins-rename-nits
...
Nits found in binary renames
2024-06-10 23:58:12 +01:00
HanClinto
70de0debab
Updating documentation references for lookup-merge and export-lora
2024-06-10 15:32:21 -07:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
HanClinto
72660c357c
Updating run-with-preset.py
to use new binary names.
...
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto
2fd66b2ce2
Updating a few lingering doc references for rename of main to llama-cli
2024-06-10 14:53:23 -07:00
HanClinto
e7e03733b2
Updating docs for eval-callback binary to use new llama-
prefix.
2024-06-10 14:44:46 -07:00
ochafik
0be5f399c4
add two missing llama- prefixes
2024-06-10 22:00:28 +01:00
Olivier Chafik
f9cfd04bd4
address gbnf-validator unused fread warning (switched to C++ / ifstream)
2024-06-10 17:38:36 +01:00
Olivier Chafik
b8436395b4
rename: llama-cli-cmake-pkg(.exe)
2024-06-10 16:23:45 +01:00
Olivier Chafik
4881a94bee
fix test-eval-callback
2024-06-10 16:21:14 +01:00
Olivier Chafik
b8cb44e812
more llama-cli(.exe)
2024-06-10 16:08:06 +01:00
Olivier Chafik
051633ed2d
update dockerfile refs
2024-06-10 16:05:11 +01:00
Olivier Chafik
1cc651446d
rename(make): llama-baby-llama
2024-06-10 16:03:18 +01:00
Olivier Chafik
0fcf2c328e
rename dockerfile w/ llama-cli
2024-06-10 15:44:49 +01:00
Olivier Chafik
0bb2a3f233
fix some missing -cli suffixes
2024-06-10 15:42:20 +01:00
Olivier Chafik
daeaeb1222
Merge remote-tracking branch 'origin/master' into bins
2024-06-10 15:38:41 +01:00
Olivier Chafik
5265c15d4c
rename llama|main -> llama-cli; consistent RPM bin prefixes
2024-06-10 15:34:14 +01:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov
10ceba354a
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present ( #7693 )
...
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix ( #7687 )
...
In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names.
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00