Commit graph

3187 commits

Author SHA1 Message Date
Olivier Chafik
48e5009e64 rename gguf-split & quantize bins refs in **/tests.sh 2024-06-13 00:31:04 +01:00
Olivier Chafik
08da184147 add hot topic notice to README.md 2024-06-12 11:27:01 +01:00
Olivier Chafik
ceb2859eef Merge remote-tracking branch 'origin/master' into bins 2024-06-12 10:43:17 +01:00
Olivier Chafik
be66f9e605 Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4.
2024-06-12 10:40:49 +01:00
Meng, Hengyu
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci]
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers (#7582) 2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation (#7818)
* Update Vulkan RoPE implementation

* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception

Minor fixes

* Fix segfault when running out of VRAM

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template (#7880) [no ci]
* fix broken link in pr template

* Update pull_request_template.md [no ci]

---------

Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root (#7868) 2024-06-11 17:43:41 +03:00
Olivier Chafik
e474ef1df4 update llama-rpc-server bin name + doc 2024-06-11 14:42:03 +01:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables (#7879) 2024-06-11 14:45:40 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version (#7872)
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) 2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image (#7861)
* try to fix CUDA ci with --allow-unsupported-compiler

* trigger when build.yml changes

* another test

* try exllama/bdashore3 method

* install vs build tools before cuda toolkit

* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
ee3a086fdf
Merge pull request #2 from HanClinto/bins-nits-2
Bins nits again
2024-06-11 02:36:25 +01:00
ochafik
166397f1e4 update grammar/README.md w/ new llama-* names 2024-06-11 02:35:30 +01:00
ochafik
2a9c4cd7ba Merge remote-tracking branch 'origin/master' into bins 2024-06-11 02:35:01 +01:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ochafik
8cf8c129d4 Update apps.nix 2024-06-11 00:18:47 +01:00
HanClinto
1f5ec2c0b4 Updating two small main references missed earlier in the finetune docs. 2024-06-10 16:12:50 -07:00
Olivier Chafik
82df7f9f0e
Merge pull request #1 from HanClinto/bins-rename-nits
Nits found in binary renames
2024-06-10 23:58:12 +01:00
HanClinto
70de0debab Updating documentation references for lookup-merge and export-lora 2024-06-10 15:32:21 -07:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA (#7821) 2024-06-10 18:32:10 -04:00
HanClinto
72660c357c Updating run-with-preset.py to use new binary names.
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto
2fd66b2ce2 Updating a few lingering doc references for rename of main to llama-cli 2024-06-10 14:53:23 -07:00
HanClinto
e7e03733b2 Updating docs for eval-callback binary to use new llama- prefix. 2024-06-10 14:44:46 -07:00
ochafik
0be5f399c4 add two missing llama- prefixes 2024-06-10 22:00:28 +01:00
Olivier Chafik
f9cfd04bd4 address gbnf-validator unused fread warning (switched to C++ / ifstream) 2024-06-10 17:38:36 +01:00
Olivier Chafik
b8436395b4 rename: llama-cli-cmake-pkg(.exe) 2024-06-10 16:23:45 +01:00
Olivier Chafik
4881a94bee fix test-eval-callback 2024-06-10 16:21:14 +01:00
Olivier Chafik
b8cb44e812 more llama-cli(.exe) 2024-06-10 16:08:06 +01:00
Olivier Chafik
051633ed2d update dockerfile refs 2024-06-10 16:05:11 +01:00
Olivier Chafik
1cc651446d rename(make): llama-baby-llama 2024-06-10 16:03:18 +01:00
Olivier Chafik
0fcf2c328e rename dockerfile w/ llama-cli 2024-06-10 15:44:49 +01:00
Olivier Chafik
0bb2a3f233 fix some missing -cli suffixes 2024-06-10 15:42:20 +01:00
Olivier Chafik
daeaeb1222 Merge remote-tracking branch 'origin/master' into bins 2024-06-10 15:38:41 +01:00
Olivier Chafik
5265c15d4c rename llama|main -> llama-cli; consistent RPM bin prefixes 2024-06-10 15:34:14 +01:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test (#7854) 2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations (#7777)
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov
10ceba354a
flake.lock: Update (#7838)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
  → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00