Olivier Chafik
73d4a4ae03
Merge branch 'bins' of https://github.com/ochafik/llama.cpp into bins
2024-06-13 00:31:22 +01:00
Olivier Chafik
48e5009e64
rename gguf-split & quantize bins refs in **/tests.sh
2024-06-13 00:31:04 +01:00
Olivier Chafik
19102415ea
Update README.md
2024-06-12 11:32:55 +01:00
Olivier Chafik
ecdde745ba
Update README.md
2024-06-12 11:29:31 +01:00
Olivier Chafik
08da184147
add hot topic notice to README.md
2024-06-12 11:27:01 +01:00
Olivier Chafik
ceb2859eef
Merge remote-tracking branch 'origin/master' into bins
2024-06-12 10:43:17 +01:00
Olivier Chafik
be66f9e605
Revert "update llama-rpc-server bin name + doc"
...
This reverts commit e474ef1df4
.
2024-06-12 10:40:49 +01:00
Meng, Hengyu
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )
...
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
Olivier Chafik
e474ef1df4
update llama-rpc-server bin name + doc
2024-06-11 14:42:03 +01:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
ee3a086fdf
Merge pull request #2 from HanClinto/bins-nits-2
...
Bins nits again
2024-06-11 02:36:25 +01:00
ochafik
166397f1e4
update grammar/README.md w/ new llama-* names
2024-06-11 02:35:30 +01:00
ochafik
2a9c4cd7ba
Merge remote-tracking branch 'origin/master' into bins
2024-06-11 02:35:01 +01:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json
: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ochafik
8cf8c129d4
Update apps.nix
2024-06-11 00:18:47 +01:00
HanClinto
1f5ec2c0b4
Updating two small main
references missed earlier in the finetune docs.
2024-06-10 16:12:50 -07:00
Olivier Chafik
82df7f9f0e
Merge pull request #1 from HanClinto/bins-rename-nits
...
Nits found in binary renames
2024-06-10 23:58:12 +01:00
HanClinto
70de0debab
Updating documentation references for lookup-merge and export-lora
2024-06-10 15:32:21 -07:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
HanClinto
72660c357c
Updating run-with-preset.py
to use new binary names.
...
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto
2fd66b2ce2
Updating a few lingering doc references for rename of main to llama-cli
2024-06-10 14:53:23 -07:00
HanClinto
e7e03733b2
Updating docs for eval-callback binary to use new llama-
prefix.
2024-06-10 14:44:46 -07:00
ochafik
0be5f399c4
add two missing llama- prefixes
2024-06-10 22:00:28 +01:00
Olivier Chafik
f9cfd04bd4
address gbnf-validator unused fread warning (switched to C++ / ifstream)
2024-06-10 17:38:36 +01:00
Olivier Chafik
b8436395b4
rename: llama-cli-cmake-pkg(.exe)
2024-06-10 16:23:45 +01:00
Olivier Chafik
4881a94bee
fix test-eval-callback
2024-06-10 16:21:14 +01:00
Olivier Chafik
b8cb44e812
more llama-cli(.exe)
2024-06-10 16:08:06 +01:00
Olivier Chafik
051633ed2d
update dockerfile refs
2024-06-10 16:05:11 +01:00
Olivier Chafik
1cc651446d
rename(make): llama-baby-llama
2024-06-10 16:03:18 +01:00
Olivier Chafik
0fcf2c328e
rename dockerfile w/ llama-cli
2024-06-10 15:44:49 +01:00
Olivier Chafik
0bb2a3f233
fix some missing -cli suffixes
2024-06-10 15:42:20 +01:00
Olivier Chafik
daeaeb1222
Merge remote-tracking branch 'origin/master' into bins
2024-06-10 15:38:41 +01:00
Olivier Chafik
5265c15d4c
rename llama|main -> llama-cli; consistent RPM bin prefixes
2024-06-10 15:34:14 +01:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov
10ceba354a
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00