jaime-m-p
b7ee8270ae
Replace char32_t with uint32_t
2024-06-17 20:19:03 +02:00
jaime-m-p
903e47f956
Fix merge: renamed and deleted files
2024-06-15 04:03:23 +02:00
jaime-m-p
e28d0e41ea
Merge branch 'master' into tokenizer-bpe-fixes
2024-06-15 03:32:42 +02:00
jaime-m-p
4af5478f60
Better unicode data generation
2024-06-14 23:50:14 +02:00
jaime-m-p
8cda5af9fe
Update brute force random test
2024-06-14 20:14:29 +02:00
jaime-m-p
0575023923
Skip missing byte tokens (falcon)
2024-06-14 20:12:39 +02:00
jaime-m-p
4ff15d4fda
Fix unicode whitespaces (deepseek-llm)
2024-06-14 20:00:15 +02:00
olexiyb
f8ec8877b7
ci : fix macos x86 build ( #7940 )
...
In order to use old `macos-latest` we should use `macos-12`
Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975
2024-06-14 20:28:34 +03:00
Johannes Gäßler
76d66ee0be
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores ( #7921 )
...
* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores
* try CI fix
* try CI fix
* try CI fix
* fix data race
* rever q2_K precision related changes
2024-06-14 18:41:49 +02:00
Georgi Gerganov
66ef1ceedf
metal : utilize max shared memory for mul_mat_id ( #7935 )
2024-06-14 17:14:09 +03:00
Radoslav Gerganov
e65bbf606c
llama-bench : fix RPC indication ( #7936 )
...
Show "<backend_name>+RPC" when RPC offloading is used
2024-06-14 16:47:41 +03:00
Sigbjørn Skjæret
6fcd1331ef
llama : more checks before assuming FIM tokens ( #7644 )
...
* More checks before assuming FIM tokens for Llama arch
* extensive token check
2024-06-14 13:20:04 +03:00
Elaine
41b9260f18
convert : add Poro-34B-chat tokenizer support ( #7713 )
...
* support for Poro chat pre-tokenizer
* add support for Poro pre-tokenizer
* Update convert-hf-to-gguf-update.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Change Poro-34B-chat to poro-chat
* Change Poro-34B-chat to poro-chat
* Update convert-hf-to-gguf-update.py
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-14 13:16:49 +03:00
jaime-m-p
07530a8dce
Fix unicode whitespaces (deepseek-coder)
2024-06-13 20:43:42 +02:00
jaime-m-p
974d40b513
Fix 'jina-v2' per token attributes
2024-06-13 20:40:56 +02:00
jaime-m-p
f58de3174e
update brute force random test
2024-06-13 20:39:55 +02:00
Radoslav Gerganov
172c825684
rpc : fix ggml_backend_rpc_supports_buft() ( #7918 )
2024-06-13 15:18:44 +03:00
Galunid
a55eb1bf0f
readme : Remove outdated instructions from README.md ( #7914 ) [no ci]
2024-06-13 09:42:41 +02:00
slaren
f578b86b21
move BLAS to a separate backend ( #6210 )
...
* move BLAS to a separate backend
* rename GGML_USE_OPENBLAS to GGML_USE_BLAS
* alloc : reuse same buffer when the same buffer type if used multiple times
* set number of threads automatically for openblas and blis
* sched : print assignments when GGML_SCHED_DEBUG env variable is set
* sched : allow ops with weights on an incompatible buffer type
This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-13 03:11:35 +02:00
Olivier Chafik
1c641e6aac
build
: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4
.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
jaime-m-p
75840fe6a6
Fix merge: 'smaug'
2024-06-12 23:01:10 +02:00
jaime-m-p
c863752ca7
Generalize 'jina-v2' per token attributes
2024-06-12 23:00:04 +02:00
Johannes Gäßler
963552903f
CUDA: fix broken oob check for FA vec f32 kernel ( #7904 )
2024-06-12 17:41:51 +02:00
Georgi Gerganov
a9cae48003
tests : add non-cont unary tests ( #7857 )
...
* tests : add non-cont unary tests
* ggml : update unary asserts and "supports_op"
ggml-ci
2024-06-12 16:00:22 +03:00
Georgi Gerganov
bfaa676b08
ggml : improve ggml_is_contiguous logic ( #7856 )
...
* ggml : improve ggml_is_contiguous logic
ggml-ci
* ggml : support more contiguous cases
ggml-ci
2024-06-12 15:24:20 +03:00
Georgi Gerganov
704a35b183
server : restore numeric prompts ( #7883 )
2024-06-12 14:42:29 +03:00
Meng, Hengyu
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )
...
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
jaime-m-p
d67de1a364
Merge commit ' 148995e5
' into tokenizer-bpe-fixes
2024-06-12 00:24:20 +02:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json
: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov
10ceba354a
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00