ngxson
2f055584cf
Merge branch 'master' into xsn/control-vector-generator
2024-06-13 14:33:45 +02:00
ngxson
ca86d4fd33
escape prompt by default
2024-06-13 13:29:58 +02:00
ngxson
25fb0a6e61
beautify help msg
2024-06-13 13:29:46 +02:00
Galunid
a55eb1bf0f
readme : Remove outdated instructions from README.md ( #7914 ) [no ci]
2024-06-13 09:42:41 +02:00
slaren
f578b86b21
move BLAS to a separate backend ( #6210 )
...
* move BLAS to a separate backend
* rename GGML_USE_OPENBLAS to GGML_USE_BLAS
* alloc : reuse same buffer when the same buffer type if used multiple times
* set number of threads automatically for openblas and blis
* sched : print assignments when GGML_SCHED_DEBUG env variable is set
* sched : allow ops with weights on an incompatible buffer type
This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-13 03:11:35 +02:00
Olivier Chafik
1c641e6aac
build
: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4
.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Johannes Gäßler
963552903f
CUDA: fix broken oob check for FA vec f32 kernel ( #7904 )
2024-06-12 17:41:51 +02:00
ngxson
334dbaed3f
shorten help msg
2024-06-12 17:13:19 +02:00
ngxson
c59bfa6368
add print_usage
2024-06-12 17:12:02 +02:00
ngxson
b22c8459ff
clean up a bit
2024-06-12 16:08:27 +02:00
ngxson
a2a5f1bfbd
better error handling
2024-06-12 16:01:00 +02:00
ngxson
679f5137f8
move param parser to common
2024-06-12 15:58:20 +02:00
Georgi Gerganov
a9cae48003
tests : add non-cont unary tests ( #7857 )
...
* tests : add non-cont unary tests
* ggml : update unary asserts and "supports_op"
ggml-ci
2024-06-12 16:00:22 +03:00
Georgi Gerganov
bfaa676b08
ggml : improve ggml_is_contiguous logic ( #7856 )
...
* ggml : improve ggml_is_contiguous logic
ggml-ci
* ggml : support more contiguous cases
ggml-ci
2024-06-12 15:24:20 +03:00
Georgi Gerganov
704a35b183
server : restore numeric prompts ( #7883 )
2024-06-12 14:42:29 +03:00
ngxson
f54cb8e307
reuse allocr
2024-06-12 12:53:17 +02:00
ngxson
8ee0c96688
fix compile warn
2024-06-12 12:50:29 +02:00
ngxson
e683b9af60
attemp to fix compile problem on mac
2024-06-12 12:49:01 +02:00
ngxson
7297817d13
use ggml_backend_tensor_copy
2024-06-12 11:41:37 +02:00
Meng, Hengyu
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )
...
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
ngxson
e9cb3b336d
fix .editorconfig
2024-06-11 22:09:14 +02:00
k.h.lai
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
ef52d1d16a
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-11 21:20:29 +02:00
ngxson
5ffba9ecc3
add readme
2024-06-11 19:35:17 +02:00
ngxson
04c91d29ff
use ggml_format_name
2024-06-11 19:14:04 +02:00
ngxson
54f77e2467
add to makefile all targets
2024-06-11 19:03:13 +02:00
ngxson
85db22dd20
Merge branch 'master' into xsn/control-vector-generator
2024-06-11 19:00:19 +02:00
Deven Mistry
14f83526cd
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian
6fe42d073f
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
ngxson
da6babdf0a
fix macos build
2024-06-11 15:47:35 +02:00
ngxson
3223133cf5
default n_pca_batch to 20
2024-06-11 15:05:06 +02:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
ngxson
d41c719980
bring back n_completions
2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng
446da906d9
fix n_completions
2024-06-11 08:22:38 -04:00
ngxson
163916864c
remember to copy back the last_eigenvector
2024-06-11 12:40:07 +02:00
ngxson
1a088fb0a5
working version
2024-06-11 12:37:05 +02:00
ngxson
9e39571fc2
add n_batch for pca
2024-06-11 11:45:16 +02:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
json
: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ngxson
6a5adf3d7c
fix shape of v_diff_original
2024-06-11 01:33:16 +02:00
ngxson
c241b500a1
clean up PCA ggml implementation
2024-06-11 01:13:10 +02:00
Jared Van Bortel
864a99e7a0
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00