llama.cpp

Author	SHA1	Message	Date
ngxson	2f055584cf	Merge branch 'master' into xsn/control-vector-generator	2024-06-13 14:33:45 +02:00
ngxson	ca86d4fd33	escape prompt by default	2024-06-13 13:29:58 +02:00
ngxson	25fb0a6e61	beautify help msg	2024-06-13 13:29:46 +02:00
Galunid	a55eb1bf0f	readme : Remove outdated instructions from README.md (#7914 ) [no ci]	2024-06-13 09:42:41 +02:00
slaren	f578b86b21	move BLAS to a separate backend (#6210 ) * move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-13 03:11:35 +02:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Johannes Gäßler	963552903f	CUDA: fix broken oob check for FA vec f32 kernel (#7904 )	2024-06-12 17:41:51 +02:00
ngxson	334dbaed3f	shorten help msg	2024-06-12 17:13:19 +02:00
ngxson	c59bfa6368	add print_usage	2024-06-12 17:12:02 +02:00
ngxson	b22c8459ff	clean up a bit	2024-06-12 16:08:27 +02:00
ngxson	a2a5f1bfbd	better error handling	2024-06-12 16:01:00 +02:00
ngxson	679f5137f8	move param parser to common	2024-06-12 15:58:20 +02:00
Georgi Gerganov	a9cae48003	tests : add non-cont unary tests (#7857 ) * tests : add non-cont unary tests * ggml : update unary asserts and "supports_op" ggml-ci	2024-06-12 16:00:22 +03:00
Georgi Gerganov	bfaa676b08	ggml : improve ggml_is_contiguous logic (#7856 ) * ggml : improve ggml_is_contiguous logic ggml-ci * ggml : support more contiguous cases ggml-ci	2024-06-12 15:24:20 +03:00
Georgi Gerganov	704a35b183	server : restore numeric prompts (#7883 )	2024-06-12 14:42:29 +03:00
ngxson	f54cb8e307	reuse allocr	2024-06-12 12:53:17 +02:00
ngxson	8ee0c96688	fix compile warn	2024-06-12 12:50:29 +02:00
ngxson	e683b9af60	attemp to fix compile problem on mac	2024-06-12 12:49:01 +02:00
ngxson	7297817d13	use ggml_backend_tensor_copy	2024-06-12 11:41:37 +02:00
Meng, Hengyu	dcf752707d	update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894 ) In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04	2024-06-12 19:05:35 +10:00
Patrice Ferlet	f2b5764beb	Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794 ) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support	2024-06-12 11:18:16 +10:00
ngxson	e9cb3b336d	fix .editorconfig	2024-06-11 22:09:14 +02:00
k.h.lai	73bac2b11d	vulkan: select only one device for single gpu with multiple drivers (#7582 )	2024-06-11 21:26:05 +02:00
0cc4m	ef52d1d16a	Update Vulkan RoPE implementation (#7818 ) * Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-11 21:20:29 +02:00
ngxson	5ffba9ecc3	add readme	2024-06-11 19:35:17 +02:00
ngxson	04c91d29ff	use ggml_format_name	2024-06-11 19:14:04 +02:00
ngxson	54f77e2467	add to makefile all targets	2024-06-11 19:03:13 +02:00
ngxson	85db22dd20	Merge branch 'master' into xsn/control-vector-generator	2024-06-11 19:00:19 +02:00
Deven Mistry	14f83526cd	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	6fe42d073f	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
ngxson	da6babdf0a	fix macos build	2024-06-11 15:47:35 +02:00
ngxson	3223133cf5	default n_pca_batch to 20	2024-06-11 15:05:06 +02:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
ngxson	d41c719980	bring back n_completions	2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng	446da906d9	fix n_completions	2024-06-11 08:22:38 -04:00
ngxson	163916864c	remember to copy back the last_eigenvector	2024-06-11 12:40:07 +02:00
ngxson	1a088fb0a5	working version	2024-06-11 12:37:05 +02:00
ngxson	9e39571fc2	add n_batch for pca	2024-06-11 11:45:16 +02:00
Georgi Gerganov	4bfe50f741	tests : check the Python version (#7872 ) ggml-ci	2024-06-11 10:10:20 +03:00
Johannes Gäßler	bdcb8f4222	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860 )	2024-06-11 08:26:07 +02:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
ngxson	6a5adf3d7c	fix shape of v_diff_original	2024-06-11 01:33:16 +02:00
ngxson	c241b500a1	clean up PCA ggml implementation	2024-06-11 01:13:10 +02:00
Jared Van Bortel	864a99e7a0	cmake : fix CMake requirement for CUDA (#7821 )	2024-06-10 18:32:10 -04:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00

1 2 3 4 5 ...

3198 commits