llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	f861ff916d	gitignore : server-parallel	2023-10-08 13:54:54 +03:00
Georgi Gerganov	2f7f634143	Merge branch 'master' into HEAD	2023-10-08 13:31:33 +03:00
Johannes Rudolph	a1202a31ed	k-quants : fix comments about block sizing (#3499 )	2023-10-08 13:21:19 +03:00
Georgi Gerganov	94e502dfb7	ci : enable on obj-c changes + fix metal build (#3540 )	2023-10-08 11:24:50 +03:00
Luo Tian	7d8b24932f	zig : fix build by introducing train.cpp (#3539 )	2023-10-08 11:24:01 +03:00
Georgi Gerganov	b0ec5218c3	metal : support MTLGPUFamily < Apple7, formatting, style (#3524 ) * metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7	2023-10-08 10:01:53 +03:00
Kerfuffle	63d3b06a43	llama : fix missing break in Persimmon arch case statements (#3535 )	2023-10-08 08:22:17 +03:00
Kerfuffle	a16e89cec8	Fix trying to strip newline from empty prompt and cfg prompt file content (#3534 )	2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz	4d03833211	gguf.py : fix CI for publishing GGUF package (#3532 ) * Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version	2023-10-07 22:14:10 +03:00
Tom C	c47066d833	py : change version of numpy requirement to 1.24.4 (#3515 ) Co-authored-by: Lyjia <me@lyjia.us>	2023-10-07 12:56:15 +03:00
cebtenzzre	f1782c68de	quantize : fail fast on write errors (#3521 )	2023-10-07 11:41:52 +03:00
Jhen-Jie Hong	c26765a0a1	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Phillip Kravtsov	0e797c2fc5	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00
goerch	3a716b4dae	Fix for #3454 (#3455 ) Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion	2023-10-07 06:57:01 +02:00
FSSRepo	a8435c3e32	improved token gen logic and limits	2023-10-06 18:22:07 -04:00
BarfingLemurs	1faaae8c2b	readme : update models, cuda + ppl instructions (#3510 )	2023-10-06 22:13:36 +03:00
Mihai	cb13d73a72	server : docs fix default values and add n_probs (#3506 )	2023-10-06 21:39:33 +03:00
FSSRepo	c1ac53fbdb	improve README + more questions	2023-10-06 14:18:03 -04:00
Kerfuffle	9ca79d5cbb	kv cache slot search improvements (#3493 ) * kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.	2023-10-06 10:10:13 -06:00
FSSRepo	2fdc181dcb	example added to makefile	2023-10-06 11:46:51 -04:00
FSSRepo	6a5d6733fc	log sys - build info + rnd seed	2023-10-06 11:25:58 -04:00
FSSRepo	f0c646f023	fix makefile server build	2023-10-06 10:31:14 -04:00
FSSRepo	cdceda30c9	added cors middleware	2023-10-06 10:02:37 -04:00
FSSRepo	c71d933d5b	ci: wrong indent style fixed	2023-10-06 09:53:36 -04:00
FSSRepo	c12e18f2f1	httplib.h json.hpp -> common lib	2023-10-06 09:40:08 -04:00
Georgi Gerganov	0c731ca403	prompts : fix editorconfig checks after #3416	2023-10-06 16:36:32 +03:00
pudepiedj	a8777ad84e	parallel : add option to load external prompt file (#3416 ) * Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-06 16:16:38 +03:00
Steward Garcia	bb093eb295	Merge pull request #4 from ggerganov/server-parallel server-parallel : add "--reverse-prompt" + compiler warning fixes	2023-10-06 08:54:35 -04:00
Jhen-Jie Hong	97af49fa39	server : reuse llama_sample_token common util (#3494 ) * server : reuse llama_sample_token common function * common : use n_probs for temperature sampling	2023-10-06 15:44:24 +03:00
Georgi Gerganov	5ab6c2132a	server-parallel : add "--reverse-prompt" + compiler warning fixes	2023-10-06 14:32:19 +03:00
l3utterfly	16820a5a0d	llama : correct hparams comparison (#3446 ) * fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-06 13:47:59 +03:00
Jhen-Jie Hong	04b2f4386e	ci : fix xcodebuild destinations (#3491 ) * ci : fix xcodebuild destinations * ci : add .swift to paths	2023-10-06 13:36:43 +03:00
FSSRepo	afc09db51c	fix json format README	2023-10-05 15:23:58 -04:00
FSSRepo	eb75395b5c	remove trail whitespace	2023-10-05 15:18:47 -04:00
FSSRepo	a7a6ceb7ae	server handling multiple clients with cam	2023-10-05 15:12:39 -04:00
cebtenzzre	48edda30ee	convert : update Falcon script for new HF config (#3448 ) Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>	2023-10-05 15:00:34 -04:00
Kenvix ⭐	45eba9369f	build : use std::make_tuple() for compatibility with older GCC versions (#3488 )	2023-10-05 20:16:39 +03:00
staviq	acec9eaaa9	common : process escape sequences in reverse prompts (#3461 )	2023-10-05 19:17:29 +03:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00

1 2 3 4 5 ...

1365 commits