llama.cpp

Author	SHA1	Message	Date
crasm	e86b8cd93a	Remove shellcheck installation step from workflow	2023-12-21 04:29:05 -05:00
crasm	c9a6de8f8a	Add check-requirements.sh script and GitHub workflow	2023-12-21 04:16:41 -05:00
crasm	b853df4207	Add convert-persimmon-to-gguf.py to new requirements.txt scheme	2023-12-20 03:32:22 -05:00
crasm	ba46057b11	Merge remote-tracking branch 'upstream/master' into cancel-model-load	2023-12-20 00:15:09 -05:00
crasm	ca122dc9e0	Add comment	2023-12-20 00:14:56 -05:00
crasm	a0eab1ea19	Make per-python-script requirements work alone This doesn't break the main requirements.txt.	2023-12-20 00:10:31 -05:00
crasm	267cfa408b	Merge commit '`c50e400163`' into cancel-model-load	2023-12-20 00:04:20 -05:00
crasm	293d16fd40	Restructure requirements.txt Top-level now imports the specific additional requirements for each python file. Using `pip install -r requirements.txt` will fail if versions become mismatched in the per-file requirements.	2023-12-20 00:00:08 -05:00
crasm	9a056ed708	Remove venv before creation	2023-12-19 20:56:22 -05:00
crasm	9809314bbf	Disable test-model-load-cancel in make	2023-12-19 17:46:36 -05:00
Eric Sommerlade	328b83de23	ggml : fixed check for _MSC_VER (#4535 ) Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>	2023-12-19 18:17:01 +02:00
crasm	1e79625910	update requirements.txt	2023-12-19 02:42:07 -05:00
crasm	121b04d121	ci : restrict .github/workflows/build.yml ctest to -L main	2023-12-19 02:20:01 -05:00
crasm	f80ff4dc6a	ci : get ci/run.sh working with test-model-load-cancel	2023-12-19 02:18:50 -05:00
arlo-phoenix	a7aee47b98	ggml-cuda: Fix HIP build (#4528 ) regression of #4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.	2023-12-18 22:33:45 +01:00
Georgi Gerganov	0e18b2e7d0	llama.swiftui : add tinyllama 1.1B F16	2023-12-18 20:17:43 +02:00
Georgi Gerganov	6ff39b129d	llama.swiftui : add more models	2023-12-18 20:05:12 +02:00
Ebey Abraham	b9e74f9bca	llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490 ) * phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2023-12-18 19:27:47 +02:00
hankcs	3c04bf6da8	llama : fix try_override for bool_value which always return true (#4519 )	2023-12-18 15:14:58 +02:00
crasm	aed3cf838c	Attempt at writing ctest_with_model	2023-12-18 04:45:39 -05:00
crasm	4b63355f45	ci : ctest uses -L main	2023-12-18 04:23:58 -05:00
crasm	fd9d247dd2	Label all ctest tests	2023-12-18 04:23:20 -05:00
crasm	6bba3410fa	Simplify .gitignore for tests, clang-tidy fixes	2023-12-17 22:33:38 -05:00
crasm	fe6a6fb6d1	Revert "Revert "Fail test if model file is missing"" This reverts commit `2796953257`.	2023-12-17 22:24:17 -05:00
crasm	068e7c408f	Add test-model-load-cancel to Makefile	2023-12-17 22:22:42 -05:00
Jared Van Bortel	2994f0c5a2	decode : fix logits_valid for legacy API (#4516 )	2023-12-17 19:39:02 -05:00
crasm	2796953257	Revert "Fail test if model file is missing" This reverts commit `32ebd525bf`.	2023-12-17 14:37:01 -05:00
crasm	cb8a4be5d0	Merge branch 'cancel-model-load' of github.com:crasm/llama.cpp into cancel-model-load	2023-12-17 14:31:49 -05:00
crasm	32ebd525bf	Fail test if model file is missing	2023-12-17 14:31:03 -05:00
Georgi Gerganov	1160de38f6	Update llama.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-12-17 21:25:19 +02:00
Georgi Gerganov	b1306c4394	readme : update hot topics	2023-12-17 20:16:23 +02:00
Georgi Gerganov	800a489e4a	llama.swiftui : add bench functionality (#4483 ) * llama.swiftui : add bench button * llama.swiftui : initial bench functionality * force to use n_gpu_layers on simulator * add download buttons & expose llamaState.loadModel * update project.pbxproj * comment #Preview & fix editorconfig check * gitignore : xcode stuff * llama.swiftui : UX improvements * llama.swiftui : avoid data copy via "downloadTask" * llama.swiftui : remove model from project * llama : remove "mostly" from model infos * llama.swiftui : improve bench --------- Co-authored-by: jhen <developer@jhen.me>	2023-12-17 19:38:41 +02:00
Jared Van Bortel	f7f468a97d	gguf-py : fail fast on nonsensical special token IDs (#4489 )	2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva	919c40660f	build : Check the ROCm installation location (#4485 ) * build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace	2023-12-17 17:23:33 +02:00
slaren	45668633fd	finetune : keep allocs alive until all allocations are done (#4486 )	2023-12-17 16:05:56 +01:00
olexiyb	0ffc92d2d2	server : disable llm logs if SERVER_VERBOSE is off (#3792 )	2023-12-17 17:02:16 +02:00
AdithyanI	8edd2b40fd	server : fix grammar being ignored (#4494 ) Fix bug in identifying the grammar.	2023-12-17 16:57:56 +02:00
Alexey Parfenov	eb16dae7e7	server : fix possible ambiguity in content type charset (#4501 )	2023-12-17 16:56:09 +02:00
mzcu	62bd52b7bf	server : allow requests larger than 8K (#4500 )	2023-12-17 16:54:37 +02:00
Bach Le	5daa5f54fd	Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506 )	2023-12-17 11:57:33 +01:00
slaren	c6c4fc081c	lora : add support for non-llama models (#3333 ) * lora : add support for non-llama models ggml-ci * avoid leaking ggml_context on failure cleanup ggml-ci * lora : allow 1d tensors * lora : include embd and output layers in size calculation * fix style	2023-12-16 18:58:46 +01:00
Jared Van Bortel	8a5be3bd58	llama : sanity checks for access to logits (#4274 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-15 22:16:15 -05:00
ShadovvBeast	88ae8952b6	server : add optional API Key Authentication example (#4441 ) * Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-15 13:49:01 +02:00
slaren	ee4725a686	ggml : group mul_mat_id rows by matrix (cpu only) (#4480 ) * ggml : group mul_mat_id rows by matrix (cpu only) * remove mmid parameters from mm forward * store row groups in wdata and calculate only once in GGML_TASK_INIT ggml-ci	2023-12-15 12:45:50 +01:00
crasm	4b1f70cb03	Fix bool return in llama_model_load, remove std::ignore use	2023-12-14 16:29:05 -05:00
slaren	6744dbe924	ggml : use ggml_row_size where possible (#4472 ) * ggml : use ggml_row_size where possible ggml-ci * ggml : move ggml_nbytes_split to ggml-cuda.cu	2023-12-14 20:05:21 +01:00
slaren	cafcd4f895	ggml : remove n_dims from ggml_tensor (#4469 ) ggml-ci	2023-12-14 16:52:08 +01:00
wonjun Jang	c50e400163	py : add protobuf dependency (#4466 )	2023-12-14 14:44:49 +02:00
LostRuins	20a68a7030	ggml : add ggml_row_size() (fixes llama out of space) (#4461 ) * Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values * do not cast to size_t, instead just use doubles * ggml : add ggml_row_size(), deprecate ggml_type_sizef() * ggml : fix row size compute to avoid overflows * tests : fix sizey -> sizez --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-14 14:13:33 +02:00
crasm	3425e62745	llama : Add test for model load cancellation	2023-12-14 04:47:54 -05:00

1 2 3 4 5 ...

1688 commits