llama.cpp

Author	SHA1	Message	Date
Jared Van Bortel	57cecad175	main : remove ggml-kompute.h #include	2024-01-26 16:37:33 -05:00
Jared Van Bortel	91324851a3	ci : initial attempt at testing Kompute backend	2024-01-26 16:36:31 -05:00
Jared Van Bortel	297fde5f58	editorconfig-checker : exclude .gitmodules	2024-01-26 15:48:35 -05:00
Jared Van Bortel	454baebacc	op_mul_mat_mat_f32.comp : fix missing final newline	2024-01-26 15:44:13 -05:00
Jared Van Bortel	cdab4043b3	kompute : fix #includes	2024-01-26 15:10:54 -05:00
Jared Van Bortel	2ff2d16131	ggml-kompute.h : remove anything that doesn't need to be public The remaining functions are either used by llama.cpp or GPT4All.	2024-01-26 15:08:46 -05:00
Jared Van Bortel	6af02b19d1	kompute : init device automatically and remove an unnecessary free	2024-01-26 14:45:52 -05:00
slaren	8ca33dec7d	test-backend-ops : check all the ops in the test for support in the backends	2024-01-26 20:01:36 +01:00
Jared Van Bortel	2512799cfe	test-backend-ops : comment out Llama and Falcon tests	2024-01-26 13:55:10 -05:00
Jared Van Bortel	aea84989f7	Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/nomic-vulkan	2024-01-26 13:46:49 -05:00
Jared Van Bortel	e6ce5f21a1	llama : revert unintended whitespace change	2024-01-26 13:10:49 -05:00
slaren	62fead3ea0	cuda : fix tensor size calculation for non-split buffer (#5145 )	2024-01-26 18:59:43 +01:00
Jared Van Bortel	61a5cf88dc	kompute : remove unnecessary use_mmap=false	2024-01-26 12:58:50 -05:00
slaren	15b4538ff2	ggml-alloc : add 10% margin to the buffer sizes (#5149 )	2024-01-26 19:18:26 +02:00
snadampal	7032f4f634	ggml : update softmax n_task calculation (#5126 ) updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.	2024-01-26 19:17:59 +02:00
Georgi Gerganov	5f1925a8ce	scripts : move run-with-preset.py from root to scripts folder	2024-01-26 17:09:44 +02:00
Georgi Gerganov	3b7c914de2	tests : gitignore test-c.o	2024-01-26 14:48:15 +02:00
Xuan Son Nguyen	48c857aa10	server : refactored the task processing logic (#5065 ) * server: add llama_server_queue struct * server: add llama_server_response_event * server: add comments * server: move all mutexes away from server.cpp * server: correct multitask response * server: only add back deferred tasks when one slot is available * server: fix a race condition cause by "request_completion"	2024-01-26 14:42:20 +02:00
crasm	413e7b0559	ci : add model tests + script wrapper (#4586 ) * scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh	2024-01-26 14:18:00 +02:00
Paul Tsochantaris	6dd3c28c9c	metal : remove unused `n_buffers` and `buffers` (#5129 )	2024-01-26 14:16:07 +02:00
Riceball LEE	38b431de23	gguf : fix "general.alignment" type in gguf_reader.py (#5136 )	2024-01-26 11:10:28 +02:00
Georgi Gerganov	aad0b01d73	readme : update hot topics	2024-01-26 10:52:33 +02:00
Kawrakow	1182cf4d4f	Another bucket sort (#5109 ) * Initial bucket sort * Bucket sort: slightly better version * Bucket sort: another minor improvement --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-26 09:14:39 +02:00
Jared Van Bortel	91654ff042	kompute : fix a -Wstrict-aliasing warning	2024-01-25 17:03:06 -05:00
Jared Van Bortel	bc287047fb	kompute : remove unused immintrin.h #include	2024-01-25 16:07:46 -05:00
Jared Van Bortel	3915194232	test-backend-ops : make Falcon test faster with a smaller model	2024-01-25 15:56:42 -05:00
Jared Van Bortel	3fbf0529ef	kompute : mark last few failing ops as unsupported	2024-01-25 15:47:43 -05:00
Jared Van Bortel	445a3734b7	kompute : fix basic Q6_K get_rows, 26 -> 24 failures	2024-01-25 15:38:39 -05:00
Jared Van Bortel	de9fba0d39	kompute : fix basic f16 get_rows, 28 -> 26 failures	2024-01-25 15:22:26 -05:00
XiaotaoChen	fe54033b69	readme : add MobileVLM 1.7B/3B to the supported models list (#5107 ) Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-25 22:14:32 +02:00
l3utterfly	5eaf9964fc	llama : dynamic temperature sampling (#4972 ) * implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-01-25 22:06:22 +02:00
Jared Van Bortel	11b305082b	test-backend-ops : restore softmax tests	2024-01-25 15:05:55 -05:00
Jared Van Bortel	38d1f0c7a0	kompute : fix op_gelu -> Falcon is working on AMDVLK	2024-01-25 15:01:46 -05:00
Jared Van Bortel	6fc99a6e66	test-backend-ops : test larger GELU range	2024-01-25 15:01:46 -05:00
Jared Van Bortel	d292f4f204	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
Jared Van Bortel	1849b85473	test-backend-ops : add Falcon test	2024-01-25 13:55:49 -05:00
Valentin Konovalov	256d1bb0dd	android : use release cmake build type by default (#5123 )	2024-01-25 19:05:51 +02:00
Jared Van Bortel	f5ac635473	kompute : fix q8_0 mmv, 41 -> 28 failures	2024-01-25 11:27:11 -05:00
Jared Van Bortel	987335ea0a	kompute : fix algorithm names	2024-01-25 11:09:18 -05:00
Kawrakow	faa3526a1e	Fix Q3_K_XS for MoE models (#5113 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-25 17:58:53 +02:00
Georgi Gerganov	ddc5a5033f	metal : show compile log messages	2024-01-25 11:26:17 +02:00
Jared Van Bortel	ec68a9657f	test-backend-ops : increase max_nmse_err so Llama passes	2024-01-24 17:31:34 -05:00
Engininja2	cd4fddb29f	cuda : fix 2-bit quants on amd hip (#5105 ) * cuda : fix 2-bit quants on amd hip * use __low2float intrinsic function for new quants	2024-01-24 23:18:15 +01:00
Jared Van Bortel	ebb5f7e968	test-backend-ops : test llama with different batch sizes	2024-01-24 16:55:44 -05:00
Jared Van Bortel	df687b10ab	kompute : support mask parameter of softmax	2024-01-24 16:51:27 -05:00
Jared Van Bortel	8bd38fe32d	test-backend-ops : test mask parameter of ggml_soft_max_ext	2024-01-24 16:28:41 -05:00
Jared Van Bortel	308f279622	kompute : support scale parameter of softmax	2024-01-24 16:17:37 -05:00
Jared Van Bortel	1450966071	test-backend-ops : test scale parameter of ggml_soft_max_ext	2024-01-24 16:17:37 -05:00
Jared Van Bortel	2852902eda	test-backend-ops : add llama test	2024-01-24 16:17:29 -05:00
Jared Van Bortel	2b0f642fec	fix f16 mmv, 49 -> 41 failures	2024-01-24 13:43:49 -05:00

1 2 3 4 5 ...

2124 commits