llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	3b7c914de2	tests : gitignore test-c.o	2024-01-26 14:48:15 +02:00
Xuan Son Nguyen	48c857aa10	server : refactored the task processing logic (#5065 ) * server: add llama_server_queue struct * server: add llama_server_response_event * server: add comments * server: move all mutexes away from server.cpp * server: correct multitask response * server: only add back deferred tasks when one slot is available * server: fix a race condition cause by "request_completion"	2024-01-26 14:42:20 +02:00
crasm	413e7b0559	ci : add model tests + script wrapper (#4586 ) * scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh	2024-01-26 14:18:00 +02:00
Paul Tsochantaris	6dd3c28c9c	metal : remove unused `n_buffers` and `buffers` (#5129 )	2024-01-26 14:16:07 +02:00
Riceball LEE	38b431de23	gguf : fix "general.alignment" type in gguf_reader.py (#5136 )	2024-01-26 11:10:28 +02:00
Georgi Gerganov	aad0b01d73	readme : update hot topics	2024-01-26 10:52:33 +02:00
Kawrakow	1182cf4d4f	Another bucket sort (#5109 ) * Initial bucket sort * Bucket sort: slightly better version * Bucket sort: another minor improvement --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-26 09:14:39 +02:00
Jared Van Bortel	91654ff042	kompute : fix a -Wstrict-aliasing warning	2024-01-25 17:03:06 -05:00
Jared Van Bortel	bc287047fb	kompute : remove unused immintrin.h #include	2024-01-25 16:07:46 -05:00
Jared Van Bortel	3915194232	test-backend-ops : make Falcon test faster with a smaller model	2024-01-25 15:56:42 -05:00
Jared Van Bortel	3fbf0529ef	kompute : mark last few failing ops as unsupported	2024-01-25 15:47:43 -05:00
Jared Van Bortel	445a3734b7	kompute : fix basic Q6_K get_rows, 26 -> 24 failures	2024-01-25 15:38:39 -05:00
Jared Van Bortel	de9fba0d39	kompute : fix basic f16 get_rows, 28 -> 26 failures	2024-01-25 15:22:26 -05:00
XiaotaoChen	fe54033b69	readme : add MobileVLM 1.7B/3B to the supported models list (#5107 ) Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-25 22:14:32 +02:00
l3utterfly	5eaf9964fc	llama : dynamic temperature sampling (#4972 ) * implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-01-25 22:06:22 +02:00
Jared Van Bortel	11b305082b	test-backend-ops : restore softmax tests	2024-01-25 15:05:55 -05:00
Jared Van Bortel	38d1f0c7a0	kompute : fix op_gelu -> Falcon is working on AMDVLK	2024-01-25 15:01:46 -05:00
Jared Van Bortel	6fc99a6e66	test-backend-ops : test larger GELU range	2024-01-25 15:01:46 -05:00
Jared Van Bortel	d292f4f204	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
Jared Van Bortel	1849b85473	test-backend-ops : add Falcon test	2024-01-25 13:55:49 -05:00
Valentin Konovalov	256d1bb0dd	android : use release cmake build type by default (#5123 )	2024-01-25 19:05:51 +02:00
Jared Van Bortel	f5ac635473	kompute : fix q8_0 mmv, 41 -> 28 failures	2024-01-25 11:27:11 -05:00
Jared Van Bortel	987335ea0a	kompute : fix algorithm names	2024-01-25 11:09:18 -05:00
Kawrakow	faa3526a1e	Fix Q3_K_XS for MoE models (#5113 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-25 17:58:53 +02:00
Georgi Gerganov	ddc5a5033f	metal : show compile log messages	2024-01-25 11:26:17 +02:00
Jared Van Bortel	ec68a9657f	test-backend-ops : increase max_nmse_err so Llama passes	2024-01-24 17:31:34 -05:00
Engininja2	cd4fddb29f	cuda : fix 2-bit quants on amd hip (#5105 ) * cuda : fix 2-bit quants on amd hip * use __low2float intrinsic function for new quants	2024-01-24 23:18:15 +01:00
Jared Van Bortel	ebb5f7e968	test-backend-ops : test llama with different batch sizes	2024-01-24 16:55:44 -05:00
Jared Van Bortel	df687b10ab	kompute : support mask parameter of softmax	2024-01-24 16:51:27 -05:00
Jared Van Bortel	8bd38fe32d	test-backend-ops : test mask parameter of ggml_soft_max_ext	2024-01-24 16:28:41 -05:00
Jared Van Bortel	308f279622	kompute : support scale parameter of softmax	2024-01-24 16:17:37 -05:00
Jared Van Bortel	1450966071	test-backend-ops : test scale parameter of ggml_soft_max_ext	2024-01-24 16:17:37 -05:00
Jared Van Bortel	2852902eda	test-backend-ops : add llama test	2024-01-24 16:17:29 -05:00
Jared Van Bortel	2b0f642fec	fix f16 mmv, 49 -> 41 failures	2024-01-24 13:43:49 -05:00
Jared Van Bortel	1a14099c43	fix q4_0/q4_1 mmv, 65 -> 49 failures	2024-01-24 13:43:48 -05:00
Jared Van Bortel	0787b80db8	kompute : remove broken mulrow kernel -> 1 less test failure	2024-01-24 13:43:48 -05:00
Jared Van Bortel	2755ae3d10	kompute : fix more dispatch ambiguity -> 12 less failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	08e23fd78c	kompute : fix op_mul kernel -> 13 less test failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	0899adf86e	kompute : fix get_rows dispatch -> 4 less failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	cb9ceff966	minor cleanup	2024-01-24 13:43:46 -05:00
Georgi Gerganov	33e8d6abe1	kompute : fix ggml_add kernel (#5027 )	2024-01-24 13:43:46 -05:00
Jared Van Bortel	2f6a279e29	fix supported ops for kompute backend	2024-01-24 13:43:45 -05:00
Jared Van Bortel	07530731ba	never try to evaluate an empty command buffer This fixes the immediate crashes with test-backend-ops - when evaluatating individual no-ops like OP_VIEW, it tries to submit an empty command buffer, which crashes RADV and hangs AMDVLK.	2024-01-24 13:43:45 -05:00
Jared Van Bortel	729e1a4cc1	sync op_rope_f16 with recent op_rope_f32 changes	2024-01-24 13:43:45 -05:00
Jared Van Bortel	e9d5223da3	actually fix this assertion	2024-01-24 13:43:44 -05:00
Jared Van Bortel	9431026a84	clean up old backend code	2024-01-24 13:43:44 -05:00
Georgi Gerganov	d6bd471693	kompute : fix rope_f32 and scale ops (#5008 )	2024-01-24 13:43:44 -05:00
Jared Van Bortel	76474a7c0d	kompute : ignore exceptions in ggml_vk_available_devices (#12 ) Signed-off-by: Jared Van Bortel <jared@nomic.ai>	2024-01-24 13:43:43 -05:00
Jared Van Bortel	cad72e1252	add sanity check and fix kompute teardown order	2024-01-24 13:43:43 -05:00
Jared Van Bortel	070919dbf7	attempt to get test-backend-ops working	2024-01-24 13:43:43 -05:00

1 2 3 4 5 ...

2158 commits