llama.cpp

Author	SHA1	Message	Date
Christian Köhnenkamp	9830b6923b	Add apple arm to presets (#10134 ) * Add apple arm to presets * Add final new line	2024-11-02 15:35:31 -07:00
sasha0552	42cadc74bd	server : fix slot selection by lru (#10126 ) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix	2024-11-02 18:34:56 +02:00
Georgi Gerganov	45950415ed	server : fix endpoint checks (#10135 ) ggml-ci	2024-11-02 18:34:00 +02:00
Georgi Gerganov	1926d6e39d	llama : adjust default context size + print warnings (#10136 ) * llama : adjust default context size + print warnings ggml-ci * ggml-ci : add missing gpu-layers + adjust context sizes	2024-11-02 15:18:56 +02:00
Diego Devesa	b634f8a26f	simple-chat : only add bos on first prompt (#10129 )	2024-11-02 13:08:53 +01:00
Xuan Son Nguyen	7554aa4655	convert-lora : make `--base` optional (#10110 ) * convert-lora : make `--base` optional * lint * handle case where base_model_name_or_path is invalid * do not include metadata from base model * clarify unspecified --base * add small comment [no ci] * trigger ci	2024-11-02 12:53:17 +01:00
Diego Devesa	a6744e43e8	llama : add simple-chat example (#10124 ) * llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-11-01 23:50:59 +01:00
Diego Devesa	e991e3127f	llama : use smart pointers for ggml resources (#10117 )	2024-11-01 23:48:26 +01:00
Shupei Fan	418f5eef26	vulkan : improve ggml_vk_create_buffer error handling (#9898 )	2024-11-01 19:33:14 +01:00
Georgi Gerganov	ba6f62eb79	readme : update hot topics	2024-11-01 17:31:51 +02:00
sasha0552	d865d1478c	server : fix smart selection of available slot (#10120 ) * Fix smart selection of available slot * minor fix * replace vectors of tokens with shorthands	2024-11-01 14:33:14 +01:00
Georgi Gerganov	1804adb0cf	ggml : remove ggml_scratch (#10121 ) ggml-ci	2024-11-01 12:58:45 +02:00
Georgi Gerganov	815fe72adc	sync : ggml	2024-11-01 10:28:24 +02:00
Georgi Gerganov	f221d56220	ggml : alloc ggml_contexts on the heap (whisper/2525)	2024-11-01 10:24:50 +02:00
Zhenwei Jin	e597e50794	build: fix build error in Windows env with OneAPI setup (#10107 )	2024-11-01 11:09:59 +08:00
Diego Devesa	85679d37f3	llama : improve output buffer type selection (#10098 )	2024-11-01 00:49:53 +01:00
Diego Devesa	1e9f94994e	quantize : fix --keep-split (#10114 )	2024-11-01 00:45:34 +01:00
Diego Devesa	c02e5ab2a6	llama : fix buffer checks for mamba and rwk (#10111 ) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-10-31 22:54:23 +01:00
Zhenwei Jin	ab3d71f97f	loader: refactor tensor weights storage (#9935 ) * loader: refactor tensor weights storage * use sorted map, sort weights by layer --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-10-31 19:50:39 +01:00
ochafik	bc52c0a4f0	`agent`: add missing tool name in response!	2024-10-31 15:01:17 +00:00
ochafik	479c1520b1	`tool-call`: fix qwen template test	2024-10-31 14:49:59 +00:00
ochafik	fe967b61a1	Update README.md	2024-10-31 14:37:55 +00:00
ochafik	f5f74751b9	nits	2024-10-31 14:28:52 +00:00
ochafik	c4a8050120	Update README.md	2024-10-31 14:27:40 +00:00
ochafik	9477c54676	`tool-call`: functionary-small-v3.2 test now green	2024-10-31 14:11:34 +00:00
ochafik	b35aa4ae1c	`tool-call`: add LLAMA_UPDATE_GOLDENS env for test-chat-template	2024-10-31 13:53:33 +00:00
ochafik	c773516d57	`tool-call`: don't use -fa w/ Mistral-Nemo (hard crashes?)	2024-10-31 13:53:11 +00:00
ochafik	f5b7825595	`tool-call`: code_interpreter & system + tool call support for all jinja templates!	2024-10-31 13:52:46 +00:00
ochafik	c395d4804f	`tool-call`: behaviour-based detection of template features	2024-10-31 13:45:10 +00:00
Kevin Gibbons	0a683e8088	server : include scheme when printing URL (#10106 )	2024-10-31 14:02:35 +01:00
Diego Devesa	dea5e86051	ggml : check tensor name lengths in gguf files (#10100 )	2024-10-31 11:40:59 +01:00
Sergio López	1329c0a75e	kompute: add mul_mat_q4_k shader (#10097 ) This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-10-31 11:09:52 +02:00
ochafik	e8d9d711f6	Update tool_call.feature	2024-10-31 04:50:38 +00:00
ochafik	7d9c90f46b	`tool-call`: nemo tweak (accept raw sql again)	2024-10-31 04:39:40 +00:00
ochafik	542853b34b	`tool-call`: greedy sampling in server tests + tweak prompt	2024-10-31 04:38:22 +00:00
ochafik	be9de3ed8a	Update llama-sampling.cpp	2024-10-31 03:58:15 +00:00
ochafik	61655b9cdd	Merge remote-tracking branch 'origin/master' into tool-call	2024-10-31 01:45:07 +00:00
Olivier Chafik	e4d5449638	`tool-calls`: test Qwen2.5-7B-Instruct-Q4_K_M.gguf	2024-10-30 21:40:15 +00:00
Sergio López	61408e7fad	kompute: add backend registry / device interfaces (#10045 ) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-10-30 17:01:52 +01:00
Diego Devesa	b9e02e8184	ggml : fix memory leaks when loading invalid gguf files (#10094 ) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-10-30 14:51:21 +01:00
ochafik	5227321dfd	`tool-call`: when slow server tests fail, hint to run `python scripts/fetch_server_test_models.py`	2024-10-30 12:40:22 +00:00
ochafik	35ac17f3f1	`tool-call`: fix missing initializer errors	2024-10-30 12:38:34 +00:00
Rich Dougherty	6763f713bb	readme : more lora detail in main example readme (#10064 )	2024-10-30 13:22:39 +01:00
Rich Dougherty	79a2bc042d	convert : more detailed convert lora usage docs (#10065 )	2024-10-30 13:22:21 +01:00
ochafik	3ebdb2b805	`tool-call`: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template	2024-10-30 10:07:10 +00:00
xctan	fc83a9e584	ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029 ) * ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM	2024-10-30 09:00:40 +02:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
Olivier Chafik	92c384a5e8	nits	2024-10-29 17:24:59 +00:00
Olivier Chafik	773ff91b7a	`tool-call`: force printing of lazy grammar trigger tokens to regularize function call parsing	2024-10-29 15:26:51 +00:00
Olivier Chafik	fa4c1119c9	`tool-call`: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call)	2024-10-29 15:25:37 +00:00

... 12 13 14 15 16 ...

4826 commits