llama.cpp

Author	SHA1	Message	Date
Akarshan Biswas	1ccfaaedbb	Add sum to backend hpp	2025-02-05 09:02:03 +05:30
Akarshan Biswas	d31c62d758	norm: add try catch sycl exception	2025-02-05 09:02:03 +05:30
Akarshan Biswas	5c05a3eedc	Move sum and sum rows to a separate file	2025-02-05 09:02:03 +05:30
Akarshan Biswas	eb466d733a	pool2d: move to a separate file	2025-02-05 09:02:03 +05:30
Akarshan Biswas	4db56d6ed2	im2col: add try catch block and move wrapper function from ggml-sycl.cpp	2025-02-05 09:02:02 +05:30
Akarshan Biswas	ba79258a2b	Add spaces to end of files	2025-02-05 09:02:02 +05:30
Akarshan Biswas	ddc5e428f2	clamp: move to a separate file	2025-02-05 09:02:02 +05:30
Akarshan Biswas	0c319bf721	DUP: move to cpy.cpp, set debug logs and adjust include	2025-02-05 09:02:02 +05:30
Akarshan Biswas	927925ffe2	scale: move to a separate file	2025-02-05 09:02:02 +05:30
Akarshan Biswas	7f2d24fdca	rope: add try catch sycl exception and debug log	2025-02-05 09:02:01 +05:30
Akarshan Biswas	8e86732cf2	diagmask: move to a separate file	2025-02-05 09:02:01 +05:30
Akarshan Biswas	98f5fd2fd1	getrows: move to a separate file	2025-02-05 09:02:01 +05:30
Akarshan Biswas	04d8b038b8	Add back split buffer type checks	2025-02-05 09:02:01 +05:30
Akarshan Biswas	7d8d689d39	eltwise: add back split buffer type checks	2025-02-05 09:02:01 +05:30
Akarshan Biswas	ecacff3f6e	CPY: move to a separate file	2025-02-05 09:02:00 +05:30
Akarshan Biswas	a16b6b7681	eltwise: sort includes	2025-02-05 09:02:00 +05:30
Akarshan Biswas	aaf9ed070d	Add spaces	2025-02-05 09:02:00 +05:30
Akarshan Biswas	3a346592b8	argsort: add a space at the end of file	2025-02-05 09:02:00 +05:30
Akarshan Biswas	51bedb847e	argmax: move missing function to file and fix function name	2025-02-05 09:02:00 +05:30
Akarshan Biswas	a153f1972d	ggml_sycl_compute_forward: fixup function calling names and remove comments	2025-02-05 09:01:59 +05:30
Akarshan Biswas	5288bd5896	Argsort: move to a separate file	2025-02-05 09:01:59 +05:30
Akarshan Biswas	95a09ab505	ARGMAX: move to a separate file	2025-02-05 09:01:59 +05:30
Akarshan Biswas	fa7c4d86f3	Fix GGML_SYCL_DEBUG in kernels in other files	2025-02-05 09:01:59 +05:30
Akarshan Biswas	e1326a7897	binbcast: add try catch sycl::exception	2025-02-05 09:01:59 +05:30
Akarshan Biswas	108be39dfe	binbcast: move to a separate file	2025-02-05 09:01:58 +05:30
Akarshan Biswas	957c11b2cf	binbcast: use void pointer to prevent intermediate type conversions	2025-02-05 09:01:58 +05:30
Akarshan Biswas	2d72bd94b0	SYCL: remove ggml_sycl_op_flatten function	2025-02-05 09:01:58 +05:30
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00
Johannes Gäßler	fd08255d0d	CUDA: non-contiguous (RMS) norm support (#11659 ) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-04 22:21:42 +01:00
fxzjshm	3ec9fd4b77	HIP: force max threads per block to be 1024 (#11621 ) Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <fxzjshm@163.com>	2025-02-04 19:18:38 +01:00
Xuan-Son Nguyen	3962fc1a79	server : add try..catch to places not covered by set_exception_handler (#11620 ) * server : add try..catch to places not covered by set_exception_handler * log_server_request: rm try catch, add reminder	2025-02-04 18:25:42 +01:00
Radoslav Gerganov	1bef571f6a	arg : list RPC devices first when using --list-devices (#11655 ) List devices in the same order as they appear when evaluating the model and splitting tensors across devices, i.e. RPC devices come first in the list. ref #11435	2025-02-04 18:16:20 +02:00
Olivier Chafik	db288b60cb	`tool-call`: command r7b fix for normal responses (#11608 ) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat	2025-02-04 15:48:53 +00:00
Shelby Jenkins	106045e7bb	readme : add llm_client Rust crate to readme bindings (#11628 ) [This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it. It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible. It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face. So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.	2025-02-04 13:20:55 +02:00
Jhen-Jie Hong	f117d84b48	swift : fix llama-vocab api usage (#11645 ) * swiftui : fix vocab api usage * batched.swift : fix vocab api usage	2025-02-04 13:15:24 +02:00
Jhen-Jie Hong	534c46b53c	metal : use residency set for other platforms (#11648 )	2025-02-04 13:07:18 +02:00
Georgi Gerganov	387a1598ca	authors : update	2025-02-04 13:04:10 +02:00
Georgi Gerganov	7c9e0ca520	sync : ggml	2025-02-04 12:59:21 +02:00
Christian Kastner	8f8290ada9	cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) This makes git as a dependency optional, and is useful in the case where ggml is built not from git, but from a tarball, or a distribution source package. This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be using it, though, so there doesn't seem much value factor it out, or even require it.	2025-02-04 12:59:15 +02:00
Georgi Gerganov	b34aedd558	ci : do not stale-close roadmap issues	2025-02-04 09:31:01 +02:00
Olivier Chafik	cde3833239	`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616 ) * tool-call: allow `--jinja --chat-template chatml` * fix double bos issue (drop bos/eos tokens from jinja template) * add missing try catch around jinja parsing to default to chatml * Simplify default chatml logic	2025-02-03 23:49:27 +00:00
Xuan-Son Nguyen	b3451785ac	server : (webui) revert hacky solution from #11626 (#11634 )	2025-02-04 00:10:52 +01:00
Woof Dog	1d1e6a90bc	server : (webui) allow typing and submitting during llm response (#11626 )	2025-02-03 23:16:27 +01:00
Daniel Bevenius	5598f475be	server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622 ) This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server code. The motivation for this is that when using a debug build the server would crash when an exception was throws and terminate the server process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set cpp_httplib will not call the exception handler, which would normally return a 500 error to the client. This caused tests to fail when using a debug build. Fixes: https://github.com/ggerganov/llama.cpp/issues/11613	2025-02-03 16:45:38 +01:00
Georgi Gerganov	8ec05832fa	sync : ggml	2025-02-03 14:57:08 +02:00
Johannes Gäßler	21c84b5d2d	CUDA: fix Volta FlashAttention logic (#11615 )	2025-02-03 14:25:56 +02:00
mashdragon	d92cb67e37	server : (webui) Fix Shift+Enter handling (#11609 ) * Fix Shift+Enter handling `exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway * build index.html.gz --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-03 10:42:55 +01:00
Johannes Gäßler	6eecde3cc8	HIP: fix flash_attn_stream_k_fixup warning (#11604 )	2025-02-02 23:48:29 +01:00
uvos	396856b400	CUDA/HIP: add support for selectable warp size to mmv (#11519 ) CUDA/HIP: add support for selectable warp size to mmv	2025-02-02 22:40:09 +01:00
uvos	4d0598e144	HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601 ) This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly	2025-02-02 22:08:05 +01:00

1 2 3 4 5 ...

4668 commits