llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	9d0156bf0a	minor [no ci]	2025-01-03 10:17:41 +02:00
Georgi Gerganov	69dd1e859a	llama : quant (cont) ggml-ci	2025-01-02 21:57:46 +02:00
Georgi Gerganov	e06d267ac6	llama : quant ggml-ci	2025-01-02 21:40:16 +02:00
Georgi Gerganov	272cd0eaea	common : update lora ggml-ci	2025-01-02 17:31:26 +02:00
Georgi Gerganov	8d117a518d	llama : model loader ggml-ci	2025-01-02 16:56:37 +02:00
Georgi Gerganov	736e6922ce	llama : context (cont) ggml-ci	2025-01-02 16:56:37 +02:00
Georgi Gerganov	4b39d7020d	minor	2025-01-02 16:56:37 +02:00
Georgi Gerganov	007064f5ec	llama : context ggml-ci	2025-01-02 16:56:36 +02:00
Georgi Gerganov	5bf9dc5783	cont ggml-ci	2025-01-02 16:56:36 +02:00
Georgi Gerganov	add3bfe068	llama : batch ggml-ci	2025-01-02 16:56:36 +02:00
Georgi Gerganov	5f794937d9	llama : impl ggml-ci	2025-01-02 16:56:36 +02:00
Georgi Gerganov	8ab668e122	llama : kv cache ggml-ci	2025-01-02 16:56:36 +02:00
Georgi Gerganov	55791c17f6	minor	2025-01-02 16:56:36 +02:00
Georgi Gerganov	2a3aa05ce9	rebase ggml-ci	2025-01-02 16:56:35 +02:00
Georgi Gerganov	2ebe8fe60e	examples : fix ggml-ci	2025-01-02 16:56:34 +02:00
Georgi Gerganov	30e0c88975	llama : adapter ggml-ci	2025-01-02 16:55:42 +02:00
Georgi Gerganov	a25ff12f8e	llama : hparams ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	7a3065f368	llama : model ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	a2dc93ed20	llama : chat ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	6c22ce1097	llama : arch (cont) ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	e9c9209e01	ci : remove BUILD_SHARED_LIBS=OFF ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	6b24e6eb97	llama : mmap ggml-ci	2025-01-02 16:55:41 +02:00
Georgi Gerganov	cf899ea0d3	llama : arch	2025-01-02 16:55:41 +02:00
Georgi Gerganov	844660ba5d	llama : control-vector -> adapter	2025-01-02 16:55:40 +02:00
Georgi Gerganov	498b68f97d	llama : scatter llama.cpp into multiple modules (wip)	2025-01-02 16:55:40 +02:00
Xuan Son Nguyen	0da5d86026	server : allow using LoRA adapters per-request (#10994 ) * slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-02 15:05:18 +01:00
Benson Wong	a45433ba20	readme : add llama-swap to infrastructure section (#11032 ) * list llama-swap under tools in README * readme: add llama-swap to Infrastructure	2025-01-02 09:14:54 +02:00
Srihari-mcw	0827b2c1da	ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027 ) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-31 15:23:33 +01:00
Xuan Son Nguyen	45095a61bf	server : clean up built-in template detection (#11026 ) * server : clean up built-in template detection * fix compilation * add chat template test * fix condition	2024-12-31 15:22:01 +01:00
Xuan Son Nguyen	5896c65232	server : add OAI compat for /v1/completions (#10974 ) * server : add OAI compat for /v1/completions * add test * add docs * better docs	2024-12-31 12:34:13 +01:00
ymcki	bc7b1f8632	convert : fix Llama-3_1-Nemotron-51B rope settings (#11008 ) * conflict resolution * move comments after bracket to its own line * DeciLMCausalModel now reads rope_theta from config.json properly	2024-12-31 13:04:48 +02:00
Peter	6e1531aca5	common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013 ) In common/common.cpp: * Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature) * Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2) In examples/run/run.cpp: * Add io.h header inclusion (error cannot find function _get_osfhandle) * Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members) * Add initialiser for hFile (warning it may be uninitialised) * Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int) In ggml/src/ggml-opencl/ggml-opencl.cpp: * Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)	2024-12-31 01:46:06 +01:00
Jeff Bolz	716bd6dec3	vulkan: optimize mul_mat for small values of N (#10991 ) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.	2024-12-30 18:27:11 +01:00
ag2s20150909	c250ecb315	android : fix llama_batch free (#11014 )	2024-12-30 14:35:13 +02:00
Jeff Bolz	a813badbbd	vulkan: im2col and matmul optimizations for stable diffusion (#10942 ) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup	2024-12-29 10:16:34 +01:00
Jeff Bolz	fdd2188912	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
Isaac McFadyen	f865ea149d	server: added more docs for response_fields field (#10995 )	2024-12-28 16:09:19 +01:00
Alexey Parfenov	16cdce7b68	server : fix token duplication when streaming with stop strings (#10997 )	2024-12-28 16:08:54 +01:00
Eve	d79d8f39b4	vulkan: multi-row k quants (#10846 ) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default	2024-12-26 16:54:44 +01:00
Peter	d283d02bf2	examples, ggml : fix GCC compiler warnings (#10983 ) Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)	2024-12-26 14:59:11 +01:00
Reza Kakhki	9ba399dfa7	server : add support for "encoding_format": "base64" to the /embeddings endpoints (#10967 ) add support for base64 * fix base64 test * improve test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-24 21:33:04 +01:00
Djip007	2cd43f4900	ggml : more perfo with llamafile tinyblas on x86_64 (#10714 ) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2024-12-24 18:54:49 +01:00
NeverLucky	09fe2e7613	server: allow filtering llama server response fields (#10940 ) * llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-24 17:39:49 +01:00
Georgi Gerganov	30caac3a68	llama : the WPM vocabs use the CLS token as BOS (#10930 ) * llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment	2024-12-24 09:44:20 +02:00
Diego Devesa	60cfa728e2	ggml : use wstring for backend search paths (#10960 ) ggml-ci	2024-12-24 04:05:27 +01:00
Diego Devesa	3327bb0f8d	ggml : fix arm enabled features check (#10961 )	2024-12-24 04:05:17 +01:00
Diego Devesa	32d6ee6385	ggml : fix const usage in SSE path (#10962 )	2024-12-23 20:25:52 +01:00
Xuan Son Nguyen	14b699ecde	server : fix missing model id in /model endpoint (#10957 ) * server : fix missing model id in /model endpoint * fix ci	2024-12-23 12:52:25 +01:00
Xuan Son Nguyen	485dc01214	server : add system_fingerprint to chat/completion (#10917 ) * server : add system_fingerprint to chat/completion * update README	2024-12-23 12:02:44 +01:00
Radoslav Gerganov	86bf31cfe6	rpc-server : add support for the SYCL backend (#10934 )	2024-12-23 10:39:30 +02:00

1 2 3 4 5 ...

4431 commits