llama.cpp

Author	SHA1	Message	Date
Zack Li	df5841b6b8	Merge pull request #13 from NexaAI/weili/master-release add omni-vlm-v2 implementations( C++ & python）	2024-11-07 00:48:21 -08:00
李为	3dfac7817f	add returned string type (const char*) for nexa-omni-audio	2024-11-07 16:13:53 +08:00
Zack Li	20b9f02cee	Merge pull request #12 from NexaAI/weili/master-release add returned string type (const char*) for nexa-omni-audio	2024-11-06 19:28:46 -08:00
李为	5edadffd88	add returned string type (const char*) for nexa-omni-audio	2024-11-07 11:19:50 +08:00
Zack Li	6a4cf0b983	Merge pull request #11 from NexaAI/weili/master-release add returned string (const char*) for qwen2 audio	2024-11-05 23:27:47 -08:00
李为	b24a409e22	add returned string (const char*) for qwen2 audio	2024-11-06 15:24:26 +08:00
Zack Li	5574bda471	Merge pull request #10 from NexaAI/weili/master-release add returned string (pure c const char* type) for omni-vlm inference api	2024-11-05 19:41:03 -08:00
李为	22da7bc379	add returned string (pure c const char* type) for omni-vlm inference api	2024-11-06 11:20:36 +08:00
Zack Li	983b4625ef	Merge pull request #8 from NexaAI/weili/master-release add omni-vlm examples (C++ & python)	2024-11-04 22:39:36 -08:00
Zack Li	91b3cafbb5	Merge pull request #6 from NexaAI/master-release-audio-lm Remove C++20 coding and suport Microsoft Visual Studio Compilation	2024-11-04 21:59:26 -08:00
Zack Zhiyuan Li	05853eb861	remove C++20 syntax	2024-11-04 23:03:49 +00:00
Zack Zhiyuan Li	d42e0371f8	remove C++20 style	2024-11-04 22:50:33 +00:00
Zack Zhiyuan Li	1419681089	disable <cxxabi.h> for MSC_VER	2024-11-04 05:45:52 +00:00
Zack Zhiyuan Li	6f1ed6e5cb	Adding #include <io.h> & <fcntl.h>	2024-11-04 04:54:51 +00:00
Zack Zhiyuan Li	a4747b2edb	fix error on windows qwen2-audio/whisper.cpp:9935:38: error: '_O_BINARY' was not declared in this scope	2024-11-04 04:40:41 +00:00
Zack Zhiyuan Li	995baefeed	Disable cxxabi.h dependency on Windows	2024-11-04 03:48:20 +00:00
李为	d277c674ae	add omni-vlm examples (C++ & python)	2024-11-04 09:56:33 +08:00
Zack Zhiyuan Li	4bdc70aaac	update to C++17 for compilation	2024-11-03 22:07:07 +00:00
Zack Zhiyuan Li	9e67ef75b4	remove uneccesary build and rename shared lib	2024-11-03 21:29:09 +00:00
Zack Zhiyuan Li	f0d1c4fa1c	enable qwen2-audio work E2E	2024-11-03 18:33:32 +00:00
Zack Zhiyuan Li	c7b912bdca	support omni-audio	2024-11-03 17:58:08 +00:00
Zack Zhiyuan Li	4a29bca867	update vulkan target name	2024-10-23 20:54:39 +00:00
Zack Li	3a3552632a	update README after renaming GGML	2024-09-10 20:53:14 +00:00
Zack Li	5f81588780	support ggml	2024-09-10 20:50:54 +00:00
Georgi Gerganov	1d1ccce676	flake.lock: Update (#9162 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/c3aa7b8938b17aebd2deecf7be0636000d62a2b9?narHash=sha256-med8%2B5DSWa2UnOqtdICndjDAEjxr5D7zaIiK4pn0Q7c%3D' (2024-08-14) → 'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-08-28 21:28:14 -07:00
slaren	9fe94ccac9	docker : build images only once (#9225 )	2024-08-28 17:28:00 +02:00
slaren	66b039a501	docker : update CUDA images (#9213 )	2024-08-28 13:20:36 +02:00
Georgi Gerganov	20f1789dfb	vulkan : fix build (#0 ) ggml-ci	2024-08-27 22:41:27 +03:00
Georgi Gerganov	231cff5f6f	sync : ggml	2024-08-27 22:41:27 +03:00
Xie Yanbo	3246fe84d7	Fix minicpm example directory (#9111 )	2024-08-27 14:33:08 +02:00
compilade	78eb487bb0	llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156 )	2024-08-27 13:09:23 +03:00
Xuan Son Nguyen	a77feb5d71	server : add some missing env variables (#9116 ) * server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING	2024-08-27 11:07:01 +02:00
CausalLM	2e59d61c1b	llama : fix ChatGLM4 wrong shape (#9194 ) This should fix THUDM/glm-4-9b-chat-1m and CausalLM/miniG	2024-08-27 09:58:22 +03:00
Carsten Kragelund Jørgensen	75e1dbbaab	llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141 ) * fix: llama3.1 rope_freqs not respecting custom head_dim * fix: use potential head_dim for Exaone	2024-08-27 09:53:40 +03:00
arch-btw	ad76569f8e	common : Update stb_image.h to latest version (#9161 ) * Update stb_image.h to latest version Fixes https://github.com/ggerganov/llama.cpp/issues/7431 * Update .ecrc	2024-08-27 08:58:50 +03:00
slaren	7d787ed96c	ggml : do not crash when quantizing q4_x_x with an imatrix (#9192 )	2024-08-26 19:44:43 +02:00
Georgi Gerganov	06658ad7c3	metal : separate scale and mask from QKT in FA kernel (#9189 ) * metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory	2024-08-26 18:31:02 +03:00
Georgi Gerganov	fc18425b6a	ggml : add SSM Metal kernels (#8546 ) * ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci	2024-08-26 17:55:36 +03:00
Georgi Gerganov	879275ac98	tests : fix compile warnings for unreachable code (#9185 ) ggml-ci	2024-08-26 16:30:25 +03:00
Georgi Gerganov	7a3df798fc	ci : add VULKAN support to ggml-ci (#9055 )	2024-08-26 12:19:39 +03:00
Georgi Gerganov	e5edb210cd	server : update deps (#9183 )	2024-08-26 12:16:57 +03:00
slaren	0c41e03ceb	metal : gemma2 flash attention support (#9159 )	2024-08-26 11:08:59 +02:00
slaren	f12ceaca0c	ggml-ci : try to improve build time (#9160 )	2024-08-26 11:03:30 +02:00
Justine Tunney	436787f170	llama : fix time complexity of string replacement (#9163 ) This change fixes a bug where replacing text in a very long string could cause llama.cpp to hang indefinitely. This is because the algorithm used was quadratic, due to memmove() when s.replace() is called in a loop. It seems most search results and LLM responses actually provide the O(n**2) algorithm, which is a great tragedy. Using a builder string fixes things	2024-08-26 09:09:53 +03:00
Herman Semenov	93bc3839f9	common: fixed not working find argument --n-gpu-layers-draft (#9175 )	2024-08-26 00:54:37 +02:00
Johannes Gäßler	f91fc5639b	CUDA: fix Gemma 2 numerical issues for FA (#9166 )	2024-08-25 22:11:48 +02:00
Johannes Gäßler	e11bd856d5	CPU/CUDA: Gemma 2 FlashAttention support (#8542 ) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-24 21:34:59 +02:00
João Dinis Ferreira	8f824ffe8e	quantize : fix typo in usage help of `quantize.cpp` (#9145 )	2024-08-24 09:22:45 +03:00
Xuan Son Nguyen	3ba780e2a8	lora : fix llama conversion script with ROPE_FREQS (#9117 )	2024-08-23 12:58:53 +02:00
piDack	a07c32ea54	llama : use F32 precision in GLM4 attention and no FA (#9130 )	2024-08-23 10:27:17 +03:00

1 2 3 4 5 ...

3666 commits