llama.cpp

Author	SHA1	Message	Date
HimariO	b24ab8634f	add `m-rope` testcase to `test-backend-ops`	2024-12-09 22:13:11 +08:00
HimariO	3ba7664de9	minor updates on debug util, bug fixs	2024-12-09 22:12:30 +08:00
HimariO	12f17f754d	rename `mrope` related function, params	2024-12-08 01:32:19 +08:00
HimariO	ac2089c378	add mrope unit test, fix few compiler warnings	2024-12-08 00:47:48 +08:00
HimariO	6c39aa38f5	add makefile entry, update speical image padding token	2024-12-07 21:59:54 +08:00
HimariO	cbd08b4204	resolve linter, test errors	2024-11-29 22:18:15 +08:00
HimariO	fac034530f	update to keep up stream changes	2024-11-29 17:55:18 +08:00
HimariO	07553cfb0f	update `llama_hparams`	2024-11-29 17:53:45 +08:00
HimariO	241bb45714	fix rope op mode switching, out dated func args	2024-11-29 17:53:45 +08:00
HimariO	f1fa60f84c	add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`	2024-11-29 17:53:45 +08:00
HimariO	201f7043c3	add fp16 support for qwen2vl and m-rope	2024-11-29 17:53:44 +08:00
HimariO	3237bb4614	add fp32 mrope, vision rope kernel	2024-11-29 17:53:44 +08:00
HimariO	0882f57612	cuda-gdb cmake preset	2024-11-29 17:53:44 +08:00
HimariO	53480d2bdb	replace variable size array with vector	2024-11-29 17:52:47 +08:00
HimariO	3d19dd44b6	add arg parser to qwen2vl_surgery	2024-11-29 17:52:47 +08:00
HimariO	023f0076e0	correcting vision-rope behavior, add the missing last layer back to ViT	2024-11-29 17:52:47 +08:00
HimariO	bcd49f5984	[WIP] create inference workflow, gguf convert script but fix	2024-11-29 17:52:47 +08:00
HimariO	7e9fc7202e	make batch and clip utils compatible with qwen2vl	2024-11-29 17:52:47 +08:00
HimariO	c13edfed59	[WIP] qwen2vl vision model	2024-11-29 17:52:47 +08:00
HimariO	3c3691e10f	update 5D tensor op workaround	2024-11-29 17:52:47 +08:00
HimariO	f661483ea7	update qwen2vl cli tool	2024-11-29 17:52:47 +08:00
HimariO	9d389a051b	Add vl-rope/2d-rope support for qwen2vl ViT	2024-11-29 17:52:47 +08:00
HimariO	35411963d2	Verify m-rope output	2024-11-29 17:52:46 +08:00
HimariO	b24bd89e77	[WIP] add qwen2vl arch	2024-11-29 17:52:46 +08:00
HimariO	7c6f793492	Add Qwen2VL cli entrypoint	2024-11-29 17:52:46 +08:00
HimariO	c17546fffa	Barebone Qwen2VL LLM convertor	2024-11-29 17:52:46 +08:00
Chenguang Li	938f608742	CANN: RoPE operator optimization (#10563 ) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-29 14:46:55 +08:00
Jeff Bolz	f095a649ec	vulkan: get the first command buffer submitted sooner (#10499 ) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.	2024-11-29 07:18:02 +01:00
Ting Lou	678d7994f4	llava: return false instead of exit (#10546 )	2024-11-29 01:09:46 +01:00
Georgi Gerganov	dc22344088	ggml : remove redundant copyright notice + update authors	2024-11-28 20:46:40 +02:00
Georgi Gerganov	4c0a95b107	llama : add missing model types	2024-11-28 20:45:07 +02:00
Xuan Son Nguyen	6c59567689	server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568 ) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3	2024-11-28 19:17:49 +01:00
Johannes Gäßler	890719311b	common: fix warning message when no GPU found (#10564 )	2024-11-28 18:15:25 +01:00
Random Fly	7281cf13ad	docs: fix outdated usage of llama-simple (#10565 )	2024-11-28 16:03:11 +01:00
Diego Devesa	e90688edd0	ci : fix tag name in cuda and hip releases (#10566 )	2024-11-28 15:58:54 +01:00
Georgi Gerganov	76b27d29c2	ggml : fix row condition for i8mm kernels (#10561 ) ggml-ci	2024-11-28 14:56:37 +02:00
Georgi Gerganov	eea986f215	cmake : fix ARM feature detection (#10543 ) ggml-ci	2024-11-28 14:56:23 +02:00
Shupei Fan	c202cef168	ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541 ) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-11-28 13:52:03 +01:00
Sergio López	2025fa67e9	kompute : improve backend to pass test_backend_ops (#10542 ) * kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_[q4_0\|q4_1\|q8_0] permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> --------- Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-28 12:51:38 +01:00
Ruixin Huang	c6bc73951e	CANN: Update cann.md to display correctly in CLion (#10538 )	2024-11-28 15:27:11 +08:00
leo-pony	605fa66c50	CANN: Fix SOC_TYPE compile bug (#10519 ) * CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. * fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version	2024-11-28 15:25:24 +08:00
Chenguang Li	b7420131bf	CANN: ROPE operator optimization (#10540 ) * [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-28 14:24:46 +08:00
Xuan Son Nguyen	9f912511bc	common : fix duplicated file name with hf_repo and hf_file (#10550 )	2024-11-27 22:30:52 +01:00
uvos	3ad5451f3b	Add some minimal optimizations for CDNA (#10498 ) * Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too	2024-11-27 17:10:08 +01:00
Diego Devesa	46c69e0e75	ci : faster CUDA toolkit installation method and use ccache (#10537 ) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master	2024-11-27 11:03:25 +01:00
Georgi Gerganov	9e2301f4a4	metal : fix group_norm support condition (#0 )	2024-11-27 11:22:14 +02:00
Georgi Gerganov	fee824a1a1	sync : ggml	2024-11-27 11:10:42 +02:00
Frankie Robertson	9150f8fef9	Do not include arm_neon.h when compiling CUDA code (ggml/1028)	2024-11-27 11:10:27 +02:00
Jeff Bolz	c31ed2abfc	vulkan: define all quant data structures in types.comp (#10440 )	2024-11-27 08:32:54 +01:00
Jeff Bolz	5b3466bedf	vulkan: Handle GPUs with less shared memory (#10468 ) There have been reports of failure to compile on systems with <= 32KB of shared memory (e.g. #10037). This change makes the large tile size fall back to a smaller size if necessary, and makes mul_mat_id fall back to CPU if there's only 16KB of shared memory.	2024-11-27 08:30:27 +01:00

1 2 3 4 5 ...

4244 commits