llama.cpp

Author	SHA1	Message	Date
hongruichen	1f9d2a7e22	refactoring: improve tensor print	2024-07-28 22:05:51 +08:00
hongruichen	e33b5c9837	refactoring: print the name of unsupport op	2024-07-27 13:49:49 +08:00
hongruichen	8ab1f15fe3	refactoring: remove internal functions, use op table directly	2024-07-27 13:43:07 +08:00
hongruichen	e0c9b34016	feat: check if dims equal for add looks qnn add can only applied to matrix with equal dimensions	2024-07-27 13:38:12 +08:00
hongruichen	5da73f8085	refactoring: move forward and supports_op into ops file	2024-07-27 13:24:57 +08:00
hongruichen	867c91bfaf	feat: add error string for QnnOpPackage_Error_t	2024-07-27 13:24:57 +08:00
hongruichen	ccfec70106	refactoring: remove unused get_rpcmem_from_memhandle func	2024-07-27 13:24:57 +08:00
hongruichen	2c73791d62	refactoring: remove dup code	2024-07-27 10:48:09 +08:00
hongruichen	18aa6654d5	refactoring: opt graph key gen	2024-07-27 10:39:07 +08:00
hongruichen	be9a8c73a0	fix: suppress warning	2024-07-26 23:07:25 +08:00
hongruichen	47735cb589	fix: try fix error in 2nd run by appending dimension into graph key	2024-07-26 23:04:53 +08:00
hongruichen	ee305cc171	refactoring: split qnn rpc buffer into dedicated class	2024-07-26 22:52:23 +08:00
hongruichen	f843e5aaf5	fix: 1.free up rpc memory at destruct 2. unbind tesnsor	2024-07-25 23:45:04 +08:00
hongruichen	706793f078	fix: back to qnn tensor v1 to fix the create tensor error	2024-07-22 23:08:38 +08:00
hongruichen	3b47056c97	refactoring: change the tensor binding mode between qnn tensor and ggml tensor	2024-07-22 23:08:38 +08:00
hongruichen	b173c4e061	feat: update tensor name when bind to graph	2024-07-20 17:31:40 +08:00
hongruichen	5f3b1ae3b0	fix: try fix graph cache with append the tensors name	2024-07-20 16:39:06 +08:00
hongruichen	51f95d6980	fix: dimension could be wrong for tensor liked 1x1x8	2024-07-20 16:11:35 +08:00
hongruichen	27299463ae	fix: try fix tensor type error	2024-07-20 15:13:10 +08:00
hongruichen	28a00e5e6c	fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG	2024-07-20 14:11:58 +08:00
hongruichen	1679dcf47e	fix: check all dimentions in `can offload`	2024-07-20 13:29:01 +08:00
hongruichen	b1b5cc10b1	add function to convert qnn error into string	2024-07-19 22:51:17 +08:00
hongruichen	a607995f95	Reapply "tried fix the add node error 6005" This reverts commit `f45fbec8f4`.	2024-07-19 15:35:55 +08:00
hongruichen	0153a23d3f	fix support ops This reverts commit `f45fbec8f4`.	2024-07-19 15:31:29 +08:00
hongruichen	f45fbec8f4	Revert "tried fix the add node error 6005" This reverts commit `ce3d09e5f2`.	2024-07-19 12:59:38 +08:00
hongruichen	ce3d09e5f2	tried fix the add node error 6005	2024-07-19 12:59:21 +08:00
hongruichen	665f823748	fix op checker	2024-07-18 22:26:53 +08:00
hongruichen	15f5cc450c	bug: fix allocation size overflow at log	2024-07-18 19:44:05 +08:00
hongruichen	d82b3a0bdb	feat: add GGML_UNARY_OP_GELU	2024-07-18 11:15:48 +08:00
hongruichen	ce199b2de7	refactoring: downgrade some log to debug level	2024-07-17 23:49:47 +08:00
hongruichen	c76fc9aa2f	fix warnings	2024-07-17 23:32:13 +08:00
hongruichen	6457a68bd7	disable qnn profiling in release build	2024-07-17 23:24:29 +08:00
hongruichen	b7d781ec81	remove qnn dedicated unit tests since we're now using the `test-backend-ops` to cross-validate backend ops	2024-07-17 23:08:16 +08:00
hongruichen	2502b57203	fix warnings	2024-07-17 22:10:12 +08:00
hongruichen	454deef83c	register qnn backend	2024-07-17 21:25:55 +08:00
hongruichen	eed960575f	add build step of QNN backend at ggml	2024-07-17 19:43:01 +08:00
hongruichen	861bb9c580	Merge tag 'b3405' into dev-refactoring	2024-07-17 17:13:55 +08:00
hongruichen	bb13795dce	refactoring: remove unused functions and variables	2024-07-17 14:17:35 +08:00
hongruichen	63dc587dff	refactoring: make the buffer alloc and free stay in same class	2024-07-17 14:08:31 +08:00
hongruichen	b1ef302991	refactoring: remove depend of dlsym at utils.hpp	2024-07-17 12:21:33 +08:00
Johannes Gäßler	5e116e8dd5	make/cmake: add missing force MMQ/cuBLAS for HIP (#8515 )	2024-07-16 21:20:59 +02:00
hongruichen	0301b500cd	refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp	2024-07-17 00:18:38 +08:00
Brian	1666f92dcd	gguf-hash : update clib.json to point to original xxhash repo (#8491 ) * Update clib.json to point to Cyan4973 original xxhash Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata https://github.com/Cyan4973/xxHash/pull/954 * gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]	2024-07-16 10:14:16 +03:00
Steve Bonds	37b12f92ab	export-lora : handle help argument (#8497 ) The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.	2024-07-16 10:04:45 +03:00
Georgi Gerganov	0efec57787	llama : valign + remove unused ftype (#8502 )	2024-07-16 10:00:30 +03:00
compilade	7acfd4e8d5	convert_hf : faster lazy safetensors (#8482 ) * convert_hf : faster lazy safetensors This makes '--dry-run' much, much faster. * convert_hf : fix memory leak in lazy MoE conversion The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.	2024-07-15 23:13:10 -04:00
Xuan Son Nguyen	97bdd26eee	Refactor lora adapter support (#8332 ) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit `42415a4874`. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-15 20:50:47 +02:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
hongruichen	ff601abc1c	add todo	2024-07-16 00:05:40 +08:00
Daniel Bevenius	8fac431b06	ggml : suppress unknown pragma 'GCC' on windows (#8460 ) This commit adds a macro guard to pragma GCC to avoid the following warning on windows: ```console C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068: unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj] ```	2024-07-15 15:48:17 +03:00

1 2 3 4 5 ...

3519 commits