llama.cpp

Author	SHA1	Message	Date
Srihari-mcw	806c5a4e5b	Remove additional snippets for CI/CD issues with python constants.py script	2024-08-20 04:02:17 -07:00
Srihari-mcw	43c7be57c1	Add the BF16 delta data types in constants.py	2024-08-19 22:18:10 -07:00
Srihari-mcw	7927655e42	Update the type name in llama.cpp	2024-08-19 22:17:37 -07:00
Srihari-mcw	4a2f703fbb	Add changes to use union data type for better conversion to strong type - Based on 5f2e011e2eed2f685521c707b3e74280fcb81dd3 from llamafile	2024-08-19 22:15:32 -07:00
Srihari-mcw	ef693e99d7	Add the new data types across files	2024-08-12 05:58:11 -07:00
Srihari-mcw	f7ce132258	Add changes to fix compiler issues	2024-08-12 05:56:16 -07:00
Srihari-mcw	db6657eeaf	Fic more conflicts in quantize.cpp	2024-08-12 05:56:16 -07:00
Srihari-mcw	5a6a235ac7	Fix build issues in sgemm.cpp post rebase	2024-08-12 05:56:16 -07:00
Srihari-mcw	c480818d97	Fix issues with SSE3 version for vec_dot_q4_0_b16_q8_0_b16	2024-08-12 05:56:16 -07:00
Srihari-mcw	9e5174ce5d	Remove additional ifdef conditions	2024-08-12 05:56:16 -07:00
Srihari-mcw	983b03ab6a	Add additional comments	2024-08-12 05:56:16 -07:00
Srihari-mcw	e26fd70dce	Introduce Q4_0 and Q8_0 quantizations with BF16 delta values	2024-08-12 05:54:21 -07:00
compilade	4134999e01	gguf-py : Numpy dequantization for most types (#8939 ) * gguf-py : Numpy dequantization for most types * gguf-py : Numpy dequantization for grid-based i-quants	2024-08-11 14:45:41 -04:00
Georgi Gerganov	8cd1bcfd3f	flake.lock: Update (#8979 )	2024-08-11 06:58:58 -07:00
Neo Zhang	a21c6fd450	update guide (#8909 ) Co-authored-by: Neo Zhang <>	2024-08-11 14:07:43 +05:30
fairydreaming	33309f661a	llama : check all graph nodes when searching for result_embd_pooled (#8956 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-08-11 10:35:26 +02:00
Markus Tavenrath	7c5bfd57f8	Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943 ) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-08-11 10:09:09 +02:00
slaren	6e02327e8b	metal : fix uninitialized abort_callback (#8968 )	2024-08-10 15:42:10 +02:00
Xuan Son Nguyen	7eb23840ed	llama : default n_swa for phi-3 (#8931 ) * default n_swa for phi-3 * fix * double check swa	2024-08-10 13:04:40 +02:00
fairydreaming	7c3f55c100	Add support for encoder-only T5 models (#8900 ) * gguf-py : add T5ENCODER model architecture * common : call llama_decode() during warmup only if the model has decoder * convert-hf : add T5EncoderModel * llama : add llama_model_has_decoder() API function * llama : split build_t5() into build_t5_encoder() and build_t5_decoder() * llama : add support for LLM_ARCH_T5ENCODER * llama-embedding : add support for LLAMA_POOLING_TYPE_NONE * llama-embedding : add support for encoder-only models --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-08-10 11:43:26 +02:00
Matteo Mortari	911b437f22	gguf-py : fix double call to add_architecture() (#8952 ) Signed-off-by: tarilabs <matteo.mortari@gmail.com>	2024-08-10 08:58:49 +03:00
Georgi Gerganov	b72942fac9	Merge commit from fork	2024-08-09 23:03:21 +03:00
fairydreaming	6afd1a99dc	llama : add support for lora adapters in T5 model (#8938 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-08-09 18:53:09 +02:00
Georgi Gerganov	272e3bd95e	make : fix llava obj file race (#8946 ) ggml-ci	2024-08-09 18:24:30 +03:00
Georgi Gerganov	45a55b91aa	llama : better replace_all (cont) (#8926 ) * llama : better replace_all (cont) ggml-ci * code : deduplicate replace_all ggml-ci	2024-08-09 18:23:52 +03:00
tc-mb	3071c0a5f2	llava : support MiniCPM-V-2.5 (#7599 ) * init * rename * add run android for termux in readme * add android readme * add instructions in readme * change name in readme * Update README.md * fixed line * add result in readme * random pos_embed * add positions index * change for ollama * change for ollama * better pos_embed in clip * support ollama * updata cmakelist * updata cmakelist * rename wrapper * clear code * replace and organize code * add link * sync master * fix warnings * fix warnings * fix bug in bicubic resize when need resize iamge smaller * receive review comments and modify * receive review comments and modify * put all code into llava dir * fix quality problem in pr code * change n_layer * add space in "-1" * imitate reshape bug of python code * fix bug in clip * fix issues for merging * fix llama-minicpmv-cli in cmake file * change pr readme * fix code review * remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir * fix cmakefile * add warn * fix KEY_HAS_MINICPMV_PROJ * remove load_image_size into clip_ctx * remove the extern "C", MINICPMV_API * fix uhd code for review comment * delete minicpmv-wrapper in pr * remove uhd_image_embed * Modify 2 notes * clip : style changes * del common.h in clip * fix Type-Check error * fix Type-Check error * fix Type-Check error * fix Type-Check error * fix makefile error * fix ubuntu-make error * try fix clip * try fix 1 --------- Co-authored-by: Hongji Zhu <fireyoucan@gmail.com> Co-authored-by: harvestingmoon <leewenyeong@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-09 13:33:53 +03:00
Georgi Gerganov	4305b57c80	sync : ggml	2024-08-09 10:03:48 +03:00
Matt Stephenson	70c0ea3560	whisper : use vulkan as gpu backend when available (whisper/2302) * ggml: use vulkan as gpu backend when available Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> * whisper: enable using vk as default buffer type Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> --------- Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>	2024-08-09 10:03:44 +03:00
Daniel Bevenius	5b2c04f492	embedding : add --pooling option to README.md [no ci] (#8934 ) This commit adds the `--pooling` option to the README.md file in the `examples/embedding` directory. The motivation for adding this options is that currently if the model used does not specify a pooling type the embedding example will fail with the following error message: ```console main: error: pooling type NONE not supported ``` This commit also updates the name of the executable in the examples section.	2024-08-09 09:33:30 +03:00
Daniel Bevenius	6f6496bb09	llama : fix typo in llama_tensor_get_type comment [no ci] (#8937 )	2024-08-09 09:32:23 +03:00
Mathieu Geli	daef3ab233	server : add one level list nesting for embeddings (#8936 )	2024-08-09 09:32:02 +03:00
compilade	345a686d82	llama : reduce useless copies when saving session (#8916 ) * llama : avoid useless copies in dummy session writer * llama : avoid double tensor copy when saving session to buffer	2024-08-08 23:54:00 -04:00
compilade	3a14e00366	gguf-py : simplify support for quant types (#8838 ) * gguf-py : use classes for quants * convert_hf : simplify internal quantization type selection * gguf-py : fix flake8 lint * gguf-py : fix BF16 numpy view type * gguf-py : remove LlamaFileTypeMap Too specific to 'llama.cpp', and would be a maintenance burden to keep up to date. * gguf-py : add generic quantize and dequantize functions The quant classes no longer need to be known, only the target or the source type, for 'quantize' and 'dequantize', respectively.	2024-08-08 13:33:09 -04:00
Georgi Gerganov	afd27f01fe	scripts : sync cann files (#0 )	2024-08-08 14:56:52 +03:00
Georgi Gerganov	366d486c16	scripts : fix sync filenames (#0 )	2024-08-08 14:40:12 +03:00
Georgi Gerganov	e44a561ab0	sync : ggml	2024-08-08 13:19:47 +03:00
Borislav Stanimirov	f93d49ab1e	ggml : ignore more msvc warnings (ggml/906)	2024-08-08 13:19:31 +03:00
Georgi Gerganov	5b33ea1ee7	metal : fix struct name (ggml/912) ggml-ci	2024-08-08 13:19:31 +03:00
Conrad Kramer	85fca8deb6	metal : add abort callback (ggml/905)	2024-08-08 13:19:30 +03:00
Pablo Duboue	ebd541a570	make : clean llamafile objects (#8923 ) `ggml/src/llamafile/sgemm.o` was not deleted on `make clean`	2024-08-08 11:44:51 +03:00
slaren	15fa07a5c5	make : use C compiler to build metal embed object (#8899 ) * make : use C compiler to build metal embed object * use rm + rmdir to avoid -r flag in rm	2024-08-07 18:24:05 +02:00
slaren	be55695eff	ggml-backend : fix async copy from CPU (#8897 ) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same	2024-08-07 13:29:02 +02:00
Ouadie EL FAROUKI	0478174d59	[SYCL] Updated SYCL device filtering (#8901 ) * Updated device filter to depend on default_selector (fixes non-intel device issues) * Small related update to example/sycl Readme	2024-08-07 11:25:36 +01:00
Johannes Gäßler	a8dbc6f753	CUDA/HIP: fix tests/test-backend-ops (#8896 )	2024-08-07 09:07:52 +02:00
Zhenwei Jin	506122d854	llama-bench : add support for getting cpu info on Windows (#8824 ) * Add support for getting cpu info on Windows for llama_bench * refactor --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-07 03:01:06 +02:00
Daniel Bevenius	725e3d9437	quantize : update usage comment in quantize.cpp (#8889 ) This commit updates the usage comment in quantize.cpp to reflect the new name of the executable, which is llama-quantize.	2024-08-07 01:43:00 +02:00
Nexes the Old	31958546c3	typo correction (#8891 )	2024-08-07 01:41:54 +02:00
Xuan Son Nguyen	1e6f6554aa	server : add lora hotswap endpoint (WIP) (#8857 ) * server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style	2024-08-06 17:33:39 +02:00
Johannes Gäßler	641f5dd2a6	CUDA: fix padding logic for FP16/FP32 (#8884 )	2024-08-06 17:13:55 +02:00
Daniel Bevenius	5f4dcb1e60	simple : update name of executable to llama-simple (#8885 ) This commit updates the name of the executable in README.md from `simple` to `llama-simple`.	2024-08-06 16:44:35 +02:00

1 2 3 4 5 ...

3582 commits