Srihari-mcw
806c5a4e5b
Remove additional snippets for CI/CD issues with python constants.py script
2024-08-20 04:02:17 -07:00
Srihari-mcw
43c7be57c1
Add the BF16 delta data types in constants.py
2024-08-19 22:18:10 -07:00
Srihari-mcw
7927655e42
Update the type name in llama.cpp
2024-08-19 22:17:37 -07:00
Srihari-mcw
4a2f703fbb
Add changes to use union data type for better conversion to strong type - Based on 5f2e011e2eed2f685521c707b3e74280fcb81dd3 from llamafile
2024-08-19 22:15:32 -07:00
Srihari-mcw
ef693e99d7
Add the new data types across files
2024-08-12 05:58:11 -07:00
Srihari-mcw
f7ce132258
Add changes to fix compiler issues
2024-08-12 05:56:16 -07:00
Srihari-mcw
db6657eeaf
Fic more conflicts in quantize.cpp
2024-08-12 05:56:16 -07:00
Srihari-mcw
5a6a235ac7
Fix build issues in sgemm.cpp post rebase
2024-08-12 05:56:16 -07:00
Srihari-mcw
c480818d97
Fix issues with SSE3 version for vec_dot_q4_0_b16_q8_0_b16
2024-08-12 05:56:16 -07:00
Srihari-mcw
9e5174ce5d
Remove additional ifdef conditions
2024-08-12 05:56:16 -07:00
Srihari-mcw
983b03ab6a
Add additional comments
2024-08-12 05:56:16 -07:00
Srihari-mcw
e26fd70dce
Introduce Q4_0 and Q8_0 quantizations with BF16 delta values
2024-08-12 05:54:21 -07:00
compilade
4134999e01
gguf-py : Numpy dequantization for most types ( #8939 )
...
* gguf-py : Numpy dequantization for most types
* gguf-py : Numpy dequantization for grid-based i-quants
2024-08-11 14:45:41 -04:00
Georgi Gerganov
8cd1bcfd3f
flake.lock: Update ( #8979 )
2024-08-11 06:58:58 -07:00
Neo Zhang
a21c6fd450
update guide ( #8909 )
...
Co-authored-by: Neo Zhang <>
2024-08-11 14:07:43 +05:30
fairydreaming
33309f661a
llama : check all graph nodes when searching for result_embd_pooled ( #8956 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-08-11 10:35:26 +02:00
Markus Tavenrath
7c5bfd57f8
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. ( #8943 )
...
* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.
- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.
* Fix small typo
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-08-11 10:09:09 +02:00
slaren
6e02327e8b
metal : fix uninitialized abort_callback ( #8968 )
2024-08-10 15:42:10 +02:00
Xuan Son Nguyen
7eb23840ed
llama : default n_swa for phi-3 ( #8931 )
...
* default n_swa for phi-3
* fix
* double check swa
2024-08-10 13:04:40 +02:00
fairydreaming
7c3f55c100
Add support for encoder-only T5 models ( #8900 )
...
* gguf-py : add T5ENCODER model architecture
* common : call llama_decode() during warmup only if the model has decoder
* convert-hf : add T5EncoderModel
* llama : add llama_model_has_decoder() API function
* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()
* llama : add support for LLM_ARCH_T5ENCODER
* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE
* llama-embedding : add support for encoder-only models
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-08-10 11:43:26 +02:00
Matteo Mortari
911b437f22
gguf-py : fix double call to add_architecture() ( #8952 )
...
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
2024-08-10 08:58:49 +03:00
Georgi Gerganov
b72942fac9
Merge commit from fork
2024-08-09 23:03:21 +03:00
fairydreaming
6afd1a99dc
llama : add support for lora adapters in T5 model ( #8938 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-08-09 18:53:09 +02:00
Georgi Gerganov
272e3bd95e
make : fix llava obj file race ( #8946 )
...
ggml-ci
2024-08-09 18:24:30 +03:00
Georgi Gerganov
45a55b91aa
llama : better replace_all (cont) ( #8926 )
...
* llama : better replace_all (cont)
ggml-ci
* code : deduplicate replace_all
ggml-ci
2024-08-09 18:23:52 +03:00
tc-mb
3071c0a5f2
llava : support MiniCPM-V-2.5 ( #7599 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* clip : style changes
* del common.h in clip
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix makefile error
* fix ubuntu-make error
* try fix clip
* try fix 1
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com>
Co-authored-by: harvestingmoon <leewenyeong@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-09 13:33:53 +03:00
Georgi Gerganov
4305b57c80
sync : ggml
2024-08-09 10:03:48 +03:00
Matt Stephenson
70c0ea3560
whisper : use vulkan as gpu backend when available (whisper/2302)
...
* ggml: use vulkan as gpu backend when available
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
* whisper: enable using vk as default buffer type
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
---------
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
2024-08-09 10:03:44 +03:00
Daniel Bevenius
5b2c04f492
embedding : add --pooling option to README.md [no ci] ( #8934 )
...
This commit adds the `--pooling` option to the README.md file in the
`examples/embedding` directory.
The motivation for adding this options is that currently if the model
used does not specify a pooling type the embedding example will fail
with the following error message:
```console
main: error: pooling type NONE not supported
```
This commit also updates the name of the executable in the examples
section.
2024-08-09 09:33:30 +03:00
Daniel Bevenius
6f6496bb09
llama : fix typo in llama_tensor_get_type comment [no ci] ( #8937 )
2024-08-09 09:32:23 +03:00
Mathieu Geli
daef3ab233
server : add one level list nesting for embeddings ( #8936 )
2024-08-09 09:32:02 +03:00
compilade
345a686d82
llama : reduce useless copies when saving session ( #8916 )
...
* llama : avoid useless copies in dummy session writer
* llama : avoid double tensor copy when saving session to buffer
2024-08-08 23:54:00 -04:00
compilade
3a14e00366
gguf-py : simplify support for quant types ( #8838 )
...
* gguf-py : use classes for quants
* convert_hf : simplify internal quantization type selection
* gguf-py : fix flake8 lint
* gguf-py : fix BF16 numpy view type
* gguf-py : remove LlamaFileTypeMap
Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.
* gguf-py : add generic quantize and dequantize functions
The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.
2024-08-08 13:33:09 -04:00
Georgi Gerganov
afd27f01fe
scripts : sync cann files ( #0 )
2024-08-08 14:56:52 +03:00
Georgi Gerganov
366d486c16
scripts : fix sync filenames ( #0 )
2024-08-08 14:40:12 +03:00
Georgi Gerganov
e44a561ab0
sync : ggml
2024-08-08 13:19:47 +03:00
Borislav Stanimirov
f93d49ab1e
ggml : ignore more msvc warnings (ggml/906)
2024-08-08 13:19:31 +03:00
Georgi Gerganov
5b33ea1ee7
metal : fix struct name (ggml/912)
...
ggml-ci
2024-08-08 13:19:31 +03:00
Conrad Kramer
85fca8deb6
metal : add abort callback (ggml/905)
2024-08-08 13:19:30 +03:00
Pablo Duboue
ebd541a570
make : clean llamafile objects ( #8923 )
...
`ggml/src/llamafile/sgemm.o` was not deleted on `make clean`
2024-08-08 11:44:51 +03:00
slaren
15fa07a5c5
make : use C compiler to build metal embed object ( #8899 )
...
* make : use C compiler to build metal embed object
* use rm + rmdir to avoid -r flag in rm
2024-08-07 18:24:05 +02:00
slaren
be55695eff
ggml-backend : fix async copy from CPU ( #8897 )
...
* ggml-backend : fix async copy from CPU
* cuda : more reliable async copy, fix stream used when the devices are the same
2024-08-07 13:29:02 +02:00
Ouadie EL FAROUKI
0478174d59
[SYCL] Updated SYCL device filtering ( #8901 )
...
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
2024-08-07 11:25:36 +01:00
Johannes Gäßler
a8dbc6f753
CUDA/HIP: fix tests/test-backend-ops ( #8896 )
2024-08-07 09:07:52 +02:00
Zhenwei Jin
506122d854
llama-bench : add support for getting cpu info on Windows ( #8824 )
...
* Add support for getting cpu info on Windows for llama_bench
* refactor
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-07 03:01:06 +02:00
Daniel Bevenius
725e3d9437
quantize : update usage comment in quantize.cpp ( #8889 )
...
This commit updates the usage comment in quantize.cpp to reflect the
new name of the executable, which is llama-quantize.
2024-08-07 01:43:00 +02:00
Nexes the Old
31958546c3
typo correction ( #8891 )
2024-08-07 01:41:54 +02:00
Xuan Son Nguyen
1e6f6554aa
server : add lora hotswap endpoint (WIP) ( #8857 )
...
* server : add lora hotswap endpoint
* handle lora_no_apply
* fix build
* updae docs
* clean up struct def
* fix build
* add LoRA test
* fix style
2024-08-06 17:33:39 +02:00
Johannes Gäßler
641f5dd2a6
CUDA: fix padding logic for FP16/FP32 ( #8884 )
2024-08-06 17:13:55 +02:00
Daniel Bevenius
5f4dcb1e60
simple : update name of executable to llama-simple ( #8885 )
...
This commit updates the name of the executable in README.md from
`simple` to `llama-simple`.
2024-08-06 16:44:35 +02:00