Commit graph

2203 commits

Author SHA1 Message Date
root
a3cf7bf60f removed redundant else from cli argument processing of --numa 2024-02-15 22:21:09 +00:00
bmwl
a5c9a5d52e
Merge branch 'ggerganov:master' into master 2024-02-15 09:40:16 -08:00
bmwl
7d1f026a58
Update ggml.c
simplified return for platforms without NUMA support

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-15 09:39:32 -08:00
Douglas Hanley
4524290e87
Use correct type of pooling for embedding models (#5500)
Use correct type of pooling for embedding models
2024-02-15 12:21:49 -05:00
Georgi Gerganov
c06e45d729
clip : fix wrong loop condition 2024-02-15 18:49:08 +02:00
slaren
9060a1e9df
cuda : print message when initialization fails (#5512)
* cuda : print message when initialization fails

* use CUDA_NAME both times
2024-02-15 16:49:01 +01:00
root
da652113f1 unified ggml_numa_strategy enum and fixed text alignment in server.cpp example 2024-02-15 15:35:32 +00:00
bmwl
5de34f568a
Merge branch 'ggerganov:master' into master 2024-02-15 07:18:20 -08:00
bmwl
377b58ffe0
Update common/common.cpp
Remove whitespace and align brace

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-15 07:15:05 -08:00
bmwl
c847828c89
Update examples/server/server.cpp
remove whitespace and align brace

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-15 07:13:41 -08:00
bmwl
1585fec58b
Update ggml.c
align paremeters

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-15 07:13:00 -08:00
bmwl
4ffe18ee5e
Update ggml.c
Remove whitespace

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-15 07:12:24 -08:00
bmwl
dc828c4556
Update ggml.h
Align enum values

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-15 07:11:28 -08:00
Georgi Gerganov
9350a1cf21
scripts : add hf.sh helper script (#5501)
* scripts : add hf.sh helper scripts

* hf : add error logs

* hf : add support for --repo and --file
2024-02-15 15:41:15 +02:00
Michaël de Vries
73122473ff
fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487)
* fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false

* fix(gguf-py): added missing cls and mask token ids to the gguf metadata
2024-02-15 14:14:37 +01:00
Elbios
0d4177126b
llava : fix memory management bug (#5491)
* Fix memory management in llava and server code

Fixes this error:

llama_new_context_with_model: graph splits (measure): 3
Available slots:
 -> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted

* Make it cleaner by checking size in batch free wrapper
2024-02-15 10:01:57 +02:00
John
7930a8a6e8
llaba : hotfix for llava-1.6 image number (#5495)
Co-authored-by: John <cmt-nct@users.noreply.github.com>
2024-02-15 09:59:18 +02:00
Neuman Vong
704359e299
vulkan: Find optimal memory type but with fallback (#5381)
* @0cc4m feedback

* More feedback @0cc4m
2024-02-15 07:11:15 +01:00
root
e237527feb Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype 2024-02-14 19:37:10 +00:00
root
7fb5427813 Fix up some boolean vs enum comparisons 2024-02-14 17:14:26 +00:00
bmwl
a47bb697cb
Merge branch 'ggerganov:master' into master 2024-02-14 08:53:54 -08:00
Rune
594fca3fef
readme : fix typo (#5490)
executabhle -> executable
2024-02-14 17:15:49 +02:00
John
ccbb277f46
llava : update README.md (#5489)
* Update README.md

* Update README.md

* Update examples/llava/README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-14 16:49:42 +02:00
bmwl
c590bceaef
Merge branch 'ggerganov:master' into master 2024-02-14 01:46:33 -08:00
root
0fb40ae755 split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples 2024-02-14 09:46:06 +00:00
Michael Podvitskiy
8084d55440
cmake : ARM intrinsics detection for MSVC (#5401) 2024-02-14 10:49:01 +02:00
John
aa23412989
llava : support v1.6 (#5267)
* Create llava-survery-v2.py

* Update convert-image-encoder-to-gguf.py

* Update convert-image-encoder-to-gguf.py

* Rename llava-survery-v2.py to llava-surgery-v2.py

* Update convert-image-encoder-to-gguf.py

will now search for projector

* Update convert-image-encoder-to-gguf.py

whoops

* Update llava-surgery-v2.py

* Clip: Bugfix for normalization (it did not loat the 3 std and mean values)
Clip: bicubic resize function
Clip: added save-to-bmp/pil for debugging and conversion from/to 32/8 images
Clip: added normalization with FP16 precision simulation (image tensors match HF implementation, can be switched off, only used for llava-1.6)
Clip: added newline tensor, mergetype kv, image-grid kv, new resize-pad function with resolution from gridpoints
Clip: clip_image_preprocess now returns a float * vector instead of float, this way llava 1.5 and 1.6 is supported
llava: added ggml cpu graph for embedding patching, added spatial_unpad preliminary support, added a lot of comments that need to be cleaned when all is final
convert-image-encoder: fixed image-grid flattening

* whitespace corrections

* ws

* Tensors are now properly permuted.
Before the embeddings were inserted 1:1, now they are split into the 24x24 patches as in reference.

* ws

* added verbose_prompt support into cli
added stopwords for llava-1.6 into cli

* moved llava functions to llava.cpp, made clip.h C compatible API, replaced vector style functions with pointers, added a debug define to remove functions from compilation while not needed

* ws

* convert : skip unknown tensors (need for LLaVA)

* llava : update readme

* llava : fix compile warnings

* llava : style

* convert : add --skip-unknown CLI arg

* server : remove clip structs

* bugfix for non llava-1.6

It should now work with llava-1.5 as well

* clip : minor code rearrange

* llava : update readme a bit

---------

Co-authored-by: John <cmt-nct@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-14 09:38:35 +02:00
bmwl
0e05042b45
Merge branch 'ggerganov:master' into master 2024-02-13 20:34:16 -08:00
AT
f5ca054855
Early return for zero size calls to get_tensor. (#5482)
* Early return for zero size calls to get_tensor.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Add an early return to the get/set tensor when the size is null.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Early return after the assertions.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Since we do the early return in the generic backend now no reason to do so here as well.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

---------

Signed-off-by: Adam Treat <treat.adam@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-13 22:44:25 +01:00
John
6c00a06692
gguf : add python reader example (#5216)
* Update CMakeLists.txt

* Create reader.py

* Update reader.py

* Update reader.py

another whitespace :|

* Update reader.py

* lintlintlint
2024-02-13 19:56:38 +02:00
Jared Van Bortel
ea9c8e1143
llama : add support for Nomic Embed (#5468) 2024-02-13 12:03:53 -05:00
Aarni Koskela
c4e6dd59e4
llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478)
* common : don't crash if newline token is not found

* common : llama_byte_to_token: allow falling back to finding just the token byte in SPM vocabs
2024-02-13 18:18:16 +02:00
Aarni Koskela
037259be68
llama : make load error reporting more granular (#5477)
Makes it easier to pinpoint where e.g. `unordered_map::at: key not found` comes from.
2024-02-13 15:24:50 +02:00
Daniel Bevenius
263978904c
finetune : rename feed-forward tensors (w1/w2/w3) (#4839)
* finetune: rename feed-forward tensors (w1/w2/w3)

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* train-text-from-scratch: rename ff tensors

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-13 15:15:42 +02:00
Georgi Gerganov
cf45252a7c
tests : multi-thread the tokenizer tests (#5474)
* tests : multi-thread the tokenizer tests

ggml-ci

* unicode : fix data race for unidentified codepoints

ggml-ci

* unicode : minor style fixes

ggml-ci
2024-02-13 15:14:22 +02:00
Douglas Hanley
03bf161eb6
llama : support batched embeddings (#5466)
* batched embedding: pool outputs by sequence id. updated embedding example

* bring back non-causal attention

* embd : minor improvements

* llama : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-13 14:06:58 +02:00
Johannes Gäßler
ad014bba97
make: add error message for bad CUDA version (#5444)
* make: add error message for bad CUDA version

* Update Makefile

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-13 12:38:37 +01:00
Georgi Gerganov
49cc1f7d67
bert : add tests + fix quantization (#5475)
* llama : do not quantize pos embd and token type tensors

* ci : add BERT tests

ggml-ci

* ci : do not do BERT tests on low-perf nodes

ggml-ci
2024-02-13 13:01:29 +02:00
Georgi Gerganov
99b8b43d7b
tests : disable moe test (#5473) 2024-02-13 11:20:24 +02:00
bmwl
e37b8f022d
Merge branch 'ggerganov:master' into master 2024-02-12 23:17:56 -08:00
Kawrakow
895407f31b
ggml-quants : fix compiler warnings (shadow variable) (#5472)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-13 09:07:57 +02:00
root
9d42825c3f Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges 2024-02-13 06:47:40 +00:00
root
5a942099e5 Merge branch 'master' of https://github.com/bmtwl/llama.cpp 2024-02-13 04:39:48 +00:00
bmwl
87f8d9e5e0
Merge branch 'ggerganov:master' into master 2024-02-12 20:39:41 -08:00
Georgi Gerganov
099afc6274
llama : fix quantization when tensors are missing (#5423) 2024-02-12 20:14:39 +02:00
Georgi Gerganov
df334a1125
swift : package no longer use ggml dependency (#5465)
* Revert "swift : update Package.swift to use ggml as dependency (#4691)"

This reverts commit ece9a45e8f.

* spm : add ggml headers
2024-02-12 19:54:29 +02:00
Lee
dbd8828eb0
py : fix persimmon n_rot conversion (#5460)
* convert : fix persimmon offical weight conversion to write correct n_rot.

* Update convert-persimmon-to-gguf.py

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-12 19:29:57 +02:00
Abhilash Majumder
43fe07c1a4
ggml-sycl: Replace 3d ops with macro (#5458)
* use macro

* use macro

* fix format
2024-02-12 20:22:05 +05:30
Daniel Bevenius
4a46d2b792
llava : remove prog parameter from ArgumentParser (#5457)
* llava: remove prog parameter from ArgumentParser

This commit removes the `prog` parameter from `ArgumentParser`
so that it uses the default value which is the name of the script.

The motivation for this change is that currently the usage output looks
like this:
```console
$ python examples/llava/convert-image-encoder-to-gguf.py --help
usage: convert_hf_to_gguf.py [-h] ...
```
And with this change it will look like this:
```console
$ python examples/llava/convert-image-encoder-to-gguf.py --help
usage: convert-image-encoder-to-gguf.py [-h] ...
```

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* ci: add W503 to flake8 ignore list

This commit adds W503 to the ignore list for flake8. This is done to
avoid the following error:
W503 line break before binary operator

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-12 10:38:44 +02:00
Georgi Gerganov
3b169441df
sync : ggml (#5452)
* ggml-alloc : v3 (ggml/727)

* ggml-alloc v3

ggml-ci

* fix ci

ggml-ci

* whisper : check for backend buffer allocation failures

* whisper : avoid leaks when initialization fails

* cleanup

ggml-ci

* style fixes

ggml-ci

* sync : ggml

* update llama.cpp, clip.cpp, export-lora.cpp

* update finetune.cpp, train-text-from-scratch.cpp

ggml-ci

* ggml-backend : reduce alignment to 32 to match gguf and fix mmap

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-12 09:16:06 +02:00