Commit graph

1521 commits

Author SHA1 Message Date
KerfuffleV2
52bdc7e946 Reorganize scripts 2023-11-09 14:52:44 -07:00
Jared Van Bortel
5738b2f3b6 gguf-py : bump minor version 2023-11-09 12:28:28 -05:00
Jared Van Bortel
233cb0741f cleanup 2023-11-09 12:11:41 -05:00
KerfuffleV2
bca0962575 Add convert-gguf-endian.py script 2023-11-09 08:35:35 -07:00
KerfuffleV2
cc58ad00b0 Merge branch 'master' into feat-gguf-py-read-refactor 2023-11-09 05:25:24 -07:00
Galunid
a75fa576ab
scripts: Generalize convert scripts (#3838)
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-09 11:09:29 +01:00
KerfuffleV2
0d0306e7df Include a gguf Python package version bump 2023-11-09 02:56:20 -07:00
KerfuffleV2
8e250fe527 Add more information to GGUFReader and examples comments 2023-11-09 02:52:42 -07:00
KerfuffleV2
2360aaadb4 Make examples executable, formatting changes 2023-11-09 00:25:20 -07:00
Kerfuffle
855486c912
Update gguf-py/gguf/gguf_reader.py type hint
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:22:00 -07:00
Kerfuffle
2af29ffeaa
Update gguf-py/examples/modify_gguf.py formatting
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:21:36 -07:00
Kerfuffle
4a5cd6924f
Clean up gguf-py/examples/modify_gguf.py whitespace
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:21:15 -07:00
Mihai
57ad015dc3
server : add min_p param (#3877)
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
2023-11-08 20:00:34 -06:00
KerfuffleV2
b56ed66195 Damagage is not a word. 2023-11-08 09:11:22 -07:00
KerfuffleV2
fffdac32b5 Fix an issue with state init in GGUFReader
Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly
2023-11-08 09:03:47 -07:00
slaren
875fb42871
ggml-alloc : fix backend assignments of views (#3982) 2023-11-08 13:15:14 +01:00
Jared Van Bortel
f2292fcc19 fix NamedTuple and Enum usage 2023-11-07 21:12:26 -05:00
Jared Van Bortel
f364636b2e style cleanup with flake8 2023-11-07 21:06:41 -05:00
KerfuffleV2
ce865b3ce3 Fix missing return statement in add_tensor 2023-11-07 18:43:23 -07:00
Jared Van Bortel
a6f5742a53 sort imports with isort (again) 2023-11-07 20:28:53 -05:00
KerfuffleV2
d7688dc937 Various type annotation fixes. 2023-11-07 17:30:11 -07:00
KerfuffleV2
8047aa192f Replay changes from #3871
Credit to @cebtenzzre for that pull
2023-11-07 15:01:36 -07:00
KerfuffleV2
b8c80df741 gguf-py: Refactor and add file reading support 2023-11-07 14:41:58 -07:00
Jared Van Bortel
0a7c980b6f
gguf : track writer state, free unneeded tensors, cleanup (#3871) 2023-11-07 12:43:04 -05:00
Georgi Gerganov
413503d4b9
make : do not add linker flags when compiling static llava lib (#3977) 2023-11-07 20:25:32 +03:00
xaedes
e9c1cecb9d
ggml : fix backward rope after YaRN (#3974)
* fix backward process of rope

rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
2023-11-07 10:04:51 +02:00
Matthew Tejo
54b4df8886
Use params when loading models in llava-cli (#3976)
llava-cli was loading models with default params and ignoring settings
from the cli. This switches to a generic function to load the params
from the cli options.
2023-11-07 10:43:59 +03:00
Meng Zhang
46876d2a2c
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

* doc: add comments to ggml_cublas_loaded()

* fix defined(...)
2023-11-07 08:49:08 +02:00
Damian Stewart
381efbf480
llava : expose as a shared library for downstream projects (#3613)
* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-11-07 00:36:23 +03:00
slaren
2833a6f63c
ggml-cuda : fix f16 mul mat (#3961)
* ggml-cuda : fix f16 mul mat

ggml-ci

* silence common.cpp warning (bonus)
2023-11-05 18:45:16 +01:00
Kerfuffle
d9ccce2e33
Allow common process_escapes to handle \x sequences (#3928)
* Allow common process_escapes to handle \x sequences

* Fix edge case when second hex digit is NUL
2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm
bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a (#3958) 2023-11-05 18:15:27 +02:00
Jared Van Bortel
132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05 10:08:57 -05:00
Meng Zhang
3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05 14:40:08 +02:00
Eve
c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
Eve
a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 (#3949) 2023-11-05 09:46:44 +02:00
slaren
48ade94538
cuda : revert CUDA pool stuff (#3944)
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de.

ggml-ci
2023-11-05 09:12:13 +02:00
Kerfuffle
f28af0d81a
gguf-py: Support 01.AI Yi models (#3943) 2023-11-04 16:20:34 -06:00
Peter Sugihara
d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) 2023-11-03 21:18:18 +02:00
Xiao-Yong Jin
5ba3746171
ggml-metal: fix yarn rope (#3937) 2023-11-03 14:00:31 -04:00
slaren
abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) 2023-11-03 12:13:09 +01:00
Georgi Gerganov
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov
05816027d6
common : YAYF (yet another YARN fix) (#3925)
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre
3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 (#3922) 2023-11-03 08:31:58 +02:00
Kerfuffle
629f917cd6
cuda : add ROCM aliases for CUDA pool stuff (#3918) 2023-11-02 21:58:22 +02:00
Andrei
51b2fc11f7
cmake : fix relative path to git submodule index (#3915) 2023-11-02 21:40:31 +02:00
Georgi Gerganov
224e7d5b14
readme : add notice about #3912 2023-11-02 20:44:12 +02:00
Georgi Gerganov
c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues (#3913) 2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko
d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov
4ff1046d75
gguf : print error for GGUFv1 files (#3908) 2023-11-02 16:22:30 +02:00