Commit graph

1534 commits

Author SHA1 Message Date
KerfuffleV2
4814b4bbcd Promote add_X_token to GGUF metadata for BOS and EOS 2023-11-10 14:12:55 -07:00
Jared Van Bortel
f22b2f2045 cleanup 2023-11-10 14:46:57 -05:00
KerfuffleV2
9ce51b69b0 gguf-py: SpecialVocab: Always try available sources for special token ids
gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json

gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u
2023-11-10 05:50:45 -07:00
KerfuffleV2
960f912a14 convert.py: We can't currently support Q8_0 on big endian. 2023-11-10 05:50:15 -07:00
KerfuffleV2
0b0e726b2d And include scripts/__init__.py, derp 2023-11-10 00:55:15 -07:00
KerfuffleV2
eff662d66e Set up gguf- scripts in pyproject.toml 2023-11-10 00:53:23 -07:00
Jared Van Bortel
a21e9e7126 fix python 3.8 compat 2023-11-09 21:23:42 -05:00
Jared Van Bortel
795dc0f048 constants : remove unneeded type annotations 2023-11-09 21:03:05 -05:00
Jared Van Bortel
5608cd8d89 cleanup 2023-11-09 20:59:59 -05:00
Kerfuffle
7d3580d5b1
Murder accidental tuple in gguf-py/scripts/gguf-dump.py
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 17:50:11 -07:00
KerfuffleV2
382f9751fd A few for gguf-dump.py cleanups 2023-11-09 17:08:44 -07:00
KerfuffleV2
bd241db879 Add JSON dumping support to gguf-dump.py
Which I kind of regret now
2023-11-09 16:56:27 -07:00
KerfuffleV2
a04f0487b0 Make GGUFReader endian detection less arbitrary 2023-11-09 16:55:58 -07:00
KerfuffleV2
52bdc7e946 Reorganize scripts 2023-11-09 14:52:44 -07:00
Jared Van Bortel
5738b2f3b6 gguf-py : bump minor version 2023-11-09 12:28:28 -05:00
Jared Van Bortel
233cb0741f cleanup 2023-11-09 12:11:41 -05:00
KerfuffleV2
bca0962575 Add convert-gguf-endian.py script 2023-11-09 08:35:35 -07:00
KerfuffleV2
cc58ad00b0 Merge branch 'master' into feat-gguf-py-read-refactor 2023-11-09 05:25:24 -07:00
Galunid
a75fa576ab
scripts: Generalize convert scripts (#3838)
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-09 11:09:29 +01:00
KerfuffleV2
0d0306e7df Include a gguf Python package version bump 2023-11-09 02:56:20 -07:00
KerfuffleV2
8e250fe527 Add more information to GGUFReader and examples comments 2023-11-09 02:52:42 -07:00
KerfuffleV2
2360aaadb4 Make examples executable, formatting changes 2023-11-09 00:25:20 -07:00
Kerfuffle
855486c912
Update gguf-py/gguf/gguf_reader.py type hint
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:22:00 -07:00
Kerfuffle
2af29ffeaa
Update gguf-py/examples/modify_gguf.py formatting
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:21:36 -07:00
Kerfuffle
4a5cd6924f
Clean up gguf-py/examples/modify_gguf.py whitespace
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-09 00:21:15 -07:00
Mihai
57ad015dc3
server : add min_p param (#3877)
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
2023-11-08 20:00:34 -06:00
KerfuffleV2
b56ed66195 Damagage is not a word. 2023-11-08 09:11:22 -07:00
KerfuffleV2
fffdac32b5 Fix an issue with state init in GGUFReader
Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly
2023-11-08 09:03:47 -07:00
slaren
875fb42871
ggml-alloc : fix backend assignments of views (#3982) 2023-11-08 13:15:14 +01:00
Jared Van Bortel
f2292fcc19 fix NamedTuple and Enum usage 2023-11-07 21:12:26 -05:00
Jared Van Bortel
f364636b2e style cleanup with flake8 2023-11-07 21:06:41 -05:00
KerfuffleV2
ce865b3ce3 Fix missing return statement in add_tensor 2023-11-07 18:43:23 -07:00
Jared Van Bortel
a6f5742a53 sort imports with isort (again) 2023-11-07 20:28:53 -05:00
KerfuffleV2
d7688dc937 Various type annotation fixes. 2023-11-07 17:30:11 -07:00
KerfuffleV2
8047aa192f Replay changes from #3871
Credit to @cebtenzzre for that pull
2023-11-07 15:01:36 -07:00
KerfuffleV2
b8c80df741 gguf-py: Refactor and add file reading support 2023-11-07 14:41:58 -07:00
Jared Van Bortel
0a7c980b6f
gguf : track writer state, free unneeded tensors, cleanup (#3871) 2023-11-07 12:43:04 -05:00
Georgi Gerganov
413503d4b9
make : do not add linker flags when compiling static llava lib (#3977) 2023-11-07 20:25:32 +03:00
xaedes
e9c1cecb9d
ggml : fix backward rope after YaRN (#3974)
* fix backward process of rope

rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
2023-11-07 10:04:51 +02:00
Matthew Tejo
54b4df8886
Use params when loading models in llava-cli (#3976)
llava-cli was loading models with default params and ignoring settings
from the cli. This switches to a generic function to load the params
from the cli options.
2023-11-07 10:43:59 +03:00
Meng Zhang
46876d2a2c
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

* doc: add comments to ggml_cublas_loaded()

* fix defined(...)
2023-11-07 08:49:08 +02:00
Damian Stewart
381efbf480
llava : expose as a shared library for downstream projects (#3613)
* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-11-07 00:36:23 +03:00
slaren
2833a6f63c
ggml-cuda : fix f16 mul mat (#3961)
* ggml-cuda : fix f16 mul mat

ggml-ci

* silence common.cpp warning (bonus)
2023-11-05 18:45:16 +01:00
Kerfuffle
d9ccce2e33
Allow common process_escapes to handle \x sequences (#3928)
* Allow common process_escapes to handle \x sequences

* Fix edge case when second hex digit is NUL
2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm
bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a (#3958) 2023-11-05 18:15:27 +02:00
Jared Van Bortel
132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05 10:08:57 -05:00
Meng Zhang
3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05 14:40:08 +02:00
Eve
c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
Eve
a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 (#3949) 2023-11-05 09:46:44 +02:00
slaren
48ade94538
cuda : revert CUDA pool stuff (#3944)
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de.

ggml-ci
2023-11-05 09:12:13 +02:00