Galunid
b7148838f5
Rename variable
2023-11-07 23:16:43 +01:00
Galunid
88b0d9effc
Review fixes
2023-11-07 23:14:58 +01:00
Galunid
73780f5939
Change ftype from int value to str value
2023-11-07 10:34:25 +01:00
Galunid
7a3433b4b6
store_true defaults to False, not None
2023-11-07 07:25:43 +01:00
Jared Van Bortel
05fb6f4e8c
sort imports
2023-11-06 15:57:54 -05:00
Galunid
648252ecda
Fix flake8 complaints
2023-11-06 06:28:21 +01:00
Jared Van Bortel
fefc3db527
address review comments
2023-11-05 16:24:48 -05:00
Galunid
781bc54986
Move everything to convert-hf-to-gguf.py
2023-11-05 08:42:11 +01:00
Galunid
f7de892ee5
Move util to gguf-py/gguf
2023-11-05 00:43:56 +01:00
Galunid
087f88cc15
Rename convert-generic -> convert-hf-to-gguf
2023-11-05 00:37:00 +01:00
Galunid
2120195bb1
Yarn rope for baichuan
2023-11-04 23:15:41 +01:00
Galunid
e64f4de189
Revert "Remove 'old' conversion scripts" - needed for testing
...
This reverts commit f4b9a7ea02
.
2023-11-04 23:10:39 +01:00
Galunid
fd30850576
Add big endian support
2023-11-04 23:01:38 +01:00
Galunid
03c9683eb7
Restore support for RWForCausalLM
2023-11-04 20:43:29 +01:00
cebtenzzre
007be85087
model.py : add missing future import
2023-11-02 12:08:44 -04:00
cebtenzzre
e9abcc9c7c
fix linter complaints
2023-11-02 00:06:32 -04:00
cebtenzzre
66ccd62102
sort imports
2023-11-01 23:26:28 -04:00
cebtenzzre
8f31dc54ec
fix mypy errors
2023-11-01 23:24:46 -04:00
Galunid
4fdd7cdf2b
Review fixes, persimmon fixes
2023-11-01 02:32:49 +01:00
Galunid
3ec89dcc69
Use 'IntEnum' instead of 'Enum'
2023-10-31 22:23:26 +01:00
Galunid
f4b9a7ea02
Remove 'old' conversion scripts
2023-10-31 16:27:06 +01:00
Galunid
235acc18cd
Small refactor
2023-10-31 16:23:53 +01:00
Galunid
c94df09732
Rework tokenizer handling
2023-10-31 16:11:08 +01:00
Galunid
b2ba44eab2
Flake8 fixes
2023-10-31 15:38:24 +01:00
Galunid
dc3115f2a3
Add another alias to n_layers
2023-10-31 04:20:51 +01:00
Galunid
0743f7a900
Fix variable
2023-10-31 03:52:52 +01:00
Galunid
b9c664ab2f
Woops
2023-10-31 03:42:55 +01:00
Galunid
6f6856c6ea
[Untested] Initial Persimmon support
2023-10-31 03:27:04 +01:00
Galunid
94ba1db24a
Add Starcoder and Refact
2023-10-31 03:12:25 +01:00
Galunid
0afa75a9a2
Add Falcon support
2023-10-31 02:57:37 +01:00
Galunid
3bb9844de9
Get rid of dumb print
2023-10-31 01:54:24 +01:00
Galunid
08918b700e
MPT conversion fix
2023-10-31 01:52:55 +01:00
Galunid
443f7d586e
Call add_tensor before write_* functions
2023-10-29 20:00:54 +01:00
Galunid
550b925af2
Missing variable
2023-10-29 02:06:41 +01:00
Galunid
989db34149
Missing variable
2023-10-29 02:05:28 +01:00
Galunid
8618b4e74c
Add [UNTESTED] Baichuan support
2023-10-29 01:38:35 +02:00
Galunid
0ff237105d
Make gguf_writer member of Model, rework tokenizer export
2023-10-29 00:33:05 +02:00
Galunid
22201248a0
Remove comments
2023-10-27 02:05:27 +02:00
Galunid
4823b9bdcb
Initial generic convert script
2023-10-26 15:43:19 +02:00
Georgi Gerganov
6961c4bd0b
batched-bench : print params at start
2023-10-25 10:26:27 +03:00
Georgi Gerganov
cc44877486
log : disable pid in log filenames
2023-10-25 10:09:16 +03:00
cebtenzzre
ad93962657
server : add parameter -tb N, --threads-batch N ( #3584 ) ( #3768 )
...
Co-authored-by: Michael Coppola <m18coppola@gmail.com>
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-24 23:10:43 +03:00
Georgi Gerganov
1717521cdb
server : do not block system prompt update ( #3767 )
...
* server : do not block system prompt update
* server : update state machine logic to process system prompts
* server : minor
2023-10-24 23:08:20 +03:00
Georgi Gerganov
b2f7e04bd3
sync : ggml (conv ops + cuda MSVC fixes) ( #3765 )
...
ggml-ci
2023-10-24 21:51:20 +03:00
John Smith
abd21fc99f
cmake : add missed dependencies ( #3763 )
2023-10-24 20:48:45 +03:00
Georgi Gerganov
2b4ea35e56
cuda : add batched cuBLAS GEMM for faster attention ( #3749 )
...
* cmake : add helper for faster CUDA builds
* batched : add NGL arg
* ggml : skip nops in compute_forward
* cuda : minor indentation
* cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
* Apply suggestions from code review
These changes plus:
```c++
#define cublasGemmBatchedEx hipblasGemmBatchedEx
```
are needed to compile with ROCM. I haven't done performance testing, but it seems to work.
I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up.
* cuda : add ROCm / hipBLAS cublasGemmBatchedEx define
* cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases
* cuda : reduce mallocs in cublasGemmBatchedEx branch
* cuda : add TODO for calling cublas from kernel + using mem pool
---------
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-10-24 16:48:37 +03:00
Galunid
daab3d7f45
Add more tokenizer tests ( #3742 )
...
* Add more tokenizer tests
* Add starcoder
* Update test vocab files
* Restrict bpe tokenizer tests to unicode planes
* Update comment
* Comment cosmetics
* Remove bloom vocab/test
2023-10-24 09:17:17 +02:00
Georgi Gerganov
469c9addef
metal : handle ggml_scale for n%4 != 0 ( close #3754 )
...
ggml-ci
2023-10-24 09:47:22 +03:00
Georgi Gerganov
e3932593d4
Revert "make : add optional CUDA_NATIVE_ARCH ( #2482 )"
...
This reverts commit 96981f37b1
.
See:
https://github.com/ggerganov/llama.cpp/pull/2482#issuecomment-1775975866
2023-10-23 23:46:05 +03:00
M. Yusuf Sarıgöz
9d02956443
issues : separate bug and enhancement template + no default title ( #3748 )
2023-10-23 22:57:16 +03:00