xaedes
de6170d818
fix gradient accumulation bug where the same batch was used for each microstep
2023-09-06 21:35:21 +02:00
xaedes
0393116628
Merge branch 'master' into finetune-lora
...
# Conflicts:
# common/common.cpp
2023-09-06 20:15:24 +02:00
xaedes
c08fcf5947
specify default lora rank with '--lora-r N'
...
'--lora-r N' will specify default rank for all tensors
'--rank-wq N', etc. will override this default rank for specific tensor types.
2023-09-06 20:11:49 +02:00
xaedes
8c2d7e37f9
improve finetune time measurement
...
fix printf warnings on system where int64_t is (long int).
change time datatypes to double because values get big with long training times.
exclude file saving from time measurement.
converge faster to actual time per iteration by removing very small first duration before first iteration was performed.
fix bug in output of total training time, the reported value was 1000 times to small.
2023-09-06 18:06:24 +02:00
Georgi Gerganov
178b1850eb
k-quants : fix zero-weight guard in Q6_K (ref #3040 )
2023-09-06 12:40:57 +03:00
Kerfuffle
ea2c85d5d2
convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 ( #3023 )
...
* convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3
* Better error messages for files that cannot be converted
* Add file type to GGUF output
* Rename to convert-llama-ggml-to-gguf.py
* Include original file type information in description
* Improve some informational output
2023-09-06 02:49:11 -06:00
Cebtenzzre
9912b9efc8
build : add LLAMA_METAL_NDEBUG flag ( #3033 )
2023-09-05 18:21:10 -04:00
Cebtenzzre
9e2023156e
make : use new flag variables for recent changes ( #3019 )
2023-09-05 15:12:00 -04:00
Cebtenzzre
de2fe892af
examples : replace fprintf to stdout with printf ( #3017 )
2023-09-05 15:10:27 -04:00
Erik Scholz
c9c3220c48
convert: fix convert.py not working with int filename_stem ( #3028 )
...
* fix implicit int to string conversion
* convert : remove an obsolete pyright comment
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-05 19:41:00 +02:00
xaedes
867e7c2255
Merge branch 'master' into finetune-lora
2023-09-05 14:48:46 +02:00
Georgi Gerganov
d375b8f3aa
ggml : fix L-BFGS linesearch loop
2023-09-05 12:05:13 +03:00
Georgi Gerganov
786e786061
build : fix compile warnings
2023-09-05 12:02:19 +03:00
Kawrakow
d59bd97065
Guard against all weights in a super-block being zero ( #3010 )
...
* Guard against all weights in a super-block being zero
* Also guard against extremely small weights
Closes #2982
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-05 09:55:33 +02:00
Georgi Gerganov
35938ee3b0
llama : update logic for number of threads when using BLAS
2023-09-05 10:46:39 +03:00
Georgi Gerganov
921772104b
speculative : add grammar support ( #2991 )
...
* speculative : add grammar support
* grammars : add json_arr.gbnf
* grammar : add comments to new grammar file
* grammar : remove one nested level
* common : warm-up with 2 tokens - seems to work better
* speculative : print draft token pieces
* speculative : reuse grammar parser + better logs and comments
* speculative : avoid grammar_mem
* make : fix speculative build
2023-09-05 08:46:17 +03:00
xaedes
d07b6aac77
fix tracking of train_samples and train_tokens
2023-09-05 02:18:17 +02:00
xaedes
c1c3b0e0c2
add gradient accumulation
...
specify number accumulation steps with '--grad-acc N'.
this will simulate a bigger batch size of grad_acc*batch.
2023-09-05 01:09:06 +02:00
Georgi Gerganov
2ba85c8609
py : minor
2023-09-04 22:50:50 +03:00
xaedes
d3afd7131e
Merge branch 'master' into finetune-lora
...
# Conflicts:
# Makefile
2023-09-04 21:44:05 +02:00
Georgi Gerganov
e36ecdccc8
build : on Mac OS enable Metal by default ( #2901 )
...
* build : on Mac OS enable Metal by default
* make : try to fix build on Linux
* make : move targets back to the top
* make : fix target clean
* llama : enable GPU inference by default with Metal
* llama : fix vocab_only logic when GPU is enabled
* common : better `n_gpu_layers` assignment
* readme : update Metal instructions
* make : fix merge conflict remnants
* gitignore : metal
2023-09-04 22:26:24 +03:00
slaren
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra ( #2994 )
2023-09-04 14:59:52 +02:00
Cebtenzzre
3103568144
llama-bench : make cpp file non-executable ( #2999 )
2023-09-04 13:40:18 +03:00
Leng Yue
5b8530d88c
make : add speculative example ( #3003 )
2023-09-04 13:39:57 +03:00
Aarni Koskela
e4386f417f
server : add a subtle loading animation to the edit box ( #2466 )
...
* editorconfig: add override for the server HTML (which already is 2-space indented)
* server: add a subtle loading animation to the edit box
2023-09-04 16:28:55 +08:00
Jiahao Li
35195689cd
2x faster (rms) norm cuda kernels (3.7% e2e improvement) ( #2985 )
...
* 2x faster (rms) norm cuda kernels
* Fix code style
2023-09-04 08:53:30 +02:00
xaedes
9ea2f7ff58
Merge branch 'master' into finetune-lora
...
# Conflicts:
# ggml-alloc.c
2023-09-04 02:40:44 +02:00
slaren
cf9b08485c
ggml-alloc : use virtual memory for measurement ( #2973 )
...
* ggml-alloc : use virtual memory for measurement
* compatibility fixes for MAP_ANONYMOUS
* fallback to fixed address for systems without virtual memory
2023-09-03 20:34:09 +02:00
xaedes
50589ed6be
load default rms_norm and rope parameters from base model
2023-09-03 20:05:54 +02:00
xaedes
bdb7092e82
add missing gguf_free in load_checkpoint_lora_file
2023-09-03 20:04:03 +02:00
xaedes
e07f5c57bb
fix printf format warnings
2023-09-03 20:03:39 +02:00
xaedes
406e0750cc
update README.md
2023-09-03 19:25:18 +02:00
Georgi Gerganov
47068e5170
speculative : PoC for speeding-up inference via speculative sampling ( #2926 )
...
* speculative : initial example
* speculative : print encoding speed
* speculative : add --draft CLI arg
2023-09-03 15:12:08 +03:00
Georgi Gerganov
8f429fa511
perplexity : fix ETA by warming up the model with an empty run
2023-09-03 13:43:17 +03:00
Kerfuffle
6519e9c99c
gguf(python): Fix special vocab handling when id < 0 ( #2984 )
2023-09-03 04:38:43 -06:00
Georgi Gerganov
b7f2aa9e51
metal : restore 363f0bf
and fix reduce in F16_F32 kernels ( #2986 )
2023-09-03 13:23:33 +03:00
Alon
73a12a6344
cov : disable comment in PRs ( #2989 )
2023-09-03 13:19:01 +03:00
opparco
3730134776
llama : fix bpe tokenize from byte ( #2889 )
2023-09-03 13:18:09 +03:00
Georgi Gerganov
d9151e6f57
metal : revert 6af0bab
until we fix it
...
This restores the generated text to be the same as before #2959
2023-09-03 12:40:56 +03:00
Alon
afc43d5f82
cov : add Code Coverage and codecov.io integration ( #2928 )
...
* update .gitignore
* makefile: add coverage support (lcov, gcovr)
* add code-coverage workflow
* update code coverage workflow
* wun on ubuntu 20.04
* use gcc-8
* check why the job hang
* add env vars
* add LLAMA_CODE_COVERAGE=1 again
* - add CODECOV_TOKEN
- add missing make lcov-report
* install lcov
* update make file -pb flag
* remove unused GGML_NITER from workflows
* wrap coverage output files in COV_TARGETS
2023-09-03 11:48:49 +03:00
Wentai Zhang
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() ( #2955 )
...
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03 11:46:44 +03:00
Kawrakow
ca82cf7bac
metal : more optimizations ( #2959 )
...
* Very minor speedup via simd-group synchronization in f16 x f32
* Another very minor speedup on metal
* Quite significant PP speedup on metal
* Another attempt
* Minor
* Massive improvement for TG for fp16
* ~4-5% improvement for Q8_0 TG on metal
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-03 11:06:22 +03:00
kchro3
6a31a3bd98
swift : add support for k-quants ( #2983 )
2023-09-03 09:21:05 +03:00
Kerfuffle
cff7b0bf07
convert.py : BPE fixes ( #2938 )
...
* convert.py: BPE fixes?
* Remove unnecessary conditional in addl token error handling
2023-09-03 08:52:13 +03:00
Ido S
340af42f09
docs : add catai
to README.md
( #2967 )
2023-09-03 08:50:51 +03:00
momonga
c42f0ec6b3
examples : fix gpt-neox ( #2943 )
...
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
2023-09-03 08:36:28 +03:00
kchro3
2753415afd
swift : add missing c file to Package.swift ( #2978 )
2023-09-03 08:27:25 +03:00
Cebtenzzre
bc054af97a
make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS ( #2886 )
...
* make : remove unused -DGGML_BIG_ENDIAN
* make : put preprocessor stuff in CPPFLAGS
* make : pass Raspberry Pi arch flags to g++ as well
* make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS
* make : fix inverted conditional
2023-09-03 08:26:59 +03:00
xaedes
80ac697df9
move measurement memory segment to upper region of the address space
2023-09-02 21:44:20 +02:00
xaedes
2d2bdc0df7
remove unnecessary "0x" before "%p" output
2023-09-02 21:28:08 +02:00