Commit graph

1464 commits

Author SHA1 Message Date
xaedes
9139fec7ff
fix code formating of long function declarations 2023-09-16 20:38:23 +02:00
xaedes
8d82d4c8e6
remove static from process_escape since we need it exposed in header 2023-09-16 20:37:56 +02:00
xaedes
7930caf24c
fix usage of llama_tokenize 2023-09-16 20:36:43 +02:00
xaedes
d3e06d3e73
Merge branch 'master' into finetune-lora
# Conflicts:
#	Makefile
#	examples/baby-llama/baby-llama.cpp
#	examples/train-text-from-scratch/train-text-from-scratch.cpp
#	llama.cpp
2023-09-16 20:31:58 +02:00
xaedes
571dc94da9
increase train_samples by used_samples instead of number of batches
on batch can contain more than one sample when option "fill_with_next_samples" is used
2023-09-16 20:23:05 +02:00
xaedes
48d3509190
save and load head_count_kv in lora checkpoints 2023-09-16 20:20:23 +02:00
IsaacDynamo
b541b4f0b1
Enable BUILD_SHARED_LIBS=ON on all Windows builds (#3215) 2023-09-16 19:35:25 +02:00
xaedes
7aa9ea7f20
fix consume_common_train_arg 2023-09-16 19:08:51 +02:00
xaedes
bef1e97875
move common opt_callback into common/train 2023-09-16 18:54:57 +02:00
xaedes
e9758ae1d2
move common train params into common/train 2023-09-16 18:45:59 +02:00
xaedes
ee27333b16
move train data saving code into callback to unify code of opt_callback
train_params are still different in finetune and train-text-from-scratch, so it can't yet be moved to train.h|cpp
2023-09-16 17:50:16 +02:00
xaedes
a8c8907c62
move train state into struct train_state 2023-09-16 17:30:38 +02:00
Vlad
5dbc2b3213
Enable build with CUDA 11.0 (make) (#3132)
* CUDA 11.0 fixes

* Cleaner CUDA/host flags separation

Also renamed GGML_ASSUME into GGML_CUDA_ASSUME
2023-09-16 16:55:43 +02:00
xaedes
9f4b1bf88d
move common train functions into common/train.[h|cpp] 2023-09-16 16:17:13 +02:00
xaedes
00b656f6db
remove lbfgs related train parameters 2023-09-16 15:59:46 +02:00
goerch
b08e75baea
Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (#3170)
* Fix für #2721

* Reenable tokenizer test for LLaMa

* Add `console.cpp` dependency

* Fix dependency to `common`

* Fixing wrong fix.

* Make console usage platform specific

Work on compiler warnings.

* Adapting makefile

* Remove trailing whitespace

* Adapting the other parts of the makefile

* Fix typo.

* Fixing the last deviations from sentencepiece indicated by test-tokenizer-1

* Simplify logic

* Add missing change...

* Fix ugly compiler warning

* llama_tokenize should accept strings containing NUL now

* Adding huichen's test case
2023-09-16 13:41:33 +02:00
xaedes
ab56b63b27
update train-text-from-scratch with tokenization, sample selection and shuffling from finetune 2023-09-15 23:45:54 +02:00
xaedes
cc60b3f639
remove outcommented old code 2023-09-15 23:45:05 +02:00
xaedes
4f2ce91b9e
add static keywords 2023-09-15 23:44:53 +02:00
Cebtenzzre
e6616cf0db
examples : add compiler version and target to build info (#2998) 2023-09-15 16:59:49 -04:00
Cebtenzzre
3aefaab9e5
check C++ code with -Wmissing-declarations (#3184) 2023-09-15 15:38:27 -04:00
Cebtenzzre
69eb67e282
fix build numbers by setting fetch-depth=0 (#3197) 2023-09-15 15:18:15 -04:00
Meng Zhang
4fe09dfe66
llama : add support for StarCoder model architectures (#3187)
* add placeholder of starcoder in gguf / llama.cpp

* support convert starcoder weights to gguf

* convert MQA to MHA

* fix ffn_down name

* add LLM_ARCH_STARCODER to llama.cpp

* set head_count_kv = 1

* load starcoder weight

* add max_position_embeddings

* set n_positions to max_positioin_embeddings

* properly load all starcoder params

* fix head count kv

* fix comments

* fix vram calculation for starcoder

* store mqa directly

* add input embeddings handling

* add TBD

* working in cpu, metal buggy

* cleanup useless code

* metal : fix out-of-bounds access in soft_max kernels

* llama : make starcoder graph build more consistent with others

* refactor: cleanup comments a bit

* add other starcoder models: 3B, 7B, 15B

* support-mqa-directly

* fix: remove max_position_embeddings, use n_train_ctx

* Update llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix: switch to space from tab

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-15 22:02:13 +03:00
Cebtenzzre
80291a1d02
common : do not use GNU zero-length __VA_ARGS__ extension (#3195) 2023-09-15 21:02:01 +03:00
Georgi Gerganov
c6f1491da0
metal : fix bug in soft_max kernels (out-of-bounds access) (#3194) 2023-09-15 20:17:24 +03:00
Cebtenzzre
e3d87a6c36
convert : make ftype optional in simple scripts (#3185) 2023-09-15 12:29:02 -04:00
Georgi Gerganov
8c00b7a6ff
sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192)
* sync : ggml (Metal F32 support + reduce ggml-alloc size)

ggml-ci

* llama-bench : fix ggml_cpu_has_metal() duplicate function

ggml-ci
2023-09-15 19:06:03 +03:00
Engininja2
7e50d34be6
cmake : fix building shared libs for clang (rocm) on windows (#3176) 2023-09-15 15:24:30 +03:00
Evgeny Kurnevsky
235f7c193b
flake : use pkg-config instead of pkgconfig (#3188)
pkgconfig is an alias, it got removed from nixpkgs:
295a5e1e2b/pkgs/top-level/aliases.nix (L1408)
2023-09-15 11:10:22 +03:00
Georgi Gerganov
a51b687657
metal : relax conditions on fast matrix multiplication kernel (#3168)
* metal : relax conditions on fast matrix multiplication kernel

* metal : revert the concurrnecy change because it was wrong

* llama : remove experimental stuff
2023-09-15 11:09:24 +03:00
Andrei
76164fe2e6
cmake : fix llama.h location when built outside of root directory (#3179) 2023-09-15 11:07:40 +03:00
Ali Tariq
c2ab6fe661
ci : Cloud-V for RISC-V builds (#3160)
* Added Cloud-V File

* Replaced Makefile with original one

---------

Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
2023-09-15 11:06:56 +03:00
Roland
2d770505a8
llama : remove mtest (#3177)
* Remove mtest

* remove from common/common.h and examples/main/main.cpp
2023-09-15 10:28:45 +03:00
Cebtenzzre
98311c4277
llama : make quantize example up to 2.7x faster (#3115) 2023-09-14 21:09:53 -04:00
xaedes
76804fab1d
exclude some more known zero values from computations in flash_attn_f32 & flash_attn_back_f32 2023-09-14 22:19:39 +02:00
jneem
feea179e9f
flake : allow $out/include to already exist (#3175) 2023-09-14 21:54:47 +03:00
xaedes
d88dae2980
block tiling for out-prod inspired by mul-mat
block sizes are empirically optimized

roughly doubles the flops of out-prod
2023-09-14 19:50:02 +02:00
Andrei
769266a543
cmake : compile ggml-rocm with -fpic when building shared library (#3158) 2023-09-14 20:38:16 +03:00
Asbjørn Olling
cf8238e7f4
flake : include llama.h in nix output (#3159) 2023-09-14 20:25:00 +03:00
Cebtenzzre
4b8560e72a
make : fix clang++ detection, move some definitions to CPPFLAGS (#3155)
* make : fix clang++ detection

* make : fix compiler definitions outside of CPPFLAGS
2023-09-14 20:22:47 +03:00
Alon
83a53b753a
CI: add FreeBSD & simplify CUDA windows (#3053)
* add freebsd to ci

* bump actions/checkout to v3
* bump cuda 12.1.0 -> 12.2.0
* bump Jimver/cuda-toolkit version

* unify and simplify "Copy and pack Cuda runtime"
* install only necessary cuda sub packages
2023-09-14 19:21:25 +02:00
akawrykow
5c872dbca2
falcon : use stated vocab size (#2914) 2023-09-14 20:19:42 +03:00
bandoti
990a5e226a
cmake : add relocatable Llama package (#2960)
* Keep static libs and headers with install

* Add logic to generate Config package

* Use proper build info

* Add llama as import library

* Prefix target with package name

* Add example project using CMake package

* Update README

* Update README

* Remove trailing whitespace
2023-09-14 20:04:40 +03:00
dylan
980ab41afb
docker : add gpu image CI builds (#3103)
Enables the GPU enabled container images to be built and pushed
alongside the CPU containers.

Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com>
2023-09-14 19:47:00 +03:00
Kerfuffle
e394084166
gguf-py : support identity operation in TensorNameMap (#3095)
Make try_suffixes keyword param optional.
2023-09-14 19:32:26 +03:00
jameswu2014
4c8643dd6e
feature : support Baichuan serial models (#3009) 2023-09-14 12:32:10 -04:00
xaedes
0971fee710
reshuffle original sample order instead of the previous shuffled order
otherwise resumed reshuffle will not result in same sample order
2023-09-14 18:21:23 +02:00
Leng Yue
35f73049af
speculative : add heuristic algorithm (#3006)
* Add heuristic algo for speculative

* Constrain minimum n_draft to 2

* speculative : improve heuristic impl

* speculative : be more rewarding upon guessing max drafted tokens

* speculative : fix typos

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-14 19:14:44 +03:00
xaedes
3a9c1d7f5a
set lora_alpha to value of lora_r if it is not set via command line
otherwise only changing lora_r will change scaling of lora adapter used in prediction
2023-09-14 17:58:31 +02:00
xaedes
20cf1a4589
use unrolled vec_mad in out_prod
y is vec_mad result vec.
x is vec_mad input vec.
v is vec_mad input scalar.

ggml_vec_mad_f32_unroll will internally loop over x and v with same y.

GGML_VEC_MAD_UNROLL is by default defined to 32.

This value is empirical optimized using performance test runs of out-prod in openllama-3b finetune with 256 context length and batch size 1. It gives 23% performance boost for out_prod.

Full measurements of out-prod runtime in ms:
	unroll_xv	unroll_yv
1	67014.643	87826.469
2	77117.552	89077.656
4	72091.311	109121.657
8	61077.543	88678.334
16	56914.67	79514.947
24	59024.595	84350.254
28	55952.446	83368.73
32	51476.658	85177.745
36	55973.792	84659.92
40	55139.616	93844.738
48	60736.392	93330.267
64	99856.878	116994.99

Second column is when unrollying yv instead of xv
2023-09-14 17:20:29 +02:00