llama.cpp

Author	SHA1	Message	Date
xaedes	8c2d7e37f9	improve finetune time measurement fix printf warnings on system where int64_t is (long int). change time datatypes to double because values get big with long training times. exclude file saving from time measurement. converge faster to actual time per iteration by removing very small first duration before first iteration was performed. fix bug in output of total training time, the reported value was 1000 times to small.	2023-09-06 18:06:24 +02:00
xaedes	867e7c2255	Merge branch 'master' into finetune-lora	2023-09-05 14:48:46 +02:00
Georgi Gerganov	d375b8f3aa	ggml : fix L-BFGS linesearch loop	2023-09-05 12:05:13 +03:00
Georgi Gerganov	786e786061	build : fix compile warnings	2023-09-05 12:02:19 +03:00
Kawrakow	d59bd97065	Guard against all weights in a super-block being zero (#3010 ) * Guard against all weights in a super-block being zero * Also guard against extremely small weights Closes #2982 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-09-05 09:55:33 +02:00
Georgi Gerganov	35938ee3b0	llama : update logic for number of threads when using BLAS	2023-09-05 10:46:39 +03:00
Georgi Gerganov	921772104b	speculative : add grammar support (#2991 ) * speculative : add grammar support * grammars : add json_arr.gbnf * grammar : add comments to new grammar file * grammar : remove one nested level * common : warm-up with 2 tokens - seems to work better * speculative : print draft token pieces * speculative : reuse grammar parser + better logs and comments * speculative : avoid grammar_mem * make : fix speculative build	2023-09-05 08:46:17 +03:00
xaedes	d07b6aac77	fix tracking of train_samples and train_tokens	2023-09-05 02:18:17 +02:00
xaedes	c1c3b0e0c2	add gradient accumulation specify number accumulation steps with '--grad-acc N'. this will simulate a bigger batch size of grad_acc*batch.	2023-09-05 01:09:06 +02:00
Georgi Gerganov	2ba85c8609	py : minor	2023-09-04 22:50:50 +03:00
xaedes	d3afd7131e	Merge branch 'master' into finetune-lora # Conflicts: # Makefile	2023-09-04 21:44:05 +02:00
Georgi Gerganov	e36ecdccc8	build : on Mac OS enable Metal by default (#2901 ) * build : on Mac OS enable Metal by default * make : try to fix build on Linux * make : move targets back to the top * make : fix target clean * llama : enable GPU inference by default with Metal * llama : fix vocab_only logic when GPU is enabled * common : better `n_gpu_layers` assignment * readme : update Metal instructions * make : fix merge conflict remnants * gitignore : metal	2023-09-04 22:26:24 +03:00
slaren	bd33e5ab92	ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994 )	2023-09-04 14:59:52 +02:00
Cebtenzzre	3103568144	llama-bench : make cpp file non-executable (#2999 )	2023-09-04 13:40:18 +03:00
Leng Yue	5b8530d88c	make : add speculative example (#3003 )	2023-09-04 13:39:57 +03:00
Aarni Koskela	e4386f417f	server : add a subtle loading animation to the edit box (#2466 ) * editorconfig: add override for the server HTML (which already is 2-space indented) * server: add a subtle loading animation to the edit box	2023-09-04 16:28:55 +08:00
Jiahao Li	35195689cd	2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985 ) * 2x faster (rms) norm cuda kernels * Fix code style	2023-09-04 08:53:30 +02:00
xaedes	9ea2f7ff58	Merge branch 'master' into finetune-lora # Conflicts: # ggml-alloc.c	2023-09-04 02:40:44 +02:00
slaren	cf9b08485c	ggml-alloc : use virtual memory for measurement (#2973 ) * ggml-alloc : use virtual memory for measurement * compatibility fixes for MAP_ANONYMOUS * fallback to fixed address for systems without virtual memory	2023-09-03 20:34:09 +02:00
xaedes	50589ed6be	load default rms_norm and rope parameters from base model	2023-09-03 20:05:54 +02:00
xaedes	bdb7092e82	add missing gguf_free in load_checkpoint_lora_file	2023-09-03 20:04:03 +02:00
xaedes	e07f5c57bb	fix printf format warnings	2023-09-03 20:03:39 +02:00
xaedes	406e0750cc	update README.md	2023-09-03 19:25:18 +02:00
Georgi Gerganov	47068e5170	speculative : PoC for speeding-up inference via speculative sampling (#2926 ) * speculative : initial example * speculative : print encoding speed * speculative : add --draft CLI arg	2023-09-03 15:12:08 +03:00
Georgi Gerganov	8f429fa511	perplexity : fix ETA by warming up the model with an empty run	2023-09-03 13:43:17 +03:00
Kerfuffle	6519e9c99c	gguf(python): Fix special vocab handling when id < 0 (#2984 )	2023-09-03 04:38:43 -06:00
Georgi Gerganov	b7f2aa9e51	metal : restore `363f0bf` and fix reduce in F16_F32 kernels (#2986 )	2023-09-03 13:23:33 +03:00
Alon	73a12a6344	cov : disable comment in PRs (#2989 )	2023-09-03 13:19:01 +03:00
opparco	3730134776	llama : fix bpe tokenize from byte (#2889 )	2023-09-03 13:18:09 +03:00
Georgi Gerganov	d9151e6f57	metal : revert `6af0bab` until we fix it This restores the generated text to be the same as before #2959	2023-09-03 12:40:56 +03:00
Alon	afc43d5f82	cov : add Code Coverage and codecov.io integration (#2928 ) * update .gitignore * makefile: add coverage support (lcov, gcovr) * add code-coverage workflow * update code coverage workflow * wun on ubuntu 20.04 * use gcc-8 * check why the job hang * add env vars * add LLAMA_CODE_COVERAGE=1 again * - add CODECOV_TOKEN - add missing make lcov-report * install lcov * update make file -pb flag * remove unused GGML_NITER from workflows * wrap coverage output files in COV_TARGETS	2023-09-03 11:48:49 +03:00
Wentai Zhang	6460f758db	opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955 ) Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>	2023-09-03 11:46:44 +03:00
Kawrakow	ca82cf7bac	metal : more optimizations (#2959 ) * Very minor speedup via simd-group synchronization in f16 x f32 * Another very minor speedup on metal * Quite significant PP speedup on metal * Another attempt * Minor * Massive improvement for TG for fp16 * ~4-5% improvement for Q8_0 TG on metal --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-03 11:06:22 +03:00
kchro3	6a31a3bd98	swift : add support for k-quants (#2983 )	2023-09-03 09:21:05 +03:00
Kerfuffle	cff7b0bf07	convert.py : BPE fixes (#2938 ) * convert.py: BPE fixes? * Remove unnecessary conditional in addl token error handling	2023-09-03 08:52:13 +03:00
Ido S	340af42f09	docs : add `catai` to `README.md` (#2967 )	2023-09-03 08:50:51 +03:00
momonga	c42f0ec6b3	examples : fix gpt-neox (#2943 ) Co-authored-by: mmnga <mmnga1mmnga@gmail.com>	2023-09-03 08:36:28 +03:00
kchro3	2753415afd	swift : add missing c file to Package.swift (#2978 )	2023-09-03 08:27:25 +03:00
Cebtenzzre	bc054af97a	make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886 ) * make : remove unused -DGGML_BIG_ENDIAN * make : put preprocessor stuff in CPPFLAGS * make : pass Raspberry Pi arch flags to g++ as well * make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS * make : fix inverted conditional	2023-09-03 08:26:59 +03:00
xaedes	80ac697df9	move measurement memory segment to upper region of the address space	2023-09-02 21:44:20 +02:00
xaedes	2d2bdc0df7	remove unnecessary "0x" before "%p" output	2023-09-02 21:28:08 +02:00
xaedes	1ce7023eed	revert last commit "bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue" "alloc was freeing an externally allocated tensor, because it calculated the end of allocator memory as alloc->data + alloc->max_size instead of alloc->data + alloc->size." This is intentional to reduce the risk of freeing external tensors when measuring. Unless max_size is not properly calculated, I don't see why this is an issue.	2023-09-02 21:27:12 +02:00
xaedes	8d982c8fd9	bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue	2023-09-02 20:53:14 +02:00
xaedes	ded6382961	add some more allocator debug prints	2023-09-02 20:52:25 +02:00
Kerfuffle	3358c381f6	logging: Fix creating empty file even when disabled (#2966 ) * logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>	2023-09-02 11:53:55 -06:00
xaedes	cfe217f1ca	fix README.md	2023-09-02 16:11:31 +02:00
xaedes	6ee12b158b	increase measured alloc size by tensor_alignment ggml_allocr_reset will reduce the given size by up to tensor_alignment-1	2023-09-02 15:59:14 +02:00
bandoti	52315a4216	readme : update clblast instructions (#2903 ) * Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace	2023-09-02 15:53:18 +03:00
Karsten Weiss	8b56b4f2c3	metal : show all Metal device instances in the system (#2952 ) * ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-02 15:29:09 +03:00
Jhen-Jie Hong	21f3d1be86	k-quants : fix build on armv7 (android only) (#2920 ) * k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm	2023-09-02 15:23:45 +03:00

1 2 3 4 5 ...

1352 commits