llama.cpp

Author	SHA1	Message	Date
Kerfuffle	6519e9c99c	gguf(python): Fix special vocab handling when id < 0 (#2984 )	2023-09-03 04:38:43 -06:00
Georgi Gerganov	b7f2aa9e51	metal : restore `363f0bf` and fix reduce in F16_F32 kernels (#2986 )	2023-09-03 13:23:33 +03:00
Alon	73a12a6344	cov : disable comment in PRs (#2989 )	2023-09-03 13:19:01 +03:00
opparco	3730134776	llama : fix bpe tokenize from byte (#2889 )	2023-09-03 13:18:09 +03:00
Georgi Gerganov	d9151e6f57	metal : revert `6af0bab` until we fix it This restores the generated text to be the same as before #2959	2023-09-03 12:40:56 +03:00
Alon	afc43d5f82	cov : add Code Coverage and codecov.io integration (#2928 ) * update .gitignore * makefile: add coverage support (lcov, gcovr) * add code-coverage workflow * update code coverage workflow * wun on ubuntu 20.04 * use gcc-8 * check why the job hang * add env vars * add LLAMA_CODE_COVERAGE=1 again * - add CODECOV_TOKEN - add missing make lcov-report * install lcov * update make file -pb flag * remove unused GGML_NITER from workflows * wrap coverage output files in COV_TARGETS	2023-09-03 11:48:49 +03:00
Wentai Zhang	6460f758db	opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955 ) Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>	2023-09-03 11:46:44 +03:00
Kawrakow	ca82cf7bac	metal : more optimizations (#2959 ) * Very minor speedup via simd-group synchronization in f16 x f32 * Another very minor speedup on metal * Quite significant PP speedup on metal * Another attempt * Minor * Massive improvement for TG for fp16 * ~4-5% improvement for Q8_0 TG on metal --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-03 11:06:22 +03:00
kchro3	6a31a3bd98	swift : add support for k-quants (#2983 )	2023-09-03 09:21:05 +03:00
Kerfuffle	cff7b0bf07	convert.py : BPE fixes (#2938 ) * convert.py: BPE fixes? * Remove unnecessary conditional in addl token error handling	2023-09-03 08:52:13 +03:00
Ido S	340af42f09	docs : add `catai` to `README.md` (#2967 )	2023-09-03 08:50:51 +03:00
momonga	c42f0ec6b3	examples : fix gpt-neox (#2943 ) Co-authored-by: mmnga <mmnga1mmnga@gmail.com>	2023-09-03 08:36:28 +03:00
kchro3	2753415afd	swift : add missing c file to Package.swift (#2978 )	2023-09-03 08:27:25 +03:00
Cebtenzzre	bc054af97a	make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886 ) * make : remove unused -DGGML_BIG_ENDIAN * make : put preprocessor stuff in CPPFLAGS * make : pass Raspberry Pi arch flags to g++ as well * make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS * make : fix inverted conditional	2023-09-03 08:26:59 +03:00
xaedes	80ac697df9	move measurement memory segment to upper region of the address space	2023-09-02 21:44:20 +02:00
xaedes	2d2bdc0df7	remove unnecessary "0x" before "%p" output	2023-09-02 21:28:08 +02:00
xaedes	1ce7023eed	revert last commit "bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue" "alloc was freeing an externally allocated tensor, because it calculated the end of allocator memory as alloc->data + alloc->max_size instead of alloc->data + alloc->size." This is intentional to reduce the risk of freeing external tensors when measuring. Unless max_size is not properly calculated, I don't see why this is an issue.	2023-09-02 21:27:12 +02:00
xaedes	8d982c8fd9	bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue	2023-09-02 20:53:14 +02:00
xaedes	ded6382961	add some more allocator debug prints	2023-09-02 20:52:25 +02:00
Kerfuffle	3358c381f6	logging: Fix creating empty file even when disabled (#2966 ) * logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>	2023-09-02 11:53:55 -06:00
xaedes	cfe217f1ca	fix README.md	2023-09-02 16:11:31 +02:00
xaedes	6ee12b158b	increase measured alloc size by tensor_alignment ggml_allocr_reset will reduce the given size by up to tensor_alignment-1	2023-09-02 15:59:14 +02:00
bandoti	52315a4216	readme : update clblast instructions (#2903 ) * Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace	2023-09-02 15:53:18 +03:00
Karsten Weiss	8b56b4f2c3	metal : show all Metal device instances in the system (#2952 ) * ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-02 15:29:09 +03:00
Jhen-Jie Hong	21f3d1be86	k-quants : fix build on armv7 (android only) (#2920 ) * k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm	2023-09-02 15:23:45 +03:00
Jhen-Jie Hong	571083f508	server : avoid aniprompt in probabilities of final response (#2849 )	2023-09-02 08:31:46 +08:00
Engininja2	f04d002844	cuda : vsubss4 for older versions of ROCm/clang (#2942 )	2023-09-01 23:33:19 +02:00
xaedes	c32ad44f84	print time per iteration and estimate remaining time	2023-09-01 17:03:36 +02:00
xaedes	6809eb7de9	Merge branch 'master' into finetune-lora # Conflicts: # Makefile	2023-09-01 16:07:05 +02:00
ZHAOKAI WANG	69fdbb9abc	readme : quick start command fix (#2908 ) * quick start command fix * quick start win command fix	2023-09-01 17:06:44 +03:00
xaedes	7acb1241c6	update README.md	2023-09-01 16:04:08 +02:00
Kerfuffle	5d6f19f16b	Allow quantize to only copy tensors, some other improvements (#2931 ) * Allow quantize tool to only copy tensors to allow repackaging models. * Slightly better logic when requantizing. * Change help message to go to `stdout`.	2023-09-01 08:02:48 -06:00
xaedes	6cbf55a64b	add finetune to Makefile	2023-09-01 16:02:45 +02:00
Georgi Gerganov	0d58936686	llama2c : rename function	2023-09-01 17:01:11 +03:00
xaedes	5bba329e58	finetune: automatically allocate all memory and changes to command line options remove '--n_examples N' parameter, as it no longer makes sense to call optimization process multiple times in a loop. add '--only_write_lora' command line option: will skip tokenization and training, to only write a llama.cpp comptabile LORA adapter. remove memory buffer related command line options. improve iteration console output.	2023-09-01 15:58:52 +02:00
Cebtenzzre	6c9c23429b	make : use unaligned vector moves on MinGW (#2945 ) Fixes #2922	2023-09-01 16:53:14 +03:00
m3ndax	ee8654bcd0	minor : add const qualifiers (#2853 ) * made the methods const # Conflicts: # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp * made method const * Update convert-llama2c-to-ggml.cpp removed write_raw and write_u32 * llama2c : remove misleading const --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-01 16:47:27 +03:00
xaedes	7e01d11a28	add ggml-alloc API function 'ggml_allocr_max_size' to get max size of alloc GGML_API size_t ggml_allocr_max_size(struct ggml_allocr * alloc);	2023-09-01 15:42:40 +02:00
xaedes	d554a70f11	initialize opt ggml context if none was provided	2023-09-01 15:41:57 +02:00
Konstantin Herud	49bb9cbe0f	docs : add java-llama.cpp to README.md (#2935 )	2023-09-01 16:36:14 +03:00
Cebtenzzre	ef15649972	build : fix most gcc and clang warnings (#2861 ) * fix most gcc and clang warnings * baby-llama : remove commented opt_params_adam * fix some MinGW warnings * fix more MinGW warnings	2023-09-01 16:34:50 +03:00
Ben Siraphob	d8d6977f48	examples : add C grammar (#2357 )	2023-09-01 16:32:14 +03:00
Tameem	5aec2cfaac	ggml : add RISC-V vector intrinsics support (#2929 ) * added support for RISCV CFLAGS & native compile + cross compile options * Add RISC-V Vector Intrinsics Support Added RVV intrinsics for following ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 ggml_vec_dot_q8_0_q8_0 Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai> Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai> Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>	2023-09-01 16:27:40 +03:00
Georgi Gerganov	13268c5331	metal : slight speed-up for add and mul kernels (#2917 )	2023-09-01 13:42:41 +03:00
staviq	4dcd47d71d	logs : fix mingw-like builds (fixes #2898 ) (#2911 ) * fix mingw-like builds * formatting * make LOG_COMPAT easier to override and extend * simplify win detection * fix for #2940	2023-09-01 12:07:06 +03:00
Cebtenzzre	18705a30ef	llama2c : fix segfault and alloc-dealloc-mismatch (#2913 ) * llama2c : fix segfault if vocab is not found * llama2c : fix mismatch between new[] and delete * llama2c : fix basename on Windows * llama2c : use a destructor to prevent memory leaks	2023-09-01 12:03:49 +03:00
Kawrakow	e8d9158925	metal: somewhat faster f16 x f32 matrix multiply kernel (#2951 ) * Somewhat faster f16 x f32 matrix multiply kernel * Better use 32 thread groups for f16 x f32 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-09-01 11:15:57 +03:00
Cebtenzzre	bce1fef328	convert : fix another python 3.8 issue (#2949 )	2023-08-31 22:13:51 -04:00
slaren	528134dd02	remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906 )	2023-09-01 01:32:09 +02:00
Kerfuffle	aeefac4ff7	scripts: Use local gguf package when running from repo (#2927 ) * scripts: Use local gguf when running from repo	2023-08-31 16:49:24 -06:00

... 2 3 4 5 6 ...

1477 commits