llama.cpp

Author	SHA1	Message	Date
Sergey Kucher	069b3d4c37	Adds --mlock argument	2023-05-02 16:19:37 +03:00
Concedo	5a10ea50da	up ver	2023-05-02 18:19:08 +08:00
Concedo	9a9b217e57	updated embedded kobold lite with multiuser chat	2023-05-02 18:18:05 +08:00
Concedo	6f702f2700	fixed stop sequence crash	2023-05-02 14:56:50 +08:00
Concedo	94827172e0	Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-cuda.h	2023-05-02 14:38:31 +08:00
Concedo	433fa1e8b2	fix for stop sequence missing, added print for exception when loading GUI	2023-05-02 14:18:04 +08:00
Concedo	0703cdf2eb	remove cloudflare insights	2023-05-02 00:38:10 +08:00
DannyDaemonic	f4cef87edf	Add git-based build information for better issue tracking (#1232 ) * Add git-based build information for better issue tracking * macOS fix * "build (hash)" and "CMAKE_SOURCE_DIR" changes * Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages * Fix conditional dependency on missing target * Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile * 4 space indenting for cmake, attempt to clean up my mess in Makefile * Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it	2023-05-01 18:23:47 +02:00
slaren	58b367c2d7	cuBLAS: refactor and optimize f16 mat mul performance (#1259 ) * cuBLAS: refactor, convert fp16 to fp32 on device * cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16 * fix build * cuBLAS: update block_q5_1	2023-05-01 18:11:07 +02:00
xloem	ea3a0ad6b6	llama : update stubs for systems without mmap and mlock (#1266 ) Co-authored-by: John Doe <john.doe@example.com>	2023-05-01 15:58:51 +03:00
Kerfuffle	2bdc09646d	ggml : fix ggml_used_mem() (#1264 )	2023-05-01 14:56:07 +03:00
Georgi Gerganov	70269cae37	llama : fix session load / save (#1263 )	2023-05-01 14:54:59 +03:00
slaren	b925f1f1b0	cuBLAS: fall back to pageable memory if pinned alloc fails (#1233 ) * cuBLAS: fall back to pageable memory if pinned alloc fails * cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set	2023-05-01 13:32:22 +02:00
Alex Klinkhamer	90b19bd6ee	llama : let context be const when accessing const data (#1261 )	2023-05-01 10:24:20 +03:00
Concedo	4d38795563	add UI for token unbanning	2023-05-01 12:10:21 +08:00
Concedo	3de34ee492	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # ggml-opencl.c	2023-05-01 12:03:46 +08:00
Concedo	560dacedbd	update readme	2023-05-01 11:41:25 +08:00
Georgi Gerganov	7ff0dcd320	ggml : fix UB (int << 31)	2023-04-30 22:28:51 +03:00
Pavol Rusnak	6f79699286	build: add armv{6,7,8} support to cmake (#1251 ) - flags copied from Makefile - updated comments in both CMakeLists.txt and Makefile to match reality	2023-04-30 20:48:38 +02:00
jon-chuang	a5d30b1f53	common : better default number of threads (#934 ) * commit * fix * try-catch * apply code review * improve * improve * add macos headers * done * remove color * fix windows * minor * fix * Apply suggestions from code review Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> * remove * minor * minor --------- Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com> Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>	2023-04-30 21:41:35 +03:00
0cc4m	76a884920a	ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225 ) * Implement q5_0, q5_1 and q8_0 * Work around q5_0 OpenCL issue * Fix q8_0 dequant kernel * Move cl kernels into ggml-opencl.c * Use two memcpy calls for q5_0 buffer transfer	2023-04-30 21:34:52 +03:00
Georgi Gerganov	6bc4400e67	ggml : add Q5 WASM SIMD + GGML_FTYPE	2023-04-30 19:07:43 +03:00
Concedo	25201233ca	fixed unbantokens not following EOS	2023-05-01 00:02:45 +08:00
Concedo	294a5d00b1	Merge remote-tracking branch 'occam/clblast-further-dequant-kernels' into concedo_experimental # Conflicts: # ggml-opencl.c	2023-04-30 23:56:24 +08:00
Concedo	3b5df18dbb	temp fix for compilation issues on OSX (M1)	2023-04-30 23:48:46 +08:00
Stephan Walter	f0d70f147d	Various fixes to mat_mul benchmark (#1253 )	2023-04-30 12:32:37 +00:00
0cc4m	e69c924ad1	Use two memcpy calls for q5_0 buffer transfer	2023-04-30 10:44:48 +02:00
Concedo	fdd21d0eba	add missing include	2023-04-30 16:15:11 +08:00
Georgi Gerganov	3e5aa8a1c4	ggml : fix labels for GGML_OP_ALIBI	2023-04-30 10:25:46 +03:00
Concedo	b3315459c7	pilled the new dequants for clblast, fixed some ooms	2023-04-30 14:15:44 +08:00
Concedo	0061b90ec6	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-04-30 10:35:02 +08:00
Georgi Gerganov	c3ca7a5f05	ggml : fix 32-bit ARM NEON	2023-04-29 21:34:23 +03:00
Georgi Gerganov	e8c051611a	ggml : use vzip instead of vuzp for consistency	2023-04-29 21:12:56 +03:00
Georgi Gerganov	0b5a935099	ggml : fix visibility and unused warnings	2023-04-29 19:28:36 +03:00
Georgi Gerganov	ec728e44d7	ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229 )	2023-04-29 18:43:42 +03:00
Georgi Gerganov	214b6a3570	ggml : adjust mul_mat_f16 work memory (#1226 ) * llama : minor - remove explicity int64_t cast * ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS * ggml : add asserts to guard for incorrect wsize	2023-04-29 18:43:28 +03:00
Concedo	f149114395	up ver	2023-04-29 19:42:21 +08:00
Concedo	7afad2b9b5	integrated the new samplers	2023-04-29 19:41:41 +08:00
Georgi Gerganov	305eb5afd5	build : fix reference to old llama_util.h	2023-04-29 13:53:12 +03:00
Georgi Gerganov	84ca9c2ecf	examples : fix save-load-state + rename llama-util.h	2023-04-29 13:48:11 +03:00
Concedo	da0c34b028	Merge branch 'master' into concedo_experimental	2023-04-29 18:27:06 +08:00
Concedo	fe0e4de8e8	fixed a regression where a bad model was giving valid logits after library changes. now we run the eval through the model twice and compare logits. if they give the same logits for different inputs, model is broken	2023-04-29 18:25:17 +08:00
0cc4m	369d903eda	Move cl kernels into ggml-opencl.c	2023-04-29 10:58:34 +02:00
0cc4m	d6be497ef6	Fix q8_0 dequant kernel	2023-04-29 10:58:34 +02:00
0cc4m	1560c10f24	Work around q5_0 OpenCL issue	2023-04-29 10:58:34 +02:00
0cc4m	9439da6f95	Implement q5_0, q5_1 and q8_0	2023-04-29 10:58:31 +02:00
Georgi Gerganov	334637e43e	common : change default parameters to pre-#1126 (#1223 )	2023-04-29 09:51:06 +03:00
Ivan Stepanov	dd7eff57d8	llama : new sampling algorithms (#1126 ) * Sample interface, new samplers. New samplers: - locally typical sampling - tail free sampling - frequency and presence penalty - mirostat Ignore EOS fix: -inf should be used. * mirostat * Added --logit-bias and --no-penalize-nl, removed std::span * Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and k) Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and k) * Save and load example adjust * Tests * Windows build fix * Windows test fix	2023-04-29 08:34:41 +03:00
Concedo	5aa185f3f7	remove preallocation	2023-04-29 12:32:37 +08:00
Concedo	bb282a4ecf	reinstated the q4_3 format, for backwards compatibility.	2023-04-29 11:42:04 +08:00

1 2 3 4 5 ...

735 commits