Commit graph

735 commits

Author SHA1 Message Date
Sergey Kucher
069b3d4c37 Adds --mlock argument 2023-05-02 16:19:37 +03:00
Concedo
5a10ea50da up ver 2023-05-02 18:19:08 +08:00
Concedo
9a9b217e57 updated embedded kobold lite with multiuser chat 2023-05-02 18:18:05 +08:00
Concedo
6f702f2700 fixed stop sequence crash 2023-05-02 14:56:50 +08:00
Concedo
94827172e0 Merge branch 'master' into concedo
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	ggml-cuda.cu
#	ggml-cuda.h
2023-05-02 14:38:31 +08:00
Concedo
433fa1e8b2 fix for stop sequence missing, added print for exception when loading GUI 2023-05-02 14:18:04 +08:00
Concedo
0703cdf2eb remove cloudflare insights 2023-05-02 00:38:10 +08:00
DannyDaemonic
f4cef87edf
Add git-based build information for better issue tracking (#1232)
* Add git-based build information for better issue tracking

* macOS fix

* "build (hash)" and "CMAKE_SOURCE_DIR" changes

* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages

* Fix conditional dependency on missing target

* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile

* 4 space indenting for cmake, attempt to clean up my mess in Makefile

* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
2023-05-01 18:23:47 +02:00
slaren
58b367c2d7
cuBLAS: refactor and optimize f16 mat mul performance (#1259)
* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1
2023-05-01 18:11:07 +02:00
xloem
ea3a0ad6b6
llama : update stubs for systems without mmap and mlock (#1266)
Co-authored-by: John Doe <john.doe@example.com>
2023-05-01 15:58:51 +03:00
Kerfuffle
2bdc09646d
ggml : fix ggml_used_mem() (#1264) 2023-05-01 14:56:07 +03:00
Georgi Gerganov
70269cae37
llama : fix session load / save (#1263) 2023-05-01 14:54:59 +03:00
slaren
b925f1f1b0
cuBLAS: fall back to pageable memory if pinned alloc fails (#1233)
* cuBLAS: fall back to pageable memory if pinned alloc fails

* cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set
2023-05-01 13:32:22 +02:00
Alex Klinkhamer
90b19bd6ee
llama : let context be const when accessing const data (#1261) 2023-05-01 10:24:20 +03:00
Concedo
4d38795563 add UI for token unbanning 2023-05-01 12:10:21 +08:00
Concedo
3de34ee492 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	ggml-opencl.c
2023-05-01 12:03:46 +08:00
Concedo
560dacedbd update readme 2023-05-01 11:41:25 +08:00
Georgi Gerganov
7ff0dcd320
ggml : fix UB (int << 31) 2023-04-30 22:28:51 +03:00
Pavol Rusnak
6f79699286
build: add armv{6,7,8} support to cmake (#1251)
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
2023-04-30 20:48:38 +02:00
jon-chuang
a5d30b1f53
common : better default number of threads (#934)
* commit

* fix

* try-catch

* apply code review

* improve

* improve

* add macos headers

* done

* remove color

* fix windows

* minor

* fix

* Apply suggestions from code review

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

* remove

* minor

* minor

---------

Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
0cc4m
76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
* Implement q5_0, q5_1 and q8_0

* Work around q5_0 OpenCL issue

* Fix q8_0 dequant kernel

* Move cl kernels into ggml-opencl.c

* Use two memcpy calls for q5_0 buffer transfer
2023-04-30 21:34:52 +03:00
Georgi Gerganov
6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE 2023-04-30 19:07:43 +03:00
Concedo
25201233ca fixed unbantokens not following EOS 2023-05-01 00:02:45 +08:00
Concedo
294a5d00b1 Merge remote-tracking branch 'occam/clblast-further-dequant-kernels' into concedo_experimental
# Conflicts:
#	ggml-opencl.c
2023-04-30 23:56:24 +08:00
Concedo
3b5df18dbb temp fix for compilation issues on OSX (M1) 2023-04-30 23:48:46 +08:00
Stephan Walter
f0d70f147d
Various fixes to mat_mul benchmark (#1253) 2023-04-30 12:32:37 +00:00
0cc4m
e69c924ad1 Use two memcpy calls for q5_0 buffer transfer 2023-04-30 10:44:48 +02:00
Concedo
fdd21d0eba add missing include 2023-04-30 16:15:11 +08:00
Georgi Gerganov
3e5aa8a1c4
ggml : fix labels for GGML_OP_ALIBI 2023-04-30 10:25:46 +03:00
Concedo
b3315459c7 pilled the new dequants for clblast, fixed some ooms 2023-04-30 14:15:44 +08:00
Concedo
0061b90ec6 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-04-30 10:35:02 +08:00
Georgi Gerganov
c3ca7a5f05
ggml : fix 32-bit ARM NEON 2023-04-29 21:34:23 +03:00
Georgi Gerganov
e8c051611a
ggml : use vzip instead of vuzp for consistency 2023-04-29 21:12:56 +03:00
Georgi Gerganov
0b5a935099
ggml : fix visibility and unused warnings 2023-04-29 19:28:36 +03:00
Georgi Gerganov
ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) 2023-04-29 18:43:42 +03:00
Georgi Gerganov
214b6a3570
ggml : adjust mul_mat_f16 work memory (#1226)
* llama : minor - remove explicity int64_t cast

* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS

* ggml : add asserts to guard for incorrect wsize
2023-04-29 18:43:28 +03:00
Concedo
f149114395 up ver 2023-04-29 19:42:21 +08:00
Concedo
7afad2b9b5 integrated the new samplers 2023-04-29 19:41:41 +08:00
Georgi Gerganov
305eb5afd5
build : fix reference to old llama_util.h 2023-04-29 13:53:12 +03:00
Georgi Gerganov
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h 2023-04-29 13:48:11 +03:00
Concedo
da0c34b028 Merge branch 'master' into concedo_experimental 2023-04-29 18:27:06 +08:00
Concedo
fe0e4de8e8 fixed a regression where a bad model was giving valid logits after library changes. now we run the eval through the model twice and compare logits. if they give the same logits for different inputs, model is broken 2023-04-29 18:25:17 +08:00
0cc4m
369d903eda Move cl kernels into ggml-opencl.c 2023-04-29 10:58:34 +02:00
0cc4m
d6be497ef6 Fix q8_0 dequant kernel 2023-04-29 10:58:34 +02:00
0cc4m
1560c10f24 Work around q5_0 OpenCL issue 2023-04-29 10:58:34 +02:00
0cc4m
9439da6f95 Implement q5_0, q5_1 and q8_0 2023-04-29 10:58:31 +02:00
Georgi Gerganov
334637e43e
common : change default parameters to pre-#1126 (#1223) 2023-04-29 09:51:06 +03:00
Ivan Stepanov
dd7eff57d8
llama : new sampling algorithms (#1126)
* Sample interface, new samplers.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.

* mirostat

* Added --logit-bias and --no-penalize-nl, removed std::span

* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

* Save and load example adjust

* Tests

* Windows build fix

* Windows test fix
2023-04-29 08:34:41 +03:00
Concedo
5aa185f3f7 remove preallocation 2023-04-29 12:32:37 +08:00
Concedo
bb282a4ecf reinstated the q4_3 format, for backwards compatibility. 2023-04-29 11:42:04 +08:00