Henri Vasserman
b67cc50dad
Merge 'origin/master' into hipblas
2023-05-03 15:04:51 +03:00
Evan Jones
e216aa0463
llama : only copy used KV cache in get / set state ( #1272 )
...
* llama : only copy used KV cache in get / set state
* switch to ggml for copying k, v
* avoid designated initializers
2023-05-02 22:26:13 -04:00
DannyDaemonic
2485d7a4d3
Process escape sequences given in prompts ( #1173 )
2023-05-02 18:46:20 -07:00
DannyDaemonic
13b0c68ed7
Handle signals properly on Windows ( #1123 )
2023-05-02 18:01:57 -07:00
DannyDaemonic
55bc5f0900
Call sh on build-info.sh ( #1294 )
2023-05-02 17:52:35 -07:00
kuvaus
9daff419f6
fix build-info.h for git submodules ( #1289 )
...
* make git build info work with submodules
---------
Co-authored-by: Green Sky <green@g-s.xyz>
2023-05-03 02:43:43 +02:00
slaren
bf4b22ffe4
fix missing parameters in llama_init_from_gpt_params
( #1293 )
2023-05-03 01:36:45 +02:00
Ron Evans
67c77799e0
examples : add llama_init_from_gpt_params() common function ( #1290 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 23:39:51 +03:00
Georgi Gerganov
0e6cbff1b7
llama : fix compile warnings
2023-05-02 23:09:08 +03:00
Georgi Gerganov
5d5817ca60
ggml : fix 32-bit ARM
2023-05-02 22:14:50 +03:00
Ron Evans
8c9be35ff9
examples : improve vertical alignment of a few variables ( #1286 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 20:53:52 +03:00
Marvin Gießing
cc0bb7235c
ggml : fix ppc64le build error and make cmake detect Power processors ( #1284 )
...
* Fix ppc64le build issue
* Added support to detect ppc64* processors
2023-05-02 19:42:16 +03:00
Robert Brisita
2bb992f034
llama : allow 0 as a seed number. ( #1275 )
2023-05-02 19:23:44 +03:00
Ron Evans
e2cd506999
main : switch input_noecho to input_echo to remove negation ( #979 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 19:13:26 +03:00
slaren
2d099e5193
ggml: add names to tensors ( #1268 )
...
* ggml: add names to tensors
* minor improvements to dot file formatting
2023-05-02 16:03:00 +02:00
Henri Vasserman
fcbc262eb9
Merge 'origin/master' into hipblas
2023-05-01 22:45:29 +03:00
DannyDaemonic
f4cef87edf
Add git-based build information for better issue tracking ( #1232 )
...
* Add git-based build information for better issue tracking
* macOS fix
* "build (hash)" and "CMAKE_SOURCE_DIR" changes
* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages
* Fix conditional dependency on missing target
* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile
* 4 space indenting for cmake, attempt to clean up my mess in Makefile
* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
2023-05-01 18:23:47 +02:00
slaren
58b367c2d7
cuBLAS: refactor and optimize f16 mat mul performance ( #1259 )
...
* cuBLAS: refactor, convert fp16 to fp32 on device
* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16
* fix build
* cuBLAS: update block_q5_1
2023-05-01 18:11:07 +02:00
xloem
ea3a0ad6b6
llama : update stubs for systems without mmap and mlock ( #1266 )
...
Co-authored-by: John Doe <john.doe@example.com>
2023-05-01 15:58:51 +03:00
Kerfuffle
2bdc09646d
ggml : fix ggml_used_mem() ( #1264 )
2023-05-01 14:56:07 +03:00
Georgi Gerganov
70269cae37
llama : fix session load / save ( #1263 )
2023-05-01 14:54:59 +03:00
slaren
b925f1f1b0
cuBLAS: fall back to pageable memory if pinned alloc fails ( #1233 )
...
* cuBLAS: fall back to pageable memory if pinned alloc fails
* cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set
2023-05-01 13:32:22 +02:00
Alex Klinkhamer
90b19bd6ee
llama : let context be const when accessing const data ( #1261 )
2023-05-01 10:24:20 +03:00
Georgi Gerganov
7ff0dcd320
ggml : fix UB (int << 31)
2023-04-30 22:28:51 +03:00
Pavol Rusnak
6f79699286
build: add armv{6,7,8} support to cmake ( #1251 )
...
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
2023-04-30 20:48:38 +02:00
jon-chuang
a5d30b1f53
common : better default number of threads ( #934 )
...
* commit
* fix
* try-catch
* apply code review
* improve
* improve
* add macos headers
* done
* remove color
* fix windows
* minor
* fix
* Apply suggestions from code review
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
* remove
* minor
* minor
---------
Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
0cc4m
76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels ( #1225 )
...
* Implement q5_0, q5_1 and q8_0
* Work around q5_0 OpenCL issue
* Fix q8_0 dequant kernel
* Move cl kernels into ggml-opencl.c
* Use two memcpy calls for q5_0 buffer transfer
2023-04-30 21:34:52 +03:00
Georgi Gerganov
6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE
2023-04-30 19:07:43 +03:00
Henri Vasserman
c73def129a
Merge 'origin/master' into hipblas
2023-04-30 18:40:42 +03:00
Stephan Walter
f0d70f147d
Various fixes to mat_mul benchmark ( #1253 )
2023-04-30 12:32:37 +00:00
Georgi Gerganov
3e5aa8a1c4
ggml : fix labels for GGML_OP_ALIBI
2023-04-30 10:25:46 +03:00
Georgi Gerganov
c3ca7a5f05
ggml : fix 32-bit ARM NEON
2023-04-29 21:34:23 +03:00
Georgi Gerganov
e8c051611a
ggml : use vzip instead of vuzp for consistency
2023-04-29 21:12:56 +03:00
Georgi Gerganov
0b5a935099
ggml : fix visibility and unused warnings
2023-04-29 19:28:36 +03:00
Georgi Gerganov
ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) ( #1229 )
2023-04-29 18:43:42 +03:00
Georgi Gerganov
214b6a3570
ggml : adjust mul_mat_f16 work memory ( #1226 )
...
* llama : minor - remove explicity int64_t cast
* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS
* ggml : add asserts to guard for incorrect wsize
2023-04-29 18:43:28 +03:00
Georgi Gerganov
305eb5afd5
build : fix reference to old llama_util.h
2023-04-29 13:53:12 +03:00
Georgi Gerganov
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h
2023-04-29 13:48:11 +03:00
Henri Vasserman
d8ea75e952
Merge 'origin/master' into hipblas
2023-04-29 11:25:51 +03:00
Georgi Gerganov
334637e43e
common : change default parameters to pre-#1126 ( #1223 )
2023-04-29 09:51:06 +03:00
Ivan Stepanov
dd7eff57d8
llama : new sampling algorithms ( #1126 )
...
* Sample interface, new samplers.
New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat
Ignore EOS fix: -inf should be used.
* mirostat
* Added --logit-bias and --no-penalize-nl, removed std::span
* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
* Save and load example adjust
* Tests
* Windows build fix
* Windows test fix
2023-04-29 08:34:41 +03:00
slaren
7fc50c051a
cuBLAS: use host pinned memory and dequantize while copying ( #1207 )
...
* cuBLAS: dequantize simultaneously while copying memory
* cuBLAS: use host pinned memory
* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory
* cuBLAS: also pin kv cache
* fix rebase
2023-04-29 02:04:18 +02:00
Henri Vasserman
b1ee8f59b4
cuBLAS: non-contiguous tensor support ( #1215 )
...
* Cuda: non-contiguous tensor support
* remove extra stuff
* rename
* fix error
* more fixes, now OpenBLAS and CLBlast build too
* now then?
2023-04-29 01:31:56 +02:00
Stephan Walter
36d19a603b
Remove Q4_3 which is no better than Q5 ( #1218 )
2023-04-28 23:10:43 +00:00
Henri Vasserman
d194586f65
Merge 'origin/master' into hipblas
2023-04-28 23:03:52 +03:00
Georgi Gerganov
7f15c5c477
readme : update hot topics
2023-04-28 21:32:52 +03:00
Georgi Gerganov
55390bcaf2
ggml : sync ggml (ggml_alibi)
2023-04-28 20:51:05 +03:00
CRD716
5fba3c016b
examples : add Jeopardy example ( #1168 )
...
* Basic Setup
* Prevent Results.txt from coming up
* Prefixes, Line separators, etc
* editorcheck
* introduction to give more consistent results
* Basic graph thing
* Grading, ready for testing!
* Y'all ready to get funky?
* fix column removal stuff
* missed a few
2023-04-28 19:13:33 +03:00
Evan Jones
1481a9cf25
llama : add session file format and saved sessions in main ( #1169 )
2023-04-28 18:59:37 +03:00
Georgi Gerganov
11d902364b
ggml : add helper debug printf in soft_max
2023-04-28 17:59:08 +03:00