Henri Vasserman
71ac58ae53
make clang-tidy happy
2023-05-20 15:29:26 +03:00
Henri Vasserman
ad9ab0e3fe
editorconfig fixes
2023-05-20 15:27:24 +03:00
Henri Vasserman
4f97f73db2
fix indexing issue
2023-05-20 15:21:38 +03:00
Henri Vasserman
e4640eec70
Merge 'origin/master' into clfixes
2023-05-20 15:01:41 +03:00
Henri Vasserman
e71bba90b8
rewrite platform selection code.
2023-05-20 14:58:33 +03:00
Georgi Gerganov
ea600071cb
Revert "feature : add blis and other BLAS implementation support ( #1502 )"
...
This reverts commit 07e9ace0f9
.
2023-05-20 12:03:48 +03:00
Zenix
07e9ace0f9
feature : add blis and other BLAS implementation support ( #1502 )
...
* feature: add blis support
* feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927
* fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake
* Fix typo in INTEGER
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20 12:02:48 +03:00
Georgi Gerganov
ec2e10c444
llama : add llama_init_backend() API ( close #1527 )
2023-05-20 11:06:37 +03:00
DannyDaemonic
d2c59b8ba4
Fix for mingw ( #1462 )
2023-05-20 00:40:02 -07:00
Maxime
503db28849
llama : fix name shadowing and C4146 ( #1526 )
...
* Fix name shadowing and C4146
* Fix if macros not using defined when required
* Update llama-util.h
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update llama-util.h
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Code style
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20 10:22:37 +03:00
Georgi Gerganov
8a203f9fa1
llama : fix compile warnings in llama_set_state_data()
2023-05-20 10:14:43 +03:00
Georgi Gerganov
4fd3e29297
ggml : fix scalar implementation of Q4_1 dot
2023-05-20 10:13:19 +03:00
Henri Vasserman
6df8e93234
update Q formats
2023-05-19 23:52:35 +03:00
Henri Vasserman
057c9b7dc8
Merge 'origin/master' into clfixes
2023-05-19 23:46:18 +03:00
Georgi Gerganov
2d5db48371
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 ( #1508 )
...
* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0
* llama : bump LLAMA_FILE_VERSION to 3
* cuda : update Q4 and Q8 dequantize kernels
* ggml : fix AVX dot products
* readme : update performance table + hot topics
2023-05-19 22:17:18 +03:00
Georgi Gerganov
6986c7835a
tests : add missing header
2023-05-19 21:17:28 +03:00
Evan Jones
943e6081cc
examples : add persistent chat ( #1495 )
...
* examples : add persistent chat
* examples : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-19 20:39:51 +03:00
Jason McCartney
7694b52b9a
main : make reverse prompt option act as a stop token in non-interactive mode ( #1032 )
...
* Make reverse prompt option act as a stop token in non-interactive scenarios
* Making requested review changes
* Update gpt_params_parse and fix a merge error
* Revert "Update gpt_params_parse and fix a merge error"
This reverts commit 2bb2ff1748
.
* Update gpt_params_parse and fix a merge error take 2
2023-05-19 20:24:59 +03:00
David Kennedy
79e3efb0e9
readme : adds WizardLM to the list of supported models ( #1485 )
2023-05-19 20:16:30 +03:00
Georgi Gerganov
4b7e245adf
minor : fix compile warnings
2023-05-19 20:14:51 +03:00
Henri Vasserman
35dbc8d799
wrap all CL calls in checks.
2023-05-19 13:08:57 +03:00
Henri Vasserman
772e3fbe12
add packed just in case
2023-05-19 01:12:57 +03:00
Henri Vasserman
558c672c93
Merge 'origin/master' into clfixes
2023-05-19 00:36:03 +03:00
Henri Vasserman
962e2a9cd9
Added another check to find a GPU.
2023-05-19 00:35:46 +03:00
Erik Scholz
5ea4339273
make kv_f16 the default for api users ( #1517 )
2023-05-18 19:31:01 +02:00
DannyDaemonic
ee9654138a
Fixes #1511 lambda issue for w64devkit (mingw) ( #1513 )
...
* Fix for w64devkit and mingw
2023-05-18 19:30:40 +02:00
Stephan Walter
dc271c52ed
Remove unused n_parts parameter ( #1509 )
2023-05-17 22:12:01 +00:00
rankaiyx
c238b5873a
benchmark-matmul: Print the average of the test results ( #1490 )
2023-05-17 16:47:58 +02:00
Tom Jobbins
2b2646931b
convert.py: Support models which are stored in a single pytorch_model.bin ( #1469 )
...
* Support models in a single pytorch_model.bin
* Remove spurious line with typo
2023-05-17 00:04:35 +02:00
Ilya Kurdyukov
42627421ec
~7% faster Q5_1 AVX2 code ( #1477 )
2023-05-16 18:36:47 +00:00
András Salamon
9560655409
define default model path once, sync path with readme ( #1366 )
2023-05-16 17:46:34 +02:00
sandyiscool
2a5ee023ad
Add alternate include path for openblas ( #1476 )
...
In some linux distributions (fedora, for example), the include path for openblas is located at '/usr/local/include'
2023-05-16 10:30:15 +02:00
zrm
63d20469b8
fix get_num_physical_cores() ( #1436 )
...
* fix get_num_physical_cores()
had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id"
* Add spaces to maintain consistent formatting
---------
Co-authored-by: slaren <ddevesa@gmail.com>
2023-05-15 04:25:42 +02:00
Henri Vasserman
225305d32c
Merge remote-tracking branch 'origin/master'
2023-05-14 23:54:51 +03:00
slaren
b5c9295eef
benchmark-matmul: fix clang-tidy issues, report results in GFLOPS ( #1458 )
...
* benchmark-matmul: fix command line parsing, replace macros with functions, report results in GFLOPS
2023-05-14 22:46:00 +02:00
Johannes Gäßler
eb363627fd
cuda : deduplicated dequantization code ( #1453 )
2023-05-14 21:53:23 +03:00
Henri Vasserman
9939b87cbb
Fix Q8_0
2023-05-14 19:12:09 +03:00
xaedes
79b2d5b69d
ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 ( #1454 )
...
* fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
* remove trailing whitespace
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-14 18:55:02 +03:00
Georgi Gerganov
13c351ad72
ggml : various fixes ( #1450 )
...
- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
2023-05-14 18:22:50 +03:00
Henri Vasserman
394dabbc1a
remove qk as well
2023-05-14 13:22:39 +03:00
Henri Vasserman
9074e353dd
minor nitpicks
2023-05-14 13:15:09 +03:00
katsu560
60f8c361ca
ggml : add AVX support based on AVX2 code ( #1430 )
2023-05-14 10:03:51 +00:00
Henri Vasserman
0453ce3f8b
Remove all constants
2023-05-14 12:47:41 +03:00
Georgi Gerganov
601a033475
ggml : add GGML_QNT_VERSION to track quantization format changes
...
https://github.com/ggerganov/ggml/issues/150#issuecomment-1546625668
2023-05-14 10:20:19 +03:00
Henri Vasserman
b8fb5cdf5c
rewrite platform and device selection
2023-05-13 22:25:53 +03:00
Henri Vasserman
bb5c3e2c70
remove constants
2023-05-13 22:04:17 +03:00
Georgi Gerganov
08737ef720
cuda : fix convert function ( #1412 )
2023-05-13 17:40:58 +03:00
Georgi Gerganov
bda4d7c215
make : fix PERF build with cuBLAS
2023-05-13 17:25:09 +03:00
Georgi Gerganov
5a5aeb1e91
llama : fix unused warning
2023-05-13 16:55:14 +03:00
Georgi Gerganov
66841fdb0e
ggml : multi-thread mul and diag_mask ops ( #1428 )
2023-05-13 16:48:03 +03:00