Xuan Son Nguyen
c432a82295
fix parallel test
2024-11-20 21:51:24 +01:00
Xuan Son Nguyen
78e3cb3cf2
add parallel completion test
2024-11-20 21:35:31 +01:00
Xuan Son Nguyen
1c2f0f708c
fix save slot test
2024-11-20 19:24:24 +01:00
Xuan Son Nguyen
6af3f95f6f
fix coding style
2024-11-20 17:58:14 +01:00
Xuan Son Nguyen
472e128c0b
added all sequential tests
2024-11-20 17:57:20 +01:00
Xuan Son Nguyen
eb02373f76
log less, fix embd test
2024-11-20 16:49:35 +01:00
Xuan Son Nguyen
e34c9d78a4
styling
2024-11-20 15:01:09 +01:00
Xuan Son Nguyen
f09a9b68e1
more tests
2024-11-20 15:00:36 +01:00
Xuan Son Nguyen
3249aabc0b
add more tests
2024-11-20 12:49:18 +01:00
Xuan Son Nguyen
d7de41302b
misc
2024-11-20 11:09:16 +01:00
Xuan Son Nguyen
49cdfd3fc2
fix test on windows
2024-11-20 00:19:07 +01:00
Xuan Son Nguyen
3acaf58e38
server : replace behave with pytest
2024-11-19 23:29:46 +01:00
haopeng
42ae10bbcd
add cmake rvv support ( #10411 )
2024-11-19 21:10:31 +01:00
Georgi Gerganov
9fe0fb0626
sync : ggml
2024-11-19 20:03:21 +02:00
Plamen Minev
611fabd792
metal : fox offset integer overflows in im2col (ggml/1015)
...
-- While running StableDiffusion.cpp locally with Metal some offsets overflow and results in incorrect calculations
2024-11-19 20:03:21 +02:00
PAB
12b0ad953a
metal : add GGML_UNARY_OP_ELU
kernel (ggml/1018)
2024-11-19 20:03:21 +02:00
蕭澧邦
342397dc7e
cmake: force MSVC compiler charset to utf-8 ( #9989 )
2024-11-19 18:42:00 +01:00
bandoti
2a11b6b094
Add required ggml-base and backend libs to cmake pkg ( #10407 )
2024-11-19 17:10:30 +01:00
Diego Devesa
3ee6382d48
cuda : fix CUDA_FLAGS not being applied ( #10403 )
2024-11-19 14:29:38 +01:00
Georgi Gerganov
8e752a777b
llama : add check for KV cache shifts ( #10401 )
...
ggml-ci
2024-11-19 13:29:26 +02:00
Shane A
a88ad007de
llama : add OLMo November 2024 support ( #10394 )
...
* Add OLMo November 2024 constants
* Add OLMo November 2024 converter
* Add loading of OLMo November 2024 tensors and hyper parameters
* Add building of OLMo November 2024 model
2024-11-19 11:04:08 +02:00
Romain Biessy
2a1507c162
sycl : Add option to set the SYCL architecture for all targets ( #10266 )
...
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
2024-11-19 08:02:23 +00:00
Jeff Bolz
b3e585988f
vulkan: Optimize soft_max ( #10301 )
...
* vulkan: Optimize soft_max
Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.
Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.
Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.
* vulkan: Further soft_max optimizations
Restore the workgroup size of 512 case, use it for >1024.
Use unrollable loops for more iteration counts.
2024-11-19 08:25:17 +01:00
Alberto Cabrera Pérez
557924f222
sycl: Revert MUL_MAT_OP support changes ( #10385 )
2024-11-19 08:50:04 +08:00
Diego Devesa
d3481e6316
cuda : only use native when supported by cmake ( #10389 )
2024-11-18 18:43:40 +01:00
bandoti
531cb1c233
Skip searching root path for cross-compile builds ( #10383 )
2024-11-18 16:23:58 +01:00
Jeff Bolz
f139d2ea61
vulkan: remove use of null initializer ( #10372 )
...
Seems like this isn't working for vulkan-over-metal when the array is sized
by a spec constant. Maybe a spirv-cross limitation?
2024-11-18 08:28:42 -06:00
Georgi Gerganov
2eb76b2a5e
flake.lock: Update ( #10346 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05)
→ 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-11-18 06:08:20 -08:00
0cc4m
9b75f03cd2
Vulkan: Fix device info output format specifiers ( #10366 )
...
* Vulkan: Fix device info output format specifiers
* Vulkan: Use zu printf specifier for size_t instead of ld
2024-11-18 11:02:43 +01:00
Johannes Gäßler
75207b3a88
docker: use GGML_NATIVE=OFF ( #10368 )
2024-11-18 00:21:53 +01:00
Johannes Gäßler
76e9e58b78
CUDA: fix MMV kernel being used for FP16 src1 ( #10357 )
2024-11-17 23:20:42 +01:00
Johannes Gäßler
ce2e59ba10
CMake: fix typo in comment [no ci] ( #10360 )
2024-11-17 12:59:38 +01:00
Diego Devesa
be5caccef9
llama : only use default buffer types for the KV cache ( #10358 )
2024-11-17 12:25:45 +01:00
Georgi Gerganov
20a780c7b6
gitignore : ignore local run scripts [no ci]
2024-11-17 13:12:22 +02:00
Georgi Gerganov
cf32a9b93a
metal : refactor kernel args into structs ( #10238 )
...
* metal : add kernel arg structs (wip)
* metal : fattn args
ggml-ci
* metal : cont + avoid potential int overflow [no ci]
* metal : mul mat struct (wip)
* cont : mul mat vec
* cont : pass by reference
* cont : args is first argument
* cont : use char ptr
* cont : shmem style
* cont : thread counters style
* cont : mul mm id
ggml-ci
* cont : int safety + register optimizations
ggml-ci
* metal : GGML_OP_CONCAT
ggml-ci
* metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV
* metal : GGML_OP_REPEAT
* metal : GGML_OP_CPY
* metal : GGML_OP_RMS_NORM
* metal : GGML_OP_NORM
* metal : add TODOs for rest of ops
* ggml : add ggml-metal-impl.h
ggml-ci
2024-11-17 11:23:01 +02:00
FirstTimeEZ
a43178299c
ggml : fix undefined reference to 'getcpu' ( #10354 )
...
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-17 10:39:22 +02:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec ( #10318 )
2024-11-17 09:09:55 +01:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build ( #10320 )
2024-11-17 09:06:34 +01:00
Diego Devesa
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve ( #9930 )
2024-11-17 08:31:17 +02:00
Georgi Gerganov
24203e9dd7
ggml : inttypes.h -> cinttypes ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5d9e59979c
ggml : adapt AMX to tensor->grad removal ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
a4200cafad
make : add ggml-opt ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
84274a10c3
tests : remove test-grad0
2024-11-17 08:30:29 +02:00
Georgi Gerganov
68fcb4759c
ggml : fix compile warnings ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5c9a8b22b1
scripts : update sync
2024-11-17 08:30:29 +02:00
FirstTimeEZ
0fff7fd798
docs : vulkan build instructions to use git bash mingw64 ( #10303 )
2024-11-17 00:29:18 +01:00
Johannes Gäßler
4e54be0ec6
llama/ex: remove --logdir argument ( #10339 )
2024-11-16 23:00:41 +01:00
Georgi Gerganov
db4cfd5dbc
llamafile : fix include path ( #0 )
...
ggml-ci
2024-11-16 20:36:26 +02:00
Georgi Gerganov
8ee0d09ae6
make : auto-determine dependencies ( #0 )
2024-11-16 20:36:26 +02:00