jianyuzh
533c647d0e
check for sycl blas, better performance
2024-01-23 13:34:05 +08:00
Meng, Hengyu
67e6b3cb7d
align pr4766
2024-01-23 03:32:09 +00:00
luoyu-intel
f008cc7b68
enable SYCL_F16 support
2024-01-23 02:38:44 +00:00
jianyuzh
f396a3b65e
add know issue for pvc hang issue
2024-01-23 02:38:44 +00:00
luoyu-intel
623d8031cb
fix code err
2024-01-23 02:38:44 +00:00
jianyuzh
e3481faa2f
rm original sycl code before refactor
2024-01-23 02:38:44 +00:00
jianyuzh
ae941b1b57
add syc and link for sycl readme
2024-01-23 02:38:44 +00:00
jianyuzh
35a0daaaa1
restore rm code to fix hang issue
2024-01-23 02:38:44 +00:00
luoyu-intel
d5f7d364f6
remove sycl version from include path
2024-01-23 02:38:44 +00:00
luoyu-intel
57e9fbadb2
fix return type
2024-01-23 02:38:44 +00:00
Neo Zhang Jianyu
593ce001e2
Update README_sycl.md
2024-01-23 02:38:44 +00:00
jianyuzh
d80dd65f42
dos2unix
2024-01-23 02:38:44 +00:00
jianyuzh
09b5619df4
rm rear space
2024-01-23 02:38:44 +00:00
jianyuzh
7350fd48ef
add ls-sycl-device, rm unused files
2024-01-23 02:38:44 +00:00
jianyuzh
0d6e7219b6
add ls-sycl-device tool
2024-01-23 02:38:44 +00:00
jianyuzh
79d30d7713
add run script, comment debug code
2024-01-23 02:38:44 +00:00
jianyuzh
a8936f4902
set nthread=1 when sycl, increase performance
2024-01-23 02:38:44 +00:00
jianyuzh
95daece908
fix build with sycl
2024-01-23 02:38:44 +00:00
jianyuzh
ca2cb6982a
update readme, refactor build script
2024-01-23 02:38:44 +00:00
jianyuzh
c3c5b20ac5
mv dpct definition from folder dpct to ggml-sycl.h
2024-01-23 02:38:44 +00:00
jianyuzh
c67c2ab228
refactor device log
2024-01-23 02:38:44 +00:00
jianyuzh
a47f5ec42e
summary dpct definition in one header file to replace folder:dpct
2024-01-23 02:38:44 +00:00
jianyuzh
5b5389941e
fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481
2024-01-23 02:38:44 +00:00
jianyuzh
bd38129aeb
add print tensor function to debug
2024-01-23 02:38:44 +00:00
jianyuzh
3645f25d74
correct queue: rm dtct:get_queue
2024-01-23 02:38:44 +00:00
jianyuzh
fa3a58605b
clear CMAKE to rm unused lib and options
2024-01-23 02:38:44 +00:00
jianyuzh
c709c3cb37
ren ggml-sycl.hpp -> ggml-sycl.h
2024-01-23 02:38:44 +00:00
jianyuzh
69d76c8b58
fix error of select non-zero device, format device list
2024-01-23 02:38:44 +00:00
jianyuzh
c2ef7a9cb9
step 8, rename all macro & func from cuda by sycl
2024-01-23 02:38:42 +00:00
jianyuzh
3b1a743e82
step7 add debug for code path, rm log
2024-01-23 02:15:32 +00:00
jianyuzh
65f895d41b
support main device is non-zero
2024-01-23 02:15:32 +00:00
jianyuzh
3a9d2c54ba
step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue
2024-01-23 02:15:32 +00:00
jianyuzh
6dd32789b4
step 5 format device and print
2024-01-23 02:15:32 +00:00
jianyuzh
da752edaf5
add GGML_LIST_DEVICE function
2024-01-23 02:15:32 +00:00
jianyuzh
43f2c35859
step3 add fp16, slower 31->28
2024-01-23 02:15:32 +00:00
jianyuzh
02dffb68b8
step 2
2024-01-23 02:15:32 +00:00
jianyuzh
ff83711055
step 1
2024-01-23 02:15:32 +00:00
jianyuzh
0c00b4f654
add debug functio, commit all help code
2024-01-23 02:15:32 +00:00
jianyuzh
233876936b
update init_cublas
2024-01-23 02:15:32 +00:00
jianyuzh
7a4343df61
first update for migration
2024-01-23 02:15:32 +00:00
slaren
011e8ec577
llama : fix not enough space in buffer with Qwen ( #5086 )
2024-01-22 23:42:41 +01:00
Kawrakow
6f9939d119
KL-divergence ( #5076 )
...
* kl-divergence: be able to save all logits to a file
* Add ability to compute KL-divergence
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22 16:10:14 +02:00
Reinforce-II
780e24a22e
ggml : parallelize FP32 conversion when using BLAS ( #5045 )
...
* make GGML_TASK_INIT phase can be run in multithread
* multithreaded dequantize in mul_mat when using blas library
* minor fixes
* update outdated comment
* fix coding style
* simplify code
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22 15:15:08 +02:00
XiaotaoChen
3ce7e8f8e7
llava : MobileVLM support ( #4954 )
...
* MobileVLM native implementation
* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake
* move android script to example/llava directory
* Fix the editor config checks
---------
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22 15:09:35 +02:00
Someone Serge
b2d80e105a
flake.nix: add a comment about flakes vs nix
2024-01-22 12:19:30 +00:00
Someone Serge
28603cd283
nix: add a comment on the many nixpkgs-with-cuda instances
2024-01-22 12:19:30 +00:00
Someone Serge
5e97ec91ae
nix: add a comment about makeScope
2024-01-22 12:19:30 +00:00
Someone Serge
7251870780
nix: refactor the cleanSource rules
2024-01-22 12:19:30 +00:00
Someone Serge
fe8b3c0d4b
workflows: nix-ci: drop the redundant "paths" filter
2024-01-22 12:19:30 +00:00
Someone Serge
f4dd059259
workflows: nix-build-aarch64: rate limit
2024-01-22 12:19:30 +00:00