Commit graph

1995 commits

Author SHA1 Message Date
jianyuzh
a0a1304b0c add build&run script, clean CMakefile, update guide by review comments 2024-01-23 14:16:01 +08:00
jianyuzh
533c647d0e check for sycl blas, better performance 2024-01-23 13:34:05 +08:00
Meng, Hengyu
67e6b3cb7d align pr4766 2024-01-23 03:32:09 +00:00
luoyu-intel
f008cc7b68 enable SYCL_F16 support 2024-01-23 02:38:44 +00:00
jianyuzh
f396a3b65e add know issue for pvc hang issue 2024-01-23 02:38:44 +00:00
luoyu-intel
623d8031cb fix code err 2024-01-23 02:38:44 +00:00
jianyuzh
e3481faa2f rm original sycl code before refactor 2024-01-23 02:38:44 +00:00
jianyuzh
ae941b1b57 add syc and link for sycl readme 2024-01-23 02:38:44 +00:00
jianyuzh
35a0daaaa1 restore rm code to fix hang issue 2024-01-23 02:38:44 +00:00
luoyu-intel
d5f7d364f6 remove sycl version from include path 2024-01-23 02:38:44 +00:00
luoyu-intel
57e9fbadb2 fix return type 2024-01-23 02:38:44 +00:00
Neo Zhang Jianyu
593ce001e2 Update README_sycl.md 2024-01-23 02:38:44 +00:00
jianyuzh
d80dd65f42 dos2unix 2024-01-23 02:38:44 +00:00
jianyuzh
09b5619df4 rm rear space 2024-01-23 02:38:44 +00:00
jianyuzh
7350fd48ef add ls-sycl-device, rm unused files 2024-01-23 02:38:44 +00:00
jianyuzh
0d6e7219b6 add ls-sycl-device tool 2024-01-23 02:38:44 +00:00
jianyuzh
79d30d7713 add run script, comment debug code 2024-01-23 02:38:44 +00:00
jianyuzh
a8936f4902 set nthread=1 when sycl, increase performance 2024-01-23 02:38:44 +00:00
jianyuzh
95daece908 fix build with sycl 2024-01-23 02:38:44 +00:00
jianyuzh
ca2cb6982a update readme, refactor build script 2024-01-23 02:38:44 +00:00
jianyuzh
c3c5b20ac5 mv dpct definition from folder dpct to ggml-sycl.h 2024-01-23 02:38:44 +00:00
jianyuzh
c67c2ab228 refactor device log 2024-01-23 02:38:44 +00:00
jianyuzh
a47f5ec42e summary dpct definition in one header file to replace folder:dpct 2024-01-23 02:38:44 +00:00
jianyuzh
5b5389941e fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 2024-01-23 02:38:44 +00:00
jianyuzh
bd38129aeb add print tensor function to debug 2024-01-23 02:38:44 +00:00
jianyuzh
3645f25d74 correct queue: rm dtct:get_queue 2024-01-23 02:38:44 +00:00
jianyuzh
fa3a58605b clear CMAKE to rm unused lib and options 2024-01-23 02:38:44 +00:00
jianyuzh
c709c3cb37 ren ggml-sycl.hpp -> ggml-sycl.h 2024-01-23 02:38:44 +00:00
jianyuzh
69d76c8b58 fix error of select non-zero device, format device list 2024-01-23 02:38:44 +00:00
jianyuzh
c2ef7a9cb9 step 8, rename all macro & func from cuda by sycl 2024-01-23 02:38:42 +00:00
jianyuzh
3b1a743e82 step7 add debug for code path, rm log 2024-01-23 02:15:32 +00:00
jianyuzh
65f895d41b support main device is non-zero 2024-01-23 02:15:32 +00:00
jianyuzh
3a9d2c54ba step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue 2024-01-23 02:15:32 +00:00
jianyuzh
6dd32789b4 step 5 format device and print 2024-01-23 02:15:32 +00:00
jianyuzh
da752edaf5 add GGML_LIST_DEVICE function 2024-01-23 02:15:32 +00:00
jianyuzh
43f2c35859 step3 add fp16, slower 31->28 2024-01-23 02:15:32 +00:00
jianyuzh
02dffb68b8 step 2 2024-01-23 02:15:32 +00:00
jianyuzh
ff83711055 step 1 2024-01-23 02:15:32 +00:00
jianyuzh
0c00b4f654 add debug functio, commit all help code 2024-01-23 02:15:32 +00:00
jianyuzh
233876936b update init_cublas 2024-01-23 02:15:32 +00:00
jianyuzh
7a4343df61 first update for migration 2024-01-23 02:15:32 +00:00
slaren
011e8ec577
llama : fix not enough space in buffer with Qwen (#5086) 2024-01-22 23:42:41 +01:00
Kawrakow
6f9939d119
KL-divergence (#5076)
* kl-divergence: be able to save all logits to a file

* Add ability to compute KL-divergence

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22 16:10:14 +02:00
Reinforce-II
780e24a22e
ggml : parallelize FP32 conversion when using BLAS (#5045)
* make GGML_TASK_INIT phase can be run in multithread

* multithreaded dequantize in mul_mat when using blas library

* minor fixes

* update outdated comment
* fix coding style

* simplify code

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22 15:15:08 +02:00
XiaotaoChen
3ce7e8f8e7
llava : MobileVLM support (#4954)
* MobileVLM native implementation

* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake

* move android script to example/llava directory

* Fix the editor config checks

---------

Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22 15:09:35 +02:00
Someone Serge
b2d80e105a flake.nix: add a comment about flakes vs nix 2024-01-22 12:19:30 +00:00
Someone Serge
28603cd283 nix: add a comment on the many nixpkgs-with-cuda instances 2024-01-22 12:19:30 +00:00
Someone Serge
5e97ec91ae nix: add a comment about makeScope 2024-01-22 12:19:30 +00:00
Someone Serge
7251870780 nix: refactor the cleanSource rules 2024-01-22 12:19:30 +00:00
Someone Serge
fe8b3c0d4b workflows: nix-ci: drop the redundant "paths" filter 2024-01-22 12:19:30 +00:00