Commit graph

1986 commits

Author SHA1 Message Date
luoyu-intel
d5f7d364f6 remove sycl version from include path 2024-01-23 02:38:44 +00:00
luoyu-intel
57e9fbadb2 fix return type 2024-01-23 02:38:44 +00:00
Neo Zhang Jianyu
593ce001e2 Update README_sycl.md 2024-01-23 02:38:44 +00:00
jianyuzh
d80dd65f42 dos2unix 2024-01-23 02:38:44 +00:00
jianyuzh
09b5619df4 rm rear space 2024-01-23 02:38:44 +00:00
jianyuzh
7350fd48ef add ls-sycl-device, rm unused files 2024-01-23 02:38:44 +00:00
jianyuzh
0d6e7219b6 add ls-sycl-device tool 2024-01-23 02:38:44 +00:00
jianyuzh
79d30d7713 add run script, comment debug code 2024-01-23 02:38:44 +00:00
jianyuzh
a8936f4902 set nthread=1 when sycl, increase performance 2024-01-23 02:38:44 +00:00
jianyuzh
95daece908 fix build with sycl 2024-01-23 02:38:44 +00:00
jianyuzh
ca2cb6982a update readme, refactor build script 2024-01-23 02:38:44 +00:00
jianyuzh
c3c5b20ac5 mv dpct definition from folder dpct to ggml-sycl.h 2024-01-23 02:38:44 +00:00
jianyuzh
c67c2ab228 refactor device log 2024-01-23 02:38:44 +00:00
jianyuzh
a47f5ec42e summary dpct definition in one header file to replace folder:dpct 2024-01-23 02:38:44 +00:00
jianyuzh
5b5389941e fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 2024-01-23 02:38:44 +00:00
jianyuzh
bd38129aeb add print tensor function to debug 2024-01-23 02:38:44 +00:00
jianyuzh
3645f25d74 correct queue: rm dtct:get_queue 2024-01-23 02:38:44 +00:00
jianyuzh
fa3a58605b clear CMAKE to rm unused lib and options 2024-01-23 02:38:44 +00:00
jianyuzh
c709c3cb37 ren ggml-sycl.hpp -> ggml-sycl.h 2024-01-23 02:38:44 +00:00
jianyuzh
69d76c8b58 fix error of select non-zero device, format device list 2024-01-23 02:38:44 +00:00
jianyuzh
c2ef7a9cb9 step 8, rename all macro & func from cuda by sycl 2024-01-23 02:38:42 +00:00
jianyuzh
3b1a743e82 step7 add debug for code path, rm log 2024-01-23 02:15:32 +00:00
jianyuzh
65f895d41b support main device is non-zero 2024-01-23 02:15:32 +00:00
jianyuzh
3a9d2c54ba step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue 2024-01-23 02:15:32 +00:00
jianyuzh
6dd32789b4 step 5 format device and print 2024-01-23 02:15:32 +00:00
jianyuzh
da752edaf5 add GGML_LIST_DEVICE function 2024-01-23 02:15:32 +00:00
jianyuzh
43f2c35859 step3 add fp16, slower 31->28 2024-01-23 02:15:32 +00:00
jianyuzh
02dffb68b8 step 2 2024-01-23 02:15:32 +00:00
jianyuzh
ff83711055 step 1 2024-01-23 02:15:32 +00:00
jianyuzh
0c00b4f654 add debug functio, commit all help code 2024-01-23 02:15:32 +00:00
jianyuzh
233876936b update init_cublas 2024-01-23 02:15:32 +00:00
jianyuzh
7a4343df61 first update for migration 2024-01-23 02:15:32 +00:00
slaren
011e8ec577
llama : fix not enough space in buffer with Qwen (#5086) 2024-01-22 23:42:41 +01:00
Kawrakow
6f9939d119
KL-divergence (#5076)
* kl-divergence: be able to save all logits to a file

* Add ability to compute KL-divergence

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22 16:10:14 +02:00
Reinforce-II
780e24a22e
ggml : parallelize FP32 conversion when using BLAS (#5045)
* make GGML_TASK_INIT phase can be run in multithread

* multithreaded dequantize in mul_mat when using blas library

* minor fixes

* update outdated comment
* fix coding style

* simplify code

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22 15:15:08 +02:00
XiaotaoChen
3ce7e8f8e7
llava : MobileVLM support (#4954)
* MobileVLM native implementation

* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake

* move android script to example/llava directory

* Fix the editor config checks

---------

Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22 15:09:35 +02:00
Someone Serge
b2d80e105a flake.nix: add a comment about flakes vs nix 2024-01-22 12:19:30 +00:00
Someone Serge
28603cd283 nix: add a comment on the many nixpkgs-with-cuda instances 2024-01-22 12:19:30 +00:00
Someone Serge
5e97ec91ae nix: add a comment about makeScope 2024-01-22 12:19:30 +00:00
Someone Serge
7251870780 nix: refactor the cleanSource rules 2024-01-22 12:19:30 +00:00
Someone Serge
fe8b3c0d4b workflows: nix-ci: drop the redundant "paths" filter 2024-01-22 12:19:30 +00:00
Someone Serge
f4dd059259 workflows: nix-build-aarch64: rate limit 2024-01-22 12:19:30 +00:00
Someone Serge
f7276f7500 workflows: nix-ci: rebuild on flake.lock updates 2024-01-22 12:19:30 +00:00
Kawrakow
15bceec2d7
imatrix : keep intermediate imatrix results (#5077)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22 14:18:43 +02:00
compilade
d6bd4d46dd
llama : support StableLM 2 1.6B (#5052)
* llama : support StableLM 2 1.6B

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra

* convert : use presence of tokenizer.json to determine StableLM tokenizer loader

It's a less arbitrary heuristic than the vocab size.
2024-01-22 13:21:52 +02:00
Daniel Bevenius
152d9d05e0
finetune : print sample-start/include-sample-start (#5072)
This commit adds `--sample-start` and `--include-sample-start` to the
output from the main function in finetune.cpp.

The motivation for this is that even though these are set explicitly by
the user via the command line, if one forgets to set them then it is
useful to have their values printed out. Otherwise it is possible to go
through the whole training process before realizing that the values are
not what one expected.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-22 13:11:01 +02:00
Kawrakow
66d575c45c
llama : add Q3_K_XS (#5060)
* Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S

* Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K

Together with an importance matrix, this brings perplexity
for LLaMA-v2-70B below the perplexity of the former Q2_K
with a 800 MB smaller quantized model size.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22 12:43:33 +02:00
bobqianic
57744932c6
ci : fix Windows CI by updating Intel SDE version (#5053) 2024-01-22 10:55:05 +02:00
Shijie
3466c6ebcf
llama : add more qwen2 models (#5071) 2024-01-22 09:33:19 +02:00
iSma
504dc37be8
Revert LLAMA_NATIVE to OFF in flake.nix (#5066) 2024-01-21 21:37:13 +00:00