Julia Longtin
588a0b19cc
expand mask, and align memory.
2024-05-13 22:12:54 +00:00
Julia Longtin
3994d81bf0
try to use vectorized zeroing function.
2024-05-13 22:12:54 +00:00
Julia Longtin
e227717136
add missing variable.
2024-05-13 22:12:54 +00:00
Julia Longtin
d5a27eb507
copy right block.
2024-05-13 22:12:54 +00:00
Julia Longtin
9f92f9730e
fix typo.
2024-05-13 22:12:54 +00:00
Julia Longtin
484c4abf8d
promote aux16 into a vector. (part three)
2024-05-13 22:12:54 +00:00
Julia Longtin
fb0fb9ff1b
promote aux16 into a vector.
2024-05-13 22:12:54 +00:00
Julia Longtin
405b5fa731
promote aux16 into a vector.
2024-05-13 22:12:54 +00:00
Julia Longtin
b92e06456c
formatting improvement.
2024-05-13 22:12:54 +00:00
Julia Longtin
ea858eee03
first fixes.
2024-05-13 22:12:54 +00:00
Julia Longtin
feed51c3f4
attempt to speed up float clearing.
2024-05-13 22:12:54 +00:00
Julia Longtin
2ed306623c
allow using code from ggml-phi-knc-dot_q5_K_q8_K.c
2024-05-13 22:12:50 +00:00
Julia Longtin
d5f39c3caa
force to compile.
2024-05-13 22:11:16 +00:00
Julia Longtin
b794e48ff8
tell ggml-common.h to export what we want.
2024-05-13 22:11:16 +00:00
Julia Longtin
2c5daab90f
pull in ggml specific types.
2024-05-13 22:11:16 +00:00
Julia Longtin
7080280c5b
import stdio.h for size_t.
2024-05-13 22:11:16 +00:00
Julia Longtin
96dce97091
import stdint.h for sizeSt.
2024-05-13 22:11:16 +00:00
Julia Longtin
0e6c910db9
begin work on targeting dot_q5_K_q8_K.
2024-05-13 22:11:16 +00:00
Julia Longtin
16cbe5dd81
be more specific about the length of our list of run amounts.
2024-05-13 22:11:16 +00:00
Julia Longtin
c605e951dc
spacing changes.
2024-05-13 22:11:16 +00:00
Julia Longtin
56be29fc58
formatting changes.
2024-05-13 22:11:16 +00:00
Julia Longtin
97c69835dc
use the same header as ggml.c, and remove some warnings.
2024-05-13 22:11:16 +00:00
Julia Longtin
580a347e59
remove intrinsics import, and use upConv to save 12 bytes of memory transit.
2024-05-13 22:11:15 +00:00
Julia Longtin
9ba28eaed3
Update ggml-phi-knc.c
2024-05-13 22:11:15 +00:00
Julia Longtin
72e2b13185
add a benchmark / test binary.
2024-05-13 22:11:15 +00:00
Julia Longtin
6f699fc98d
merge from upstream
2024-05-13 22:11:15 +00:00
Julia Longtin
926b0e8076
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
6e1b77ad58
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
f940c96aac
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
2458643dac
implement F32 dot products.
2024-05-13 22:11:15 +00:00
Julia Longtin
59ce785f61
import intrinsics.
2024-05-13 22:11:15 +00:00
Julia Longtin
c08ddb831f
use right type, and define GGML_F32_VEC_ZERO.
2024-05-13 22:11:15 +00:00
Julia Longtin
25095cac23
try to implement one intrinsic
2024-05-13 22:11:15 +00:00
Julia Longtin
8f6e535edc
try to detect the PHI cross compiler in make.
2024-05-13 22:11:15 +00:00
Julia Longtin
f7f174ecc9
try to detect the PHI cross compiler in make.
2024-05-13 22:11:15 +00:00
Julia Longtin
b9e2f2a332
instead of checking on glibc, check on SYS_getcpu
2024-05-13 22:11:10 +00:00
Julia Longtin
78291d93b9
handle the case that we have no glibc on the PHI.
2024-05-13 22:05:33 +00:00
Julia Longtin
757f952046
add detection of Xeon PHI: Knights Corner.
2024-05-13 22:03:26 +00:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion ( #7234 )
...
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00
Georgi Gerganov
614d3b914e
llama : less KV padding when FA is off ( #7257 )
...
ggml-ci
2024-05-13 17:15:15 +03:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt ( #7248 )
2024-05-14 00:02:36 +10:00
Johannes Gäßler
1c570d8bee
perplexity: add BF16 vs. FP16 results ( #7150 )
2024-05-13 13:03:27 +02:00
Neo Zhang
948f4ec7c5
[SYCL] rm wait() ( #7233 )
2024-05-13 18:11:26 +08:00
Joan Fontanals
9aa672490c
llama : rename jina tokenizers to v2 ( #7249 )
...
* refactor: rename jina tokenizers to v2
* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
Brian
b1f8af1886
convert.py: Outfile default name change and additional metadata support ( #4858 )
...
* convert.py: Outfile default name change and additional metadata support
* convert.py: don't stringify Metadata load method output
* convert.py: typo fix
* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley
e586ee4259
change default temperature of OAI compat API from 0 to 1 ( #7226 )
...
* change default temperature of OAI compat API from 0 to 1
* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package ( #7241 )
...
* add oneapi running time dlls to release package
* fix path
* fix path
* fix path
* fix path
* fix path
---------
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 ( #7235 )
...
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel ( #7188 )
...
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
Georgi Gerganov
6f1b63606f
cmake : fix version cmp ( #7227 )
2024-05-12 18:30:23 +03:00