Commit graph

2907 commits

Author SHA1 Message Date
Julia Longtin
d5a27eb507 copy right block. 2024-05-13 22:12:54 +00:00
Julia Longtin
9f92f9730e fix typo. 2024-05-13 22:12:54 +00:00
Julia Longtin
484c4abf8d promote aux16 into a vector. (part three) 2024-05-13 22:12:54 +00:00
Julia Longtin
fb0fb9ff1b promote aux16 into a vector. 2024-05-13 22:12:54 +00:00
Julia Longtin
405b5fa731 promote aux16 into a vector. 2024-05-13 22:12:54 +00:00
Julia Longtin
b92e06456c formatting improvement. 2024-05-13 22:12:54 +00:00
Julia Longtin
ea858eee03 first fixes. 2024-05-13 22:12:54 +00:00
Julia Longtin
feed51c3f4 attempt to speed up float clearing. 2024-05-13 22:12:54 +00:00
Julia Longtin
2ed306623c allow using code from ggml-phi-knc-dot_q5_K_q8_K.c 2024-05-13 22:12:50 +00:00
Julia Longtin
d5f39c3caa force to compile. 2024-05-13 22:11:16 +00:00
Julia Longtin
b794e48ff8 tell ggml-common.h to export what we want. 2024-05-13 22:11:16 +00:00
Julia Longtin
2c5daab90f pull in ggml specific types. 2024-05-13 22:11:16 +00:00
Julia Longtin
7080280c5b import stdio.h for size_t. 2024-05-13 22:11:16 +00:00
Julia Longtin
96dce97091 import stdint.h for sizeSt. 2024-05-13 22:11:16 +00:00
Julia Longtin
0e6c910db9 begin work on targeting dot_q5_K_q8_K. 2024-05-13 22:11:16 +00:00
Julia Longtin
16cbe5dd81 be more specific about the length of our list of run amounts. 2024-05-13 22:11:16 +00:00
Julia Longtin
c605e951dc spacing changes. 2024-05-13 22:11:16 +00:00
Julia Longtin
56be29fc58 formatting changes. 2024-05-13 22:11:16 +00:00
Julia Longtin
97c69835dc use the same header as ggml.c, and remove some warnings. 2024-05-13 22:11:16 +00:00
Julia Longtin
580a347e59 remove intrinsics import, and use upConv to save 12 bytes of memory transit. 2024-05-13 22:11:15 +00:00
Julia Longtin
9ba28eaed3 Update ggml-phi-knc.c 2024-05-13 22:11:15 +00:00
Julia Longtin
72e2b13185 add a benchmark / test binary. 2024-05-13 22:11:15 +00:00
Julia Longtin
6f699fc98d merge from upstream 2024-05-13 22:11:15 +00:00
Julia Longtin
926b0e8076 Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
6e1b77ad58 Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
f940c96aac Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
2458643dac implement F32 dot products. 2024-05-13 22:11:15 +00:00
Julia Longtin
59ce785f61 import intrinsics. 2024-05-13 22:11:15 +00:00
Julia Longtin
c08ddb831f use right type, and define GGML_F32_VEC_ZERO. 2024-05-13 22:11:15 +00:00
Julia Longtin
25095cac23 try to implement one intrinsic 2024-05-13 22:11:15 +00:00
Julia Longtin
8f6e535edc try to detect the PHI cross compiler in make. 2024-05-13 22:11:15 +00:00
Julia Longtin
f7f174ecc9 try to detect the PHI cross compiler in make. 2024-05-13 22:11:15 +00:00
Julia Longtin
b9e2f2a332 instead of checking on glibc, check on SYS_getcpu 2024-05-13 22:11:10 +00:00
Julia Longtin
78291d93b9 handle the case that we have no glibc on the PHI. 2024-05-13 22:05:33 +00:00
Julia Longtin
757f952046 add detection of Xeon PHI: Knights Corner. 2024-05-13 22:03:26 +00:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion (#7234)
* convert-hf : support q8_0 conversion

* convert-hf : add missing ftype

This was messing with the checksums otherwise.

* convert-hf : add missing ftype to Baichuan and Xverse

I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00
Georgi Gerganov
614d3b914e
llama : less KV padding when FA is off (#7257)
ggml-ci
2024-05-13 17:15:15 +03:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt (#7248) 2024-05-14 00:02:36 +10:00
Johannes Gäßler
1c570d8bee
perplexity: add BF16 vs. FP16 results (#7150) 2024-05-13 13:03:27 +02:00
Neo Zhang
948f4ec7c5
[SYCL] rm wait() (#7233) 2024-05-13 18:11:26 +08:00
Joan Fontanals
9aa672490c
llama : rename jina tokenizers to v2 (#7249)
* refactor: rename jina tokenizers to v2

* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
Brian
b1f8af1886
convert.py: Outfile default name change and additional metadata support (#4858)
* convert.py: Outfile default name change and additional metadata support

* convert.py: don't stringify Metadata load method output

* convert.py: typo fix

* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley
e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package (#7241)
* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 (#7235)
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
Georgi Gerganov
6f1b63606f
cmake : fix version cmp (#7227) 2024-05-12 18:30:23 +03:00
slaren
b228aba91a
remove convert-lora-to-ggml.py (#7204) 2024-05-12 02:29:33 +02:00
Georgi Gerganov
7bd4ffb780
metal : fix warnings (skipme) (#0) 2024-05-11 21:38:13 +03:00
Georgi Gerganov
1622ac023f
sync : ggml 2024-05-11 21:35:05 +03:00