Julia Longtin
d5f39c3caa
force to compile.
2024-05-13 22:11:16 +00:00
Julia Longtin
b794e48ff8
tell ggml-common.h to export what we want.
2024-05-13 22:11:16 +00:00
Julia Longtin
2c5daab90f
pull in ggml specific types.
2024-05-13 22:11:16 +00:00
Julia Longtin
7080280c5b
import stdio.h for size_t.
2024-05-13 22:11:16 +00:00
Julia Longtin
96dce97091
import stdint.h for sizeSt.
2024-05-13 22:11:16 +00:00
Julia Longtin
0e6c910db9
begin work on targeting dot_q5_K_q8_K.
2024-05-13 22:11:16 +00:00
Julia Longtin
16cbe5dd81
be more specific about the length of our list of run amounts.
2024-05-13 22:11:16 +00:00
Julia Longtin
c605e951dc
spacing changes.
2024-05-13 22:11:16 +00:00
Julia Longtin
56be29fc58
formatting changes.
2024-05-13 22:11:16 +00:00
Julia Longtin
97c69835dc
use the same header as ggml.c, and remove some warnings.
2024-05-13 22:11:16 +00:00
Julia Longtin
580a347e59
remove intrinsics import, and use upConv to save 12 bytes of memory transit.
2024-05-13 22:11:15 +00:00
Julia Longtin
9ba28eaed3
Update ggml-phi-knc.c
2024-05-13 22:11:15 +00:00
Julia Longtin
72e2b13185
add a benchmark / test binary.
2024-05-13 22:11:15 +00:00
Julia Longtin
6f699fc98d
merge from upstream
2024-05-13 22:11:15 +00:00
Julia Longtin
926b0e8076
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
6e1b77ad58
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
f940c96aac
Update ggml.c
2024-05-13 22:11:15 +00:00
Julia Longtin
2458643dac
implement F32 dot products.
2024-05-13 22:11:15 +00:00
Julia Longtin
59ce785f61
import intrinsics.
2024-05-13 22:11:15 +00:00
Julia Longtin
c08ddb831f
use right type, and define GGML_F32_VEC_ZERO.
2024-05-13 22:11:15 +00:00
Julia Longtin
25095cac23
try to implement one intrinsic
2024-05-13 22:11:15 +00:00
Julia Longtin
8f6e535edc
try to detect the PHI cross compiler in make.
2024-05-13 22:11:15 +00:00
Julia Longtin
f7f174ecc9
try to detect the PHI cross compiler in make.
2024-05-13 22:11:15 +00:00
Julia Longtin
b9e2f2a332
instead of checking on glibc, check on SYS_getcpu
2024-05-13 22:11:10 +00:00
Julia Longtin
78291d93b9
handle the case that we have no glibc on the PHI.
2024-05-13 22:05:33 +00:00
Julia Longtin
757f952046
add detection of Xeon PHI: Knights Corner.
2024-05-13 22:03:26 +00:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion ( #7234 )
...
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00
Georgi Gerganov
614d3b914e
llama : less KV padding when FA is off ( #7257 )
...
ggml-ci
2024-05-13 17:15:15 +03:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt ( #7248 )
2024-05-14 00:02:36 +10:00
Johannes Gäßler
1c570d8bee
perplexity: add BF16 vs. FP16 results ( #7150 )
2024-05-13 13:03:27 +02:00
Neo Zhang
948f4ec7c5
[SYCL] rm wait() ( #7233 )
2024-05-13 18:11:26 +08:00
Joan Fontanals
9aa672490c
llama : rename jina tokenizers to v2 ( #7249 )
...
* refactor: rename jina tokenizers to v2
* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
Brian
b1f8af1886
convert.py: Outfile default name change and additional metadata support ( #4858 )
...
* convert.py: Outfile default name change and additional metadata support
* convert.py: don't stringify Metadata load method output
* convert.py: typo fix
* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley
e586ee4259
change default temperature of OAI compat API from 0 to 1 ( #7226 )
...
* change default temperature of OAI compat API from 0 to 1
* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package ( #7241 )
...
* add oneapi running time dlls to release package
* fix path
* fix path
* fix path
* fix path
* fix path
---------
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 ( #7235 )
...
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel ( #7188 )
...
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
Georgi Gerganov
6f1b63606f
cmake : fix version cmp ( #7227 )
2024-05-12 18:30:23 +03:00
slaren
b228aba91a
remove convert-lora-to-ggml.py ( #7204 )
2024-05-12 02:29:33 +02:00
Georgi Gerganov
7bd4ffb780
metal : fix warnings (skipme) ( #0 )
2024-05-11 21:38:13 +03:00
Georgi Gerganov
1622ac023f
sync : ggml
2024-05-11 21:35:05 +03:00
Georgi Gerganov
6aeff24f8b
metal : fix indent (ggml/0)
2024-05-11 21:34:21 +03:00
Georgi Gerganov
325756d28d
ggml : resolve merge (ggml/0)
...
ggml-ci
2024-05-11 21:33:08 +03:00
Josh Ramer
fed0108491
Scripting & documenting debugging one test without anything else in the loop. ( #7096 )
...
* A little documentation that shares my quick tips for working in the repository.
* Update startup-testing-debugging.md
* script that shows a menu of tests to pick from & run the debugger on
* debug-test.sh: Refactor CLI help message
* debug-test.sh: documentation update
* debug-test.sh: CLI Help output corrections
* debug-test.sh: minor doc fix
---------
authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
Xuan Son Nguyen
72c177c1f6
fix system prompt handling ( #7153 )
2024-05-11 17:28:10 +02:00
compilade
5a419926b0
convert-hf : support bfloat16 conversion ( #7158 )
...
* convert-hf : support bfloat16 conversion
* gguf-py : flake8 fixes
* convert-hf : add missing space after comma
* convert-hf : get bit-exact same output as ./quantize
The quantization version was missing.
* convert-hf : don't round bf16 NANs
* convert-hf : save some memory with np.int16 intermediate bf16 weights
* convert-hf : more closely match llama.cpp with which weights to keep in f32
* convert-hf : add --outtype auto-f16
A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.
* convert-hf : remove a semicolon because flake8 doesn't like it
It's a reflex from when programming in C/C++, I guess.
* convert-hf : support outtype templating in outfile name
* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
Georgi Gerganov
fae9d234b6
sync : ggml
...
ggml-ci
2024-05-11 15:38:34 +03:00
Justina Cho
f5ef34e428
feat: implemented sigmoid function (ggml/806)
...
* added sigmoid function
* implemented metal kernel for sigmoid
* implemented cuda kernel for sigmoid
* added sigmoid unary op and incremented count
2024-05-11 15:38:34 +03:00
Borislav Stanimirov
ef0d5e3ec9
build: fix and ignore msvc warnings (ggml/805)
2024-05-11 15:38:34 +03:00
CrispStrobe
3292733f95
convert : skip unaccessible HF repos ( #7210 )
2024-05-11 11:18:35 +03:00