Commit graph

2899 commits

Author SHA1 Message Date
Julia Longtin
2ed306623c allow using code from ggml-phi-knc-dot_q5_K_q8_K.c 2024-05-13 22:12:50 +00:00
Julia Longtin
d5f39c3caa force to compile. 2024-05-13 22:11:16 +00:00
Julia Longtin
b794e48ff8 tell ggml-common.h to export what we want. 2024-05-13 22:11:16 +00:00
Julia Longtin
2c5daab90f pull in ggml specific types. 2024-05-13 22:11:16 +00:00
Julia Longtin
7080280c5b import stdio.h for size_t. 2024-05-13 22:11:16 +00:00
Julia Longtin
96dce97091 import stdint.h for sizeSt. 2024-05-13 22:11:16 +00:00
Julia Longtin
0e6c910db9 begin work on targeting dot_q5_K_q8_K. 2024-05-13 22:11:16 +00:00
Julia Longtin
16cbe5dd81 be more specific about the length of our list of run amounts. 2024-05-13 22:11:16 +00:00
Julia Longtin
c605e951dc spacing changes. 2024-05-13 22:11:16 +00:00
Julia Longtin
56be29fc58 formatting changes. 2024-05-13 22:11:16 +00:00
Julia Longtin
97c69835dc use the same header as ggml.c, and remove some warnings. 2024-05-13 22:11:16 +00:00
Julia Longtin
580a347e59 remove intrinsics import, and use upConv to save 12 bytes of memory transit. 2024-05-13 22:11:15 +00:00
Julia Longtin
9ba28eaed3 Update ggml-phi-knc.c 2024-05-13 22:11:15 +00:00
Julia Longtin
72e2b13185 add a benchmark / test binary. 2024-05-13 22:11:15 +00:00
Julia Longtin
6f699fc98d merge from upstream 2024-05-13 22:11:15 +00:00
Julia Longtin
926b0e8076 Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
6e1b77ad58 Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
f940c96aac Update ggml.c 2024-05-13 22:11:15 +00:00
Julia Longtin
2458643dac implement F32 dot products. 2024-05-13 22:11:15 +00:00
Julia Longtin
59ce785f61 import intrinsics. 2024-05-13 22:11:15 +00:00
Julia Longtin
c08ddb831f use right type, and define GGML_F32_VEC_ZERO. 2024-05-13 22:11:15 +00:00
Julia Longtin
25095cac23 try to implement one intrinsic 2024-05-13 22:11:15 +00:00
Julia Longtin
8f6e535edc try to detect the PHI cross compiler in make. 2024-05-13 22:11:15 +00:00
Julia Longtin
f7f174ecc9 try to detect the PHI cross compiler in make. 2024-05-13 22:11:15 +00:00
Julia Longtin
b9e2f2a332 instead of checking on glibc, check on SYS_getcpu 2024-05-13 22:11:10 +00:00
Julia Longtin
78291d93b9 handle the case that we have no glibc on the PHI. 2024-05-13 22:05:33 +00:00
Julia Longtin
757f952046 add detection of Xeon PHI: Knights Corner. 2024-05-13 22:03:26 +00:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion (#7234)
* convert-hf : support q8_0 conversion

* convert-hf : add missing ftype

This was messing with the checksums otherwise.

* convert-hf : add missing ftype to Baichuan and Xverse

I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00
Georgi Gerganov
614d3b914e
llama : less KV padding when FA is off (#7257)
ggml-ci
2024-05-13 17:15:15 +03:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt (#7248) 2024-05-14 00:02:36 +10:00
Johannes Gäßler
1c570d8bee
perplexity: add BF16 vs. FP16 results (#7150) 2024-05-13 13:03:27 +02:00
Neo Zhang
948f4ec7c5
[SYCL] rm wait() (#7233) 2024-05-13 18:11:26 +08:00
Joan Fontanals
9aa672490c
llama : rename jina tokenizers to v2 (#7249)
* refactor: rename jina tokenizers to v2

* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
Brian
b1f8af1886
convert.py: Outfile default name change and additional metadata support (#4858)
* convert.py: Outfile default name change and additional metadata support

* convert.py: don't stringify Metadata load method output

* convert.py: typo fix

* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley
e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package (#7241)
* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 (#7235)
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
Georgi Gerganov
6f1b63606f
cmake : fix version cmp (#7227) 2024-05-12 18:30:23 +03:00
slaren
b228aba91a
remove convert-lora-to-ggml.py (#7204) 2024-05-12 02:29:33 +02:00
Georgi Gerganov
7bd4ffb780
metal : fix warnings (skipme) (#0) 2024-05-11 21:38:13 +03:00
Georgi Gerganov
1622ac023f
sync : ggml 2024-05-11 21:35:05 +03:00
Georgi Gerganov
6aeff24f8b
metal : fix indent (ggml/0) 2024-05-11 21:34:21 +03:00
Georgi Gerganov
325756d28d
ggml : resolve merge (ggml/0)
ggml-ci
2024-05-11 21:33:08 +03:00
Josh Ramer
fed0108491
Scripting & documenting debugging one test without anything else in the loop. (#7096)
* A little documentation that shares my quick tips for working in the repository.

* Update startup-testing-debugging.md

* script that shows a menu of tests to pick from & run the debugger on

* debug-test.sh: Refactor CLI help message

* debug-test.sh: documentation update

* debug-test.sh: CLI Help output corrections

* debug-test.sh: minor doc fix

---------

authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
Xuan Son Nguyen
72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
compilade
5a419926b0
convert-hf : support bfloat16 conversion (#7158)
* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
Georgi Gerganov
fae9d234b6 sync : ggml
ggml-ci
2024-05-11 15:38:34 +03:00
Justina Cho
f5ef34e428 feat: implemented sigmoid function (ggml/806)
* added sigmoid function

* implemented metal kernel for sigmoid

* implemented cuda kernel for sigmoid

* added sigmoid unary op and incremented count
2024-05-11 15:38:34 +03:00
Borislav Stanimirov
ef0d5e3ec9 build: fix and ignore msvc warnings (ggml/805) 2024-05-11 15:38:34 +03:00