Commit graph

2449 commits

Author SHA1 Message Date
Julia Longtin
f09b3ed79e use quotes properly. 2024-03-23 20:53:16 +00:00
Julia Longtin
bb5eb95816 use better memory save operator. 2024-03-23 20:49:11 +00:00
Julia Longtin
9d7ca41703 expand mask, and align memory. 2024-03-23 20:48:43 +00:00
Julia Longtin
bd6d7e6238 try to use vectorized zeroing function. 2024-03-23 19:55:12 +00:00
Julia Longtin
f985372e3a add missing variable. 2024-03-23 19:49:16 +00:00
Julia Longtin
31d4f9312b copy right block. 2024-03-23 19:47:21 +00:00
Julia Longtin
e43a63e7c6 fix typo. 2024-03-23 16:29:30 +00:00
Julia Longtin
f092a10dc9 promote aux16 into a vector. (part three) 2024-03-23 16:27:11 +00:00
Julia Longtin
c72157a5a6 promote aux16 into a vector. 2024-03-23 16:24:11 +00:00
Julia Longtin
e3503c924a promote aux16 into a vector. 2024-03-23 16:21:20 +00:00
Julia Longtin
edb76ffddb formatting improvement. 2024-03-23 16:19:17 +00:00
Julia Longtin
6face8a0be first fixes. 2024-03-23 15:56:47 +00:00
Julia Longtin
0a2051aa88 attempt to speed up float clearing. 2024-03-23 15:55:00 +00:00
Julia Longtin
0b012c03ef allow using code from ggml-phi-knc-dot_q5_K_q8_K.c 2024-03-23 15:02:56 +00:00
Julia Longtin
0b3f17127f force to compile. 2024-03-23 14:58:33 +00:00
Julia Longtin
18f353987c tell ggml-common.h to export what we want. 2024-03-23 14:49:35 +00:00
Julia Longtin
cd20404250 pull in ggml specific types. 2024-03-23 14:38:15 +00:00
Julia Longtin
8f57803f58 import stdio.h for size_t. 2024-03-23 14:29:59 +00:00
Julia Longtin
9bcb8350d5 import stdint.h for sizeSt. 2024-03-23 14:28:29 +00:00
Julia Longtin
a7bd64c130 begin work on targeting dot_q5_K_q8_K. 2024-03-23 14:19:47 +00:00
Julia Longtin
9185e14922 be more specific about the length of our list of run amounts. 2024-03-21 20:38:49 +00:00
Julia Longtin
0979522fbe spacing changes. 2024-03-21 18:36:25 +00:00
Julia Longtin
ac3637142d formatting changes. 2024-03-20 21:34:12 +00:00
Julia Longtin
76e66e77c2 use the same header as ggml.c, and remove some warnings. 2024-03-20 21:12:22 +00:00
Julia Longtin
ee27148629 remove intrinsics import, and use upConv to save 12 bytes of memory transit. 2024-03-20 20:15:30 +00:00
Julia Longtin
ab6f3a8a8d
Update ggml-phi-knc.c 2024-03-17 21:36:14 +00:00
Julia Longtin
f882673ba6 add a benchmark / test binary. 2024-03-17 21:20:14 +00:00
Julia Longtin
fe663c1b63 merge from upstream 2024-03-17 21:15:32 +00:00
Julia Longtin
eac00a72d5
Update ggml.c 2024-03-16 14:17:21 +00:00
Julia Longtin
e216a2f133
Update ggml.c 2024-03-16 14:15:51 +00:00
Julia Longtin
257ffd9955
Update ggml.c 2024-03-16 14:13:22 +00:00
Julia Longtin
717e164dd7 implement F32 dot products. 2024-03-16 14:05:03 +00:00
Julia Longtin
7a57feba0c import intrinsics. 2024-03-13 19:26:54 +00:00
Julia Longtin
a1ae649662 use right type, and define GGML_F32_VEC_ZERO. 2024-03-13 19:23:53 +00:00
Julia Longtin
f346a41deb try to implement one intrinsic 2024-03-13 19:18:10 +00:00
Julia Longtin
aec982eefd try to detect the PHI cross compiler in make. 2024-03-12 21:54:38 +00:00
Julia Longtin
a31c936c5a try to detect the PHI cross compiler in make. 2024-03-12 21:40:46 +00:00
Julia Longtin
5a2973af25 instead of checking on glibc, check on SYS_getcpu 2024-03-12 21:07:10 +00:00
Julia Longtin
7f3722beb6 handle the case that we have no glibc on the PHI. 2024-03-12 21:02:14 +00:00
Julia Longtin
868a2016ac add detection of Xeon PHI: Knights Corner. 2024-03-12 20:57:43 +00:00
slaren
306d34be7a
ci : remove tidy-review (#6021) 2024-03-12 17:55:19 +02:00
Georgi Gerganov
8030da7afe
ggml : reuse quantum structs across backends (#5943)
* ggml : reuse quant blocks across backends

ggml-ci

* ggml : define helper constants only for CUDA and SYCL

ggml-ci

* ggml : define helper quantum constants for SYCL

ggml-ci
2024-03-12 14:27:20 +02:00
Georgi Gerganov
184215e783
ggml : fix UB in IQ2_S and IQ3_S (#6012) 2024-03-12 13:49:55 +02:00
Georgi Gerganov
48358b2e5b
sycl : update IQ1_S kernels (WIP - not working!) (#5995)
* sycl : try to fix after IQ1_S changes

* sycl : iq1s_grid -> iq1s_grid_gpu

* sycl : fix grid type
2024-03-12 11:15:05 +02:00
gliptic
5cdb371731
grammar : fix unnecessarily retained pointer to rules (#6003) 2024-03-11 21:59:03 +02:00
Kawrakow
44ca159faf
1.5 bit: we can do even better (#5999)
* iq1_s: we can do even better

Spent one of the 4 scale bits on a signs of a 0.125 shift.
I.e., quants are now -1 + delta, delta, 1 + delta, where delta
is +/- 0.125.

CUDA works, same performance as before.
PPL(LLaMA-v2-7B) is now 11.85!

* iq1_s: make scalar and AVX2 work with the new version

* iq1_s: make Neon work with new version.

~10% drop in performance, so will need some more work.

* iq1_s: make Metal work with new version

* iq1_s: very slightly faster dequantize on Metal

* iq1_s: fix dequantize on the CPU

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-03-11 17:53:15 +02:00
Georgi Gerganov
05b06210c9
llama : more consistent names of count variables (#5994)
* llama : more consistent names of count variables

ggml-ci

* llama : n_parallel -> n_seq_max

* common : fix param name

* examples : fix param name
2024-03-11 17:49:47 +02:00
Georgi Gerganov
83796e62bc
llama : refactor unicode stuff (#5992)
* llama : refactor unicode stuff

ggml-ci

* unicode : names

* make : fix c++ compiler

* unicode : names

* unicode : straighten tables

* zig : fix build

* unicode : put nfd normalization behind API

ggml-ci

* swift : fix build

* unicode : add BOM

* unicode : add <cstdint>

ggml-ci

* unicode : pass as cpts as const ref
2024-03-11 17:47:47 +02:00
Jakub N
828defefb6
Update server docker image URLs (#5997) 2024-03-11 14:40:42 +01:00
Xuan Son Nguyen
caa106d4e0
Server: format error to json (#5961)
* server: format error to json

* server: do not crash on grammar error

* fix api key test case

* revert limit max n_predict

* small fix

* correct coding style

* update completion.js

* launch_slot_with_task

* update docs

* update_slots

* update webui

* update readme
2024-03-11 10:56:41 +01:00