Julia Longtin
4477b8e123
add I32 vector memory clearing.
2024-03-23 21:16:23 +00:00
Julia Longtin
ea1edb0600
promote aux32 to a vector.
2024-03-23 21:12:35 +00:00
Julia Longtin
f967690a41
add missing address of operators.
2024-03-23 21:05:50 +00:00
Julia Longtin
2fdd11fe3a
promote aux16 to a vector.
2024-03-23 21:00:51 +00:00
Julia Longtin
f09b3ed79e
use quotes properly.
2024-03-23 20:53:16 +00:00
Julia Longtin
bb5eb95816
use better memory save operator.
2024-03-23 20:49:11 +00:00
Julia Longtin
9d7ca41703
expand mask, and align memory.
2024-03-23 20:48:43 +00:00
Julia Longtin
bd6d7e6238
try to use vectorized zeroing function.
2024-03-23 19:55:12 +00:00
Julia Longtin
f985372e3a
add missing variable.
2024-03-23 19:49:16 +00:00
Julia Longtin
31d4f9312b
copy right block.
2024-03-23 19:47:21 +00:00
Julia Longtin
e43a63e7c6
fix typo.
2024-03-23 16:29:30 +00:00
Julia Longtin
f092a10dc9
promote aux16 into a vector. (part three)
2024-03-23 16:27:11 +00:00
Julia Longtin
c72157a5a6
promote aux16 into a vector.
2024-03-23 16:24:11 +00:00
Julia Longtin
e3503c924a
promote aux16 into a vector.
2024-03-23 16:21:20 +00:00
Julia Longtin
edb76ffddb
formatting improvement.
2024-03-23 16:19:17 +00:00
Julia Longtin
6face8a0be
first fixes.
2024-03-23 15:56:47 +00:00
Julia Longtin
0a2051aa88
attempt to speed up float clearing.
2024-03-23 15:55:00 +00:00
Julia Longtin
0b012c03ef
allow using code from ggml-phi-knc-dot_q5_K_q8_K.c
2024-03-23 15:02:56 +00:00
Julia Longtin
0b3f17127f
force to compile.
2024-03-23 14:58:33 +00:00
Julia Longtin
18f353987c
tell ggml-common.h to export what we want.
2024-03-23 14:49:35 +00:00
Julia Longtin
cd20404250
pull in ggml specific types.
2024-03-23 14:38:15 +00:00
Julia Longtin
8f57803f58
import stdio.h for size_t.
2024-03-23 14:29:59 +00:00
Julia Longtin
9bcb8350d5
import stdint.h for sizeSt.
2024-03-23 14:28:29 +00:00
Julia Longtin
a7bd64c130
begin work on targeting dot_q5_K_q8_K.
2024-03-23 14:19:47 +00:00
Julia Longtin
9185e14922
be more specific about the length of our list of run amounts.
2024-03-21 20:38:49 +00:00
Julia Longtin
0979522fbe
spacing changes.
2024-03-21 18:36:25 +00:00
Julia Longtin
ac3637142d
formatting changes.
2024-03-20 21:34:12 +00:00
Julia Longtin
76e66e77c2
use the same header as ggml.c, and remove some warnings.
2024-03-20 21:12:22 +00:00
Julia Longtin
ee27148629
remove intrinsics import, and use upConv to save 12 bytes of memory transit.
2024-03-20 20:15:30 +00:00
Julia Longtin
ab6f3a8a8d
Update ggml-phi-knc.c
2024-03-17 21:36:14 +00:00
Julia Longtin
f882673ba6
add a benchmark / test binary.
2024-03-17 21:20:14 +00:00
Julia Longtin
fe663c1b63
merge from upstream
2024-03-17 21:15:32 +00:00
Julia Longtin
eac00a72d5
Update ggml.c
2024-03-16 14:17:21 +00:00
Julia Longtin
e216a2f133
Update ggml.c
2024-03-16 14:15:51 +00:00
Julia Longtin
257ffd9955
Update ggml.c
2024-03-16 14:13:22 +00:00
Julia Longtin
717e164dd7
implement F32 dot products.
2024-03-16 14:05:03 +00:00
Julia Longtin
7a57feba0c
import intrinsics.
2024-03-13 19:26:54 +00:00
Julia Longtin
a1ae649662
use right type, and define GGML_F32_VEC_ZERO.
2024-03-13 19:23:53 +00:00
Julia Longtin
f346a41deb
try to implement one intrinsic
2024-03-13 19:18:10 +00:00
Julia Longtin
aec982eefd
try to detect the PHI cross compiler in make.
2024-03-12 21:54:38 +00:00
Julia Longtin
a31c936c5a
try to detect the PHI cross compiler in make.
2024-03-12 21:40:46 +00:00
Julia Longtin
5a2973af25
instead of checking on glibc, check on SYS_getcpu
2024-03-12 21:07:10 +00:00
Julia Longtin
7f3722beb6
handle the case that we have no glibc on the PHI.
2024-03-12 21:02:14 +00:00
Julia Longtin
868a2016ac
add detection of Xeon PHI: Knights Corner.
2024-03-12 20:57:43 +00:00
slaren
306d34be7a
ci : remove tidy-review ( #6021 )
2024-03-12 17:55:19 +02:00
Georgi Gerganov
8030da7afe
ggml : reuse quantum structs across backends ( #5943 )
...
* ggml : reuse quant blocks across backends
ggml-ci
* ggml : define helper constants only for CUDA and SYCL
ggml-ci
* ggml : define helper quantum constants for SYCL
ggml-ci
2024-03-12 14:27:20 +02:00
Georgi Gerganov
184215e783
ggml : fix UB in IQ2_S and IQ3_S ( #6012 )
2024-03-12 13:49:55 +02:00
Georgi Gerganov
48358b2e5b
sycl : update IQ1_S kernels (WIP - not working!) ( #5995 )
...
* sycl : try to fix after IQ1_S changes
* sycl : iq1s_grid -> iq1s_grid_gpu
* sycl : fix grid type
2024-03-12 11:15:05 +02:00
gliptic
5cdb371731
grammar : fix unnecessarily retained pointer to rules ( #6003 )
2024-03-11 21:59:03 +02:00
Kawrakow
44ca159faf
1.5 bit: we can do even better ( #5999 )
...
* iq1_s: we can do even better
Spent one of the 4 scale bits on a signs of a 0.125 shift.
I.e., quants are now -1 + delta, delta, 1 + delta, where delta
is +/- 0.125.
CUDA works, same performance as before.
PPL(LLaMA-v2-7B) is now 11.85!
* iq1_s: make scalar and AVX2 work with the new version
* iq1_s: make Neon work with new version.
~10% drop in performance, so will need some more work.
* iq1_s: make Metal work with new version
* iq1_s: very slightly faster dequantize on Metal
* iq1_s: fix dequantize on the CPU
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-03-11 17:53:15 +02:00