* Try to reduce some unused and typecast warnings
* Reduce compiler warnings step 2
* add a newline at the end of the file
* Initialize nreduce as size_t
* [SYCL] Remove pragma directives from mmq.cpp
* SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0
* SYCL softmax: Initialize nreduce as size_t
* ggml-sycl.cpp: fix some trailing whitespaces
* SYCL: remove the unused variables instead of commenting it out
* SYCL poo2d kernel: set NAN for invalid pooling op
* SYCL gemm.hpp: remove pragma directives
* SYCL gemm.hpp: use const cast to properly support dnnl::memory
* SYCL: wkv6 remove a comment
* SYCL: clean comments step 2
* SYCL: clean comments and variables step 3
* SYCL: Use GGML_UNUSED for unused variables
* SYCL: remove extra empty lines and a comment
* Remove TODO
* cleanup spaces
* add a stdout for unsupported op
* use sycl printf over fprintf
* remove prints for CI
* SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend
Move to compile time selection to backend to avoid latency at run time.
Add it to all mkl gemm calls and only for NVIDIA backend.
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* Formatting
* Address PR comments to increase readibility
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>