* Use batched mul_mat pathway
* rm extra line
* Explicitly state scaled data type
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* add magika inference example
* ggml : fix unaligned accesses in custom ops
* ggml : fix FP32 GELU for values that exceed the FP16 range
* use ggml_pool_1d
* add README
* Update README.md
* pad inputs if the files are too small
* cleanup
ggml-ci
* Introduce backend GUIDs
Initial proposed implementation of backend GUIDs
(Discussed in https://github.com/ggerganov/ggml/pull/741)
Hardcoded CPU backend GUID (for now)
Change ggml_backend_is_cpu logic to use GUID
* Remove redundant functions
Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion
* Add spaces to match style
Co-authored-by: slaren <slarengh@gmail.com>
* Fix brace style to match
Co-authored-by: slaren <slarengh@gmail.com>
* Add void to () in function signature
Co-authored-by: slaren <slarengh@gmail.com>
* Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid
* add guids to all backends
ggml-ci
---------
Co-authored-by: slaren <slarengh@gmail.com>
* implement nfd for stripping accents in wpm tokenizer
* sort nfd map; reuse iterator
* use builtin tolower
* add locale include
* Simplify to_lower cases
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* Add "/chat/completions" as alias for "/v1/chat/completions"
* merge to upstream master
* minor : fix trailing whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* WIP: make i-quants work for QK_K = 64
* iq2_xs: attempt to fix AVX dot product for QK_K = 64
Tests pass, but I get gibberish.
* QK_K = 64 tests pass on ARM_NEON and Metal
Sadly, that does not mean it actually works.
* Make CUDA compile with QK_K = 64
Tests don't pass, plus we get misaligned access
* Q2_K: fixed bug in imatrix quantization for QK_K = 64
* iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work)
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>