Q6_K AVX improvements (#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
This commit is contained in:
Eve 2024-11-04 22:06:31 +00:00 committed by GitHub
parent d5a409e57f
commit 3407364776
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 38 additions and 51 deletions

View file

@ -92,7 +92,7 @@ jobs:
name: llama-bin-macos-arm64.zip
macOS-latest-cmake-x64:
runs-on: macos-12
runs-on: macos-13
steps:
- name: Clone