Commit graph

3068 commits

Author SHA1 Message Date
Srihari-mcw
33c8d50acc
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) 2024-05-20 12:18:39 +10:00
slaren
d359f30921
llama : remove MPI backend (#7395) 2024-05-20 01:17:03 +02:00
Fred Douglas
1ea2a0036e
quantize : fix --keep-split check (#7374) 2024-05-19 19:37:04 +03:00
0cc4m
f030ec1f7a
Vulkan Embedding Fix (#7360)
* Fix empty Vulkan host buffers

Add fp32 fp16 matmul shader

Fix matmul shader alignment

* Remove deprecated tensor->backend uses

* Fix Vulkan validation errors on embedding models with no offloaded layers

* Fix Vulkan llava segfault when not offloading layers
2024-05-19 17:19:53 +02:00
slaren
e4e6f67be6
ggml : fix another case of quants nans (#7387) 2024-05-19 17:08:46 +02:00
Johannes Gäßler
5ca49cbecd
ggml: implement quantized KV cache for FA (#7372) 2024-05-19 16:46:13 +02:00
Johannes Gäßler
1b01f06db0
server: add test for token probs (#7347) 2024-05-19 16:26:02 +02:00
Johannes Gäßler
41858392e1
server: fix seed being reported back (#7382) 2024-05-19 17:06:33 +03:00
Anas Ahouzi
6aade19ee7
Add StableLM2 pre-tokenizer (#7349)
* Add StableLM pre-tokenizer

* Fix space

* Fix trailing whitespace
2024-05-19 22:46:46 +10:00
slaren
ab33f7a338
cuda : clear error after buffer allocation failure (#7376) 2024-05-19 14:19:37 +02:00
Brian
e23b974f4c
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)
https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels

e.g.

    steps:
    - uses: actions/checkout@v4 # Uploads repository content to the runner
      with:
        repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more
    - uses: actions/labeler@v5
      with:
        configuration-path: 'path/to/the/uploaded/configuration/file'
2024-05-19 20:51:03 +10:00
Georgi Gerganov
854d365aba
cmake : update android comments (#7341) 2024-05-19 11:01:01 +03:00
teleprint-me
dcc5d4241d
fix: Remove dangling if statement 2024-05-19 00:06:30 -04:00
teleprint-me
5840b6f0b0
refactor: Simplify the get_vocab_base_pre method 2024-05-18 23:59:52 -04:00
teleprint-me
316b404d94
patch: Fix CLI option for generating vocab tests 2024-05-18 23:59:22 -04:00
teleprint-me
da5deebda1
fix: Apply fix to verbose help description and generating vocab tests option 2024-05-18 23:34:33 -04:00
teleprint-me
ce777c8910
Merge branch 'master' into auto-model-support 2024-05-18 22:46:00 -04:00
teleprint-me
d02a0f42f9
feat: Add vocab generation script 2024-05-18 22:15:12 -04:00
teleprint-me
bd32266c87
feat: Add function for generating vocab script and fix CLI opts 2024-05-18 22:14:58 -04:00
teleprint-me
0479e9695f
patch: Add exception handling for non-existent vocab related files 2024-05-18 22:14:19 -04:00
teleprint-me
4b3735ca50
chore: Remove cluttered vocab files 2024-05-18 22:13:21 -04:00
teleprint-me
1a82573126
feat: Add example script for automating generating tokenizer model checksums and tests 2024-05-18 20:49:22 -04:00
teleprint-me
006bb60d27
chore: Fix model path references 2024-05-18 19:20:19 -04:00
fraxy-v
f5bf761747
Capture CUDA logging output (#7298)
* logging: output capture in cuda module

* fix compile error

* fix: vsnprintf terminates with 0, string use not correct

* post review

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-19 00:44:42 +02:00
teleprint-me
b6f70b8a0e
chore: Fix line spacing 2024-05-18 16:59:20 -04:00
teleprint-me
832b449cbd
feat: Add pre-tokenizer CLI tooling 2024-05-18 14:33:56 -04:00
teleprint-me
04fb7886c5
chore: Apply isort to package gguf init 2024-05-18 14:33:22 -04:00
teleprint-me
2ef73ee6e4
refactor: Apply SoC for HF requests, vocab, and weights 2024-05-18 13:45:21 -04:00
teleprint-me
5eda2c9485
feat: Add pre-tokenizer logging 2024-05-18 13:21:22 -04:00
Georgi Gerganov
059031b8c4
ci : re-enable sanitizer runs (#7358)
* Revert "ci : temporary disable sanitizer builds (#6128)"

This reverts commit 4f6d1337ca.

* ci : trigger
2024-05-18 18:55:54 +03:00
Georgi Gerganov
511182eabb
android : use "ci-android" branch for CI (#7341)
* android : use "ci-android" branch for CI

* ggml : disable SIMD exp and silu for 32-bit ARM

ggml-ci

* android : do not fetch, use add_subdirectory instead

* cmake : provide binary dir
2024-05-18 20:40:39 +10:00
Johannes Gäßler
133d99c599
CUDA: deduplicate FlashAttention code (#7352) 2024-05-18 12:36:25 +02:00
Johannes Gäßler
cb42c29427
server: correct --threads documentation [no ci] (#7362) 2024-05-18 11:10:47 +02:00
Engininja2
d233b507cd
cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) 2024-05-18 10:05:17 +02:00
Steffen Röcker
0f98acfac6
llama : add support for larger Granite Code Models (20B, 34B) (#7324)
Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
2024-05-18 11:04:55 +03:00
strawberrymelonpanda
ca57e0f35e
perplexity : ndot progress and show stats with < 100 tasks (#7348)
Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
2024-05-18 10:57:08 +03:00
0cc4m
c1b295eea5
Update and fix Vulkan soft_max and argsort implementations (#7237)
* Update and fix Vulkan softmax implementation

* Update and fix Vulkan argsort implementation
2024-05-18 08:10:58 +02:00
Brian
de73196344
github-actions-labeler: initial commit (#7330)
* github-actions-labeler: initial commit [no ci]

* github actions: remove priority auto labeling [no ci]
2024-05-18 16:04:23 +10:00
Georgi Gerganov
b49a13dd2f
convert : fix set_vocab_sentencepiece (#6866)
* convert : fix set_vocab_sentencepiece

* Update convert-hf-to-gguf.py
2024-05-18 08:46:20 +03:00
teleprint-me
b2ca23c746
feat: Add method for generating the checksums and writing the results to a json file 2024-05-18 01:46:13 -04:00
teleprint-me
302258721b
refactor: Apply model schema to tokenizer downloads
- Add imports for json and hashlib
- Add missing models: phi, stablelm, mistral, and mixtral
- Fix constructor logic
- Fix how models are accessed
- Apply model schema to download_model method
2024-05-18 01:26:39 -04:00
teleprint-me
f7515abf49
feat: Add tokenizer types, model types, and model repos 2024-05-18 00:37:19 -04:00
teleprint-me
3ba01c7a0e
chore: Fix spacing 2024-05-18 00:10:42 -04:00
teleprint-me
1a286c8e21
refactor: Clean up variable names and separate concerns when downloading tokenizers 2024-05-17 23:27:30 -04:00
teleprint-me
5c8144e645
feat: Add download_model method and fix references for clarity to mitigate confusion 2024-05-17 23:00:12 -04:00
teleprint-me
4790f76740
feat: Add prototype for requesting vocab related files 2024-05-17 21:08:39 -04:00
teleprint-me
98cf788990
patch: Apply minor fixes for handling headers and writing content 2024-05-17 21:07:51 -04:00
slaren
05834841dc
ggml : fix quants nans when all the group weights are very close to zero (#7313) 2024-05-18 02:39:54 +02:00
Engininja2
ef277de2ad
cmake : fix typo in AMDGPU_TARGETS (#7356) 2024-05-18 02:39:25 +02:00
teleprint-me
742abebb39
refactor: Add log for status and fix url path variable name 2024-05-17 20:37:59 -04:00