Srihari-mcw
33c8d50acc
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 ( #7258 )
2024-05-20 12:18:39 +10:00
slaren
d359f30921
llama : remove MPI backend ( #7395 )
2024-05-20 01:17:03 +02:00
Fred Douglas
1ea2a0036e
quantize : fix --keep-split check ( #7374 )
2024-05-19 19:37:04 +03:00
0cc4m
f030ec1f7a
Vulkan Embedding Fix ( #7360 )
...
* Fix empty Vulkan host buffers
Add fp32 fp16 matmul shader
Fix matmul shader alignment
* Remove deprecated tensor->backend uses
* Fix Vulkan validation errors on embedding models with no offloaded layers
* Fix Vulkan llava segfault when not offloading layers
2024-05-19 17:19:53 +02:00
slaren
e4e6f67be6
ggml : fix another case of quants nans ( #7387 )
2024-05-19 17:08:46 +02:00
Johannes Gäßler
5ca49cbecd
ggml: implement quantized KV cache for FA ( #7372 )
2024-05-19 16:46:13 +02:00
Johannes Gäßler
1b01f06db0
server: add test for token probs ( #7347 )
2024-05-19 16:26:02 +02:00
Johannes Gäßler
41858392e1
server: fix seed being reported back ( #7382 )
2024-05-19 17:06:33 +03:00
Anas Ahouzi
6aade19ee7
Add StableLM2 pre-tokenizer ( #7349 )
...
* Add StableLM pre-tokenizer
* Fix space
* Fix trailing whitespace
2024-05-19 22:46:46 +10:00
slaren
ab33f7a338
cuda : clear error after buffer allocation failure ( #7376 )
2024-05-19 14:19:37 +02:00
Brian
e23b974f4c
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] ( #7363 )
...
https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels
e.g.
steps:
- uses: actions/checkout@v4 # Uploads repository content to the runner
with:
repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more
- uses: actions/labeler@v5
with:
configuration-path: 'path/to/the/uploaded/configuration/file'
2024-05-19 20:51:03 +10:00
Georgi Gerganov
854d365aba
cmake : update android comments ( #7341 )
2024-05-19 11:01:01 +03:00
teleprint-me
dcc5d4241d
fix: Remove dangling if statement
2024-05-19 00:06:30 -04:00
teleprint-me
5840b6f0b0
refactor: Simplify the get_vocab_base_pre method
2024-05-18 23:59:52 -04:00
teleprint-me
316b404d94
patch: Fix CLI option for generating vocab tests
2024-05-18 23:59:22 -04:00
teleprint-me
da5deebda1
fix: Apply fix to verbose help description and generating vocab tests option
2024-05-18 23:34:33 -04:00
teleprint-me
ce777c8910
Merge branch 'master' into auto-model-support
2024-05-18 22:46:00 -04:00
teleprint-me
d02a0f42f9
feat: Add vocab generation script
2024-05-18 22:15:12 -04:00
teleprint-me
bd32266c87
feat: Add function for generating vocab script and fix CLI opts
2024-05-18 22:14:58 -04:00
teleprint-me
0479e9695f
patch: Add exception handling for non-existent vocab related files
2024-05-18 22:14:19 -04:00
teleprint-me
4b3735ca50
chore: Remove cluttered vocab files
2024-05-18 22:13:21 -04:00
teleprint-me
1a82573126
feat: Add example script for automating generating tokenizer model checksums and tests
2024-05-18 20:49:22 -04:00
teleprint-me
006bb60d27
chore: Fix model path references
2024-05-18 19:20:19 -04:00
fraxy-v
f5bf761747
Capture CUDA logging output ( #7298 )
...
* logging: output capture in cuda module
* fix compile error
* fix: vsnprintf terminates with 0, string use not correct
* post review
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-05-19 00:44:42 +02:00
teleprint-me
b6f70b8a0e
chore: Fix line spacing
2024-05-18 16:59:20 -04:00
teleprint-me
832b449cbd
feat: Add pre-tokenizer CLI tooling
2024-05-18 14:33:56 -04:00
teleprint-me
04fb7886c5
chore: Apply isort to package gguf init
2024-05-18 14:33:22 -04:00
teleprint-me
2ef73ee6e4
refactor: Apply SoC for HF requests, vocab, and weights
2024-05-18 13:45:21 -04:00
teleprint-me
5eda2c9485
feat: Add pre-tokenizer logging
2024-05-18 13:21:22 -04:00
Georgi Gerganov
059031b8c4
ci : re-enable sanitizer runs ( #7358 )
...
* Revert "ci : temporary disable sanitizer builds (#6128 )"
This reverts commit 4f6d1337ca
.
* ci : trigger
2024-05-18 18:55:54 +03:00
Georgi Gerganov
511182eabb
android : use "ci-android" branch for CI ( #7341 )
...
* android : use "ci-android" branch for CI
* ggml : disable SIMD exp and silu for 32-bit ARM
ggml-ci
* android : do not fetch, use add_subdirectory instead
* cmake : provide binary dir
2024-05-18 20:40:39 +10:00
Johannes Gäßler
133d99c599
CUDA: deduplicate FlashAttention code ( #7352 )
2024-05-18 12:36:25 +02:00
Johannes Gäßler
cb42c29427
server: correct --threads documentation [no ci] ( #7362 )
2024-05-18 11:10:47 +02:00
Engininja2
d233b507cd
cuda : add half2 __shfl_xor() for ROCm 5.5 ( #7263 )
2024-05-18 10:05:17 +02:00
Steffen Röcker
0f98acfac6
llama : add support for larger Granite Code Models (20B, 34B) ( #7324 )
...
Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116
There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
2024-05-18 11:04:55 +03:00
strawberrymelonpanda
ca57e0f35e
perplexity : ndot progress and show stats with < 100 tasks ( #7348 )
...
Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
2024-05-18 10:57:08 +03:00
0cc4m
c1b295eea5
Update and fix Vulkan soft_max and argsort implementations ( #7237 )
...
* Update and fix Vulkan softmax implementation
* Update and fix Vulkan argsort implementation
2024-05-18 08:10:58 +02:00
Brian
de73196344
github-actions-labeler: initial commit ( #7330 )
...
* github-actions-labeler: initial commit [no ci]
* github actions: remove priority auto labeling [no ci]
2024-05-18 16:04:23 +10:00
Georgi Gerganov
b49a13dd2f
convert : fix set_vocab_sentencepiece ( #6866 )
...
* convert : fix set_vocab_sentencepiece
* Update convert-hf-to-gguf.py
2024-05-18 08:46:20 +03:00
teleprint-me
b2ca23c746
feat: Add method for generating the checksums and writing the results to a json file
2024-05-18 01:46:13 -04:00
teleprint-me
302258721b
refactor: Apply model schema to tokenizer downloads
...
- Add imports for json and hashlib
- Add missing models: phi, stablelm, mistral, and mixtral
- Fix constructor logic
- Fix how models are accessed
- Apply model schema to download_model method
2024-05-18 01:26:39 -04:00
teleprint-me
f7515abf49
feat: Add tokenizer types, model types, and model repos
2024-05-18 00:37:19 -04:00
teleprint-me
3ba01c7a0e
chore: Fix spacing
2024-05-18 00:10:42 -04:00
teleprint-me
1a286c8e21
refactor: Clean up variable names and separate concerns when downloading tokenizers
2024-05-17 23:27:30 -04:00
teleprint-me
5c8144e645
feat: Add download_model method and fix references for clarity to mitigate confusion
2024-05-17 23:00:12 -04:00
teleprint-me
4790f76740
feat: Add prototype for requesting vocab related files
2024-05-17 21:08:39 -04:00
teleprint-me
98cf788990
patch: Apply minor fixes for handling headers and writing content
2024-05-17 21:07:51 -04:00
slaren
05834841dc
ggml : fix quants nans when all the group weights are very close to zero ( #7313 )
2024-05-18 02:39:54 +02:00
Engininja2
ef277de2ad
cmake : fix typo in AMDGPU_TARGETS ( #7356 )
2024-05-18 02:39:25 +02:00
teleprint-me
742abebb39
refactor: Add log for status and fix url path variable name
2024-05-17 20:37:59 -04:00