llama.cpp

Author	SHA1	Message	Date
Srihari-mcw	33c8d50acc	Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258 )	2024-05-20 12:18:39 +10:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Fred Douglas	1ea2a0036e	quantize : fix --keep-split check (#7374 )	2024-05-19 19:37:04 +03:00
0cc4m	f030ec1f7a	Vulkan Embedding Fix (#7360 ) * Fix empty Vulkan host buffers Add fp32 fp16 matmul shader Fix matmul shader alignment * Remove deprecated tensor->backend uses * Fix Vulkan validation errors on embedding models with no offloaded layers * Fix Vulkan llava segfault when not offloading layers	2024-05-19 17:19:53 +02:00
slaren	e4e6f67be6	ggml : fix another case of quants nans (#7387 )	2024-05-19 17:08:46 +02:00
Johannes Gäßler	5ca49cbecd	ggml: implement quantized KV cache for FA (#7372 )	2024-05-19 16:46:13 +02:00
Johannes Gäßler	1b01f06db0	server: add test for token probs (#7347 )	2024-05-19 16:26:02 +02:00
Johannes Gäßler	41858392e1	server: fix seed being reported back (#7382 )	2024-05-19 17:06:33 +03:00
Anas Ahouzi	6aade19ee7	Add StableLM2 pre-tokenizer (#7349 ) * Add StableLM pre-tokenizer * Fix space * Fix trailing whitespace	2024-05-19 22:46:46 +10:00
slaren	ab33f7a338	cuda : clear error after buffer allocation failure (#7376 )	2024-05-19 14:19:37 +02:00
Brian	e23b974f4c	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363 ) https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action Recommends the use of checkout action to use the correct repo context when applying settings for PR labels e.g. steps: - uses: actions/checkout@v4 # Uploads repository content to the runner with: repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more - uses: actions/labeler@v5 with: configuration-path: 'path/to/the/uploaded/configuration/file'	2024-05-19 20:51:03 +10:00
Georgi Gerganov	854d365aba	cmake : update android comments (#7341 )	2024-05-19 11:01:01 +03:00
teleprint-me	dcc5d4241d	fix: Remove dangling if statement	2024-05-19 00:06:30 -04:00
teleprint-me	5840b6f0b0	refactor: Simplify the get_vocab_base_pre method	2024-05-18 23:59:52 -04:00
teleprint-me	316b404d94	patch: Fix CLI option for generating vocab tests	2024-05-18 23:59:22 -04:00
teleprint-me	da5deebda1	fix: Apply fix to verbose help description and generating vocab tests option	2024-05-18 23:34:33 -04:00
teleprint-me	ce777c8910	Merge branch 'master' into auto-model-support	2024-05-18 22:46:00 -04:00
teleprint-me	d02a0f42f9	feat: Add vocab generation script	2024-05-18 22:15:12 -04:00
teleprint-me	bd32266c87	feat: Add function for generating vocab script and fix CLI opts	2024-05-18 22:14:58 -04:00
teleprint-me	0479e9695f	patch: Add exception handling for non-existent vocab related files	2024-05-18 22:14:19 -04:00
teleprint-me	4b3735ca50	chore: Remove cluttered vocab files	2024-05-18 22:13:21 -04:00
teleprint-me	1a82573126	feat: Add example script for automating generating tokenizer model checksums and tests	2024-05-18 20:49:22 -04:00
teleprint-me	006bb60d27	chore: Fix model path references	2024-05-18 19:20:19 -04:00
fraxy-v	f5bf761747	Capture CUDA logging output (#7298 ) * logging: output capture in cuda module * fix compile error * fix: vsnprintf terminates with 0, string use not correct * post review * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-19 00:44:42 +02:00
teleprint-me	b6f70b8a0e	chore: Fix line spacing	2024-05-18 16:59:20 -04:00
teleprint-me	832b449cbd	feat: Add pre-tokenizer CLI tooling	2024-05-18 14:33:56 -04:00
teleprint-me	04fb7886c5	chore: Apply isort to package gguf init	2024-05-18 14:33:22 -04:00
teleprint-me	2ef73ee6e4	refactor: Apply SoC for HF requests, vocab, and weights	2024-05-18 13:45:21 -04:00
teleprint-me	5eda2c9485	feat: Add pre-tokenizer logging	2024-05-18 13:21:22 -04:00
Georgi Gerganov	059031b8c4	ci : re-enable sanitizer runs (#7358 ) * Revert "ci : temporary disable sanitizer builds (#6128)" This reverts commit `4f6d1337ca`. * ci : trigger	2024-05-18 18:55:54 +03:00
Georgi Gerganov	511182eabb	android : use "ci-android" branch for CI (#7341 ) * android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir	2024-05-18 20:40:39 +10:00
Johannes Gäßler	133d99c599	CUDA: deduplicate FlashAttention code (#7352 )	2024-05-18 12:36:25 +02:00
Johannes Gäßler	cb42c29427	server: correct --threads documentation [no ci] (#7362 )	2024-05-18 11:10:47 +02:00
Engininja2	d233b507cd	cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263 )	2024-05-18 10:05:17 +02:00
Steffen Röcker	0f98acfac6	llama : add support for larger Granite Code Models (20B, 34B) (#7324 ) Tie the weights for ARCH_STARCODER to support the larger Granite code models. Partially addresses ggerganov/issues/7116 There still remains to be a few things to fix. Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`	2024-05-18 11:04:55 +03:00
strawberrymelonpanda	ca57e0f35e	perplexity : ndot progress and show stats with < 100 tasks (#7348 ) Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.	2024-05-18 10:57:08 +03:00
0cc4m	c1b295eea5	Update and fix Vulkan soft_max and argsort implementations (#7237 ) * Update and fix Vulkan softmax implementation * Update and fix Vulkan argsort implementation	2024-05-18 08:10:58 +02:00
Brian	de73196344	github-actions-labeler: initial commit (#7330 ) * github-actions-labeler: initial commit [no ci] * github actions: remove priority auto labeling [no ci]	2024-05-18 16:04:23 +10:00
Georgi Gerganov	b49a13dd2f	convert : fix set_vocab_sentencepiece (#6866 ) * convert : fix set_vocab_sentencepiece * Update convert-hf-to-gguf.py	2024-05-18 08:46:20 +03:00
teleprint-me	b2ca23c746	feat: Add method for generating the checksums and writing the results to a json file	2024-05-18 01:46:13 -04:00
teleprint-me	302258721b	refactor: Apply model schema to tokenizer downloads - Add imports for json and hashlib - Add missing models: phi, stablelm, mistral, and mixtral - Fix constructor logic - Fix how models are accessed - Apply model schema to download_model method	2024-05-18 01:26:39 -04:00
teleprint-me	f7515abf49	feat: Add tokenizer types, model types, and model repos	2024-05-18 00:37:19 -04:00
teleprint-me	3ba01c7a0e	chore: Fix spacing	2024-05-18 00:10:42 -04:00
teleprint-me	1a286c8e21	refactor: Clean up variable names and separate concerns when downloading tokenizers	2024-05-17 23:27:30 -04:00
teleprint-me	5c8144e645	feat: Add download_model method and fix references for clarity to mitigate confusion	2024-05-17 23:00:12 -04:00
teleprint-me	4790f76740	feat: Add prototype for requesting vocab related files	2024-05-17 21:08:39 -04:00
teleprint-me	98cf788990	patch: Apply minor fixes for handling headers and writing content	2024-05-17 21:07:51 -04:00
slaren	05834841dc	ggml : fix quants nans when all the group weights are very close to zero (#7313 )	2024-05-18 02:39:54 +02:00
Engininja2	ef277de2ad	cmake : fix typo in AMDGPU_TARGETS (#7356 )	2024-05-18 02:39:25 +02:00
teleprint-me	742abebb39	refactor: Add log for status and fix url path variable name	2024-05-17 20:37:59 -04:00

1 2 3 4 5 ...

3068 commits