llama.cpp

Author	SHA1	Message	Date
Justine Tunney	3855416027	ggml : introduce bfloat16 support (#6412 ) * Introduce bfloat16 support Many models on Hugging Face (e.g. Mistral, TinyLLaMA) use bfloat16 as their canonical floating point format. ┌sign │ │ ┌exponent │ │ │ │ ┌mantissa │ │ │ │┌──┴───┐┌─┴───┐ 0b0000000000000000 brain16 This encoding has the same number of exponent bits as float32. That makes conversion relatively straightforward, even in the absence of hardware support. For example, converting brain16 to binary32 means simply shifting 16 bits to the left. ┌sign │ │ ┌exponent │ │ │ │ ┌mantissa │ │ │ │┌──┴───┐┌─┴───────────────────┐ 0b00000000000000000000000000000000 IEEE binary32 The issue is that converting bf16 to fp16 can result in information loss. Only 13% of bf16 numbers can be precisely represented in fp16 which in practice ends up being 99.71% of Mistral 7b v0.2's weights however there is currently no way other than fp32 to get the others ┌sign │ │ ┌exponent │ │ │ │ ┌mantissa │ │ │ │┌─┴─┐┌─┴──────┐ 0b0000000000000000 IEEE binary16 This change fixes that, by adding a bf16 data type to GGML. Support for CPU inference has been implemented along with optimizations for the AVX2, AVX512, and AVX512BF16 ISAs. Perplexity on Mistral 7b 0.2 improves somewhere around -0.0024 to -0.0046 compared to using fp16 * Remove GGML code that's not needed * Minimize the GGML API surface area for BF16 * Remove bf16 luts * Make the GGML header look nicer * Fix documentation * Apply ggerganov's fixes for test-backend-ops * Add BF16 code for new ggml_validate_row_data() function	2024-05-08 09:30:09 +03:00
Georgi Gerganov	c0e6fbf8c3	metal : fix unused warning	2024-05-08 09:14:50 +03:00
Jeximo	c780e75305	Further tidy on Android instructions README.md (#7077 ) * Further tidy on Android instructions README.md Fixed some logic when following readme direction * Clean up redundent information A new user arriving will see simple directions on llama.cpp homepage * corrected puncuation Period after cmake, colon after termux * re-word for clarity method seems to be more correct, instead of alternative in this context * Organized required packages per build type building llama.cpp with NDK on a pc doesn't require installing clang, cmake, git, or wget in termux. * README.md corrected title * fix trailing whitespace	2024-05-08 02:26:43 +02:00
jukofyork	48b2f9c1fc	Fixed save_imatrix to match old behaviour for MoE (#7099 ) * Fixed save_imatrix to match old behaviour for MoE This fix is simple and clear, but unnecessarily doubles the memory overhead.. * Fixed missing idx variable * Unconditionally increment ncall Co-authored-by: slaren <slarengh@gmail.com> * Fixed 2 bugs in save_imatrix() - Fixed segfault bug because the counts vector needed to be created. - Fixed pre-existing bug didn't actually add to the counts for "--combine" option. * ncall needs summing too * Trailing whitespace --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-08 02:24:16 +02:00
Johannes Gäßler	af0a5b6163	server: fix incorrectly reported token probabilities (#7125 ) * server: normalize token probabilities * fix temperature == 0.0f	2024-05-07 23:07:58 +02:00
nopperl	b6aa670203	Fix OLMo HF to GGUF conversion (#6910 )	2024-05-07 21:39:43 +02:00
Kyle Mistele	260b7c6529	server : update readme with undocumented options (#7013 )	2024-05-07 21:44:29 +03:00
Georgi Gerganov	53d6c52e22	readme : update hot topics	2024-05-07 21:43:13 +03:00
RhinoDevel	3af34c1d1b	main : update log text (EOS to EOG) (#7104 ) * Update log text (EOS to EOG) The log text "found EOS" is no longer always correct, here, because there is now an is-EOG check that also returns true for EOT. * Improve log msg. further by using "an" instead of "some". As suggested, to avoid misunderstanding (no multiple EOG tokens found, just one).	2024-05-07 20:51:31 +03:00
omahs	04976db7a8	docs: fix typos (#7124 ) * fix typo * fix typos * fix typo * fix typos * fix typo * fix typos	2024-05-07 18:20:33 +03:00
Georgi Gerganov	947d3ad27d	ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098 ) * ci : add GG_BUILD_EXTRA_TESTS_0 env ggml-ci * Update run.sh ggml-ci	2024-05-07 11:08:49 +03:00
HanishKVC	76791bad63	ChatON:Fix partsLengths to int32_t type, instead of int so that the size of the elements is explicit and fixed, so that it is inturn in sync with the fixed int size specified wrt the c-api, even with any c compilers with different idea about int. avoid some ununsed vars, need to update compile flags later to enable corresponding warnings.	2024-05-07 12:40:49 +05:30
HanishKVC	b3a56545d6	ChatON:Reposition alertAssistantAtEnd flag for consistency	2024-05-07 11:49:43 +05:30
HanishKVC	0852f3b7ec	ChatON:ExCApi: Rename for consistency	2024-05-07 11:46:40 +05:30
HanishKVC	43a3a91b03	ChatON: Cleanup/Refine initial go at tmpl_apply_ex_capi	2024-05-07 11:44:25 +05:30
HanishKVC	7c288d3dfc	ChatON: Rename to partstypes for consistency	2024-05-07 11:32:20 +05:30
HanishKVC	04b4a15177	ChatON: Initial go at chat-template-apply c-api with parts info	2024-05-07 11:08:47 +05:30
HanishKVC	f6a86cd209	ChatON: Update the Note a bit	2024-05-07 10:29:16 +05:30
William Tambellini	858f6b73f6	Add an option to build without CUDA VMM (#7067 ) Add an option to build ggml cuda without CUDA VMM resolves https://github.com/ggerganov/llama.cpp/issues/6889 https://forums.developer.nvidia.com/t/potential-nvshmem-allocated-memory-performance-issue/275416/4	2024-05-06 20:12:14 +02:00
Georgi Gerganov	b3a995b416	flake.lock: Update (#7079 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d?narHash=sha256-sB4SWl2lX95bExY2gMFG5HIzvva5AVMJd4Igm%2BGpZNw%3D' (2024-04-01) → 'github:hercules-ci/flake-parts/e5d10a24b66c3ea8f150e47dfdb0416ab7c3390e?narHash=sha256-yzcRNDoyVP7%2BSCNX0wmuDju1NUCt8Dz9%2BlyUXEI0dbI%3D' (2024-05-02) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib&narHash=sha256-iMUFArF0WCatKK6RzfUJknjem0H9m4KgorO/p3Dopkk%3D' (2024-03-29) → 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25) → 'github:NixOS/nixpkgs/63c3a29ca82437c87573e4c6919b09a24ea61b0f?narHash=sha256-4cPymbty65RvF1DWQfc%2BBc8B233A1BWxJnNULJKQ1EY%3D' (2024-05-02) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-05-06 08:36:06 -07:00
Georgi Gerganov	bcdee0daa7	minor : fix trailing whitespace	2024-05-06 09:31:30 +03:00
HanishKVC	b875b02979	ChatON:Initial go at vicuna chat template in meta.json Have looked at tokenizer_config.json, jinja file and default hardcoded template in llama.cpp. This is also one of the models where a Global BoS is needed. NOTE: Have taken the liberty to also add a SYSTEM: prefix wrt system message, even thou default vicuna doesnt seem to need, but vicuna-orca seems to need, so that both models can be driven from same chat template config. I am assuming the system prefix should not create any problem even in default vicuna, however if it does create a problem one can duplicate the existing vicuna block in chaton_meta.json and make the system prefix empty in it.	2024-05-06 11:27:56 +05:30
HanishKVC	0f8f2a18c2	ChatON:chat template for OpenChat in meta.json initial go The first model seen, based on templates added till now into meta json file, that needs a Global Begin. From tokenizer_config json file, it appears like even system role should have a appropriate prefix, unlike what is seen in hardcoded default chat apply template of llama.cpp and chat jinja template.	2024-05-06 11:27:56 +05:30
HanishKVC	93115a9733	ChatON: initial go at OrionStar Ai chat model template Got from its tokenizer config json. Also same found in existing hardcoded template in default chat apply template logic of llamacpp	2024-05-06 11:27:56 +05:30
HanishKVC	989c6c4125	SimpCfg: Cleanup the Note a bit to avoid some ambiguities	2024-05-06 11:27:56 +05:30
HanishKVC	344c068d7b	SimpCfg:MultiPart keys wrt get_vector With this and past few commits, now there is simple yet sufficient support to help move multi-level-hierarchy config files into the SimpCfg's simple physically 1-level, but if reqd logically multi level hierarchy flow. B4 this series of commits also one could have still achieved this, but there would have been bit more effort needed.	2024-05-06 11:27:56 +05:30
HanishKVC	19d3c88e8a	SimpCfg:MultiPart keys wrt get_value etal	2024-05-06 11:27:56 +05:30
HanishKVC	623d0b60da	SimpCfg: General MultiPart support, KeyParts not Key wrt SetValue	2024-05-06 11:27:56 +05:30
HanishKVC	c6ecd9316e	SimpCfg: Use to_str instead of using stringstream directly	2024-05-06 11:27:56 +05:30
HanishKVC	5380b1e86e	ChatON:Update meta.json wrt command-r models template info picked from tokenizer config's default entry, Verified that same is used in the existing hardcoded chat apply template flow.	2024-05-06 11:27:56 +05:30
HanishKVC	2b14bcaddb	SimpCfg:ChatON: add by Humans for All note	2024-05-06 11:27:56 +05:30
HanishKVC	20e5b383c5	SimpCfg:Trim DumpHexString only if SC_DEBUG_VERBOSE	2024-05-06 11:27:56 +05:30
HanishKVC	f53c19baac	SimpCfg: Update the notes wrt tolower and add test code	2024-05-06 11:27:56 +05:30
HanishKVC	3287fdba28	SimpCfg:Fix/cleanup trim related test samples and flow Use the commonality between Indian languages to show mixup issue with the simple minded trim_dump logic and how trim_oversmart could potentially avoid that. Given that I am using valid strings to show the pitfalls of fixed native char size driven logic, so no need to keep the dump and oversmart flows seperate, so merge into a common loop.	2024-05-06 11:27:56 +05:30
HanishKVC	33619a3b92	SimpCfg: Templatize str_lower	2024-05-06 11:27:56 +05:30
HanishKVC	32ba195a83	SimpCfg: Templatize str_trim_single Also use NativeCharSize and MultiNativeCharSize wording to make the note more generic	2024-05-06 11:27:56 +05:30
HanishKVC	5b8bf849c0	SimpCfg: Fixed & ~Variable Length to Native & MultiNativeCharSize So as to make the notes, more generic.	2024-05-06 11:27:56 +05:30
HanishKVC	d030a26f3c	SimpCfg:Update TrimOverSmart use templated TrimDumb after wstrconv	2024-05-06 11:27:56 +05:30
HanishKVC	97ac443bba	SimpCfg:Cleanup, updated notes, templated code Update the notes to match the templated flow now and some of the nitty gritties involved. Update DumpHexString to be templated. Split check nonenglish flow wrt trim dumb and oversmart testing, so that things with work with one, but not the other can be differentiated in the flow.	2024-05-06 11:27:56 +05:30
HanishKVC	bf111a83f1	SimpCfg:TemplatedDumbTrim; Test dumb and oversmart trim logics	2024-05-06 11:27:56 +05:30
HanishKVC	554b00f027	SimpCfg: Add some missing const refs	2024-05-06 11:27:56 +05:30
HanishKVC	cae0fff715	SimpCfg: Update notes; Try add a better trimming logic	2024-05-06 11:27:56 +05:30
HanishKVC	d1156cc055	SimpCfg: As locale manipulation reqd for better processing	2024-05-06 11:27:56 +05:30
HanishKVC	2325764180	SimpCfg:CheckStrings: Switch Mbs2Wcs to multithread safe calls	2024-05-06 11:27:56 +05:30
HanishKVC	23acf07bb2	SimpCfg:CheckStrings: Cleanup wstring flow to needed parts	2024-05-06 11:27:56 +05:30
HanishKVC	2cda78f1ad	SimpCfg:CheckStrings: WString2String finally The constructor method doesnt convert wstring to string, when it involves non-english chars which will encode to multibyte chars in utf8. even thou it does work for the already utf8 u8string. wcstombs doesnt seem to work for non english chars, when the locale is set to the default c, need to change to something like en_US.UTF-8, to allow it to do the conversion properly.	2024-05-06 11:27:56 +05:30
HanishKVC	7607dbc8c7	SimpCfg:CheckStrings: Try fixup wstring handling	2024-05-06 11:27:56 +05:30
HanishKVC	1a618a42f8	SimpCfg: Update the func notes with alert	2024-05-06 11:27:56 +05:30
HanishKVC	66d6fa62b7	SimpCfg: C++ and strings is a mess even after decades Seperate out the checks wrt different string types. Add a wstring_basic, which verifies that wstring iterator handles non english chars propery or atleast better.	2024-05-06 11:27:56 +05:30
HanishKVC	3ad5cec47e	SimpCfg:CheckStrings:MacOS, wstring and wcout Without using imbue, I couldnt get non-english wstrings to print on mac. Need to check on linux also. Also avoid the uint8_t typecasting, given that wchar isnt 8bit	2024-05-06 11:27:56 +05:30

1 2 3 4 5 ...

3003 commits