llama.cpp

Author	SHA1	Message	Date
Daniel Bevenius	091a7af9fe	llama : add early return for empty range (#8327 ) * llama : add early return for empty range This commit adds an early return to the llama_kv_cache_seq_add and llama_kv_cache_seq_div functions. The motivation for adding this is to avoid looping over the cache when the range is empty. I ran into this when using the self-extend feature in main.cpp. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama : add static_cast to fix CI warning/error This commit attempts to fix the following warning/error: ```console src/llama.cpp:7271:31: error: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Werror=sign-compare] 7271 \| if (i < hparams.n_layer_dense_lead) { \| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This can be reproduced locally by setting -Wsign-compare in the Makefile. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! llama : add early return for empty range Remove the setting of cache.head to 0 when the range is empty. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * Update src/llama.cpp --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-14 00:20:26 +08:00
jaime-m-p	0106884e98	Detokenizer fixes (#8039 ) * Add llama_detokenize(): - Update header files location - UNKNOWN and CONTROL are 'special pieces' - Remove space after UNKNOWN and CONTROL - Refactor llama_token_to_piece() - Add flag: clean_up_tokenization_spaces - Symmetric params for llama_tokenize() and llama_detokenize() * Update and fix tokenizer tests: - Using llama_detokenize() - Unexpected vocab type as test fail instead of error - Useful when automating tests: - If you don't know in advance the vocab type - Differenciate other loading errors - Skip unicode surrogaes and undefined - Gracefully exit threads - Using exit() is throwing random exceptions - Clean old known problematic codepoints - Minor: confusing hexadecimal codepoint * Update bruteforce random tests - Add detokenizer checks - New generator: ascii_lr_strip - New generator: apostrophe - Add more vocabs files - Detokenize special tokens. - Replace errors with '\uFFFD' when detokenizing to 'utf-8' - More edge cases - Better detokenization results check * Fix add_space_prefix, set false by default * Better leading space removal * Do not remove space when decoding special tokens * Bugfix: custom regexs splits undefined unicode codepoints * 'viking' detokenizer clean spaces	2024-07-14 00:20:26 +08:00
Xuan Son Nguyen	16ab65b7b9	Reorganize documentation pages (#8325 ) * re-organize docs * add link among docs * add link to build docs * fix style * de-duplicate sections	2024-07-14 00:20:26 +08:00
Georgi Gerganov	401892e563	llama : fix compile warning (#8304 )	2024-07-14 00:20:26 +08:00
Natsu	c667e897e9	cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281 )	2024-07-14 00:20:26 +08:00
Georgi Gerganov	c738f1bc89	convert : remove AWQ remnants (#8320 )	2024-07-14 00:17:51 +08:00
Georgi Gerganov	655a624782	llama : minor indentation during tensor loading (#8304 ) * llama : minor indentation during tensor loading ggml-ci * llama : use int for layer iterators [no ci]	2024-07-14 00:17:51 +08:00
Johannes Gäßler	1dfab16f5d	CUDA: MMQ support for iq4_nl, iq4_xs (#8278 )	2024-07-14 00:17:51 +08:00
Daniele	4bb7223486	CUDA: revert part of the RDNA1 optimizations (#8309 ) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-14 00:17:51 +08:00
Douglas Hanley	d49328a3bf	llama : streamline embeddings from "non-embedding" models (#8087 )	2024-07-14 00:17:51 +08:00
Johannes Gäßler	972fbf7fbf	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311 )	2024-07-14 00:17:51 +08:00
Pieter Ouwerkerk	df8b4d8e39	readme : fix minor typos [no ci] (#8314 )	2024-07-14 00:17:51 +08:00
Daniel Bevenius	53da9d276e	passkey : add short intro to README.md [no-ci] (#8317 ) * passkey : add short intro to README.md [no-ci] This commit adds a short introduction to the README.md file in the examples/passkey directory. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * Update examples/passkey/README.md --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-14 00:17:51 +08:00
Georgi Gerganov	f5cb88cc73	llama : prefer n_ over num_ prefix (#8308 )	2024-07-14 00:17:51 +08:00
Georgi Gerganov	8696144105	contributing : update guidelines (#8316 )	2024-07-14 00:17:51 +08:00
Neo Zhang	a4c8edcb67	fix for multiple cards	2024-07-14 00:15:55 +08:00
Neo Zhang	aeaed61904	Merge pull request #1 from arthw/update_warp [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) cherry-pick `b549a1bbef`	2024-07-13 16:44:28 +08:00
arthw	74e3185cfd	fix editorconfig check format issue	2024-07-13 16:02:15 +08:00
arthw	4cd9e48670	cherry-pick `b549a1bbef`, [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp Fix issue in above PR: fix norm() nullptr lead to crash on iGPU. use WARP_32_SIZE replace QK_WARP_SIZE optimize dmmv.cpp for iGPU. add sycl_hw.cpp to detect Hardware info.	2024-07-13 14:44:38 +08:00
Georgi Gerganov	c5009e6128	py : switch to snake_case (#8305 ) * py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-07 21:20:52 +08:00
Xuan Son Nguyen	6d6ecd3200	cli: add EOT when user hit Ctrl+C (#8296 ) * main: add need_insert_eot * do not format system prompt if it is empty	2024-07-07 21:20:05 +08:00
Icecream95	cbfc850793	llama : add OpenELM support (#7359 ) * Initial OpenELM support (270M only so far) * Fill out missing entries in llama_model_type_name * fixup! Initial OpenELM support (270M only so far) Fix formatting * llama : support all OpenELM models * llama : add variable GQA and variable FFN sizes Some metadata keys can now also be arrays to support setting their value per-layer for models like OpenELM. * llama : minor spacing changes Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : use std::array for per-layer hparams * llama : fix save/load state * llama : do not print hparams for vocab-only models * llama : handle n_head == 0 * llama : use const ref for print_f and fix division by zero * llama : fix t5 uses of n_head and n_ff * llama : minor comment --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-07 21:20:05 +08:00
Daniel Bevenius	63c6e90eab	tokenize : add --show-count (token) option (#8299 ) This commit adds a new option to the tokenize example, --show-count. When this is set the total number of tokens are printed to stdout. This was added as an option as I was concerned that there might be scripts that use the output from this program and it might be better to not print this information by default. The motivation for this is that can be useful to find out how many tokens a file contains, for example when trying to determine prompt input file sizes for testing. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-07 21:20:05 +08:00
ditsuke	498d561ab1	build: Export hf-to-gguf as snakecase	2024-07-07 21:20:05 +08:00
ditsuke	cb46165d9e	doc: Add context for why we add an explicit pytorch source	2024-07-07 21:20:05 +08:00
ditsuke	ba8aea8457	chore: Remove rebase artifacts	2024-07-07 21:20:05 +08:00
ditsuke	1d1fea0b6e	chore: Fixup requirements and build	2024-07-07 21:20:05 +08:00
ditsuke	1ee5d59f67	chore: ignore all __pychache__	2024-07-07 21:20:05 +08:00
ditsuke	3aefc742fe	fix: Update script paths in CI scripts	2024-07-07 21:20:05 +08:00
ditsuke	84f249c4e8	fix: Actually include scripts in build Not namespaced though :(	2024-07-07 21:20:05 +08:00
ditsuke	2c753017ae	build(python): Package scripts with pip-0517 compliance	2024-07-07 21:20:05 +08:00
fairydreaming	ff2ca9cfb7	Inference support for T5 and FLAN-T5 model families (#5763 ) * llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-07 21:20:05 +08:00
Daniel Bevenius	3a710b6aaf	tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231 ) This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS` to the root cmake subproject. The motivation for this is that currently the following warnings are displayed when compiling the tests and common cmake subprojects: ```console test-llama-grammar.cpp C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror': This function or variable may be unsafe. Consider using strerror_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [C:\llama.cpp\build\tests\test-llama-grammar.vcxproj] ... ``` This compile definition is currently set for the `src` subproject and this change moves into the root cmake project so that it is applied to all cmake subprojects.	2024-07-07 21:20:05 +08:00
Daniel Bevenius	ef1600090f	llama : suppress unref var in Windows MSVC (#8150 ) * llama : suppress unref var in Windows MSVC This commit suppresses two warnings that are currently generated for src/llama.cpp when building on Windows MSVC ```console C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] ``` * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-07 21:20:05 +08:00
Georgi Gerganov	e9d503a5d7	convert : fix gemma v1 tokenizer convert (#8248 ) ggml-ci	2024-07-07 21:20:04 +08:00
Daniele	ab0e5dee19	Define and optimize RDNA1 (#8085 )	2024-07-07 21:14:44 +08:00
slaren	80ffd6e497	ppl : fix n_seq_max for perplexity (#8277 ) * ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence	2024-07-07 21:14:44 +08:00
Xuan Son Nguyen	40a2a1b936	fix phi 3 conversion (#8262 )	2024-07-07 21:14:44 +08:00
Neo Zhang	fdef7d606e	replace get_work_group_size() by local buf	2024-07-04 11:55:23 +08:00
Neo Zhang	2493479958	skip UT for BF16	2024-07-04 08:28:58 +08:00
Neo Zhang	96e3826f83	update for title	2024-07-03 12:59:34 +08:00
AidanBeltonS	51be862438	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 12:02:33 +08:00
MistApproach	85ec6c02c2	fix: add missing short command line argument -mli for multiline-input (#8261 )	2024-07-03 11:51:04 +08:00
Clint Herron	044995e2d1	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-03 11:47:48 +08:00
Faisal Zaghloul	6b695b5a2c	Add `JAIS` model(s) (#8118 ) * Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>	2024-07-03 11:44:37 +08:00
Daniel Bevenius	785f24b954	convert-hf : print output file name when completed (#8181 ) * convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-03 11:44:19 +08:00
slaren	726953cda5	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-03 11:42:59 +08:00
Neo Zhang	9c593619f3	fix multiple gpu, add device choose mode, update the guide for usages	2024-07-03 11:20:54 +08:00
Jianyu Zhang	de2763118f	fix to support multiple GPUs, fix set single device, unify id/device_id/device_index	2024-07-03 10:21:29 +08:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00

1 2 3 4 5 ...

3378 commits