llama.cpp

Author	SHA1	Message	Date
HanishKVC	6b23f15ffe	ChatON:ChatOnMetaJSon: Add suffix wrt assistant messages	2024-05-06 11:27:56 +05:30
HanishKVC	d1899728aa	ChatON: Test ChatParts in chat-template-apply	2024-05-06 11:27:56 +05:30
HanishKVC	9de1d6017f	ChatON:ChatParts class initial go Helps keep user prompt and chat-hs-template tag parts seperate, but in sequence	2024-05-06 11:27:56 +05:30
HanishKVC	3064a36e74	ChatON+:Update tmpl_role_kv to retrieve wrt multiple keys Use the same for user role's begin and prefix entries.	2024-05-06 11:27:56 +05:30
HanishKVC	f1f39c5256	ChatON:Add Monarch model template, which uses Begin + Prefix Inturn Begin/BoS is added only for non 1st user messages in a system+user prompts chain.	2024-05-06 11:27:56 +05:30
HanishKVC	724ff38345	ChatOn: Wrap getting begin in try-catch, so that even if a role doesnt contain begin, the logic will work fine.	2024-05-06 11:27:56 +05:30
HanishKVC	d70fca7a45	ChatOn: Add begin to the mix along with prefix Dump shows user->begin. chat-template-apply[-single] updated to work with begin and prefix TODO: need to wrap begin in a try-catch, so that irrespective of role, begin+prefix will work, irrespoective of whether that role has a begin entry or not.	2024-05-06 11:27:56 +05:30
HanishKVC	0f713d4c4f	ChatOn: meta json update wrt the new begin related fields	2024-05-06 11:27:56 +05:30
HanishKVC	bdd279c0c9	ChatOn:User Begin+Prefix note update, keep things simple consistent	2024-05-06 11:27:56 +05:30
HanishKVC	84367b9fd1	ChatON: Add template for DeepSeek Was looking at the tokenized vector, and noticed that the EOS mentioned by existing chat_apply_template of llama.cpp, is different from what I noticed in tokenizer_config.json of deepseek llm, so I have added two entries * "deepseek-alt" which matches llama.cpp's chat_apply_template and * "deepseek" which matches that in tokenizer_config.json. This impacts the assistant suffix and reverse prompt entries. CasOfThis: Need to look into other entries which I added previously at a later time. However as the default logic should be picking the EOS from model file, so I assume reverse-prompt being outofsync, may not matter beyond a limit, potentially.	2024-05-06 11:27:56 +05:30
HanishKVC	f4b54069f6	ChatON: Add template for Gemma	2024-05-06 11:27:56 +05:30
HanishKVC	2a8028fba8	ChatON: Add Zephyr template to meta-json file	2024-05-06 11:27:56 +05:30
HanishKVC	57bd772bfd	ChatON: Cleanup logging Avoid showing on screen the debug messages. meta-dump can either show on screen or not, based on how LOGXLN is defined.	2024-05-06 11:27:56 +05:30
HanishKVC	217544e5ff	ChatON: Keep compiler happy Order the functions so that no need for seperate prototypes Also use kv_bool wrt boolean entries. Convert string to c char *	2024-05-06 11:27:56 +05:30
HanishKVC	3f9dfc240c	ChatON: Check for the boolean entries in meta-json	2024-05-06 11:27:56 +05:30
HanishKVC	42f6b45547	ChatON: Use the constants defined for the keys	2024-05-06 11:27:56 +05:30
HanishKVC	efb758ba7d	ChatON: Rename helpers to kv suffix, updated wrt metaok rename because they return value of specified key. [main] update metaok to take template-id, so that one can cross check that all needed entries are there wrt that template-id in the chaton-meta-json file	2024-05-06 11:27:56 +05:30
HanishKVC	e8c24c0767	ChatOn:MetaOk: Allows template-id based cross check For a given template-id, cross check, all needed entries are there in the json.	2024-05-06 11:27:56 +05:30
HanishKVC	b1055641e9	ChatON: Update the notes a bit	2024-05-06 11:27:56 +05:30
HanishKVC	11b47fbcfc	ChatON:MetaJson: Add key constants, check metaJson loaded ifNeeded	2024-05-06 11:27:56 +05:30
HanishKVC	221ccd6462	ChatOn: Add SystemUser-1st-User-Has-Prefix flag support Llama2 seems to need it, so chaton-meta-json sample file updated to use same.	2024-05-06 11:27:56 +05:30
HanishKVC	f03dd2439f	ChatOn:No global-begin/end in ChatApplyTmplSingle, ChatApplyTmpl Avoid adding global begin/end markers wrt ChatApplyTmplSingle. Add ChatApplyTmpl which goes through a vector of messages.	2024-05-06 11:27:56 +05:30
HanishKVC	c4cf0e9075	ChatON:Cleanup: BeginEnd, Debug log Update the note Rename global-prefix\|suffix to global-begin\|end. Rename chat-apply-template to chat-apply-template-single, cas it handles only a single message. Add some debug log messages to the helper functions	2024-05-06 11:27:56 +05:30
HanishKVC	d87d27512e	ChatOn: update sample meta json a bit Move [inst] [/inst] wrt llama2 from global to individual role specific parts. Avoid an extra \n wrt prefixes of llama3	2024-05-06 11:27:55 +05:30
HanishKVC	cdbe4f06ce	Chaton:Sample Meta JSON cleanup	2024-05-06 11:27:55 +05:30
HanishKVC	050d329e7e	ChatOn+Main: Initial go at chaton in main interactive flow	2024-05-06 11:27:55 +05:30
HanishKVC	1374a64200	Chaton:Meta: Add chatml meta data to sample meta json file	2024-05-06 11:27:55 +05:30
HanishKVC	093abc29a2	ChatOn: Update sample meta json to be a valid json	2024-05-06 11:27:55 +05:30
HanishKVC	dc56be951d	ChatOn:Main: Load and dump any specified chaton meta file	2024-05-06 11:27:55 +05:30
HanishKVC	35f25196a0	ChatOn:Common: Add the needed cmdline arg params and its parsing	2024-05-06 11:27:55 +05:30
HanishKVC	2146a253e8	ChatOn: Capture the idea	2024-05-06 11:27:55 +05:30
kunnis	628b299106	Adding support for the --numa argument for llama-bench. (#7080 )	2024-05-05 14:17:47 +02:00
Sigbjørn Skjæret	8f8acc8683	Disable benchmark on forked repo (#7034 ) * Disable benchmark on forked repo * only check owner on schedule event * check owner on push also * more readable as multi-line * ternary won't work * style++ * test++ * enable actions debug * test-- * remove debug * test++ * do debug where we can get logs * test-- * this is driving me crazy * correct github.event usage * remove test condition * correct github.event usage * test++ * test-- * event_name is pull_request_target * test++ * test-- * update ref checks	2024-05-05 13:38:55 +02:00
Lyle Dean	ca36326020	readme : add note that LLaMA 3 is not supported with convert.py (#7065 )	2024-05-05 08:21:46 +03:00
DAN™	889bdd7686	command-r : add BPE pre-tokenization (#7063 ) * Add BPE pre-tokenization for Command-R/R+. * Bump transformers convert requirement. * command-r : add individual digits regex --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-05 08:19:30 +03:00
Brian	6fbd432211	py : logging and flake8 suppression refactoring (#7081 ) Set one as executable and add basicConfig() to another. Also added noqa tag to test scripts.	2024-05-05 08:07:48 +03:00
Xuan Son Nguyen	842500144e	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
Jeximo	cf768b7e71	Tidy Android Instructions README.md (#7016 ) * Tidy Android Instructions README.md Remove CLBlast instructions(outdated), added OpenBlas. * don't assume git is installed Added apt install git, so that git clone works * removed OpenBlas Linked to Linux build instructions * fix typo Remove word "run" * correct style Co-authored-by: slaren <slarengh@gmail.com> * correct grammar Co-authored-by: slaren <slarengh@gmail.com> * delete reference to Android API * remove Fdroid reference, link directly to Termux Fdroid is not required Co-authored-by: slaren <slarengh@gmail.com> * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-04 18:10:15 +02:00
viric	fcd84a0f5a	Fix Linux /sys cpu path to guess number of cores (#7064 )	2024-05-04 15:26:53 +02:00
maor-ps	03fb8a002d	If first token generated from the server is the stop word the server will crash (#7038 ) This will reproduce the issue in llama13b { 'prompt': 'Q: hello world \nA: ', 'stop': ['\n'], 'temperature': 0.0, 'n_predict': 10, 'cache_prompt': True, 'n_probs': 10 }	2024-05-04 11:06:40 +02:00
Georgi Gerganov	92139b90af	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 ) * tests : add test-tokenizer-0.sh * unicode : add all unicode number ranges * starcoder : fix pre-tokenizer * tests : add test that fails with DeepSeek tokenizers * falcon : fix regex * unicode : regenerate unicode tables * refact : add tokenizer model * lint : fix * tests : disable failing tests ggml-ci * refact : add tests files ggml-ci * convert : print -> logging ggml-ci * lint : fix * unicode : digit -> number * phi-3 : update	2024-05-04 08:32:32 +03:00
Brian	a2ac89d6ef	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
Daniel Bevenius	433def286e	llama : rename ctx to user_data in progress_callback (#7045 ) * llama : rename ctx to user_data in progress_callback This commit renames the `ctx` parameter to `user_data` in the `llama_progress_callback` typedef. The motivation for this is that other callbacks use `user_data` or `data`, and using `ctx` in this case might be confusing as it could be confused with `llama_context`. --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-05-03 15:24:30 +02:00
Bartowski	60325fa56f	Remove .attention from skipped tensors to match more accurately (#7051 )	2024-05-03 01:49:09 +02:00
alwqx	6ecf3189e0	chore: fix typo in llama.cpp (#7032 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-05-02 11:56:41 -04:00
Andrew Downing	b0d943de17	Update LOG_IMPL and LOG_TEE_IMPL (#7029 ) ROCm clang defines _MSC_VER which results in the wrong implementation of LOG_IMPL and LOG_TEE_IMPL being compiled. This fixes https://github.com/ggerganov/llama.cpp/issues/6972	2024-05-01 23:31:30 +02:00
l3utterfly	8d608a81b7	main : fix off by one error for context shift (#6921 )	2024-05-01 22:27:41 +03:00
Johannes Gäßler	3ea0d36000	Server: add tests for batch size, different seeds (#6950 )	2024-05-01 17:52:55 +02:00
Johannes Gäßler	1613ef8d8e	CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019 )	2024-05-01 14:46:37 +02:00
slaren	c4ec9c0d3d	ci : exempt confirmed bugs from being tagged as stale (#7014 )	2024-05-01 08:13:59 +03:00

1 2 3 4 5 ...

2875 commits