Commit graph

2930 commits

Author SHA1 Message Date
HanishKVC
7ba0144e42 ChatOn:chaton_tmpl_role_kv: try except to ignore missing ifany
Cas of above reason, switch to directly accessing the keys in
dump helper, which is inturn used by meta_ok check
2024-05-06 11:27:56 +05:30
HanishKVC
adab5775bf ChatON: more detailed/spreadout json fields 2024-05-06 11:27:56 +05:30
HanishKVC
3f09eb5dea ChatOn: ChatTemplateApply[Ex] return tagged msgs parts detail
Now there is a simple and extended version of returning tagged
messages.

The extended version returns the tagged string, as well as the
details of the parts that make up that tagged message interms of
the type of parts and the lengths of the parts.
2024-05-06 11:27:56 +05:30
HanishKVC
825a78abaa ChatOn: ChatTemplateApplySingle[Ex] return parts detail
Now there is a simple and extended version of returning tagged
message wrt a single role and its content.

The extended version returns the tagged string, as well as the
details of the parts that make up that tagged message interms of
the type of parts and the lengths of the parts.
2024-05-06 11:27:56 +05:30
HanishKVC
92e780fb1a ChatON:ChatParts: Allow flexibility for more refined tokenization 2024-05-06 11:27:56 +05:30
HanishKVC
6b23f15ffe ChatON:ChatOnMetaJSon: Add suffix wrt assistant messages 2024-05-06 11:27:56 +05:30
HanishKVC
d1899728aa ChatON: Test ChatParts in chat-template-apply 2024-05-06 11:27:56 +05:30
HanishKVC
9de1d6017f ChatON:ChatParts class initial go
Helps keep user prompt and chat-hs-template tag parts seperate,
but in sequence
2024-05-06 11:27:56 +05:30
HanishKVC
3064a36e74 ChatON+:Update tmpl_role_kv to retrieve wrt multiple keys
Use the same for user role's begin and prefix entries.
2024-05-06 11:27:56 +05:30
HanishKVC
f1f39c5256 ChatON:Add Monarch model template, which uses Begin + Prefix
Inturn Begin/BoS is added only for non 1st user messages in a
system+user prompts chain.
2024-05-06 11:27:56 +05:30
HanishKVC
724ff38345 ChatOn: Wrap getting begin in try-catch,
so that even if a role doesnt contain begin, the logic will work
fine.
2024-05-06 11:27:56 +05:30
HanishKVC
d70fca7a45 ChatOn: Add begin to the mix along with prefix
Dump shows user->begin.

chat-template-apply[-single] updated to work with begin and prefix

TODO: need to wrap begin in a try-catch, so that irrespective of
role, begin+prefix will work, irrespoective of whether that role
has a begin entry or not.
2024-05-06 11:27:56 +05:30
HanishKVC
0f713d4c4f ChatOn: meta json update wrt the new begin related fields 2024-05-06 11:27:56 +05:30
HanishKVC
bdd279c0c9 ChatOn:User Begin+Prefix note update, keep things simple consistent 2024-05-06 11:27:56 +05:30
HanishKVC
84367b9fd1 ChatON: Add template for DeepSeek
Was looking at the tokenized vector, and noticed that the EOS
mentioned by existing chat_apply_template of llama.cpp, is different
from what I noticed in tokenizer_config.json of deepseek llm, so
I have added two entries

* "deepseek-alt" which matches llama.cpp's chat_apply_template and
* "deepseek" which matches that in tokenizer_config.json.

This impacts the assistant suffix and reverse prompt entries.

CasOfThis: Need to look into other entries which I added previously
at a later time. However as the default logic should be picking the
EOS from model file, so I assume reverse-prompt being outofsync,
may not matter beyond a limit, potentially.
2024-05-06 11:27:56 +05:30
HanishKVC
f4b54069f6 ChatON: Add template for Gemma 2024-05-06 11:27:56 +05:30
HanishKVC
2a8028fba8 ChatON: Add Zephyr template to meta-json file 2024-05-06 11:27:56 +05:30
HanishKVC
57bd772bfd ChatON: Cleanup logging
Avoid showing on screen the debug messages.

meta-dump can either show on screen or not, based on how LOGXLN
is defined.
2024-05-06 11:27:56 +05:30
HanishKVC
217544e5ff ChatON: Keep compiler happy
Order the functions so that no need for seperate prototypes

Also use kv_bool wrt boolean entries.

Convert string to c char *
2024-05-06 11:27:56 +05:30
HanishKVC
3f9dfc240c ChatON: Check for the boolean entries in meta-json 2024-05-06 11:27:56 +05:30
HanishKVC
42f6b45547 ChatON: Use the constants defined for the keys 2024-05-06 11:27:56 +05:30
HanishKVC
efb758ba7d ChatON: Rename helpers to kv suffix, updated wrt metaok
rename because they return value of specified key.

[main] update metaok to take template-id, so that one can cross
check that all needed entries are there wrt that template-id in
the chaton-meta-json file
2024-05-06 11:27:56 +05:30
HanishKVC
e8c24c0767 ChatOn:MetaOk: Allows template-id based cross check
For a given template-id, cross check, all needed entries are there
in the json.
2024-05-06 11:27:56 +05:30
HanishKVC
b1055641e9 ChatON: Update the notes a bit 2024-05-06 11:27:56 +05:30
HanishKVC
11b47fbcfc ChatON:MetaJson: Add key constants, check metaJson loaded ifNeeded 2024-05-06 11:27:56 +05:30
HanishKVC
221ccd6462 ChatOn: Add SystemUser-1st-User-Has-Prefix flag support
Llama2 seems to need it, so chaton-meta-json sample file updated
to use same.
2024-05-06 11:27:56 +05:30
HanishKVC
f03dd2439f ChatOn:No global-begin/end in ChatApplyTmplSingle, ChatApplyTmpl
Avoid adding global begin/end markers wrt ChatApplyTmplSingle.

Add ChatApplyTmpl which goes through a vector of messages.
2024-05-06 11:27:56 +05:30
HanishKVC
c4cf0e9075 ChatON:Cleanup: BeginEnd, Debug log
Update the note

Rename global-prefix|suffix to global-begin|end.

Rename chat-apply-template to chat-apply-template-single, cas it
handles only a single message.

Add some debug log messages to the helper functions
2024-05-06 11:27:56 +05:30
HanishKVC
d87d27512e ChatOn: update sample meta json a bit
Move [inst] [/inst] wrt llama2 from global to individual role
specific parts.

Avoid an extra \n wrt prefixes of llama3
2024-05-06 11:27:55 +05:30
HanishKVC
cdbe4f06ce Chaton:Sample Meta JSON cleanup 2024-05-06 11:27:55 +05:30
HanishKVC
050d329e7e ChatOn+Main: Initial go at chaton in main interactive flow 2024-05-06 11:27:55 +05:30
HanishKVC
1374a64200 Chaton:Meta: Add chatml meta data to sample meta json file 2024-05-06 11:27:55 +05:30
HanishKVC
093abc29a2 ChatOn: Update sample meta json to be a valid json 2024-05-06 11:27:55 +05:30
HanishKVC
dc56be951d ChatOn:Main: Load and dump any specified chaton meta file 2024-05-06 11:27:55 +05:30
HanishKVC
35f25196a0 ChatOn:Common: Add the needed cmdline arg params and its parsing 2024-05-06 11:27:55 +05:30
HanishKVC
2146a253e8 ChatOn: Capture the idea 2024-05-06 11:27:55 +05:30
kunnis
628b299106
Adding support for the --numa argument for llama-bench. (#7080) 2024-05-05 14:17:47 +02:00
Sigbjørn Skjæret
8f8acc8683
Disable benchmark on forked repo (#7034)
* Disable benchmark on forked repo

* only check owner on schedule event

* check owner on push also

* more readable as multi-line

* ternary won't work

* style++

* test++

* enable actions debug

* test--

* remove debug

* test++

* do debug where we can get logs

* test--

* this is driving me crazy

* correct github.event usage

* remove test condition

* correct github.event usage

* test++

* test--

* event_name is pull_request_target

* test++

* test--

* update ref checks
2024-05-05 13:38:55 +02:00
Lyle Dean
ca36326020
readme : add note that LLaMA 3 is not supported with convert.py (#7065) 2024-05-05 08:21:46 +03:00
DAN™
889bdd7686
command-r : add BPE pre-tokenization (#7063)
* Add BPE pre-tokenization for Command-R/R+.

* Bump transformers convert requirement.

* command-r : add individual digits regex

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-05 08:19:30 +03:00
Brian
6fbd432211
py : logging and flake8 suppression refactoring (#7081)
Set one as executable and add basicConfig()
to another. Also added noqa tag to test scripts.
2024-05-05 08:07:48 +03:00
Xuan Son Nguyen
842500144e
gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
Jeximo
cf768b7e71
Tidy Android Instructions README.md (#7016)
* Tidy Android Instructions README.md

Remove CLBlast instructions(outdated), added OpenBlas.

* don't assume git is installed

Added apt install git, so that git clone works

* removed OpenBlas

Linked to Linux build instructions

* fix typo

Remove word "run"

* correct style

Co-authored-by: slaren <slarengh@gmail.com>

* correct grammar

Co-authored-by: slaren <slarengh@gmail.com>

* delete reference to Android API

* remove Fdroid reference, link directly to Termux

Fdroid is not required

Co-authored-by: slaren <slarengh@gmail.com>

* Update README.md

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-04 18:10:15 +02:00
viric
fcd84a0f5a
Fix Linux /sys cpu path to guess number of cores (#7064) 2024-05-04 15:26:53 +02:00
maor-ps
03fb8a002d
If first token generated from the server is the stop word the server will crash (#7038)
This will reproduce the issue in llama13b
{
'prompt': 'Q: hello world \nA: ',
 'stop': ['\n'],
 'temperature': 0.0,
 'n_predict': 10,
 'cache_prompt': True,
 'n_probs': 10
}
2024-05-04 11:06:40 +02:00
Georgi Gerganov
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
* tests : add test-tokenizer-0.sh

* unicode : add all unicode number ranges

* starcoder : fix pre-tokenizer

* tests : add test that fails with DeepSeek tokenizers

* falcon : fix regex

* unicode : regenerate unicode tables

* refact : add tokenizer model

* lint : fix

* tests : disable failing tests

ggml-ci

* refact : add tests files

ggml-ci

* convert : print -> logging

ggml-ci

* lint : fix

* unicode : digit -> number

* phi-3 : update
2024-05-04 08:32:32 +03:00
Brian
a2ac89d6ef
convert.py : add python logging instead of print() (#6511)
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
2024-05-03 22:36:41 +03:00
Daniel Bevenius
433def286e
llama : rename ctx to user_data in progress_callback (#7045)
* llama : rename ctx to user_data in progress_callback

This commit renames the `ctx` parameter to `user_data` in the
`llama_progress_callback` typedef.

The motivation for this is that other callbacks use `user_data` or
`data`, and using `ctx` in this case might be confusing as it could be
confused with `llama_context`.

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-03 15:24:30 +02:00
Bartowski
60325fa56f
Remove .attention from skipped tensors to match more accurately (#7051) 2024-05-03 01:49:09 +02:00
alwqx
6ecf3189e0
chore: fix typo in llama.cpp (#7032)
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-05-02 11:56:41 -04:00