Commit graph

3458 commits

Author SHA1 Message Date
nopperl
fa99dc27c9 Merge branch 'master' into chameleon 2024-07-22 15:14:13 +02:00
nopperl
0ee896d149
fix punctuation regex in chameleon pre-tokenizer (@compilade)
Co-authored-by: compilade <git@compilade.net>
2024-07-22 11:47:35 +00:00
nopperl
1e1e78a324
Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
2024-07-22 11:46:18 +00:00
nopperl
05f138551f add comment regarding special token regex in chameleon pre-tokenizer 2024-07-22 13:44:24 +02:00
nopperl
6e0ded3637 move swin_norm in gguf writer 2024-07-22 13:22:21 +02:00
Georgi Gerganov
6f11a83e4e
llama : allow overrides for tokenizer flags (#8614)
ggml-ci
2024-07-22 13:33:22 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests (#8611)
* models : remove duplicated gpt-2 vocab

* models : remove old stablelm vocab

* tests : re-enable MPT tokenizer tests

* tests : re-enable DeepSeek tokenizer tests

* cmake : sort

ggml-ci
2024-07-22 13:32:49 +03:00
Douglas Hanley
50e05353e8
llama : add Mistral Nemo inference support (#8604) 2024-07-22 11:06:17 +03:00
Jan Boon
628154492a
server : update doc to clarify n_keep when there is bos token (#8619) 2024-07-22 11:02:09 +03:00
Mark Zhuang
04bab6b7da
ggml: fix compile error for RISC-V (#8623) 2024-07-22 10:56:45 +03:00
devojony
b7c11d36e6
examples: fix android example cannot be generated continuously (#8621)
When generation ends `completion_loop()` should return a NULL, not the empty string
2024-07-22 09:54:42 +03:00
Georgi Gerganov
45f2c19cc5
flake.lock: Update (#8610) 2024-07-21 06:45:10 -07:00
M-A
22f281aa16
examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)
Changes:

- Move each example into its own function. This makes the code much
  easier to read and understand.
- Make the program easy to only run one test by commenting out function
  calls in main().
- Make the output easy to parse by indenting the output for each example.
- Add shebang and +x bit to make it clear it's an executable.
- Make the host configurable via --host with a default 127.0.0.1:8080.
- Make the code look in the tools list to call the registered tool,
  instead of hardcoding the returned values. This makes the code more
  copy-pastable.
- Add error checking, so that the program exits 1 if the LLM didn't
  returned expected values. It's super useful to check for correctness.

Testing:

- Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and
  Meta-Llama-3-8B-Instruct in F16 and Q5_K_M.
  - I did not observe a failure even once in Mistral-7B-Instruct-v0.3.
  - Llama-3 failed about a third of the time in example_concurrent: it
    only returned one call instead of 3. Even for F16.

Potential follow ups:

- Do not fix the prompt encoding yet. Surprisingly it mostly works even
  if the prompt encoding is not model optimized.
- Add chained answer and response.

Test only change.
2024-07-20 22:09:17 -04:00
compilade
328884f421
gguf-py : fix some metadata name extraction edge cases (#8591)
* gguf-py : fix some metadata name extraction edge cases

* convert_lora : use the lora dir for the model card path

* gguf-py : more metadata edge cases fixes

Multiple finetune versions are now joined together,
and the removal of the basename annotation on trailing versions
is more robust.

* gguf-py : add more name metadata extraction tests

* convert_lora : fix default filename

The default filename was previously hardcoded.

* convert_hf : Model.fname_out can no longer be None

* gguf-py : do not use title case for naming convention

Some models use acronyms in lowercase,
which can't be title-cased like other words,
so it's best to simply use the same case
as in the original model name.

Note that the size label still has an uppercased suffix
to make it distinguishable from the context size of a finetune.
2024-07-20 21:58:49 -04:00
compilade
c69c63039c
convert_hf : fix Gemma v1 conversion (#8597)
* convert_hf : fix Gemma v1 conversion

* convert_hf : allow renaming tokens, but with a warning

* convert_hf : fix Gemma v1 not setting BOS and EOS tokens
2024-07-20 21:53:01 -04:00
Johannes Gäßler
69c487f4ed
CUDA: MMQ code deduplication + iquant support (#8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-07-20 22:25:26 +02:00
Georgi Gerganov
07283b1a90
gguf : handle null name during init (#8587) 2024-07-20 17:15:42 +03:00
Michael Coppola
940362224d
llama : add support for Tekken pre-tokenizer (#8579)
* llama : Added support for Tekken pre-tokenizer (#8577)

Removed uneeded `vocab.tokenizer_clean_spaces` assignment

* llama : fix order of pre-tokenizers

* * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
* Updated chkhsh for Tekken tokenizer

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-20 16:43:51 +03:00
Huifeng Ou
69b9945b44
llama.swiftui: fix end of generation bug (#8268)
* fix continuing generating blank lines after getting EOT token or EOS token from LLM

* change variable name to is_done (variable name suggested by ggerganov)

* minor : fix trailing whitespace

* minor : add space

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-20 16:09:37 +03:00
Brian
c3776cacab
gguf_dump.py: fix markddown kv array print (#8588)
* gguf_dump.py: fix markddown kv array print

* Update gguf-py/scripts/gguf_dump.py

Co-authored-by: compilade <git@compilade.net>

* gguf_dump.py: refactor kv array string handling

* gguf_dump.py: escape backticks inside of strings

* gguf_dump.py: inline code markdown escape handler added

>>> escape_markdown_inline_code("hello world")
'`hello world`'
>>> escape_markdown_inline_code("hello ` world")
'``hello ` world``'

* gguf_dump.py: handle edge case about backticks on start or end of a string

---------

Co-authored-by: compilade <git@compilade.net>
2024-07-20 17:35:25 +10:00
slaren
87e397d00b
ggml : fix quant dot product with odd number of blocks (#8549)
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-19 17:17:27 +02:00
Brian
57b1d4f9eb
convert-*.py: remove add_name from ChatGLMModel class (#8590) 2024-07-20 00:04:38 +10:00
Georgi Gerganov
d197545530
llama : bump max layers from 256 to 512 (#8530)
* llama : bump max layers from 256 to 512

* llama : replace asserts with exceptions
2024-07-19 16:50:47 +03:00
Georgi Gerganov
be0cfb4175
readme : fix server badge 2024-07-19 14:34:55 +03:00
Clint Herron
b57eb9ca4f
ggml : add friendlier error message to fopen errors (#8575)
* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.
2024-07-19 14:05:45 +03:00
Frank Mai
f299aa98ec
fix: typo of chatglm4 chat tmpl (#8586)
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-07-19 11:44:41 +02:00
Brian
3d0e4367d9
convert-*.py: add general.name kv override (#8571) 2024-07-19 17:51:51 +10:00
Johannes Gäßler
a15ef8f8a0
CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) 2024-07-18 23:48:47 +02:00
65a
705b7ecf60
cmake : install all ggml public headers (#8480)
Co-authored-by: 65a <65a@65a.invalid>
2024-07-18 17:47:12 +03:00
Eric Zhang
0d2c7321e9
server: use relative routes for static files in new UI (#8552)
* server: public: fix api_url on non-index pages

* server: public: use relative routes for static files in new UI
2024-07-18 12:43:49 +02:00
Brian
672a6f1018
convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499)
Main thing is that the default output filename will take this form

{name}{parameters}{finetune}{version}{encoding}{kind}

In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content

* No Change:
  - Internal GGUF Spec
    - `general.architecture`
    - `general.quantization_version`
    - `general.alignment`
    - `general.file_type`
  - General Model Details
    - `general.name`
    - `general.author`
    - `general.version`
    - `general.description`
  - Licensing details
    - `general.license`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.url`
  - Model Source during conversion
    - `general.source.url`

* Removed:
  - Model Source during conversion
    - `general.source.huggingface.repository`

* Added:
  - General Model Details
    - `general.organization`
    - `general.finetune`
    - `general.basename`
    - `general.quantized_by`
    - `general.size_label`
  - Licensing details
    - `general.license.name`
    - `general.license.link`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.doi`
    - `general.uuid`
    - `general.repo_url`
  - Model Source during conversion
    - `general.source.doi`
    - `general.source.uuid`
    - `general.source.repo_url`
  - Base Model Source
    - `general.base_model.count`
    - `general.base_model.{id}.name`
    - `general.base_model.{id}.author`
    - `general.base_model.{id}.version`
    - `general.base_model.{id}.organization`
    - `general.base_model.{id}.url` (Model Website/Paper)
    - `general.base_model.{id}.doi`
    - `general.base_model.{id}.uuid`
    - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...))
  - Array based KV stores
    - `general.tags`
    - `general.languages`
    - `general.datasets`

---------

Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-07-18 20:40:15 +10:00
nopperl
15260c5ba8 fix layer input for swin norm 2024-07-18 10:26:34 +02:00
RunningLeon
3807c3de04
server : respect --special cli arg (#8553) 2024-07-18 11:06:22 +03:00
nopperl
f40cd2073a adapt to new lora implementation 2024-07-18 09:59:30 +02:00
nopperl
fa568f6a82 check for k norm separately 2024-07-18 09:30:40 +02:00
nopperl
da5e356dfb fix ci 2024-07-18 09:28:26 +02:00
nopperl
126201d1a2 add comment to conversion 2024-07-18 08:28:27 +02:00
nopperl
b16c09a1cc Merge branch 'master' into chameleon 2024-07-18 08:25:33 +02:00
Johannes Gäßler
e02b597be3
lookup: fibonacci hashing, fix crashes (#8548) 2024-07-17 23:35:44 +02:00
Al Mochkin
b3283448ce
build : Fix docker build warnings (#8535) (#8537) 2024-07-17 20:21:55 +02:00
nopperl
90766e15e2 rem tabs 2024-07-17 17:18:09 +02:00
Brian
30f80ca0bc
CONTRIBUTING.md : remove mention of noci (#8541) 2024-07-17 17:57:06 +03:00
nopperl
758612a984 suppress image token output 2024-07-17 16:19:13 +02:00
hipudding
1bdd8ae19f
[CANN] Add Ascend NPU backend (#6035)
* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <391746016@qq.com>

* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <391746016@qq.com>
2024-07-17 14:23:50 +03:00
nopperl
3d3523e432 implement swin norm 2024-07-17 12:55:47 +02:00
nopperl
c460d5c3bb return qk norm weights and biases to original format 2024-07-17 12:53:56 +02:00
Masaya, Kato
da3913d8f9
batched: fix n_predict parameter (#8527) 2024-07-17 10:34:28 +03:00
Georgi Gerganov
d65a8361fe
llama : disable context-shift for DeepSeek v2 (#8501) 2024-07-17 10:32:59 +03:00
Johannes Gäßler
5e116e8dd5
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) 2024-07-16 21:20:59 +02:00
Brian
1666f92dcd
gguf-hash : update clib.json to point to original xxhash repo (#8491)
* Update clib.json to point to Cyan4973 original xxhash

Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata

https://github.com/Cyan4973/xxHash/pull/954

* gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]
2024-07-16 10:14:16 +03:00