Commit graph

3435 commits

Author SHA1 Message Date
brian khuu
4c91d077d2 convert-*.py: cast not required if Metadata.load_metadata_override returned a dict[str, Any] instead of a dict[str, object]
Co-authored-by: compilade <git@compilade.net>
2024-07-16 06:42:38 +10:00
Brian
74383ba6d2 Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
2024-07-16 06:42:38 +10:00
brian khuu
dd14b8fdb1 convert-*.py: pyright type fixes 2024-07-16 06:42:38 +10:00
brian khuu
59a01df784 convert-*.py: refactor per model weight count estimation 2024-07-16 06:42:38 +10:00
brian khuu
2a976e1211 convert-*.py: write_tensors() --> prepare_tensors_for_writing() 2024-07-16 06:42:38 +10:00
brian khuu
fdc5a3fc80 convert-*.py: autogenerate general.uuid if missing 2024-07-16 06:42:35 +10:00
brian khuu
7ecb8f00a0 test: remove test_gguf.py and remove test_generate_any_missing_uuid() 2024-07-16 06:38:40 +10:00
brian khuu
007708e32d gguf_writer.py: generate tensor uuid if missing 2024-07-16 06:38:40 +10:00
brian khuu
4dc8ddd35a convert_hf_to_gguf.py: Remove code that is already in fill_templated_filename() and GGUFWriter() 2024-07-16 06:38:40 +10:00
brian khuu
2f23927d37 convert_hf_to_gguf.py: rebase error correction 2024-07-16 06:38:40 +10:00
brian khuu
5011eefeaf convert_hf_to_gguf.py: optional, dataclass removed from type as it was unused 2024-07-16 06:38:40 +10:00
brian khuu
e9734434bd convert-*.py: Remove self.model_name that was left in since last rebase 2024-07-16 06:38:40 +10:00
brian khuu
eaa47f5546 convert-*.py: separated unit test, hf_repo to repo_url 2024-07-16 06:38:40 +10:00
brian khuu
d060fcdbe2 convert-*.py: adjusted authorship KV store 2024-07-16 06:38:40 +10:00
brian khuu
91e65d9485 convert-*.py: add unittest to metadata class 2024-07-16 06:38:38 +10:00
brian khuu
3625a42061 convert-*.py: add heuristic to directory name fallback
Also add source_url for huggingface url
2024-07-16 06:37:42 +10:00
brian khuu
39472a09da convert-*.py: need to include self in per_model_weight_count_estimation() 2024-07-16 06:37:42 +10:00
brian khuu
54918ad14e convert-*.py: refactor parameter weight class 2024-07-16 06:37:42 +10:00
brian khuu
32e80e094c convert-*.py: base_model is actually in spec for model cards 2024-07-16 06:37:42 +10:00
brian khuu
4d5cd0670a convert-*.py: use heuristics to parse _name_or_path 2024-07-16 06:37:42 +10:00
brian khuu
b0553f42da convert-*.py: adjust help message 2024-07-16 06:37:42 +10:00
brian khuu
dd1571211e convert-*.py: add quantized_by and enhance heuristics 2024-07-16 06:37:38 +10:00
brian khuu
5a86dfaa1c convert-*.py: add general.organization to kv store 2024-07-16 06:36:03 +10:00
brian khuu
f7c20793b9 convert-*.py: enable --model-name direct metadata override 2024-07-16 06:36:03 +10:00
brian khuu
b1927eed82 convert-*.py: move per model weight estimation away from util back to main script
plus some refactoring
2024-07-16 06:36:03 +10:00
brian khuu
684c604eca convert-*.py: add datasets and language to KV store 2024-07-16 06:36:03 +10:00
brian khuu
0f1d50fab7 convert-*.py: add parameter size class 2024-07-16 06:36:03 +10:00
brian khuu
8f734083dd convert-*.py: add base_version and add tags 2024-07-16 06:36:03 +10:00
brian khuu
b36e391b87 convert-*.py: parse model card in metadata util. Add license_link and license_name to kv store 2024-07-16 06:36:03 +10:00
brian khuu
5c263cb257 convert-*.py: encoding_scheme --> output_type 2024-07-16 06:36:03 +10:00
brian khuu
4d5f18a0e6 convert-*.py: metadata class moved to utility 2024-07-16 06:36:03 +10:00
brian khuu
916872f72f convert-*.py: model card metadata 2024-07-16 06:36:03 +10:00
brian khuu
a42c2b7efc convert-*.py: add basename and finetune metadata 2024-07-16 06:36:03 +10:00
brian khuu
dbb1b471e4 convert-*.py: add --get-outfile command and refactor 2024-07-16 06:36:03 +10:00
brian khuu
d3a936fd0e convert-*.py: licence -> license 2024-07-16 06:36:03 +10:00
Xuan Son Nguyen
97bdd26eee
Refactor lora adapter support (#8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-15 20:50:47 +02:00
Xuan Son Nguyen
4db8f60fe7
fix ci (#8494) 2024-07-15 19:23:10 +02:00
Daniel Bevenius
8fac431b06
ggml : suppress unknown pragma 'GCC' on windows (#8460)
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
2024-07-15 15:48:17 +03:00
M-A
f17f39ff9c
server: update README.md with llama-server --help output [no ci] (#8472)
The README.md had a stale information. In particular, the --ctx-size
"defaults to 512" confused me and I had to check the code to confirm
this was false. This the server is evolving rapidly, it's probably
better to keep the source of truth at a single place (in the source) and
generate the README.md based on that.

Did:

    make llama-server
    ./llama-server --help > t.txt
    vimdiff t.txt examples/server/README.md

I copied the content inside a backquote block. I would have preferred
proper text but it would require a fair amount of surgery to make the
current output compatible with markdown. A follow up could be to
automate this process with a script.

No functional change.
2024-07-15 15:04:56 +03:00
Georgi Gerganov
9104bc20ed
common : add --no-cont-batching arg (#6358) 2024-07-15 14:54:58 +03:00
NikolaiLyssogor
fc690b018e
docs: fix links in development docs [no ci] (#8481)
Fixes a few links to within the repo that were broken in the reorganization of the
documentation in #8325.
2024-07-15 14:46:39 +03:00
Meng, Hengyu
16bdfa42ac
[SYCL] add concat through dim 1/2 (#8483)
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
Georgi Gerganov
3dfda05956
llama : de-duplicate deepseek2 norm 2024-07-15 14:10:39 +03:00
0cc4m
bda62d7999
Vulkan MMQ Fix (#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-07-15 09:38:52 +02:00
compilade
090fca7a07
pydantic : replace uses of __annotations__ with get_type_hints (#8474)
* pydantic : replace uses of __annotations__ with get_type_hints

* pydantic : fix Python 3.9 and 3.10 support
2024-07-14 19:51:21 -04:00
Georgi Gerganov
aaab2419ea
flake.lock: Update (#8475)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03)
  → 'github:NixOS/nixpkgs/7e7c39ea35c5cdd002cd4588b03a3fb9ece6fad9?narHash=sha256-EYekUHJE2gxeo2pM/zM9Wlqw1Uw2XTJXOSAO79ksc4Y%3D' (2024-07-12)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-07-14 08:54:02 -07:00
Georgi Gerganov
73cf442e7b
llama : fix Gemma-2 Query scaling factors (#8473)
* 9B - query_pre_attn_scalar = 256 not 224

See 03e657582d

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

* llama : fix Gemma-2 Query scaling factor

ggml-ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-07-14 14:05:09 +03:00
Brian
e236528e76
gguf_hash.py: Add sha256 (#8470)
* gguf_hash.py: Add sha256

* gguf_hash.py: rename string UUIDv5 --> uuid

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2024-07-14 16:47:14 +10:00
compilade
fa79495bb4
llama : fix pre-tokenization of non-special added tokens (#8228)
* llama : fix mpt and olmo pre-tokenizer

* llama : pre-tokenize non-special user-defined tokens first

* llama : fix detection of control-like user-defined tokens

* convert_hf : identify which user-defined tokens are control tokens

Only used in _set_vocab_gpt2() for now.

* convert_hf : identify more added control tokens for SPM tokenziers

This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly,
including HTML tags and consecutive spaces,
but it unfortunately requires model re-conversion.

There seems to be a weird behavior of the HF tokenizer for Gemma,
which prefers to use the 16-space token over more lengthy space tokens,
while using the SentencePiece tokenizer does not do this.
(the implementation in llama.cpp has the same behavior as SentencePiece)

* llama : fix wrong pre-tokenization of byte tokens

* llama : fix Viking pre-tokenizer regex

The order was previously wrong, which caused errors in some tests.

* llama : fix command-r detokenization

* convert_hf : reduce usages of the UNKNOWN token type

* llama : add UNKNOWN tokens in the special tokens cache

* convert_hf : reduce usages of UNKNOWN for InternLM2

This makes the changes from #8321 more consistent
with the other changes made here.

* test-tokenizer-random : reduce potential confilcts with #8379

* test-tokenizer-random : add a failing edge case for falcon
2024-07-13 23:35:10 -04:00
bandoti
17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00