Main thing is that the default output filename will take this form
{name}{parameters}{finetune}{version}{encoding}{kind}
In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content
* No Change:
- Internal GGUF Spec
- `general.architecture`
- `general.quantization_version`
- `general.alignment`
- `general.file_type`
- General Model Details
- `general.name`
- `general.author`
- `general.version`
- `general.description`
- Licensing details
- `general.license`
- Typically represents the converted GGUF repo (Unless made from scratch)
- `general.url`
- Model Source during conversion
- `general.source.url`
* Removed:
- Model Source during conversion
- `general.source.huggingface.repository`
* Added:
- General Model Details
- `general.organization`
- `general.finetune`
- `general.basename`
- `general.quantized_by`
- `general.size_label`
- Licensing details
- `general.license.name`
- `general.license.link`
- Typically represents the converted GGUF repo (Unless made from scratch)
- `general.doi`
- `general.uuid`
- `general.repo_url`
- Model Source during conversion
- `general.source.doi`
- `general.source.uuid`
- `general.source.repo_url`
- Base Model Source
- `general.base_model.count`
- `general.base_model.{id}.name`
- `general.base_model.{id}.author`
- `general.base_model.{id}.version`
- `general.base_model.{id}.organization`
- `general.base_model.{id}.url` (Model Website/Paper)
- `general.base_model.{id}.doi`
- `general.base_model.{id}.uuid`
- `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...))
- Array based KV stores
- `general.tags`
- `general.languages`
- `general.datasets`
---------
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* [CANN] Add Ascend NPU backend
Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.
CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.
Co-authored-by: wangshuai09 <391746016@qq.com>
* delete trailing whitespaces
* Modify the code based on review comment
* Rename LLAMA_CANN to GGML_CANN
* Make ggml-common.h private
* add ggml_cann prefix for acl funcs
* Add logging for CANN backend
* Delete Trailing whitespace
---------
Co-authored-by: wangshuai09 <391746016@qq.com>
* Update clib.json to point to Cyan4973 original xxhash
Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata
https://github.com/Cyan4973/xxHash/pull/954
* gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]
The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.
* convert_hf : faster lazy safetensors
This makes '--dry-run' much, much faster.
* convert_hf : fix memory leak in lazy MoE conversion
The '_lazy' queue was sometimes self-referential,
which caused reference cycles of objects old enough
to avoid garbage collection until potential memory exhaustion.
* lora: load to devide buft
* add patch tensor function
* correct tensor patch
* llama_lora_adapter_apply
* correct ggml_backend_tensor_copy
* add llm_build_mm
* fix auto merge
* update based on review comments
* add convert script
* no more transpose A
* add f16 convert
* add metadata check
* add sanity check
* fix ftype
* add requirements
* fix requirements
* fix outfile
* conversion: only allow selected models
* fix types
* cuda : do not use dmmv if the tensor does not have enough cols
* llama : lora fixes
* do not disable mmap with lora
Co-authored-by: slaren <slarengh@gmail.com>
* llm_build_lora_mm_id
* convert_lora : MoE LoRA conversion support
* convert_lora : prefer safetensors, similarly to convert_hf
* convert_hf : simplify modify_tensors for InternLM2
* convert_lora : lazy conversion
* llama : load and use alpha from LoRA adapters
* llama : use llm_build_lora_mm in most model graphs
* auto scale
* Revert "auto scale"
This reverts commit 42415a4874.
* remove redundant params
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* change kv metadata
* move add_type to __init__
* convert_hf : move add_type to main()
* convert_lora : use the GGUFWriter from Model instead of overwriting it
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:
```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
The README.md had a stale information. In particular, the --ctx-size
"defaults to 512" confused me and I had to check the code to confirm
this was false. This the server is evolving rapidly, it's probably
better to keep the source of truth at a single place (in the source) and
generate the README.md based on that.
Did:
make llama-server
./llama-server --help > t.txt
vimdiff t.txt examples/server/README.md
I copied the content inside a backquote block. I would have preferred
proper text but it would require a fair amount of surgery to make the
current output compatible with markdown. A follow up could be to
automate this process with a script.
No functional change.