mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-01-31 11:37:35 +00:00
8fdb31681a
llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me.
28 lines
870 B
Text
28 lines
870 B
Text
DESCRIPTION
|
|
|
|
ggml is a machine learning library useful for LLM inference on CPUs
|
|
|
|
LICENSE
|
|
|
|
MIT
|
|
|
|
ORIGIN
|
|
|
|
https://github.com/ggerganov/llama.cpp
|
|
d8bd0013e8768aaa3dc9cfc1ff01499419d5348e
|
|
|
|
LOCAL CHANGES
|
|
|
|
- Maintaining support for deprecated file formats
|
|
- Make it possible for loaded prompts to be cached to disk
|
|
- Introduce -v and --verbose flags
|
|
- Reduce batch size from 512 to 32
|
|
- Allow --n_keep to specify a substring of prompt
|
|
- Don't print stats / diagnostics unless -v is passed
|
|
- Reduce --top_p default from 0.95 to 0.70
|
|
- Change --reverse-prompt to no longer imply --interactive
|
|
- Permit --reverse-prompt specifying custom EOS if non-interactive
|
|
- Refactor headers per cosmo convention
|
|
- Remove C++ exceptions; use Die() function instead
|
|
- Removed division from matrix multiplication.
|
|
- Let quantizer convert between ggmt formats
|