mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-01-31 03:27:39 +00:00
8fdb31681a
llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me. |
||
---|---|---|
.. | ||
common.cc | ||
common.h | ||
companionai.txt | ||
fp16.c | ||
fp16.h | ||
fp16.internal.h | ||
ggjt.v1.c | ||
ggjt.v1.internal.h | ||
ggjt.v1.q4_0.c | ||
ggjt.v1.q4_0.h | ||
ggjt.v1.q4_1.c | ||
ggjt.v1.q4_1.h | ||
ggjt.v1.q4_2.c | ||
ggjt.v1.q4_2.h | ||
ggjt.v1.q5_0.c | ||
ggjt.v1.q5_0.h | ||
ggjt.v1.q5_1.c | ||
ggjt.v1.q5_1.h | ||
ggjt.v1.q8_0.c | ||
ggjt.v1.q8_0.h | ||
ggjt.v1.q8_1.c | ||
ggjt.v1.q8_1.h | ||
ggjt.v2.c | ||
ggjt.v2.internal.h | ||
ggjt.v2.q4_0.c | ||
ggjt.v2.q4_0.h | ||
ggjt.v2.q4_1.c | ||
ggjt.v2.q4_1.h | ||
ggjt.v2.q5_0.c | ||
ggjt.v2.q5_0.h | ||
ggjt.v2.q5_1.c | ||
ggjt.v2.q5_1.h | ||
ggjt.v2.q8_0.c | ||
ggjt.v2.q8_0.h | ||
ggjt.v2.q8_1.c | ||
ggjt.v2.q8_1.h | ||
ggml.c | ||
ggml.h | ||
ggml.mk | ||
LICENSE | ||
llama.cc | ||
llama.h | ||
llama_util.h | ||
main.cc | ||
perplexity.cc | ||
quantize.cc | ||
README.cosmo |
DESCRIPTION ggml is a machine learning library useful for LLM inference on CPUs LICENSE MIT ORIGIN https://github.com/ggerganov/llama.cpp d8bd0013e8768aaa3dc9cfc1ff01499419d5348e LOCAL CHANGES - Maintaining support for deprecated file formats - Make it possible for loaded prompts to be cached to disk - Introduce -v and --verbose flags - Reduce batch size from 512 to 32 - Allow --n_keep to specify a substring of prompt - Don't print stats / diagnostics unless -v is passed - Reduce --top_p default from 0.95 to 0.70 - Change --reverse-prompt to no longer imply --interactive - Permit --reverse-prompt specifying custom EOS if non-interactive - Refactor headers per cosmo convention - Remove C++ exceptions; use Die() function instead - Removed division from matrix multiplication. - Let quantizer convert between ggmt formats