cosmopolitan/third_party/ggml/README.cosmo
Justine Tunney b31ba86ace
Introduce prompt caching so prompts load instantly
This change also introduces an ephemeral status line in non-verbose mode
to display a load percentage status when slow operations are happening.
2023-04-28 16:15:26 -07:00

28 lines
876 B
Text

DESCRIPTION
ggml is a machine learning library useful for LLM inference on CPUs
LICENSE
MIT
ORIGIN
https://github.com/ggerganov/llama.cpp
commit 0b2da20538d01926b77ea237dd1c930c4d20b686
Author: Stephan Walter <stephan@walter.name>
Date: Wed Apr 26 20:26:42 2023 +0000
ggml : slightly faster AVX2 implementation for Q5 (#1197)
LOCAL CHANGES
- Make it possible for loaded prompts to be cached to disk
- Introduce -v and --verbose flags
- Reduce batch size from 512 to 32
- Don't print stats / diagnostics unless -v is passed
- Reduce --top_p default from 0.95 to 0.70
- Change --reverse-prompt to no longer imply --interactive
- Permit --reverse-prompt specifying custom EOS if non-interactive
- Refactor headers per cosmo convention
- Replace code like 'ggjt' with READ32BE("ggjt")
- Remove C++ exceptions; use Die() function instead