DESCRIPTION ggml is a machine learning library useful for LLM inference on CPUs LICENSE MIT ORIGIN https://github.com/ggerganov/llama.cpp d8bd0013e8768aaa3dc9cfc1ff01499419d5348e LOCAL CHANGES - Maintaining support for deprecated file formats - Make it possible for loaded prompts to be cached to disk - Introduce -v and --verbose flags - Reduce batch size from 512 to 32 - Allow --n_keep to specify a substring of prompt - Don't print stats / diagnostics unless -v is passed - Reduce --top_p default from 0.95 to 0.70 - Change --reverse-prompt to no longer imply --interactive - Permit --reverse-prompt specifying custom EOS if non-interactive - Refactor headers per cosmo convention - Remove C++ exceptions; use Die() function instead - Removed division from matrix multiplication. - Let quantizer convert between ggmt formats