cosmopolitan/third_party/ggml/README.cosmo

DESCRIPTION

  ggml is a machine learning library useful for LLM inference on CPUs

LICENSE

  MIT

ORIGIN

  https://github.com/ggerganov/llama.cpp
  d8bd0013e8768aaa3dc9cfc1ff01499419d5348e

LOCAL CHANGES

  - Maintaining support for deprecated file formats
  - Make it possible for loaded prompts to be cached to disk
  - Introduce -v and --verbose flags
  - Reduce batch size from 512 to 32
  - Allow --n_keep to specify a substring of prompt
  - Don't print stats / diagnostics unless -v is passed
  - Reduce --top_p default from 0.95 to 0.70
  - Change --reverse-prompt to no longer imply --interactive
  - Permit --reverse-prompt specifying custom EOS if non-interactive
  - Refactor headers per cosmo convention
  - Remove C++ exceptions; use Die() function instead
  - Removed division from matrix multiplication.
  - Let quantizer convert between ggmt formats
Import llama.cpp https://github.com/ggerganov/llama.cpp 0b2da20538d01926b77ea237dd1c930c4d20b686 See third_party/ggml/README.cosmo for changes 2023-04-27 21:31:20 +00:00			`DESCRIPTION`

			`ggml is a machine learning library useful for LLM inference on CPUs`

			`LICENSE`

			`MIT`

			`ORIGIN`

			`https://github.com/ggerganov/llama.cpp`
Introduce support for GGJT v3 file format llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me. 2023-06-03 20:48:52 +00:00			`d8bd0013e8768aaa3dc9cfc1ff01499419d5348e`
Import llama.cpp https://github.com/ggerganov/llama.cpp 0b2da20538d01926b77ea237dd1c930c4d20b686 See third_party/ggml/README.cosmo for changes 2023-04-27 21:31:20 +00:00
			`LOCAL CHANGES`

Introduce support for GGJT v3 file format llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me. 2023-06-03 20:48:52 +00:00			`- Maintaining support for deprecated file formats`
Introduce prompt caching so prompts load instantly This change also introduces an ephemeral status line in non-verbose mode to display a load percentage status when slow operations are happening. 2023-04-28 23:15:26 +00:00			`- Make it possible for loaded prompts to be cached to disk`
Make shell usability improvements to llama.cpp - Introduce -v and --verbose flags - Don't print stats / diagnostics unless -v is passed - Reduce --top_p default from 0.95 to 0.70 - Change --reverse-prompt to no longer imply --interactive - Permit --reverse-prompt specifying custom EOS if non-interactive 2023-04-28 09:54:11 +00:00			`- Introduce -v and --verbose flags`
Introduce prompt caching so prompts load instantly This change also introduces an ephemeral status line in non-verbose mode to display a load percentage status when slow operations are happening. 2023-04-28 23:15:26 +00:00			`- Reduce batch size from 512 to 32`
Use Companion AI in llama.com by default 2023-04-29 07:48:14 +00:00			`- Allow --n_keep to specify a substring of prompt`
Make shell usability improvements to llama.cpp - Introduce -v and --verbose flags - Don't print stats / diagnostics unless -v is passed - Reduce --top_p default from 0.95 to 0.70 - Change --reverse-prompt to no longer imply --interactive - Permit --reverse-prompt specifying custom EOS if non-interactive 2023-04-28 09:54:11 +00:00			`- Don't print stats / diagnostics unless -v is passed`
			`- Reduce --top_p default from 0.95 to 0.70`
			`- Change --reverse-prompt to no longer imply --interactive`
			`- Permit --reverse-prompt specifying custom EOS if non-interactive`
Import llama.cpp https://github.com/ggerganov/llama.cpp 0b2da20538d01926b77ea237dd1c930c4d20b686 See third_party/ggml/README.cosmo for changes 2023-04-27 21:31:20 +00:00			`- Refactor headers per cosmo convention`
			`- Remove C++ exceptions; use Die() function instead`
Make more ML improvements - Fix UX issues with llama.com - Do housekeeping on libm code - Add more vectorization to GGML - Get GGJT quantizer programs working well - Have the quantizer keep the output layer as f16c - Prefetching improves performance 15% if you use fewer threads 2023-05-16 15:07:23 +00:00			`- Removed division from matrix multiplication.`
			`- Let quantizer convert between ggmt formats`