Commit graph

5 commits

Author SHA1 Message Date
Justine Tunney
e7eb0b3070
Make more ML improvements
- Fix UX issues with llama.com
- Do housekeeping on libm code
- Add more vectorization to GGML
- Get GGJT quantizer programs working well
- Have the quantizer keep the output layer as f16c
- Prefetching improves performance 15% if you use fewer threads
2023-05-16 08:07:23 -07:00
Justine Tunney
282dd8e7b7
Get radpajama to build
make -j8 o//third_party/radpajama/radpajama.com
    make -j8 o//third_party/radpajama/radpajama-chat.com

This change gets the radpajama.mk config working. This package depends
on THIRD_PARTY_GGML but it's configured to call ggjt_v1(), so that the
library will provide the old quantizers. The ggml_quantize_chunk() API
will now dispatch to older quantizers based on the configured version.
2023-05-13 20:44:36 -07:00
Justine Tunney
5a4cf9560f
Add support for new GGJT v2 quantizers
This change makes quantized models (e.g. q4_0) go 10% faster on Macs
however doesn't offer much improvement for Intel PC hardware.

This change syncs llama.cpp 699b1ad7fe6f7b9e41d3cb41e61a8cc3ea5fc6b5
which recently made a breaking change to nearly all its file formats
without any migration. Since that'll break hundreds upon hundreds of
models on websites like HuggingFace llama.com will support both file
formats because llama.com will never ever break the GGJT file format
2023-05-13 08:08:32 -07:00
Justine Tunney
5f57fc1f59
Upgrade llama.cpp to e6a46b0ed1884c77267dc70693183e3b7164e0e0 2023-05-10 04:20:48 -07:00
Justine Tunney
e8b43903b2
Import llama.cpp
https://github.com/ggerganov/llama.cpp
0b2da20538d01926b77ea237dd1c930c4d20b686
See third_party/ggml/README.cosmo for changes
2023-04-27 14:37:14 -07:00