Commit graph

17 commits

Author SHA1 Message Date
Justine Tunney
e6b7c16a53
Make changes needed for new demo 2023-06-15 23:22:49 -07:00
Justine Tunney
8fdb31681a
Introduce support for GGJT v3 file format
llama.com can now load weights that use the new file format which was
introduced a few weeks ago. Note that, unlike llama.cpp, we will keep
support for old file formats in our tool so you don't need to convert
your weights when the upstream project makes breaking changes. Please
note that using ggjt v3 does make avx2 inference go 5% faster for me.
2023-06-03 15:46:21 -07:00
Justine Tunney
e7eb0b3070
Make more ML improvements
- Fix UX issues with llama.com
- Do housekeeping on libm code
- Add more vectorization to GGML
- Get GGJT quantizer programs working well
- Have the quantizer keep the output layer as f16c
- Prefetching improves performance 15% if you use fewer threads
2023-05-16 08:07:23 -07:00
Justine Tunney
4a8a81eb9f
Fix llama.com interactive mode regressions 2023-05-13 00:09:38 -07:00
Justine Tunney
45186c74ac
Introduce -q (quiet flag) and improve ctrl-c ux 2023-05-12 09:46:07 -07:00
Justine Tunney
e8de1e4766
Fix subtoken antiprompt scanning 2023-05-12 08:55:40 -07:00
Justine Tunney
80c174d494
Clean up llama.com anti/stop/reverse-prompt code
Example use case for JSON completion:

    $ m=opt
    $ make -j16 m=$m o/$m/third_party/ggml/llama.com
    $ o/$m/third_party/ggml/llama.com -m llama.bin -p '{"key": "life", "val": ' -r '}'
    42}

This provides better control. More sophisticated facilities for
controlling text generation will be provided soon enough.
2023-05-12 08:20:58 -07:00
Justine Tunney
ca19ecf49c
Fine tune crash reports for llama.com 2023-05-12 06:24:26 -07:00
Justine Tunney
a88290e595
Make sure llama.com terminal cleanup happens 2023-05-10 15:56:01 -07:00
Justine Tunney
290a49952e
Fix some more issues with aarch64 and llama.cpp 2023-05-10 07:34:26 -07:00
Justine Tunney
5f57fc1f59
Upgrade llama.cpp to e6a46b0ed1884c77267dc70693183e3b7164e0e0 2023-05-10 04:20:48 -07:00
Justine Tunney
4c093155a3
Get llama.com building as an aarch64 native binary 2023-05-10 04:20:47 -07:00
Justine Tunney
3dac9f8999
Use Companion AI in llama.com by default 2023-04-30 23:08:15 -07:00
Justine Tunney
d9e27203d4
Incorporate some fixes and updates for GGML 2023-04-28 20:24:55 -07:00
Justine Tunney
b31ba86ace
Introduce prompt caching so prompts load instantly
This change also introduces an ephemeral status line in non-verbose mode
to display a load percentage status when slow operations are happening.
2023-04-28 16:15:26 -07:00
Justine Tunney
1c2da3a55a
Make shell usability improvements to llama.cpp
- Introduce -v and --verbose flags
- Don't print stats / diagnostics unless -v is passed
- Reduce --top_p default from 0.95 to 0.70
- Change --reverse-prompt to no longer imply --interactive
- Permit --reverse-prompt specifying custom EOS if non-interactive
2023-04-28 02:54:11 -07:00
Justine Tunney
e8b43903b2
Import llama.cpp
https://github.com/ggerganov/llama.cpp
0b2da20538d01926b77ea237dd1c930c4d20b686
See third_party/ggml/README.cosmo for changes
2023-04-27 14:37:14 -07:00