Commit graph

587 commits

Author SHA1 Message Date
Justine Tunney
210187cf77
Perform some code cleanup 2023-05-15 16:32:10 -07:00
Justine Tunney
cc1732bc42
Make AARCH64 harder, better, faster, stronger
- Perform some housekeeping on scalar math function code
- Import ARM's Optimized Routines for SIMD string processing
- Upgrade to latest Chromium zlib and enable more SIMD optimizations
2023-05-15 02:15:34 -07:00
Justine Tunney
550b52abf6
Port a lot more code to AARCH64
- Introduce epoll_pwait()
- Rewrite -ftrapv and ffs() libraries in C code
- Use more FreeBSD code in math function library
- Get significantly more tests passing on qemu-aarch64
- Fix many Musl long double functions that were broken on AARCH64
2023-05-14 09:37:26 -07:00
Ariel Núñez
91791e9f38
Started removing features from RedPajama to make it easier to understand for beginners (#817) 2023-05-14 09:16:22 -07:00
Justine Tunney
89d1fad7ee
Enable crash reports for radpajama executables 2023-05-13 21:16:03 -07:00
Justine Tunney
296ee3ec58
Make some other fixes to radpajama build config 2023-05-13 21:09:28 -07:00
Justine Tunney
282dd8e7b7
Get radpajama to build
make -j8 o//third_party/radpajama/radpajama.com
    make -j8 o//third_party/radpajama/radpajama-chat.com

This change gets the radpajama.mk config working. This package depends
on THIRD_PARTY_GGML but it's configured to call ggjt_v1(), so that the
library will provide the old quantizers. The ggml_quantize_chunk() API
will now dispatch to older quantizers based on the configured version.
2023-05-13 20:44:36 -07:00
Justine Tunney
410c8785c9
Fix the AARCH64 build 2023-05-13 08:19:44 -07:00
Justine Tunney
5a4cf9560f
Add support for new GGJT v2 quantizers
This change makes quantized models (e.g. q4_0) go 10% faster on Macs
however doesn't offer much improvement for Intel PC hardware.

This change syncs llama.cpp 699b1ad7fe6f7b9e41d3cb41e61a8cc3ea5fc6b5
which recently made a breaking change to nearly all its file formats
without any migration. Since that'll break hundreds upon hundreds of
models on websites like HuggingFace llama.com will support both file
formats because llama.com will never ever break the GGJT file format
2023-05-13 08:08:32 -07:00
Justine Tunney
802e7eb4ef
Mop up more test regressions 2023-05-13 01:09:44 -07:00
Justine Tunney
4a8a81eb9f
Fix llama.com interactive mode regressions 2023-05-13 00:09:38 -07:00
Justine Tunney
fd34ef732d
Make considerably more progress on AARCH64
- Utilities like pledge.com now build
- kprintf() will no longer balk at 48-bit addresses
- There's a new aarch64-dbg build mode that should work
- gc() and defer() are mostly pacified; avoid using them on aarch64
- THIRD_PART_STB now has Arm Neon intrinsics for fast image handling
2023-05-12 22:42:57 -07:00
Justine Tunney
1bfb3aab1b
Make Arm Neon intrinsics work with make tags 2023-05-12 18:32:53 -07:00
Justine Tunney
45186c74ac
Introduce -q (quiet flag) and improve ctrl-c ux 2023-05-12 09:46:07 -07:00
Justine Tunney
e8de1e4766
Fix subtoken antiprompt scanning 2023-05-12 08:55:40 -07:00
Justine Tunney
80c174d494
Clean up llama.com anti/stop/reverse-prompt code
Example use case for JSON completion:

    $ m=opt
    $ make -j16 m=$m o/$m/third_party/ggml/llama.com
    $ o/$m/third_party/ggml/llama.com -m llama.bin -p '{"key": "life", "val": ' -r '}'
    42}

This provides better control. More sophisticated facilities for
controlling text generation will be provided soon enough.
2023-05-12 08:20:58 -07:00
Justine Tunney
bbfe4fbd11
Make llama.com n_predict be -1 by default 2023-05-12 08:20:34 -07:00
Justine Tunney
ca19ecf49c
Fine tune crash reports for llama.com 2023-05-12 06:24:26 -07:00
Justine Tunney
4edbc98811
Get MbedTLS and its unit tests passing AARCH64 2023-05-11 21:53:15 -07:00
Justine Tunney
5e2f7f7ced
Get LIBC_TESTLIB building on AARCH64 2023-05-11 19:57:09 -07:00
Justine Tunney
95fab334e4
Use yield on aarch in spin locks 2023-05-11 19:57:09 -07:00
Ariel Núñez
b3e3359d22
Import radpajama (a redpajama.cpp fork) (#814)
This is the relevant commit: bfa6466199

Model download links:
https://huggingface.co/ceonlabs/radpajama/tree/main
2023-05-11 07:12:08 -07:00
Justine Tunney
1f6f9e6701
Remove division from matrix multiplication
This change reduces llama.com CPU cycles systemically by 2.5% according
to the Linux Kernel `perf stat -Bddd` utility.
2023-05-10 21:19:54 -07:00
Justine Tunney
a88290e595
Make sure llama.com terminal cleanup happens 2023-05-10 15:56:01 -07:00
Justine Tunney
5250feb7ad
There must only be one strerror() 2023-05-10 15:34:13 -07:00
Justine Tunney
bb3ebedfce
Fix load time measurement 2023-05-10 07:54:21 -07:00
Justine Tunney
290a49952e
Fix some more issues with aarch64 and llama.cpp 2023-05-10 07:34:26 -07:00
Justine Tunney
12a33858c9
There must be only one clock() 2023-05-10 06:16:01 -07:00
Justine Tunney
6cb9553706
Fix alignment bug in llama.com 2023-05-10 06:15:32 -07:00
Justine Tunney
ca990ef091
Make llama.com -h print to stdout 2023-05-10 04:55:59 -07:00
Justine Tunney
5f57fc1f59
Upgrade llama.cpp to e6a46b0ed1884c77267dc70693183e3b7164e0e0 2023-05-10 04:20:48 -07:00
Justine Tunney
86d9323a43
Remove sys_getrandom() on NetBSD
This fixes an apparent regression caused by
3f0bcdc3ef where getrandom() on NetBSD 9.2
doesn't appear to work; ktrace oddly reports:

    1446      1 .ape     CALL  #91 (unimplemented getdopt)
    1446      1 .ape     RET   #91 (unimplemented getdopt) -1 errno 78
    Function not implemented
    1446      1 .ape     PSIG  SIGSYS SIG_DFL: code=SI_NOINFO
2023-05-10 04:20:47 -07:00
Justine Tunney
a0237a017c
Get llama.com working on aarch64 2023-05-10 04:20:47 -07:00
Justine Tunney
4c093155a3
Get llama.com building as an aarch64 native binary 2023-05-10 04:20:47 -07:00
Justine Tunney
d04430f4ef
Get LIBC_MEM and LIBC_STDIO building with aarch64 2023-05-10 04:20:47 -07:00
Justine Tunney
ae0ee59614
Get aarch64 hello world working
$ m=aarch64-tiny
    $ make -j8 m=$m o/$m/tool/hello/hello.com o/third_party/qemu/qemu-aarch64
    $ o/third_party/qemu/qemu-aarch64 o/$m/tool/hello/hello.com
    hello world
    $ ls -hal o/$m/tool/hello/hello.com
    -rwxr-xr-x 1 jart jart 4.0K May  9 05:04 o/aarch64-tiny/tool/hello/hello.com
2023-05-10 04:20:47 -07:00
Justine Tunney
e5e3cdf447
Get LIBC_RUNTIME and LIBC_CALLS building on aarch64 2023-05-10 04:20:47 -07:00
Justine Tunney
036b9a0002
Make further progress on non-x86 support 2023-05-10 04:20:47 -07:00
Justine Tunney
135080fd3e
Get libc/tinymath/ compiling on aarch64 2023-05-10 04:20:46 -07:00
Justine Tunney
2b73e72d59
Make more code aarch64 friendly 2023-05-10 04:20:46 -07:00
Justine Tunney
ca2860947f
Make progress towards aarch64 build 2023-05-10 04:20:46 -07:00
Justine Tunney
08ff26c817
Add qemu-aarch64 2023-05-10 04:20:46 -07:00
Justine Tunney
57cc257f58
Vendor musl-cross-make gcc 9.2.0 aarch64 2023-05-10 04:20:46 -07:00
Justine Tunney
12438cce16
Fix regression with Python linker eaxmples
We can once again create 2mb statically-linked Python binaries:

    $ make -j8 m=tiny o/tiny/examples/pyapp/pyapp.com
    $ ls -hal o/tiny/examples/pyapp/pyapp.com
    -rwxr-xr-x 1 jart jart 2.1M May  1 14:04 o/tiny/examples/pyapp/pyapp.com
    $ o/tiny/examples/pyapp/pyapp.com
    cosmopolitan is cool!

The regression was caused by Python thread support in b15f9eb58
2023-05-01 14:12:15 -07:00
Justine Tunney
3dac9f8999
Use Companion AI in llama.com by default 2023-04-30 23:08:15 -07:00
Justine Tunney
d9e27203d4
Incorporate some fixes and updates for GGML 2023-04-28 20:24:55 -07:00
Justine Tunney
b31ba86ace
Introduce prompt caching so prompts load instantly
This change also introduces an ephemeral status line in non-verbose mode
to display a load percentage status when slow operations are happening.
2023-04-28 16:15:26 -07:00
Justine Tunney
1c2da3a55a
Make shell usability improvements to llama.cpp
- Introduce -v and --verbose flags
- Don't print stats / diagnostics unless -v is passed
- Reduce --top_p default from 0.95 to 0.70
- Change --reverse-prompt to no longer imply --interactive
- Permit --reverse-prompt specifying custom EOS if non-interactive
2023-04-28 02:54:11 -07:00
Justine Tunney
420f889ac3
Further optimize the math library
The sincosf() function is now twice as fast, thanks to ARM Limited. The
same might also be true of logf() and expm1f() which have been updated.
2023-04-28 01:20:47 -07:00
Justine Tunney
e8b43903b2
Import llama.cpp
https://github.com/ggerganov/llama.cpp
0b2da20538d01926b77ea237dd1c930c4d20b686
See third_party/ggml/README.cosmo for changes
2023-04-27 14:37:14 -07:00