cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-07-03 09:48:29 +00:00

Author	SHA1	Message	Date
Justine Tunney	9cc3e37263	Upgrade to Cosmopolitan GCC 11.2.0 for aarch64	2023-06-05 02:07:28 -07:00
Justine Tunney	8fdb31681a	Introduce support for GGJT v3 file format llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me.	2023-06-03 15:46:21 -07:00
Justine Tunney	1904a3cae8	Sync llama.cpp to 6986c7835adc13ba3f9d933b95671bb1f3984dc6	2023-06-03 10:29:12 -07:00
Justine Tunney	e7eb0b3070	Make more ML improvements - Fix UX issues with llama.com - Do housekeeping on libm code - Add more vectorization to GGML - Get GGJT quantizer programs working well - Have the quantizer keep the output layer as f16c - Prefetching improves performance 15% if you use fewer threads	2023-05-16 08:07:23 -07:00
Justine Tunney	80db9de173	Make the intrinsics more readable	2023-05-15 23:12:11 -07:00
Justine Tunney	210187cf77	Perform some code cleanup	2023-05-15 16:32:10 -07:00
Justine Tunney	282dd8e7b7	Get radpajama to build make -j8 o//third_party/radpajama/radpajama.com make -j8 o//third_party/radpajama/radpajama-chat.com This change gets the radpajama.mk config working. This package depends on THIRD_PARTY_GGML but it's configured to call ggjt_v1(), so that the library will provide the old quantizers. The ggml_quantize_chunk() API will now dispatch to older quantizers based on the configured version.	2023-05-13 20:44:36 -07:00
Justine Tunney	5a4cf9560f	Add support for new GGJT v2 quantizers This change makes quantized models (e.g. q4_0) go 10% faster on Macs however doesn't offer much improvement for Intel PC hardware. This change syncs llama.cpp 699b1ad7fe6f7b9e41d3cb41e61a8cc3ea5fc6b5 which recently made a breaking change to nearly all its file formats without any migration. Since that'll break hundreds upon hundreds of models on websites like HuggingFace llama.com will support both file formats because llama.com will never ever break the GGJT file format	2023-05-13 08:08:32 -07:00
Justine Tunney	95fab334e4	Use yield on aarch in spin locks	2023-05-11 19:57:09 -07:00
Justine Tunney	1f6f9e6701	Remove division from matrix multiplication This change reduces llama.com CPU cycles systemically by 2.5% according to the Linux Kernel `perf stat -Bddd` utility.	2023-05-10 21:19:54 -07:00
Justine Tunney	5f57fc1f59	Upgrade llama.cpp to e6a46b0ed1884c77267dc70693183e3b7164e0e0	2023-05-10 04:20:48 -07:00
Justine Tunney	a0237a017c	Get llama.com working on aarch64	2023-05-10 04:20:47 -07:00
Justine Tunney	4c093155a3	Get llama.com building as an aarch64 native binary	2023-05-10 04:20:47 -07:00
Justine Tunney	d9e27203d4	Incorporate some fixes and updates for GGML	2023-04-28 20:24:55 -07:00
Justine Tunney	420f889ac3	Further optimize the math library The sincosf() function is now twice as fast, thanks to ARM Limited. The same might also be true of logf() and expm1f() which have been updated.	2023-04-28 01:20:47 -07:00
Justine Tunney	e8b43903b2	Import llama.cpp https://github.com/ggerganov/llama.cpp 0b2da20538d01926b77ea237dd1c930c4d20b686 See third_party/ggml/README.cosmo for changes	2023-04-27 14:37:14 -07:00

16 commits