cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-01-31 11:37:35 +00:00

Author	SHA1	Message	Date
Justine Tunney	fa20edc44d	Reduce header complexity - Remove most __ASSEMBLER__ __LINKER__ ifdefs - Rename libc/intrin/bits.h to libc/serialize.h - Block pthread cancelation in fchmodat() polyfill - Remove `clang-format off` statements in third_party	2023-11-28 14:39:42 -08:00
Justine Tunney	96f979dfc5	Rename makefiles BUILD.mk This way they appear at the top of directory listings.	2023-11-28 11:21:08 -08:00
Justine Tunney	68c7c9c1e0	Clean up some code - Use good ELF technique in cosmo_dlopen() - Make strerror() conform more to other libc impls - Introduce __clear_cache() and use it in cosmo_dlopen() - Remove libc/fmt/fmt.h header (trying to kill off LIBC_FMT)	2023-11-16 17:31:07 -08:00
Justine Tunney	791f79fcb3	Make improvements - We now serialize the file descriptor table when spawning / executing processes on Windows. This means you can now inherit more stuff than just standard i/o. It's needed by bash, which duplicates the console to file descriptor #255. We also now do a better job serializing the environment variables, so you're less likely to encounter E2BIG when using your bash shell. We also no longer coerce environ to uppercase - execve() on Windows now remotely controls its parent process to make them spawn a replacement for itself. Then it'll be able to terminate immediately once the spawn succeeds, without having to linger around for the lifetime as a shell process for proxying the exit code. When process worker thread running in the parent sees the child die, it's given a handle to the new child, to replace it in the process table. - execve() and posix_spawn() on Windows will now provide CreateProcess an explicit handle list. This allows us to remove handle locks which enables better fork/spawn concurrency, with seriously correct thread safety. Other codebases like Go use the same technique. On the other hand fork() still favors the conventional WIN32 inheritence approach which can be a little bit messy, but is controlled by guaranteeing perfectly clean slates at both the spawning and execution boundaries - sigset_t is now 64 bits. Having it be 128 bits was a mistake because there's no reason to use that and it's only supported by FreeBSD. By using the system word size, signal mask manipulation on Windows goes very fast. Furthermore @asyncsignalsafe funcs have been rewritten on Windows to take advantage of signal masking, now that it's much more pleasant to use. - All the overlapped i/o code on Windows has been rewritten for pretty good signal and cancelation safety. We're now able to ensure overlap data structures are cleaned up so long as you don't longjmp() out of out of a signal handler that interrupted an i/o operation. Latencies are also improved thanks to the removal of lots of "busy wait" code. Waits should be optimal for everything except poll(), which shall be the last and final demon we slay in the win32 i/o horror show. - getrusage() on Windows is now able to report RUSAGE_CHILDREN as well as RUSAGE_SELF, thanks to aggregation in the process manager thread.	2023-10-08 08:59:53 -07:00
Justine Tunney	ec480f5aa0	Make improvements - Every unit test now passes on Apple Silicon. The final piece of this puzzle was porting our POSIX threads cancelation support, since that works differently on ARM64 XNU vs. AMD64. Our semaphore support on Apple Silicon is also superior now compared to AMD64, thanks to the grand central dispatch library which lets *NSYNC locks go faster. - The Cosmopolitan runtime is now more stable, particularly on Windows. To do this, thread local storage is mandatory at all runtime levels, and the innermost packages of the C library is no longer being built using ASAN. TLS is being bootstrapped with a 128-byte TIB during the process startup phase, and then later on the runtime re-allocates it either statically or dynamically to support code using _Thread_local. fork() and execve() now do a better job cooperating with threads. We can now check how much stack memory is left in the process or thread when functions like kprintf() / execve() etc. call alloca(), so that ENOMEM can be raised, reduce a buffer size, or just print a warning. - POSIX signal emulation is now implemented the same way kernels do it with pthread_kill() and raise(). Any thread can interrupt any other thread, regardless of what it's doing. If it's blocked on read/write then the killer thread will cancel its i/o operation so that EINTR can be returned in the mark thread immediately. If it's doing a tight CPU bound operation, then that's also interrupted by the signal delivery. Signal delivery works now by suspending a thread and pushing context data structures onto its stack, and redirecting its execution to a trampoline function, which calls SetThreadContext(GetCurrentThread()) when it's done. - We're now doing a better job managing locks and handles. On NetBSD we now close semaphore file descriptors in forked children. Semaphores on Windows can now be canceled immediately, which means mutexes/condition variables will now go faster. Apple Silicon semaphores can be canceled too. We're now using Apple's pthread_yield() funciton. Apple _nocancel syscalls are now used on XNU when appropriate to ensure pthread_cancel requests aren't lost. The MbedTLS library has been updated to support POSIX thread cancelations. See tool/build/runitd.c for an example of how it can be used for production multi-threaded tls servers. Handles on Windows now leak less often across processes. All i/o operations on Windows are now overlapped, which means file pointers can no longer be inherited across dup() and fork() for the time being. - We now spawn a thread on Windows to deliver SIGCHLD and wakeup wait4() which means, for example, that posix_spawn() now goes 3x faster. POSIX spawn is also now more correct. Like Musl, it's now able to report the failure code of execve() via a pipe although our approach favors using shared memory to do that on systems that have a true vfork() function. - We now spawn a thread to deliver SIGALRM to threads when setitimer() is used. This enables the most precise wakeups the OS makes possible. - The Cosmopolitan runtime now uses less memory. On NetBSD for example, it turned out the kernel would actually commit the PT_GNU_STACK size which caused RSS to be 6mb for every process. Now it's down to ~4kb. On Apple Silicon, we reduce the mandatory upstream thread size to the smallest possible size to reduce the memory overhead of Cosmo threads. The examples directory has a program called greenbean which can spawn a web server on Linux with 10,000 worker threads and have the memory usage of the process be ~77mb. The 1024 byte overhead of POSIX-style thread-local storage is now optional; it won't be allocated until the pthread_setspecific/getspecific functions are called. On Windows, the threads that get spawned which are internal to the libc implementation use reserve rather than commit memory, which shaves a few hundred kb. - sigaltstack() is now supported on Windows, however it's currently not able to be used to handle stack overflows, since crash signals are still generated by WIN32. However the crash handler will still switch to the alt stack, which is helpful in environments with tiny threads. - Test binaries are now smaller. Many of the mandatory dependencies of the test runner have been removed. This ensures many programs can do a better job only linking the the thing they're testing. This caused the test binaries for LIBC_FMT for example, to decrease from 200kb to 50kb - long double is no longer used in the implementation details of libc, except in the APIs that define it. The old code that used long double for time (instead of struct timespec) has now been thoroughly removed. - ShowCrashReports() is now much tinier in MODE=tiny. Instead of doing backtraces itself, it'll just print a command you can run on the shell using our new `cosmoaddr2line` program to view the backtrace. - Crash report signal handling now works in a much better way. Instead of terminating the process, it now relies on SA_RESETHAND so that the default SIG_IGN behavior can terminate the process if necessary. - Our pledge() functionality has now been fully ported to AARCH64 Linux.	2023-09-18 21:04:47 -07:00
Justine Tunney	0d748ad58e	Fix warnings This change fixes Cosmopolitan so it has fewer opinions about compiler warnings. The whole repository had to be cleaned up to be buildable in -Werror -Wall mode. This lets us benefit from things like strict const checking. Some actual bugs might have been caught too.	2023-09-01 20:50:18 -07:00
Justine Tunney	3a9cac4892	Fix small matters and improve sysconf() - Fix mkdeps.com out of memory error - Remove static memory from __get_cpu_count() - Add support for passing hyphen to cat in cocmd - Change more ZipOS errors from ENOTSUP to EROFS - Specify mem_unit in sysinfo() output on BSD OSes	2023-08-17 00:32:11 -07:00
Justine Tunney	c776a32f75	Replace COSMO define with _COSMO_SOURCE This change might cause ABI breakages for /opt/cosmos. It's needed to help us better conform to header declaration practices.	2023-08-13 20:55:04 -07:00
Justine Tunney	e11fa30791	Move zipos into runtime package This way complex runtime features (e.g. ftrace, symbol tables) can always yoink zipos support. This is important now that apelink.com automates embedding symbol tables for multiple cpus.	2023-08-11 23:14:02 -07:00
Justine Tunney	7e0a09feec	Mint APE Loader v1.5 This change ports APE Loader to Linux AARCH64, so that Raspberry Pi users can run programs like redbean, without the executable needing to modify itself. Progress has also slipped into this change on the issue of making progress better conforming to user expectations and industry standards regarding which symbols we're allowed to declare	2023-07-26 13:54:49 -07:00
Justine Tunney	0409096658	Get us closer to building busybox This change undefines __linux__ and adds APIs like clock_settime(). The gosh darned getopt_long() API has been reintroduced, thanks to OpenBSD.	2023-06-18 04:13:45 -07:00
Justine Tunney	d7c79f43ef	Clean up more code - Found some bugs in LLVM compiler-rt library - The useless LIBC_STUBS package is now deleted - Improve the overflow checking story even further - Get chibicc tests working in MODE=dbg mode again - The libc/isystem/ headers now have correctly named guards	2023-06-18 01:00:05 -07:00
Justine Tunney	e6b7c16a53	Make changes needed for new demo	2023-06-15 23:22:49 -07:00
Justine Tunney	4d629fd424	Fix stack abuse in llama.cc This change also incorporates improvements for MODE=asan. It's been confirmed that o/asan/third_party/ggml/llama.com will work. Fixes #829	2023-06-08 07:12:26 -07:00
Justine Tunney	daf4454a06	Validate privileged code relationships - Work towards improving non-optimized build support - Introduce MODE=zero which is -O0 without ASAN/UBSAN - Use system GCC when ~/.cosmo.mk has USE_SYSTEM_TOOLCHAIN=1 - Have package.com check .privileged code doesn't call non-privileged	2023-06-08 04:38:06 -07:00
Justine Tunney	b94b29d79c	Prevent ftrace from misaligning functions	2023-06-06 06:00:31 -07:00
Justine Tunney	eb40cb371d	Get --ftrace working on aarch64 This change implements a new approach to function call logging, that's based on the GCC flag: -fpatchable-function-entry. Read the commentary in build/config.mk to learn how it works.	2023-06-05 23:35:31 -07:00
Justine Tunney	9cc3e37263	Upgrade to Cosmopolitan GCC 11.2.0 for aarch64	2023-06-05 02:07:28 -07:00
Justine Tunney	b5eab2b0b7	Get POSIX threads working on Apple Silicon It's now possible to run a working ape-m1 o/aarch64/third_party/ggml/llama.com on Apple M1 hardware running XNU!	2023-06-03 18:33:01 -07:00
Justine Tunney	8fdb31681a	Introduce support for GGJT v3 file format llama.com can now load weights that use the new file format which was introduced a few weeks ago. Note that, unlike llama.cpp, we will keep support for old file formats in our tool so you don't need to convert your weights when the upstream project makes breaking changes. Please note that using ggjt v3 does make avx2 inference go 5% faster for me.	2023-06-03 15:46:21 -07:00
Justine Tunney	1904a3cae8	Sync llama.cpp to 6986c7835adc13ba3f9d933b95671bb1f3984dc6	2023-06-03 10:29:12 -07:00
Justine Tunney	1422e96b4e	Introduce native support for MacOS ARM64 There's a new program named ape/ape-m1.c which will be used to build an embeddable binary that can load ape and elf executables. The support is mostly working so far, but still chasing down ABI issues.	2023-05-20 04:17:03 -07:00
Justine Tunney	e7eb0b3070	Make more ML improvements - Fix UX issues with llama.com - Do housekeeping on libm code - Add more vectorization to GGML - Get GGJT quantizer programs working well - Have the quantizer keep the output layer as f16c - Prefetching improves performance 15% if you use fewer threads	2023-05-16 08:07:23 -07:00
Justine Tunney	80db9de173	Make the intrinsics more readable	2023-05-15 23:12:11 -07:00
Justine Tunney	210187cf77	Perform some code cleanup	2023-05-15 16:32:10 -07:00
Justine Tunney	282dd8e7b7	Get radpajama to build make -j8 o//third_party/radpajama/radpajama.com make -j8 o//third_party/radpajama/radpajama-chat.com This change gets the radpajama.mk config working. This package depends on THIRD_PARTY_GGML but it's configured to call ggjt_v1(), so that the library will provide the old quantizers. The ggml_quantize_chunk() API will now dispatch to older quantizers based on the configured version.	2023-05-13 20:44:36 -07:00
Justine Tunney	410c8785c9	Fix the AARCH64 build	2023-05-13 08:19:44 -07:00
Justine Tunney	5a4cf9560f	Add support for new GGJT v2 quantizers This change makes quantized models (e.g. q4_0) go 10% faster on Macs however doesn't offer much improvement for Intel PC hardware. This change syncs llama.cpp 699b1ad7fe6f7b9e41d3cb41e61a8cc3ea5fc6b5 which recently made a breaking change to nearly all its file formats without any migration. Since that'll break hundreds upon hundreds of models on websites like HuggingFace llama.com will support both file formats because llama.com will never ever break the GGJT file format	2023-05-13 08:08:32 -07:00
Justine Tunney	4a8a81eb9f	Fix llama.com interactive mode regressions	2023-05-13 00:09:38 -07:00
Justine Tunney	45186c74ac	Introduce -q (quiet flag) and improve ctrl-c ux	2023-05-12 09:46:07 -07:00
Justine Tunney	e8de1e4766	Fix subtoken antiprompt scanning	2023-05-12 08:55:40 -07:00
Justine Tunney	80c174d494	Clean up llama.com anti/stop/reverse-prompt code Example use case for JSON completion: $ m=opt $ make -j16 m=$m o/$m/third_party/ggml/llama.com $ o/$m/third_party/ggml/llama.com -m llama.bin -p '{"key": "life", "val": ' -r '}' 42} This provides better control. More sophisticated facilities for controlling text generation will be provided soon enough.	2023-05-12 08:20:58 -07:00
Justine Tunney	bbfe4fbd11	Make llama.com n_predict be -1 by default	2023-05-12 08:20:34 -07:00
Justine Tunney	ca19ecf49c	Fine tune crash reports for llama.com	2023-05-12 06:24:26 -07:00
Justine Tunney	95fab334e4	Use yield on aarch in spin locks	2023-05-11 19:57:09 -07:00
Justine Tunney	1f6f9e6701	Remove division from matrix multiplication This change reduces llama.com CPU cycles systemically by 2.5% according to the Linux Kernel `perf stat -Bddd` utility.	2023-05-10 21:19:54 -07:00
Justine Tunney	a88290e595	Make sure llama.com terminal cleanup happens	2023-05-10 15:56:01 -07:00
Justine Tunney	bb3ebedfce	Fix load time measurement	2023-05-10 07:54:21 -07:00
Justine Tunney	290a49952e	Fix some more issues with aarch64 and llama.cpp	2023-05-10 07:34:26 -07:00
Justine Tunney	6cb9553706	Fix alignment bug in llama.com	2023-05-10 06:15:32 -07:00
Justine Tunney	ca990ef091	Make `llama.com -h` print to stdout	2023-05-10 04:55:59 -07:00
Justine Tunney	5f57fc1f59	Upgrade llama.cpp to e6a46b0ed1884c77267dc70693183e3b7164e0e0	2023-05-10 04:20:48 -07:00
Justine Tunney	a0237a017c	Get llama.com working on aarch64	2023-05-10 04:20:47 -07:00
Justine Tunney	4c093155a3	Get llama.com building as an aarch64 native binary	2023-05-10 04:20:47 -07:00
Justine Tunney	d04430f4ef	Get LIBC_MEM and LIBC_STDIO building with aarch64	2023-05-10 04:20:47 -07:00
Justine Tunney	2b73e72d59	Make more code aarch64 friendly	2023-05-10 04:20:46 -07:00
Justine Tunney	3dac9f8999	Use Companion AI in llama.com by default	2023-04-30 23:08:15 -07:00
Justine Tunney	d9e27203d4	Incorporate some fixes and updates for GGML	2023-04-28 20:24:55 -07:00
Justine Tunney	b31ba86ace	Introduce prompt caching so prompts load instantly This change also introduces an ephemeral status line in non-verbose mode to display a load percentage status when slow operations are happening.	2023-04-28 16:15:26 -07:00
Justine Tunney	1c2da3a55a	Make shell usability improvements to llama.cpp - Introduce -v and --verbose flags - Don't print stats / diagnostics unless -v is passed - Reduce --top_p default from 0.95 to 0.70 - Change --reverse-prompt to no longer imply --interactive - Permit --reverse-prompt specifying custom EOS if non-interactive	2023-04-28 02:54:11 -07:00

1 2

52 commits