cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-01-31 11:37:35 +00:00

Author	SHA1	Message	Date
Justine Tunney	98c5847727	Fix fork waiter leak in nsync This change fixes a bug where nsync waiter objects would leak. It'd mean that long-running programs like runitd would run out of file descriptors on NetBSD where waiter objects have ksem file descriptors. On other OSes this bug is mostly harmless since the worst that can happen with a futex is to leak a little bit of ram. The bug was caused because tib_nsync was sneaking back in after the finalization code had cleared it. This change refactors the thread exiting code to handle nsync teardown appropriately and in making this change I found another issue, which is that user code which is buggy, and tries to exit without joining joinable threads which haven't been detached, would result in a deadlock. That doesn't sound so bad, except the main thread is a joinable thread. So this deadlock would be triggered in ways that put libc at fault. So we now auto-join threads and libc will log a warning to --strace when that happens for any thread	2024-12-31 01:30:13 -08:00
Justine Tunney	624573207e	Make threads faster and more reliable This change doubles the performance of thread spawning. That's thanks to our new stack manager, which allows us to avoid zeroing stacks. It gives us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is faster now, since it doesn't need to acquire the pthread GIL. On NetBSD, that helps us avoid allocating too many semaphores. Even if that happens we're now able to survive semaphores running out and even memory running out, when allocating *NSYNC waiter objects. I found a lot more rare bugs in the POSIX threads runtime that could cause things to crash, if you've got dozens of threads all spawning and joining dozens of threads. I want cosmo to be world class production worthy for 2025 so happy holidays all	2024-12-21 22:13:00 -08:00
Justine Tunney	af7bd80430	Eliminate cyclic locks in runtime This change introduces a new deadlock detector for Cosmo's POSIX threads implementation. Error check mutexes will now track a DAG of nested locks and report EDEADLK when a deadlock is theoretically possible. These will occur rarely, but it's important for production hardening your code. You don't even need to change your mutexes to use the POSIX error check mode because `cosmocc -mdbg` will enable error checking on mutexes by default globally. When cycles are found, an error message showing your demangled symbols describing the strongly connected component are printed and then the SIGTRAP is raised, which means you'll also get a backtrace if you're using ShowCrashReports() too. This new error checker is so low-level and so pure that it's able to verify the relationships of every libc runtime lock, including those locks upon which the mutex implementation depends.	2024-12-16 22:25:12 -08:00
Justine Tunney	dd8544c3bd	Delve into clock rabbit hole The worst issue I had with consts.sh for clock_gettime is how it defined too many clocks. So I looked into these clocks all day to figure out how how they overlap in functionality. I discovered counter-intuitive things such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10 also has some incredible new APIs, that let us simplify clock_gettime(). - Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime() - Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime() - Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime() - Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise() Documentation on the clock crew has been added to clock_gettime() in the docstring and in redbean's documentation too. You can read that to learn interesting facts about eight essential clocks that survived this purge. This is original research you will not find on Google, OpenAI, or Claude I've tested this change by porting *NSYNC to become fully clock agnostic since it has extensive tests for spotting irregularities in time. I have also included these tests in the default build so they no longer need to be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across the entire amd64 and arm64 test fleets.	2024-09-04 01:32:46 -07:00
Justine Tunney	3c61a541bd	Introduce pthread_condattr_setclock() This is one of the few POSIX APIs that was missing. It lets you choose a monotonic clock for your condition variables. This might improve perf on some platforms. It might also grant more flexibility with NTP configs. I know Qt is one project that believes it needs this. To introduce this, I needed to change some the *NSYNC APIs, to support passing a clock param. There's also new benchmarks, demonstrating Cosmopolitan's supremacy over many libc implementations when it comes to mutex performance. Cygwin has an alarmingly bad pthread_mutex_t implementation. It is so bad that they would have been significantly better off if they'd used naive spinlocks.	2024-09-02 23:45:42 -07:00
Justine Tunney	884d89235f	Harden against aba problem	2024-08-26 20:01:55 -07:00
Justine Tunney	d3a13e8d70	Improve lock hierarchy - NetBSD no longer needs a spin lock to create semaphores - Windows fork() now locks process manager in correct order	2024-07-24 16:05:48 -07:00
Justine Tunney	86d884cce2	Get rid of .internal.h convention in LIBC_INTRIN	2024-07-19 19:38:00 -07:00
Justine Tunney	8c645fa1ee	Make mmap() scalable It's now possible to create thousands of thousands of sparse independent memory mappings, without any slowdown. The memory manager is better with tracking memory protection now, particularly on Windows in a precise way that can be restored during fork(). You now have the highest quality mem manager possible. It's even better than some OSes like XNU, where mmap() is implemented as an O(n) operation which means sadly things aren't much improved over there. With this change the llamafile HTTP server endpoint at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec	2024-07-05 23:26:00 -07:00
Jōshin	2fc507c98f	Fix more vi modelines (#1006 ) * modelines: tw -> sw shiftwidth, not textwidth. * space-surround modelines * fix irregular modelines * Fix modeline in titlegen.c	2023-12-13 02:28:11 -05:00
Jōshin	e16a7d8f3b	flip et / noet in modelines `et` means `expandtab`. ```sh rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\) et\(.\) :vi/vi: \1 xoet\2:vi/' rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\)noet\(.\):vi/vi: \1et\2 :vi/' rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\)xoet\(.\):vi/vi: \1noet\2:vi/' ```	2023-12-07 22:17:11 -05:00
Jōshin	394d998315	Fix vi modelines (#989 ) At least in neovim, `│vi:` is not recognized as a modeline because it has no preceding whitespace. After fixing this, opening a file yields an error because `net` is not an option. (`noet`, however, is.)	2023-12-05 14:37:54 -08:00
Justine Tunney	fa20edc44d	Reduce header complexity - Remove most __ASSEMBLER__ __LINKER__ ifdefs - Rename libc/intrin/bits.h to libc/serialize.h - Block pthread cancelation in fchmodat() polyfill - Remove `clang-format off` statements in third_party	2023-11-28 14:39:42 -08:00
Justine Tunney	d0ad2694ed	Iterate more on recent changes	2023-11-11 00:28:22 -08:00
Justine Tunney	ec480f5aa0	Make improvements - Every unit test now passes on Apple Silicon. The final piece of this puzzle was porting our POSIX threads cancelation support, since that works differently on ARM64 XNU vs. AMD64. Our semaphore support on Apple Silicon is also superior now compared to AMD64, thanks to the grand central dispatch library which lets *NSYNC locks go faster. - The Cosmopolitan runtime is now more stable, particularly on Windows. To do this, thread local storage is mandatory at all runtime levels, and the innermost packages of the C library is no longer being built using ASAN. TLS is being bootstrapped with a 128-byte TIB during the process startup phase, and then later on the runtime re-allocates it either statically or dynamically to support code using _Thread_local. fork() and execve() now do a better job cooperating with threads. We can now check how much stack memory is left in the process or thread when functions like kprintf() / execve() etc. call alloca(), so that ENOMEM can be raised, reduce a buffer size, or just print a warning. - POSIX signal emulation is now implemented the same way kernels do it with pthread_kill() and raise(). Any thread can interrupt any other thread, regardless of what it's doing. If it's blocked on read/write then the killer thread will cancel its i/o operation so that EINTR can be returned in the mark thread immediately. If it's doing a tight CPU bound operation, then that's also interrupted by the signal delivery. Signal delivery works now by suspending a thread and pushing context data structures onto its stack, and redirecting its execution to a trampoline function, which calls SetThreadContext(GetCurrentThread()) when it's done. - We're now doing a better job managing locks and handles. On NetBSD we now close semaphore file descriptors in forked children. Semaphores on Windows can now be canceled immediately, which means mutexes/condition variables will now go faster. Apple Silicon semaphores can be canceled too. We're now using Apple's pthread_yield() funciton. Apple _nocancel syscalls are now used on XNU when appropriate to ensure pthread_cancel requests aren't lost. The MbedTLS library has been updated to support POSIX thread cancelations. See tool/build/runitd.c for an example of how it can be used for production multi-threaded tls servers. Handles on Windows now leak less often across processes. All i/o operations on Windows are now overlapped, which means file pointers can no longer be inherited across dup() and fork() for the time being. - We now spawn a thread on Windows to deliver SIGCHLD and wakeup wait4() which means, for example, that posix_spawn() now goes 3x faster. POSIX spawn is also now more correct. Like Musl, it's now able to report the failure code of execve() via a pipe although our approach favors using shared memory to do that on systems that have a true vfork() function. - We now spawn a thread to deliver SIGALRM to threads when setitimer() is used. This enables the most precise wakeups the OS makes possible. - The Cosmopolitan runtime now uses less memory. On NetBSD for example, it turned out the kernel would actually commit the PT_GNU_STACK size which caused RSS to be 6mb for every process. Now it's down to ~4kb. On Apple Silicon, we reduce the mandatory upstream thread size to the smallest possible size to reduce the memory overhead of Cosmo threads. The examples directory has a program called greenbean which can spawn a web server on Linux with 10,000 worker threads and have the memory usage of the process be ~77mb. The 1024 byte overhead of POSIX-style thread-local storage is now optional; it won't be allocated until the pthread_setspecific/getspecific functions are called. On Windows, the threads that get spawned which are internal to the libc implementation use reserve rather than commit memory, which shaves a few hundred kb. - sigaltstack() is now supported on Windows, however it's currently not able to be used to handle stack overflows, since crash signals are still generated by WIN32. However the crash handler will still switch to the alt stack, which is helpful in environments with tiny threads. - Test binaries are now smaller. Many of the mandatory dependencies of the test runner have been removed. This ensures many programs can do a better job only linking the the thing they're testing. This caused the test binaries for LIBC_FMT for example, to decrease from 200kb to 50kb - long double is no longer used in the implementation details of libc, except in the APIs that define it. The old code that used long double for time (instead of struct timespec) has now been thoroughly removed. - ShowCrashReports() is now much tinier in MODE=tiny. Instead of doing backtraces itself, it'll just print a command you can run on the shell using our new `cosmoaddr2line` program to view the backtrace. - Crash report signal handling now works in a much better way. Instead of terminating the process, it now relies on SA_RESETHAND so that the default SIG_IGN behavior can terminate the process if necessary. - Our pledge() functionality has now been fully ported to AARCH64 Linux.	2023-09-18 21:04:47 -07:00
Justine Tunney	a359de7893	Get rid of kmalloc() This changes *NSYNC to allocate waiters on the stack so our locks don't need to depend on dynamic memory. This make our runtiem simpler, and it also fixes bugs with thread cancellation support.	2023-09-11 21:56:00 -07:00
Justine Tunney	7e0a09feec	Mint APE Loader v1.5 This change ports APE Loader to Linux AARCH64, so that Raspberry Pi users can run programs like redbean, without the executable needing to modify itself. Progress has also slipped into this change on the issue of making progress better conforming to user expectations and industry standards regarding which symbols we're allowed to declare	2023-07-26 13:54:49 -07:00
Justine Tunney	bcf9af94bf	Get threads working well on MacOS Arm64 - Now using 10x better GCD semaphores - We now generate Linux-like thread ids - We now use fast system clock / sleep libraries - The APE M1 loader now generates Linux-like stacks	2023-06-04 01:57:10 -07:00
Justine Tunney	c995838e5c	Make improvements - Clean up sigaction() code - Add a port scanner example - Introduce a ParseCidr() API - Clean up our futex abstraction code - Fix a harmless integer overflow in ParseIp() - Use kernel semaphores on NetBSD to make threads much faster	2022-11-07 02:26:06 -08:00

19 commits