cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-01-31 11:37:35 +00:00

Author	SHA1	Message	Date
Justine Tunney	fbdf9d028c	Rewrite Windows poll() We can now await signals, files, pipes, and console simultaneously. This change also gives a deeper review and testing to changes made yesterday.	2024-09-10 20:04:02 -07:00
Justine Tunney	a0a404a431	Fix issues with previous commit	2024-09-10 01:59:46 -07:00
Justine Tunney	2f48a02b44	Make recursive mutexes faster Recursive mutexes now go as fast as normal mutexes. The tradeoff is they are no longer safe to use in signal handlers. However you can still have signal safe mutexes if you set your mutex to both recursive and pshared. You can also make functions that use recursive mutexes signal safe using sigprocmask to ensure recursion doesn't happen due to any signal handler The impact of this change is that, on Windows, many functions which edit the file descriptor table rely on recursive mutexes, e.g. open(). If you develop your app so it uses pread() and pwrite() then your app should go very fast when performing a heavily multithreaded and contended workload For example, when scaling to 40+ cores, NSYNC mutexes can go as much as 1000x faster (in CPU time) than the naive recursive lock implementation. Now recursive will use NSYNC under the hood when it's possible to do so	2024-09-10 00:08:59 -07:00
Justine Tunney	95fee8614d	Test recursive mutex code more	2024-09-09 00:19:23 -07:00
Justine Tunney	dd8544c3bd	Delve into clock rabbit hole The worst issue I had with consts.sh for clock_gettime is how it defined too many clocks. So I looked into these clocks all day to figure out how how they overlap in functionality. I discovered counter-intuitive things such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10 also has some incredible new APIs, that let us simplify clock_gettime(). - Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime() - Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime() - Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime() - Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise() Documentation on the clock crew has been added to clock_gettime() in the docstring and in redbean's documentation too. You can read that to learn interesting facts about eight essential clocks that survived this purge. This is original research you will not find on Google, OpenAI, or Claude I've tested this change by porting *NSYNC to become fully clock agnostic since it has extensive tests for spotting irregularities in time. I have also included these tests in the default build so they no longer need to be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across the entire amd64 and arm64 test fleets.	2024-09-04 01:32:46 -07:00
Justine Tunney	3c61a541bd	Introduce pthread_condattr_setclock() This is one of the few POSIX APIs that was missing. It lets you choose a monotonic clock for your condition variables. This might improve perf on some platforms. It might also grant more flexibility with NTP configs. I know Qt is one project that believes it needs this. To introduce this, I needed to change some the *NSYNC APIs, to support passing a clock param. There's also new benchmarks, demonstrating Cosmopolitan's supremacy over many libc implementations when it comes to mutex performance. Cygwin has an alarmingly bad pthread_mutex_t implementation. It is so bad that they would have been significantly better off if they'd used naive spinlocks.	2024-09-02 23:45:42 -07:00
Justine Tunney	90460ceb3c	Make Cosmo mutexes competitive with Apple Libc While we have always licked glibc and musl libc on gnu/systemd sadly the Apple Libc implementation of pthread_mutex_t is better than ours. It may be due to how the XNU kernel and M2 microprocessor are in league when it comes to scheduling processes and the NSYNC behavior is being penalized. We can solve this by leaning more heavily on ulock using Drepper's algo. It's kind of ironic that Linux's official mutexes work terribly on Linux but almost as good as Apple Libc if used on MacOS.	2024-09-02 19:03:11 -07:00
Justine Tunney	2ec413b5a9	Fix bugs in poll(), select(), ppoll(), and pselect() poll() and select() now delegate to ppoll() and pselect() for assurances that both polyfill implementations are correct and well-tested. Poll now polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other similar status flags. This change resolves a misunderstanding concerning how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for bouncing requests that exceed the 64 handle limit on Windows. With pipes and consoles on Windows our poll impl will now report POLLHUP correctly. Issues with Windows path generation have been fixed. For example, it was problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/") due to the need to un-UNC paths in some additional places. Calling fstat on UNC style volume path handles will now work. posix_spawn now supports simulating the opening of /dev/null and other special paths on Windows. Cosmopolitan no longer defines epoll(). I think wepoll is a nice project for using epoll() on Windows socket handles. However we need generalized file descriptor support to make epoll() for Windows work well enough for inclusion in a C library. It's also not worth having epoll() if we can't get it to work on XNU and BSD OSes which provide different abstractions. Even epoll() on Linux isn't that great of an abstraction since it's full of footguns. Last time I tried to get it to be useful I had little luck. Considering how long it took to get poll() and select() to be consistent across platforms, we really have no business claiming to have epoll too. While it'd be nice to have fully implemented, the only software that use epoll() are event i/o libraries used by things like nodejs. Event i/o is not the best paradigm for handling i/o; threads make so much more sense.	2024-09-02 00:29:52 -07:00
Justine Tunney	11d9fb521d	Make atomics faster on aarch64 This change implements the compiler runtime for ARM v8.1 ISE atomics and gets rid of the mandatory -mno-outline-atomics flag. It can dramatically speed things up, on newer ARM CPUs, as indicated by the changed lines in test/libc/thread/footek_test.c. In llamafile dispatching on hwcap atomic also shaved microseconds off synchronization barriers.	2024-08-16 11:14:46 -07:00
Justine Tunney	31194165d2	Remove .internal from more header filenames	2024-08-04 12:52:25 -07:00
Justine Tunney	bb815eafaf	Update Musl Libc code We now have implement all of Musl's localization code, the same way that Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"), just in case anything stops working as expected.	2024-07-30 22:51:29 -07:00
Justine Tunney	cf1559c448	Remove __threaded variable	2024-07-28 23:43:30 -07:00
Justine Tunney	e18fe1e112	Freshen build/bootstrap/cocmd See https://news.ycombinator.com/item?id=41055121	2024-07-27 23:22:11 -07:00
Justine Tunney	18a620cc1a	Make some improvements of little consequence	2024-07-27 08:20:18 -07:00
Justine Tunney	59692b0882	Make spinlocks faster (take two) This change is green on x86 and arm test fleet.	2024-07-26 00:45:24 -07:00
Justine Tunney	02e1cbcd00	Revert "Make spin locks go faster" This reverts commit `c8e25d811c`.	2024-07-25 22:24:32 -07:00
Justine Tunney	c8e25d811c	Make spin locks go faster	2024-07-25 17:37:11 -07:00
Justine Tunney	d3a13e8d70	Improve lock hierarchy - NetBSD no longer needs a spin lock to create semaphores - Windows fork() now locks process manager in correct order	2024-07-24 16:05:48 -07:00
Justine Tunney	7ba9a73840	Remove more _Atomic keywords from public headers It's been thirteen years and C++ still hasn't implemented this wonderful simple builtin keyword. In C++23 a solution was provided for making this work in C++ which is libcxx's stdatomic.h. Including that header schleps in literally 253 unique header files!! Many of the header files it needs are libc header files like pthread.h where we need to have the _Atomic() keyword, but since <atomic> depends on pthreads we can't have it include the <stdatomic.h> header that defines _Atomic for C++ users, and instead we simply make the type non-atomic, hoping and praying only C code shall use those internal data structures. This just shows how STL clowns can't be trusted to define the innermost primitives of a language. They should instead be focusing on being the best at algorithms and data structures.	2024-07-24 13:56:03 -07:00
Justine Tunney	5dd7ddb9ea	Remove bad defines from early days of project These definitions were causing issues with building LLVM. It is possible they also caused crashes we've seen with our MacOS ARM64 OpenMP support.	2024-07-24 12:11:21 -07:00
Justine Tunney	e398f3887c	Make more improvements to threads and mappings - NetBSD should now have faster synchronization - POSIX barriers may now be shared across processes - An edge case with memory map tracking has been fixed - Grand Central Dispatch is no longer used on MacOS ARM64 - POSIX mutexes in normal mode now use futexes across processes	2024-07-24 01:19:54 -07:00
Justine Tunney	6e809ee49b	Add unit test for process shared conditions	2024-07-22 18:48:54 -07:00
Justine Tunney	61c36c1dd6	Allow pthread_condattr_setpshared() to set shared	2024-07-22 18:41:45 -07:00
Justine Tunney	0a9a6f86bb	Support process shared condition variables	2024-07-22 16:35:29 -07:00
Justine Tunney	3de6632be6	Graduate some clock_gettime() constants to #define - CLOCK_THREAD_CPUTIME_ID - CLOCK_PROCESS_CPUTIME_ID Cosmo now supports the above constants universally across supported OSes therefore it's now safe to let programs detect their presence w/ #ifdefs	2024-07-22 07:14:35 -07:00
Justine Tunney	5d2d9e9640	Add back missing TlsAlloc() call Cosmopolitan Libc once called this important function although somewhere along the way, possibly in a refactoring, it got removed and __tls_alloc has always been zero ever since.	2024-07-21 20:45:27 -07:00
Justine Tunney	30afd6ddbb	Improve multithreading	2024-07-21 14:40:45 -07:00
Justine Tunney	29ce25c767	Start writing formal specification for APE	2024-07-20 10:04:22 -07:00
Justine Tunney	86d884cce2	Get rid of .internal.h convention in LIBC_INTRIN	2024-07-19 19:38:00 -07:00
Justine Tunney	567d8fe32d	Create variables for page size	2024-07-18 21:16:53 -07:00
Justine Tunney	3f2a1b696e	Fix greenbean example The memory leak detector was crashing. When using gc() you shouldn't use the CheckForMemoryLeaks() function from inside the same function, due to how it runs the atexit handlers.	2024-07-07 17:52:33 -07:00
Justine Tunney	6be030cd7c	Fix MODE=tinylinux build	2024-07-06 01:51:08 -07:00
Justine Tunney	8c645fa1ee	Make mmap() scalable It's now possible to create thousands of thousands of sparse independent memory mappings, without any slowdown. The memory manager is better with tracking memory protection now, particularly on Windows in a precise way that can be restored during fork(). You now have the highest quality mem manager possible. It's even better than some OSes like XNU, where mmap() is implemented as an O(n) operation which means sadly things aren't much improved over there. With this change the llamafile HTTP server endpoint at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec	2024-07-05 23:26:00 -07:00
Justine Tunney	01587de761	Simplify memory manager	2024-07-05 05:47:15 -07:00
Justine Tunney	bd6d9ff99a	Get deathstar demo working again on metal	2024-07-04 03:44:17 -07:00
Justine Tunney	15ea0524b3	Reduce code size of mandatory runtime This change reduces o/tiny/examples/life from 44kb to 24kb in size since it avoids linking mmap() when unnecessary. This is important, to helping cosmo not completely lose touch with its roots.	2024-07-04 02:50:20 -07:00
Justine Tunney	76957983cf	Make POSIX threads improvements - Ensure SIGTHR isn't blocked in newly created threads - Use TIB rather than thread_local for thread atexits - Make POSIX thread keys atomic within thread - Don't bother logging prctl() to --strace - Log thread destructor names to --strace	2024-06-30 15:38:59 -07:00
Justine Tunney	4cb5e21ba8	Introduce pthread_decimate_np() api This is useful with CheckForMemoryLeaks().	2024-06-30 02:26:06 -07:00
Justine Tunney	464858dbb4	Fix bugs with new memory manager This fixes a regression in mmap(MAP_FIXED) on Windows caused by a recent revision. This change also fixes ZipOS so it no longer needs a MAP_FIXED mapping to open files from the PKZIP store. The memory mapping mutex was implemented incorrectly earlier which meant that ftrace and strace could cause cause crashes. This lock and other recursive mutexes are rewritten so that it should be provable that recursive mutexes in cosmopolitan are asynchronous signal safe.	2024-06-29 10:53:57 -07:00
Justine Tunney	d461c6f47d	Do more quality assurance work	2024-06-24 06:53:49 -07:00
Justine Tunney	c4c812c154	Introduce ctl::set and ctl::map We now have a C++ red-black tree implementation that implements standard template library compatible APIs while compiling 10x faster than libcxx. It's not as beautiful as the red-black tree implementation in Plinko but this will get the job done and the test proves it upholds all invariants This change also restores CheckForMemoryLeaks() support and fixes a real actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.	2024-06-23 22:27:11 -07:00
Justine Tunney	f2c8ddbbe3	Fix --strace use-after-free in pthread_join()	2024-06-22 06:05:52 -07:00
Justine Tunney	d1d4388201	Delete ASAN It hasn't been helpful enough to be justify the maintenance burden. What actually does help is mprotect(), kprintf(), --ftrace and --strace which can always be counted upon to work correctly. We aren't losing much with this change. Support for ASAN on AARCH64 was never implemented. Applying ASAN to the core libc runtimes was disabled many months ago. If there is some way to have an ASAN runtime for user programs that is less invasive we can potentially consider reintroducing support. But now is premature.	2024-06-22 05:45:49 -07:00
Justine Tunney	6ffed14b9c	Rewrite memory manager Actually Portable Executable now supports Android. Cosmo's old mmap code required a 47 bit address space. The new implementation is very agnostic and supports both smaller address spaces (e.g. embedded) and even modern 56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native page size. This fixes a longstanding POSIX conformance issue, concerning file mappings that overlap the end of file. Other aspects of conformance have been improved too, such as the subtleties of address assignment and and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE On Windows, mappings larger than 100 megabytes won't be broken down into thousands of independent 64kb mappings. Support for MAP_STACK is removed by this change; please use NewCosmoStack() instead. Stack overflow avoidance is now being implemented using the POSIX thread APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.	2024-06-22 05:45:11 -07:00
Jōshin	89fc95fefd	Rerun clang-format on the repo (#1217 ) 🚨 clang-format changes output per version! This is with version 19.0.0. The modifications seem to be fixing the old version’s errors - mainly involving omitted whitespace around binary ops and inserted whitespace between goto labels and colons (if followed by a curly brace.) Also fixes a few mistakes made by e.g. someone (ahem) forgetting to pass his ctl/string.h modifications through it. We should add this to .git-blame-ignore-revs once we have its final hash on master.	2024-06-15 16:34:48 -04:00
Justine Tunney	3609f65de3	Make malloc() go 200x faster If pthread_create() is linked into the binary, then the cosmo runtime will create an independent dlmalloc arena for each core. Whenever the malloc() function is used it will index `g_heaps[sched_getcpu() / 2]` to find the arena with the greatest hyperthread / numa locality. This may be configured via an environment variable. For example if you say `export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways. Your process may be configured to have anywhere between 1 - 128 heaps We need this revision because it makes multithreaded C++ applications faster. For example, an HTTP server I'm working on that makes extreme use of the STL went from 16k to 2000k requests per second, after this change was made. To understand why, try out the malloc_test benchmark which calls malloc() + realloc() in a loop across many threads, which sees a a 250x improvement in process clock time and 200x on wall time The tradeoff is this adds ~25ns of latency to individual malloc calls compared to MODE=tiny, once the cosmo runtime has transitioned into a fully multi-threaded state. If you don't need malloc() to be scalable then cosmo provides many options for you. For starters the heap count variable above can be set to put the process back in single heap mode plus you can go even faster still, if you include tinymalloc.inc like many of the programs in tool/build/.. are already doing since that'll shave tens of kb off your binary footprint too. Theres also MODE=tiny which is configured to use just 1 plain old dlmalloc arena by default Another tradeoff is we need more memory now (except in MODE=tiny), to track the provenance of memory allocation. This is so allocations can be freely shared across threads, and because OSes can reschedule code to different CPUs at any time.	2024-06-05 02:02:14 -07:00
Jōshin	f032b5570b	Run clang-format (#1197 )	2024-06-01 16:30:43 -04:00
Justine Tunney	07cef612c3	Make dlmalloc 2.4x faster for multithreading This change adds a TLS freelist for small dynamic memory allocations. Cosmopolitan's TIB is now 512 bytes in size. Single-threaded malloc() performance isn't impacted by this, until pthread_create() is called. Single-threaded programs may also want to consider using: #include "libc/mem/tinymalloc.inc" Which will shave 30k off the executable size and sometimes go faster.	2024-05-28 11:18:34 -07:00
Justine Tunney	8e68384e15	Upgrade to 2022-era LLVM LIBCXX	2024-05-27 02:12:27 -07:00
Justine Tunney	ae2a7ac844	Fix thread-local storage bugs on aarch64 This change fixes an issue where .tbss memory might not be initialized.	2024-05-08 04:20:22 -07:00

1 2 3 4 5

240 commits