cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-01-31 03:27:39 +00:00

Author	SHA1	Message	Date
Justine Tunney	0b3c81dd4e	Make fork() go 30% faster This change makes fork() go nearly as fast as sys_fork() on UNIX. As for Windows this change shaves about 4-5ms off fork() + wait() latency. This is accomplished by using WriteProcessMemory() from the parent process to setup the address space of a suspended process; it is better than a pipe	2025-01-01 04:59:38 -08:00
Justine Tunney	98c5847727	Fix fork waiter leak in nsync This change fixes a bug where nsync waiter objects would leak. It'd mean that long-running programs like runitd would run out of file descriptors on NetBSD where waiter objects have ksem file descriptors. On other OSes this bug is mostly harmless since the worst that can happen with a futex is to leak a little bit of ram. The bug was caused because tib_nsync was sneaking back in after the finalization code had cleared it. This change refactors the thread exiting code to handle nsync teardown appropriately and in making this change I found another issue, which is that user code which is buggy, and tries to exit without joining joinable threads which haven't been detached, would result in a deadlock. That doesn't sound so bad, except the main thread is a joinable thread. So this deadlock would be triggered in ways that put libc at fault. So we now auto-join threads and libc will log a warning to --strace when that happens for any thread	2024-12-31 01:30:13 -08:00
Justine Tunney	55b7aa1632	Allow user to override pthread mutex and cond	2024-12-23 21:57:52 -08:00
Justine Tunney	c8e10eef30	Make bulk_free() go faster	2024-12-23 20:31:57 -08:00
Justine Tunney	624573207e	Make threads faster and more reliable This change doubles the performance of thread spawning. That's thanks to our new stack manager, which allows us to avoid zeroing stacks. It gives us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is faster now, since it doesn't need to acquire the pthread GIL. On NetBSD, that helps us avoid allocating too many semaphores. Even if that happens we're now able to survive semaphores running out and even memory running out, when allocating *NSYNC waiter objects. I found a lot more rare bugs in the POSIX threads runtime that could cause things to crash, if you've got dozens of threads all spawning and joining dozens of threads. I want cosmo to be world class production worthy for 2025 so happy holidays all	2024-12-21 22:13:00 -08:00
Justine Tunney	c8c81af0c7	Remove distracting code from dlmalloc	2024-12-16 22:54:30 -08:00
Justine Tunney	af7bd80430	Eliminate cyclic locks in runtime This change introduces a new deadlock detector for Cosmo's POSIX threads implementation. Error check mutexes will now track a DAG of nested locks and report EDEADLK when a deadlock is theoretically possible. These will occur rarely, but it's important for production hardening your code. You don't even need to change your mutexes to use the POSIX error check mode because `cosmocc -mdbg` will enable error checking on mutexes by default globally. When cycles are found, an error message showing your demangled symbols describing the strongly connected component are printed and then the SIGTRAP is raised, which means you'll also get a backtrace if you're using ShowCrashReports() too. This new error checker is so low-level and so pure that it's able to verify the relationships of every libc runtime lock, including those locks upon which the mutex implementation depends.	2024-12-16 22:25:12 -08:00
Justine Tunney	69402f4d78	Support building ltests.c in MODE=dbg Fixes #1226	2024-12-13 08:19:42 -08:00
Justine Tunney	9ddbfd921e	Introduce cosmo_futex_wait and cosmo_futex_wake Cosmopolitan Futexes are now exposed as a public API.	2024-11-22 11:25:15 -08:00
Justine Tunney	1312f60245	Strongly link tr and sed into system() and popen()	2024-11-15 21:23:49 -08:00
Justine Tunney	cafdb456ed	Strongly link glob() into system() and popen()	2024-11-15 20:37:34 -08:00
Justine Tunney	5edc0819c0	Define glob64	2024-10-12 15:26:10 -07:00
Justine Tunney	a8bc7ac119	Import some Chromium Zlib changes	2024-10-11 07:04:02 -07:00
Justine Tunney	dcf9596620	Make more fixups and quality assurance	2024-10-07 15:29:53 -07:00
Justine Tunney	12cc2de22e	Make contended mutexes 30% faster on aarch64 On Raspberry Pi 5, benchmark_mu_contended takes 359µs in *NSYNC upstream and in Cosmopolitan it takes 272µs.	2024-09-26 09:24:25 -07:00
Justine Tunney	dd8c4dbd7d	Write more tests for signal handling There's now a much stronger level of assurance that signaling on Windows will be atomic, low-latency, low tail latency, and shall never deadlock.	2024-09-21 05:24:56 -07:00
Justine Tunney	87a6669900	Make more Windows socket fixes and improvements This change makes send() / sendto() always block on Windows. It's needed because poll(POLLOUT) doesn't guarantee a socket is immediately writable on Windows, and it caused rsync to fail because it made that assumption. The only exception is when a SO_SNDTIMEO is specified which will EAGAIN. Tests are added confirming MSG_WAITALL and MSG_NOSIGNAL work as expected on all our supported OSes. Most of the platform-specific MSG_FOO magnums have been deleted, with the exception of MSG_FASTOPEN. Your --strace log will now show MSG_FOO flags as symbols rather than numbers. I've also removed cv_wait_example_test because it's 0.3% flaky with Qemu under system load since it depends on a process being readily scheduled.	2024-09-18 20:29:42 -07:00
Justine Tunney	bb7942e557	Improve socket option story	2024-09-17 01:17:07 -07:00
Justine Tunney	949c398327	Clean up more code	2024-09-15 02:45:16 -07:00
Gabriel Ravier	e3d28de8a6	Fix UB in gdtoa hexadecimal float scanf and strtod (#1288 ) When reading hexadecimal floats, cosmopolitan would previously sometimes print a number of warnings relating to undefined behavior on left shift: third_party/gdtoa/gethex.c:172: ubsan warning: signed left shift changed sign bit or overflowed 12 'int' 28 'int' is undefined behavior This is because gdtoa assumes left shifts are safe when overflow happens even on signed integers - this is false: the C standard considers it UB. This is easy to fix, by simply casting the shifted value to unsigned, as doing so does not change the value or the semantics of the left shifting (except for avoiding the undefined behavior, as the C standard specifies that unsigned overflow yields wraparound, avoiding undefined behaviour). This commit does this, and adds a testcase that previously triggered UB. (this also adds test macros to test for exact float equality, instead of the existing {EXPECT,ASSERT}_FLOAT_EQ macros which only tests inputs for being "almost equal" (with a significant epsilon) whereas exact equality makes more sense for certain things such as reading floats from strings, and modifies other testcases for sscanf/fscanf of floats to utilize it).	2024-09-14 17:11:04 -07:00
Justine Tunney	ed1f992cb7	Fix default open mode in redbean unix.open()	2024-09-14 00:10:21 -07:00
Justine Tunney	b5fcb59a85	Implement more bf16/fp16 compiler runtimes Fixes #1259	2024-09-13 05:06:34 -07:00
Justine Tunney	a5c0189bf6	Make vim startup faster It appears that GetFileAttributes(u"\\etc\\passwd") can take two seconds on Windows 10 at unpredictable times for reasons which are mysterious to me. Let's try avoiding that path entirely and pray to Microsoft it works	2024-09-11 00:52:34 -07:00
Justine Tunney	fbdf9d028c	Rewrite Windows poll() We can now await signals, files, pipes, and console simultaneously. This change also gives a deeper review and testing to changes made yesterday.	2024-09-10 20:04:02 -07:00
Justine Tunney	cceddd21b2	Reduce latency of poll() on Windows When polling sockets poll() can now let you know about an event in about 10µs rather than 10ms. If you're not polling sockets then poll() reports console events now in microseconds instead of milliseconds.	2024-09-10 04:12:21 -07:00
Justine Tunney	2f48a02b44	Make recursive mutexes faster Recursive mutexes now go as fast as normal mutexes. The tradeoff is they are no longer safe to use in signal handlers. However you can still have signal safe mutexes if you set your mutex to both recursive and pshared. You can also make functions that use recursive mutexes signal safe using sigprocmask to ensure recursion doesn't happen due to any signal handler The impact of this change is that, on Windows, many functions which edit the file descriptor table rely on recursive mutexes, e.g. open(). If you develop your app so it uses pread() and pwrite() then your app should go very fast when performing a heavily multithreaded and contended workload For example, when scaling to 40+ cores, NSYNC mutexes can go as much as 1000x faster (in CPU time) than the naive recursive lock implementation. Now recursive will use NSYNC under the hood when it's possible to do so	2024-09-10 00:08:59 -07:00
Justine Tunney	95fee8614d	Test recursive mutex code more	2024-09-09 00:19:23 -07:00
Gabriel Ravier	4754f200ee	Fix printf-family long double prec/rounding issues (#1283 ) Currently, in cosmopolitan, there is no handling of the current rounding mode for long double conversions, such that round-to-nearest gets always used, regardless of the current rounding mode. %Le also improperly calls gdtoa with a too small precision (which led to relatively similar bugs). This patch fixes these issues, in particular by modifying the FPI object passed to gdtoa such that it is modifiable (so that __fmt can adjust its rounding field to correspond to FLT_ROUNDS (note that this is not needed for dtoa, which checks FLT_ROUNDS directly)) and ors STRTOG_Neg into the kind field in both of the __fmt_dfpbits and __fmt_ldfpbits functions, as the gdtoa function also depends on it to be able to accurately round any negative arguments. The change to kind also requires a few other changes to make sure kind's upper bits (which include STRTOG_Neg) are masked off when attempting to only examine the lower bits' value. Furthermore, this patch also makes exactly one change in gdtoa, which appears to be needed to fix rounding issues with FE_TOWARDZERO (this seems like a gdtoa bug). The patch also adds a few tests for these issues, along with also taking the opportunity to clean up some of the previous tests to do the asserts in the right order (i.e. with the first argument as the expected result, and the second one being used as the value that it is compared against).	2024-09-07 18:26:04 -07:00
Justine Tunney	d1157d471f	Upgrade pl_mpeg This change gets printvideo working on aarch64. Performance improvements have been introduced for magikarp decimation on aarch64. The last of the old portable x86 intrinsics library is gone, but it still lives in Blink	2024-09-06 19:10:34 -07:00
Justine Tunney	dd8544c3bd	Delve into clock rabbit hole The worst issue I had with consts.sh for clock_gettime is how it defined too many clocks. So I looked into these clocks all day to figure out how how they overlap in functionality. I discovered counter-intuitive things such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10 also has some incredible new APIs, that let us simplify clock_gettime(). - Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime() - Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise() - Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime() - Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime() - Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise() Documentation on the clock crew has been added to clock_gettime() in the docstring and in redbean's documentation too. You can read that to learn interesting facts about eight essential clocks that survived this purge. This is original research you will not find on Google, OpenAI, or Claude I've tested this change by porting *NSYNC to become fully clock agnostic since it has extensive tests for spotting irregularities in time. I have also included these tests in the default build so they no longer need to be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across the entire amd64 and arm64 test fleets.	2024-09-04 01:32:46 -07:00
Justine Tunney	3c61a541bd	Introduce pthread_condattr_setclock() This is one of the few POSIX APIs that was missing. It lets you choose a monotonic clock for your condition variables. This might improve perf on some platforms. It might also grant more flexibility with NTP configs. I know Qt is one project that believes it needs this. To introduce this, I needed to change some the *NSYNC APIs, to support passing a clock param. There's also new benchmarks, demonstrating Cosmopolitan's supremacy over many libc implementations when it comes to mutex performance. Cygwin has an alarmingly bad pthread_mutex_t implementation. It is so bad that they would have been significantly better off if they'd used naive spinlocks.	2024-09-02 23:45:42 -07:00
Justine Tunney	90460ceb3c	Make Cosmo mutexes competitive with Apple Libc While we have always licked glibc and musl libc on gnu/systemd sadly the Apple Libc implementation of pthread_mutex_t is better than ours. It may be due to how the XNU kernel and M2 microprocessor are in league when it comes to scheduling processes and the NSYNC behavior is being penalized. We can solve this by leaning more heavily on ulock using Drepper's algo. It's kind of ironic that Linux's official mutexes work terribly on Linux but almost as good as Apple Libc if used on MacOS.	2024-09-02 19:03:11 -07:00
Justine Tunney	2ec413b5a9	Fix bugs in poll(), select(), ppoll(), and pselect() poll() and select() now delegate to ppoll() and pselect() for assurances that both polyfill implementations are correct and well-tested. Poll now polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other similar status flags. This change resolves a misunderstanding concerning how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for bouncing requests that exceed the 64 handle limit on Windows. With pipes and consoles on Windows our poll impl will now report POLLHUP correctly. Issues with Windows path generation have been fixed. For example, it was problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/") due to the need to un-UNC paths in some additional places. Calling fstat on UNC style volume path handles will now work. posix_spawn now supports simulating the opening of /dev/null and other special paths on Windows. Cosmopolitan no longer defines epoll(). I think wepoll is a nice project for using epoll() on Windows socket handles. However we need generalized file descriptor support to make epoll() for Windows work well enough for inclusion in a C library. It's also not worth having epoll() if we can't get it to work on XNU and BSD OSes which provide different abstractions. Even epoll() on Linux isn't that great of an abstraction since it's full of footguns. Last time I tried to get it to be useful I had little luck. Considering how long it took to get poll() and select() to be consistent across platforms, we really have no business claiming to have epoll too. While it'd be nice to have fully implemented, the only software that use epoll() are event i/o libraries used by things like nodejs. Event i/o is not the best paradigm for handling i/o; threads make so much more sense.	2024-09-02 00:29:52 -07:00
Gabriel Ravier	a089c07ddc	Fix printf funcs on memory pressure with floats (#1275 ) Cosmopolitan's printf-family functions will currently crash if one tries formatting a floating point number with a larger precision (large enough that gdtoa attempts to allocate memory to format the number) while under memory pressure (i.e. when malloc fails) because gdtoa fails to check if malloc fails. The added tests (which would previously crash under cosmopolitan without this patch) show how to reproduce the issue. This patch fixes this, and adds the aforementioned tests.	2024-09-01 14:42:14 -07:00
Justine Tunney	7c83f4abc8	Make improvements - wcsstr() is now linearly complex - strstr16() is now linearly complex - strstr() is now vectorized on aarch64 (10x) - strstr() now uses KMP on pathological cases - memmem() is now vectorized on aarch64 (10x) - memmem() now uses KMP on pathological cases - Disable shared_ptr::owner_before until fixed - Make iswlower(), iswupper() consistent with glibc - Remove figure space from iswspace() implementation - Include line and paragraph separator in iswcntrl() - Use Musl wcwidth(), iswalpha(), iswpunct(), towlower(), towupper()	2024-09-01 01:27:47 -07:00
Justine Tunney	c9152b6f14	Release Cosmopolitan v3.8.0 This change switches c++ exception handling from sjlj to standard dwarf. It's needed because clang for aarch64 doesn't support sjlj. It turns out that libunwind had a bare-metal configuration that made this easy to do. This change gets the new experimental cosmocc -mclang flag in a state of working so well that it can now be used to build all of llamafile and it goes 3x faster in terms of build latency, without trading away any perf. The int_fast16_t and int_fast32_t types are now always defined as 32-bit in the interest of having more abi consistency between cosmocc -mgcc and -mclang mode.	2024-08-30 20:14:07 -07:00
Justine Tunney	884d89235f	Harden against aba problem	2024-08-26 20:01:55 -07:00
Justine Tunney	610c951f71	Fix the build	2024-08-26 16:44:05 -07:00
Justine Tunney	111ec9a989	Fix bug we added to *NSYNC a while ago This is believed to fix a crash, that's possible in nsync_waiter_free_() when you call pthread_cond_timedwait(), or nsync_cv_wait_with_deadline() where an assertion can fail. Thanks ipv4.games for helping me find this!	2024-08-26 12:25:50 -07:00
Justine Tunney	863c704684	Add string similarity function	2024-08-17 16:45:07 -07:00
Justine Tunney	60e697f7b2	Move LoadZipArgs() to cosmo.h	2024-08-17 12:06:27 -07:00
Justine Tunney	eb6e96f036	Change InfoZIP to not auto-append .zip to pathname	2024-08-17 06:45:23 -07:00
Justine Tunney	77be460290	Make Windows REPLs great again	2024-08-17 06:32:10 -07:00
Justine Tunney	1d532ba3f8	Disable some anti-Musl Lua tests	2024-08-17 02:20:08 -07:00
Justine Tunney	b2a1811c01	Add missing pragma	2024-08-16 21:49:28 -07:00
Justine Tunney	2eda50929b	Add stdfloat header Fixes #1260	2024-08-16 21:38:00 -07:00
Justine Tunney	1671283f1a	Avoid clobbering errno	2024-08-15 23:54:14 -07:00
Justine Tunney	0a79c6961f	Make malloc scalable on all platforms It turns out sched_getcpu() didn't work on many platforms. So the system call now has tests and is well documented. We now employ new workarounds on platforms where it isn't supported in our malloc() implementation. It was previously the case that malloc() was only scalable on Linux/Windows for x86-64. Now the other platforms are scalable too.	2024-08-15 23:32:53 -07:00
Justine Tunney	31194165d2	Remove .internal from more header filenames	2024-08-04 12:52:25 -07:00
Justine Tunney	4ed4a1095a	Improve build latency	2024-07-31 01:21:27 -07:00

1 2 3 4 5 ...

921 commits