Commit graph

858 commits

Author SHA1 Message Date
Justine Tunney
035b0e2a62
Attempt to fix MODE=dbg Windows execve() flake 2025-01-05 14:05:49 -08:00
Justine Tunney
29eb7e67bb
Fix fork() regression on Windows
Recent optimizations to fork() introduced a regression, that could cause
the subprocess to fail unexpectedly, when TlsAlloc() returns a different
index. This is because we were burning the indexes into the displacement
of x86 opcodes. So when fork() happened and the executable memory copied
it would use the old index. Right now the way this is being solved is to
not copy the executable on fork() and then re-apply code changes. If you
need to be able to preserve self-modified code on fork, reach out and we
can implement a better solution for you. This gets us unblocked quickly.
2025-01-05 09:25:23 -08:00
Justine Tunney
42a3bb729a
Make execve() linger when it can't spoof parent
It's now possible to use execve() when the parent process isn't built by
cosmo. In such cases, the current process will kill all threads and then
linger around, waiting for the newly created process to die, and then we
propagate its exit code to the parent. This should help bazel and others

Allocating private anonymous memory is now 5x faster on Windows. This is
thanks to VirtualAlloc() which is faster than the file mapping APIs. The
fork() function also now goes 30% faster, since we are able to avoid the
VirtualProtect() calls on mappings in most cases now.

Fixes #1253
2025-01-04 21:13:37 -08:00
Justine Tunney
b734eec836
Test restricting tests to single cpu 2025-01-03 19:51:09 -08:00
Justine Tunney
fe01642a20
Add missing lock to fork() on Windows 2025-01-03 19:01:58 -08:00
Justine Tunney
8db646f6b2
Fix bug with systemvpe()
See #1253
2025-01-02 09:19:59 -08:00
Justine Tunney
f24c854b28
Write more runtime tests and fix bugs
This change adds tests for the new memory manager code particularly with
its windows support. Function call tracing now works reliably on Silicon
since our function hooker was missing new Apple self-modifying code APIs

Many tests that were disabled a long time ago on aarch64 are reactivated
by this change, now that arm support is on equal terms with x86. There's
been a lot of places where ftrace could cause deadlocks, which have been
hunted down across all platforms thanks to new tests. A bug in Windows's
kill() function has been identified.
2025-01-01 22:25:22 -08:00
Justine Tunney
0b3c81dd4e
Make fork() go 30% faster
This change makes fork() go nearly as fast as sys_fork() on UNIX. As for
Windows this change shaves about 4-5ms off fork() + wait() latency. This
is accomplished by using WriteProcessMemory() from the parent process to
setup the address space of a suspended process; it is better than a pipe
2025-01-01 04:59:38 -08:00
Justine Tunney
98c5847727
Fix fork waiter leak in nsync
This change fixes a bug where nsync waiter objects would leak. It'd mean
that long-running programs like runitd would run out of file descriptors
on NetBSD where waiter objects have ksem file descriptors. On other OSes
this bug is mostly harmless since the worst that can happen with a futex
is to leak a little bit of ram. The bug was caused because tib_nsync was
sneaking back in after the finalization code had cleared it. This change
refactors the thread exiting code to handle nsync teardown appropriately
and in making this change I found another issue, which is that user code
which is buggy, and tries to exit without joining joinable threads which
haven't been detached, would result in a deadlock. That doesn't sound so
bad, except the main thread is a joinable thread. So this deadlock would
be triggered in ways that put libc at fault. So we now auto-join threads
and libc will log a warning to --strace when that happens for any thread
2024-12-31 01:30:13 -08:00
Justine Tunney
a51ccc8fb1
Remove old shuffle header 2024-12-30 03:03:32 -08:00
Justine Tunney
aca4214ff6
Simplify memory manager code 2024-12-28 17:09:28 -08:00
Justine Tunney
379cd77078
Improve memory manager and signal handling
On Windows, mmap() now chooses addresses transactionally. It reduces the
risk of badness when interacting with the WIN32 memory manager. We don't
throw darts anymore. There is also no more retry limit, since we recover
from mystery maps more gracefully. The subroutine for combining adjacent
maps has been rewritten for clarity. The print maps subroutine is better

This change goes to great lengths to perfect the stack overflow code. On
Windows you can now longjmp() out of a crash signal handler. Guard pages
previously weren't being restored properly by the signal handler. That's
fixed, so on Windows you can now handle a stack overflow multiple times.
Great thought has been put into selecting the perfect SIGSTKSZ constants
so you can save sigaltstack() memory. You can now use kprintf() with 512
bytes of stack available. The guard pages beneath the main stack are now
recorded in the memory manager.

This change fixes getcontext() so it works right with the %rax register.
2024-12-27 01:33:00 -08:00
Justine Tunney
36e5861b0c
Reduce stack virtual memory consumption on Linux 2024-12-25 20:58:08 -08:00
Justine Tunney
2de3845b25
Build tool for hunting down flakes 2024-12-24 11:36:16 -08:00
Justine Tunney
55b7aa1632
Allow user to override pthread mutex and cond 2024-12-23 21:57:52 -08:00
Justine Tunney
c8e10eef30
Make bulk_free() go faster 2024-12-23 20:31:57 -08:00
Justine Tunney
624573207e
Make threads faster and more reliable
This change doubles the performance of thread spawning. That's thanks to
our new stack manager, which allows us to avoid zeroing stacks. It gives
us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is
faster now, since it doesn't need to acquire the pthread GIL. On NetBSD,
that helps us avoid allocating too many semaphores. Even if that happens
we're now able to survive semaphores running out and even memory running
out, when allocating *NSYNC waiter objects. I found a lot more rare bugs
in the POSIX threads runtime that could cause things to crash, if you've
got dozens of threads all spawning and joining dozens of threads. I want
cosmo to be world class production worthy for 2025 so happy holidays all
2024-12-21 22:13:00 -08:00
Justine Tunney
906bd06a5a
Fix MODE=tiny build 2024-12-17 01:36:29 -08:00
Justine Tunney
af7bd80430
Eliminate cyclic locks in runtime
This change introduces a new deadlock detector for Cosmo's POSIX threads
implementation. Error check mutexes will now track a DAG of nested locks
and report EDEADLK when a deadlock is theoretically possible. These will
occur rarely, but it's important for production hardening your code. You
don't even need to change your mutexes to use the POSIX error check mode
because `cosmocc -mdbg` will enable error checking on mutexes by default
globally. When cycles are found, an error message showing your demangled
symbols describing the strongly connected component are printed and then
the SIGTRAP is raised, which means you'll also get a backtrace if you're
using ShowCrashReports() too. This new error checker is so low-level and
so pure that it's able to verify the relationships of every libc runtime
lock, including those locks upon which the mutex implementation depends.
2024-12-16 22:25:12 -08:00
Justine Tunney
9cc1bd04b2
Test rwlock more 2024-12-14 09:40:13 -08:00
Justine Tunney
2d43d400c6
Support process shared pthread_rwlock
Cosmo now has a non-nsync implementation of POSIX read-write locks. It's
possible to call pthread_rwlockattr_setpshared in PTHREAD_PROCESS_SHARED
mode. Furthermore, if cosmo is built with PTHREAD_USE_NSYNC set to zero,
then Cosmo shouldn't use nsync at all. That's helpful if you want to not
link any Apache 2.0 licensed code.
2024-12-13 03:00:06 -08:00
Justine Tunney
c22b413ac4
Make strcasestr() faster 2024-12-12 22:50:20 -08:00
Justine Tunney
9ddbfd921e
Introduce cosmo_futex_wait and cosmo_futex_wake
Cosmopolitan Futexes are now exposed as a public API.
2024-11-22 11:25:15 -08:00
Justine Tunney
5c3f854acb
Fix strcasestr()
Fixes #1323
2024-11-20 14:07:17 -08:00
Justine Tunney
cafdb456ed
Strongly link glob() into system() and popen() 2024-11-15 20:37:34 -08:00
Justine Tunney
000d6dbb0f
Make getentropy() faster 2024-10-10 18:45:15 -07:00
Justine Tunney
e4d6eb382a
Make memchr() and memccpy() faster 2024-09-30 05:54:34 -07:00
Gabriel Ravier
333c3d1f0a
Add the uppercase B conversion specifier to printf (#1300)
The (uppercase) B conversion specifier is specified by the C standard to
have the same behavior as the (lowercase) b conversion specifier, except
that whenever the # flag is used, the (uppercase) B conversion specifier
alters a nonzero result by prefixing it with "0B", instead of with "0b".

This commit adds this conversion specifier alongside a few tests for it.
2024-09-26 04:27:45 -07:00
Justine Tunney
518eabadf5
Further optimize poll() on Windows 2024-09-22 22:28:59 -07:00
Gabriel Ravier
476926790a
Make printf %La test-case work properly on AArch64 (#1298)
As the size of long double changes between x86 and AArch64, this results
in one of the printf a conversion specifier test-cases getting different
output between the two. Note that both outputs (and a few more) are 100%
standards-conforming, but the testcase currently only expects a specific
one - the one that had been seen on x86 when initially writing the test.

This patch fixes the testcase so it accepts either of those two outputs.
2024-09-22 01:21:03 -07:00
Justine Tunney
0d74673213
Introduce interprocess signaling on Windows
This change gets rsync working without any warning or errors. On Windows
we now create a bunch of C:\var\sig\x\y.pid shared memory files, so sigs
can be delivered between processes. WinMain() creates this file when the
process starts. If the program links signaling system calls then we make
a thread at startup too, which allows asynchronous delivery each quantum
and cancelation points can spot these signals potentially faster on wait

See #1240
2024-09-19 03:02:13 -07:00
Justine Tunney
56ca00b022
Release Cosmopolitan v3.9.0 2024-09-15 22:39:09 -07:00
Gabriel Ravier
e260d90096
Fix 0 before decimal-point in hex float printf fns (#1297)
The C standard specifies that, upon handling the a conversion specifier,
the argument is converted to a string in which "there is one hexadecimal
digit (which is nonzero [...]) before the decimal-point character", this
being a requirement which cosmopolitan does not currently always handle,
sometimes printing numbers like "0x0.1p+5", where a correct output would
have been e.g. "0x1.0p+1" (despite both representing the same value, the
first one illegally has a '0' digit before the decimal-point character).
2024-09-15 18:02:47 -07:00
Gabriel Ravier
81bc8d0963
Fix printing decimal-point character on printf %#a (#1296)
The C standard indicates that when processing the a conversion specifier
"if the precision is zero *and* the # flag is not specified, no decimal-
point character appears.". This means that __fmt needs to ensure that it
prints the decimal-point character not only when the precision is non-0,
but also when the # flag is specified - cosmopolitan currently does not.

This patch fixes this, along with adding a few tests for this behaviour.
2024-09-15 16:42:51 -07:00
Gabriel Ravier
b55e4d61a9
Hopefully completely fix printf-family %a rounding (#1287)
The a conversion specifier to printf had some issues w.r.t. rounding, in
particular in edge cases w.r.t. "to nearest, ties to even" rounding (for
instance, "%.1a" with 0x1.78p+4 outputted 0x1.7p+4 instead of 0x1.8p+4).

This patch fixes this and adds several tests w.r.t ties to even rounding
2024-09-15 12:11:27 -07:00
Justine Tunney
c260144843
Introduce sigtimedwait() on Windows 2024-09-15 01:18:27 -07:00
Gabriel Ravier
675abfa029
Add POSIX's apostrophe flag for printf-based funcs (#1285)
POSIX specifies the <apostrophe> flag character for printf as formatting
decimal conversions with the thousands' grouping characters specified by
the current locale. Given that cosmopolitan currently has no support for
obtaining the locale's grouping character, all that is required (when in
the C/POSIX locale) for supporting this flag is ignoring it, and as it's
already used to indicate quoting (for non-decimal conversions), all that
has to be done is to avoid having it be an alias for the <space> flag so
that decimal conversions don't accidentally behave as though the <space>
flag has also been specified whenever the <apostrophe> flag is utilized.

This patch adds this flag, as described above, along with a test for it.
2024-09-14 17:17:40 -07:00
Gabriel Ravier
e3d28de8a6
Fix UB in gdtoa hexadecimal float scanf and strtod (#1288)
When reading hexadecimal floats, cosmopolitan would previously sometimes
print a number of warnings relating to undefined behavior on left shift:

third_party/gdtoa/gethex.c:172: ubsan warning: signed left shift changed
sign bit or overflowed 12 'int' 28 'int' is undefined behavior

This is because gdtoa assumes left shifts are safe when overflow happens
even on signed integers - this is false: the C standard considers it UB.
This is easy to fix, by simply casting the shifted value to unsigned, as
doing so does not change the value or the semantics of the left shifting
(except for avoiding the undefined behavior, as the C standard specifies
that unsigned overflow yields wraparound, avoiding undefined behaviour).

This commit does this, and adds a testcase that previously triggered UB.
(this also adds test macros to test for exact float equality, instead of
the existing {EXPECT,ASSERT}_FLOAT_EQ macros which only tests inputs for
being "almost equal" (with a significant epsilon) whereas exact equality
makes more sense for certain things such as reading floats from strings,
and modifies other testcases for sscanf/fscanf of floats to utilize it).
2024-09-14 17:11:04 -07:00
Gabriel Ravier
7f21547122
Fix occasional crash in test/libc/intrin/mmap_test (#1289)
This test would sometimes crash due to the EZBENCH2() macro occasionally
running the first benchmark (BenchMmapPrivate()) less times than it does
the second benchmark (BenchUnmap()) - this would then lead to a crash in
BenchUnmap() because BenchUnmap() expects that BenchMmapPrivate() has to
previously have been called at least as many times as it has itself such
that a region of memory has been mapped, for BenchUnmap() to then unmap.

This commit fixes this by utilizing the newer BENCHMARK() macro (instead
of the EZBENCH2() macro) which runs the benchmark using an count of runs
specified directly by the benchmark itself, which allows us to make sure
that the two benchmark functions get ran the exact same amount of times.
2024-09-14 17:07:56 -07:00
Justine Tunney
1260f9d0ed
Add more tests for strlcpy()
I asked ChatGPT o1 for an optimized version of strlcpy() but it couldn't
produce an implementation that didn't crash. Eventually it just gave up.
2024-09-13 01:14:35 -07:00
Justine Tunney
e142124730
Rewrite Windows connect()
Our old code wasn't working with projects like Qt that call connect() in
O_NONBLOCK mode multiple times. This change overhauls connect() to use a
simpler WSAConnect() API and follows the same pattern as cosmo accept().
This change also reduces the binary footprint of read(), which no longer
needs to depend on our enormous clock_gettime() function.
2024-09-12 23:07:52 -07:00
Justine Tunney
acd6c32184
Rewrite Windows accept()
This change should fix the Windows issues Qt Creator has been having, by
ensuring accept() and accept4() work in O_NONBLOCK mode. I switched away
from AcceptEx() which is buggy, back to using WSAAccept(). This requires
making a tradeoff where we have to accept a busy loop. However it is low
latency in nature, just like our new and improved Windows poll() code. I
was furthermore able to eliminate a bunch of Windows-related test todos.
2024-09-12 04:23:38 -07:00
Justine Tunney
6f868fe1de
Fix polling of files on Windows 2024-09-11 17:13:23 -07:00
Gabriel Ravier
4d05060aac
Partially fix printf hex float numbers/%a rounding (#1286)
Hexadecimal printing of floating-point numbers in cosmopolitan (that is,
using the the conversion specifier) is improved to have correct rounding
of results in rounding modes other than the default one (ie. FE_NEAREST)

This commit fixes that, and adds tests for the change (note that there's
still some rounding issues with the a conversion specifier in general in
relatively rare cases (that is without non-default rounding modes) where
I've left commented-out tests for anyone interested in improving it more
2024-09-10 20:42:52 -07:00
Justine Tunney
fbdf9d028c
Rewrite Windows poll()
We can now await signals, files, pipes, and console simultaneously. This
change also gives a deeper review and testing to changes made yesterday.
2024-09-10 20:04:02 -07:00
Justine Tunney
cceddd21b2
Reduce latency of poll() on Windows
When polling sockets poll() can now let you know about an event in about
10µs rather than 10ms. If you're not polling sockets then poll() reports
console events now in microseconds instead of milliseconds.
2024-09-10 04:12:21 -07:00
Justine Tunney
a0a404a431
Fix issues with previous commit 2024-09-10 01:59:46 -07:00
Justine Tunney
2f48a02b44
Make recursive mutexes faster
Recursive mutexes now go as fast as normal mutexes. The tradeoff is they
are no longer safe to use in signal handlers. However you can still have
signal safe mutexes if you set your mutex to both recursive and pshared.
You can also make functions that use recursive mutexes signal safe using
sigprocmask to ensure recursion doesn't happen due to any signal handler

The impact of this change is that, on Windows, many functions which edit
the file descriptor table rely on recursive mutexes, e.g. open(). If you
develop your app so it uses pread() and pwrite() then your app should go
very fast when performing a heavily multithreaded and contended workload

For example, when scaling to 40+ cores, *NSYNC mutexes can go as much as
1000x faster (in CPU time) than the naive recursive lock implementation.
Now recursive will use *NSYNC under the hood when it's possible to do so
2024-09-10 00:08:59 -07:00
Justine Tunney
95fee8614d
Test recursive mutex code more 2024-09-09 00:19:23 -07:00
Gabriel Ravier
4754f200ee
Fix printf-family long double prec/rounding issues (#1283)
Currently, in cosmopolitan, there is no handling of the current rounding
mode for long double conversions, such that round-to-nearest gets always
used, regardless of the current rounding mode. %Le also improperly calls
gdtoa with a too small precision (which led to relatively similar bugs).

This patch fixes these issues, in particular by modifying the FPI object
passed to gdtoa such that it is modifiable (so that __fmt can adjust its
rounding field to correspond to FLT_ROUNDS (note that this is not needed
for dtoa, which checks FLT_ROUNDS directly)) and ors STRTOG_Neg into the
kind field in both of the __fmt_dfpbits and __fmt_ldfpbits functions, as
the gdtoa function also depends on it to be able to accurately round any
negative arguments. The change to kind also requires a few other changes
to make sure kind's upper bits (which include STRTOG_Neg) are masked off
when attempting to only examine the lower bits' value. Furthermore, this
patch also makes exactly one change in gdtoa, which appears to be needed
to fix rounding issues with FE_TOWARDZERO (this seems like a gdtoa bug).

The patch also adds a few tests for these issues, along with also taking
the opportunity to clean up some of the previous tests to do the asserts
in the right order (i.e. with the first argument as the expected result,
and the second one being used as the value that it is compared against).
2024-09-07 18:26:04 -07:00