Commit graph

113 commits

Author SHA1 Message Date
Justine Tunney
035b0e2a62
Attempt to fix MODE=dbg Windows execve() flake 2025-01-05 14:05:49 -08:00
Justine Tunney
29eb7e67bb
Fix fork() regression on Windows
Recent optimizations to fork() introduced a regression, that could cause
the subprocess to fail unexpectedly, when TlsAlloc() returns a different
index. This is because we were burning the indexes into the displacement
of x86 opcodes. So when fork() happened and the executable memory copied
it would use the old index. Right now the way this is being solved is to
not copy the executable on fork() and then re-apply code changes. If you
need to be able to preserve self-modified code on fork, reach out and we
can implement a better solution for you. This gets us unblocked quickly.
2025-01-05 09:25:23 -08:00
Justine Tunney
f71f61cd40
Add some temporary logging statements 2025-01-04 23:37:32 -08:00
Justine Tunney
42a3bb729a
Make execve() linger when it can't spoof parent
It's now possible to use execve() when the parent process isn't built by
cosmo. In such cases, the current process will kill all threads and then
linger around, waiting for the newly created process to die, and then we
propagate its exit code to the parent. This should help bazel and others

Allocating private anonymous memory is now 5x faster on Windows. This is
thanks to VirtualAlloc() which is faster than the file mapping APIs. The
fork() function also now goes 30% faster, since we are able to avoid the
VirtualProtect() calls on mappings in most cases now.

Fixes #1253
2025-01-04 21:13:37 -08:00
Justine Tunney
fe01642a20
Add missing lock to fork() on Windows 2025-01-03 19:01:58 -08:00
Justine Tunney
a15958edc6
Remove some legacy cruft
Function trace logs will report stack usage accurately. It won't include
the argv/environ block. Our clone() polyfill is now simpler and does not
use as much stack memory. Function call tracing on x86 is now faster too
2025-01-02 18:44:07 -08:00
Justine Tunney
fde03f8487
Remove leaf attribute where appropriate
This change fixes a bug where gcc assumed thread synchronization such as
pthread_cond_wait() wouldn't alter static variables, because the headers
were using __attribute__((__leaf__)) inappropriately.
2025-01-02 08:07:15 -08:00
Justine Tunney
f24c854b28
Write more runtime tests and fix bugs
This change adds tests for the new memory manager code particularly with
its windows support. Function call tracing now works reliably on Silicon
since our function hooker was missing new Apple self-modifying code APIs

Many tests that were disabled a long time ago on aarch64 are reactivated
by this change, now that arm support is on equal terms with x86. There's
been a lot of places where ftrace could cause deadlocks, which have been
hunted down across all platforms thanks to new tests. A bug in Windows's
kill() function has been identified.
2025-01-01 22:25:22 -08:00
Justine Tunney
0b3c81dd4e
Make fork() go 30% faster
This change makes fork() go nearly as fast as sys_fork() on UNIX. As for
Windows this change shaves about 4-5ms off fork() + wait() latency. This
is accomplished by using WriteProcessMemory() from the parent process to
setup the address space of a suspended process; it is better than a pipe
2025-01-01 04:59:38 -08:00
Justine Tunney
98c5847727
Fix fork waiter leak in nsync
This change fixes a bug where nsync waiter objects would leak. It'd mean
that long-running programs like runitd would run out of file descriptors
on NetBSD where waiter objects have ksem file descriptors. On other OSes
this bug is mostly harmless since the worst that can happen with a futex
is to leak a little bit of ram. The bug was caused because tib_nsync was
sneaking back in after the finalization code had cleared it. This change
refactors the thread exiting code to handle nsync teardown appropriately
and in making this change I found another issue, which is that user code
which is buggy, and tries to exit without joining joinable threads which
haven't been detached, would result in a deadlock. That doesn't sound so
bad, except the main thread is a joinable thread. So this deadlock would
be triggered in ways that put libc at fault. So we now auto-join threads
and libc will log a warning to --strace when that happens for any thread
2024-12-31 01:30:13 -08:00
Justine Tunney
aca4214ff6
Simplify memory manager code 2024-12-28 17:09:28 -08:00
Justine Tunney
36e5861b0c
Reduce stack virtual memory consumption on Linux 2024-12-25 20:58:08 -08:00
Justine Tunney
cc8a9eb93c
Document execve() limitation on Windows
Closes #1253
2024-12-24 12:20:48 -08:00
Justine Tunney
93e22c581f
Reduce pthread memory usage 2024-12-24 10:30:59 -08:00
Justine Tunney
55b7aa1632
Allow user to override pthread mutex and cond 2024-12-23 21:57:52 -08:00
Justine Tunney
4705705548
Fix bugs in times() function 2024-12-23 20:57:10 -08:00
Justine Tunney
624573207e
Make threads faster and more reliable
This change doubles the performance of thread spawning. That's thanks to
our new stack manager, which allows us to avoid zeroing stacks. It gives
us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is
faster now, since it doesn't need to acquire the pthread GIL. On NetBSD,
that helps us avoid allocating too many semaphores. Even if that happens
we're now able to survive semaphores running out and even memory running
out, when allocating *NSYNC waiter objects. I found a lot more rare bugs
in the POSIX threads runtime that could cause things to crash, if you've
got dozens of threads all spawning and joining dozens of threads. I want
cosmo to be world class production worthy for 2025 so happy holidays all
2024-12-21 22:13:00 -08:00
Justine Tunney
af7bd80430
Eliminate cyclic locks in runtime
This change introduces a new deadlock detector for Cosmo's POSIX threads
implementation. Error check mutexes will now track a DAG of nested locks
and report EDEADLK when a deadlock is theoretically possible. These will
occur rarely, but it's important for production hardening your code. You
don't even need to change your mutexes to use the POSIX error check mode
because `cosmocc -mdbg` will enable error checking on mutexes by default
globally. When cycles are found, an error message showing your demangled
symbols describing the strongly connected component are printed and then
the SIGTRAP is raised, which means you'll also get a backtrace if you're
using ShowCrashReports() too. This new error checker is so low-level and
so pure that it's able to verify the relationships of every libc runtime
lock, including those locks upon which the mutex implementation depends.
2024-12-16 22:25:12 -08:00
Justine Tunney
26c051c297
Spoof PID across execve() on Windows
It's now possible with cosmo and redbean, to deliver a signal to a child
process after it has called execve(). However the executed program needs
to be compiled using cosmocc. The cosmo runtime WinMain() implementation
now intercepts a _COSMO_PID environment variable that's set by execve().
It ensures the child process will use the same C:\ProgramData\cosmo\sigs
file, which is where kill() will place the delivered signal. We are able
to do this on Windows even better than NetBSD, which has a bug with this

Fixes #1334
2024-12-14 13:13:08 -08:00
Justine Tunney
cafdb456ed
Strongly link glob() into system() and popen() 2024-11-15 20:37:34 -08:00
Justine Tunney
dc1afc968b
Fix fork() crash on Windows
On Windows, sometimes fork() could crash with message likes:

    fork() ViewOrDie(170000) failed with win32 error 487

This is due to a bug in our file descriptor inheritance. We have cursors
which are shared between processes. They let us track the file positions
of read() and write() operations. At startup they were being mmap()ed to
memory addresses that were assigned by WIN32. That's bad because Windows
likes to give us memory addresses beneath the program image in the first
4mb range that are likely to conflict with other assignments. That ended
up causing problems because fork() needs to be able to assume that a map
will be possible to resurrect at the same address. But for one reason or
another, Windows libraries we don't control could sneak allocations into
the memory space that overlap with these mappings. This change solves it
by choosing a random memory address instead when mapping cursor objects.
2024-10-12 15:38:58 -07:00
Justine Tunney
706cb66310
Tune posix_spawn() successful fix 2024-10-11 07:10:43 -07:00
Justine Tunney
d8fac40f55
Attempt to fix Emacs spawning issue 2024-10-11 06:09:46 -07:00
Justine Tunney
518eabadf5
Further optimize poll() on Windows 2024-09-22 22:28:59 -07:00
Justine Tunney
dd8c4dbd7d
Write more tests for signal handling
There's now a much stronger level of assurance that signaling on Windows
will be atomic, low-latency, low tail latency, and shall never deadlock.
2024-09-21 05:24:56 -07:00
Justine Tunney
f68fc1f815
Put more thought into new signaling code 2024-09-19 20:21:33 -07:00
Justine Tunney
d50f4c02f6
Make revision to previous change 2024-09-19 03:29:39 -07:00
Justine Tunney
0d74673213
Introduce interprocess signaling on Windows
This change gets rsync working without any warning or errors. On Windows
we now create a bunch of C:\var\sig\x\y.pid shared memory files, so sigs
can be delivered between processes. WinMain() creates this file when the
process starts. If the program links signaling system calls then we make
a thread at startup too, which allows asynchronous delivery each quantum
and cancelation points can spot these signals potentially faster on wait

See #1240
2024-09-19 03:02:13 -07:00
Justine Tunney
8527462b95
Fix ECHILD with WNOHANG on Windows
This change along with a patch for rsync's safe_write() function that'll
that'll soon be added to superconfigure, gets rsync working. There's one
remaining issue (which isn't a blocker) which is how rsync logs an error
about abnormal process termination since there's currently no way for us
to send non-fatal signals between processes. rsync in cosmos is restored

Fixes #1240
2024-09-18 23:26:02 -07:00
Justine Tunney
b73673e984
Freshen bootstrap binaries 2024-09-15 20:37:32 -07:00
Justine Tunney
949c398327
Clean up more code 2024-09-15 02:45:16 -07:00
Justine Tunney
a0a404a431
Fix issues with previous commit 2024-09-10 01:59:46 -07:00
Justine Tunney
03875beadb
Add missing ICANON features 2024-09-05 03:17:19 -07:00
Justine Tunney
3c61a541bd
Introduce pthread_condattr_setclock()
This is one of the few POSIX APIs that was missing. It lets you choose a
monotonic clock for your condition variables. This might improve perf on
some platforms. It might also grant more flexibility with NTP configs. I
know Qt is one project that believes it needs this. To introduce this, I
needed to change some the *NSYNC APIs, to support passing a clock param.
There's also new benchmarks, demonstrating Cosmopolitan's supremacy over
many libc implementations when it comes to mutex performance. Cygwin has
an alarmingly bad pthread_mutex_t implementation. It is so bad that they
would have been significantly better off if they'd used naive spinlocks.
2024-09-02 23:45:42 -07:00
Justine Tunney
2ec413b5a9
Fix bugs in poll(), select(), ppoll(), and pselect()
poll() and select() now delegate to ppoll() and pselect() for assurances
that both polyfill implementations are correct and well-tested. Poll now
polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other
similar status flags. This change resolves a misunderstanding concerning
how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for
bouncing requests that exceed the 64 handle limit on Windows. With pipes
and consoles on Windows our poll impl will now report POLLHUP correctly.

Issues with Windows path generation have been fixed. For example, it was
problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/")
due to the need to un-UNC paths in some additional places. Calling fstat
on UNC style volume path handles will now work. posix_spawn now supports
simulating the opening of /dev/null and other special paths on Windows.

Cosmopolitan no longer defines epoll(). I think wepoll is a nice project
for using epoll() on Windows socket handles. However we need generalized
file descriptor support to make epoll() for Windows work well enough for
inclusion in a C library. It's also not worth having epoll() if we can't
get it to work on XNU and BSD OSes which provide different abstractions.
Even epoll() on Linux isn't that great of an abstraction since it's full
of footguns. Last time I tried to get it to be useful I had little luck.
Considering how long it took to get poll() and select() to be consistent
across platforms, we really have no business claiming to have epoll too.
While it'd be nice to have fully implemented, the only software that use
epoll() are event i/o libraries used by things like nodejs. Event i/o is
not the best paradigm for handling i/o; threads make so much more sense.
2024-09-02 00:29:52 -07:00
Justine Tunney
39e7f24947
Fix handling of paths with dirfd on Windows
This change fixes an issue with all system calls ending with *at(), when
the caller passes `dirfd != AT_FDCWD` and an absolute path. It's because
the old code was turning paths like C:\bin\ls into \\C:\bin\ls\C:\bin\ls
after being converted from paths like /C/bin/ls. I noticed this when the
Emacs dired mode stopped working. It's unclear if it's a regression with
Cosmopolitan Libc or if this was introduced by the Emacs v29 upgrade. It
also impacted posix_spawn() for which a newly minted example now exists.
2024-09-01 17:52:30 -07:00
Justine Tunney
cca0edd62b
Make pthread mutex non-recursive 2024-09-01 02:05:17 -07:00
Justine Tunney
bb06230f1e
Avoid linker conflicts on DescribeFoo symbols
These symbols belong to the user. It caused a confusing error for Blink.
2024-08-24 18:10:22 -07:00
Justine Tunney
31194165d2
Remove .internal from more header filenames 2024-08-04 12:52:25 -07:00
Justine Tunney
3f26dfbb31
Share file offset across execve() on Windows
This is a breaking change. It defines the new environment variable named
_COSMO_FDS_V2 which is used for inheriting non-stdio file descriptors on
execve() or posix_spawn(). No effort has been spent thus far integrating
with the older variable. If a new binary launches the older ones or vice
versa they'll only be able to pass stdin / stdout / stderr to each other
therefore it's important that you upgrade all your cosmo binaries if you
depend on this functionality. You'll be glad you did because inheritance
of file descriptors is more aligned with the POSIX standard than before.
2024-08-03 17:48:00 -07:00
Justine Tunney
761c6ad615
Share file offset across processes
This change ensures that if a file descriptor for an open disk file gets
shared by multiple processes within a process tree, then lseek() changes
will be visible across processes, and read() / write() are synchronized.
Note this only applies to Windows, because UNIX kernels already do this.
2024-08-03 01:39:11 -07:00
Justine Tunney
cf1559c448
Remove __threaded variable 2024-07-28 23:43:30 -07:00
Justine Tunney
cdfcee51ca
Properly serialize fork() operations
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.

   threads      count task
         1       1062 fork+exit+wait
         2        668 fork+exit+wait
         4         66 fork+exit+wait
         8         19 fork+exit+wait
        16         22 fork+exit+wait
        32         16 fork+exit+wait

Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions

   threads      count task
         1       1085 fork+exit+wait
         2        842 fork+exit+wait
         4        532 fork+exit+wait
         8        400 fork+exit+wait
        16        276 fork+exit+wait
        32         66 fork+exit+wait

With OpenBSD which also lacks vfork(), things were just as bad as NetBSD

   threads      count task
         1        584 fork+exit+wait
         2        687 fork+exit+wait
         4        206 fork+exit+wait
         8         24 fork+exit+wait
        16         33 fork+exit+wait
        32         26 fork+exit+wait

But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC

   threads      count task
         1        525 fork+exit+wait
         2        580 fork+exit+wait
         4        451 fork+exit+wait
         8        479 fork+exit+wait
        16        408 fork+exit+wait
        32        373 fork+exit+wait

This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating

Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.

   threads      count task
         1       2559 vfork+exit+wait
         2       5389 vfork+exit+wait
         4      34933 vfork+exit+wait
         8      43273 vfork+exit+wait
        16      49648 vfork+exit+wait
        32      40247 vfork+exit+wait

So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
2024-07-27 08:23:44 -07:00
Justine Tunney
642e9cb91a
Introduce cosmocc flags -mdbg -mtiny -moptlinux
The cosmocc.zip toolchain will now include four builds of the libcosmo.a
runtime libraries. You can pass the -mdbg flag if you want to debug your
cosmopolitan runtime. You can pass the -moptlinux flag if you don't want
windows code lurking in your binary. See tool/cosmocc/README.md for more
details on how these flags may be used and their important implications.
2024-07-26 05:10:25 -07:00
Justine Tunney
59692b0882
Make spinlocks faster (take two)
This change is green on x86 and arm test fleet.
2024-07-26 00:45:24 -07:00
Justine Tunney
d3a13e8d70
Improve lock hierarchy
- NetBSD no longer needs a spin lock to create semaphores
- Windows fork() now locks process manager in correct order
2024-07-24 16:05:48 -07:00
Justine Tunney
5660ec4741
Release Cosmopolitan v3.6.0
This release is an atomic upgrade to GCC 14.1.0 with C23 and C++23
2024-07-23 03:28:19 -07:00
Justine Tunney
5d2d9e9640
Add back missing TlsAlloc() call
Cosmopolitan Libc once called this important function although somewhere
along the way, possibly in a refactoring, it got removed and __tls_alloc
has always been zero ever since.
2024-07-21 20:45:27 -07:00
Justine Tunney
7ebaff34c6
Fix ctype.h and wctype.h 2024-07-21 15:54:17 -07:00
Justine Tunney
2018cac11f
Use better memory strategy on Windows
Rather than using the the rollo global to pick addresses, we select them
randomly now using a conservative vaspace.
2024-07-20 02:20:03 -07:00