When programs like ar.ape and compile.ape are run on eCryptFs partitions
on Linux, copy_file_range() will fail with EINVAL which is wrong because
eCryptFs which doesn't support this system call, should raise EOPNOTSUPP
See https://github.com/jart/cosmopolitan/discussions/1305
We were too zealous about security before by only setting the owner bits
and that would cause issues for projects like redbean that check "other"
bits to determine if it's safe to serve a file. Since that doesn't exist
on Windows, it's better to have things work than not work. So what we'll
do instead is return modes like 0664 for files and 0775 for directories.
This change gets rsync working without any warning or errors. On Windows
we now create a bunch of C:\var\sig\x\y.pid shared memory files, so sigs
can be delivered between processes. WinMain() creates this file when the
process starts. If the program links signaling system calls then we make
a thread at startup too, which allows asynchronous delivery each quantum
and cancelation points can spot these signals potentially faster on wait
See #1240
Spawning processes would leak lots of memory, due to a missing free call
in ntspawn(). Our tooling never caught this since ntspawn() must use the
WIN32 memory allocator. It means every time posix_spawn, fork, or execve
got called, we would leak 162kb of memory. I'm proud to say that's fixed
You would think this is an important bug fix, but unfortunately all UNIX
implementations I've evaluated have a bug in read that causes signals to
not be handled atomically. The only exception is the latest iteration of
Cosmopolitan's read/write polyfill on Windows, which is somewhat ironic.
While always blocking statx did not lead to particularly bad results for
most cases (most code that uses statx appears to utilize a fallback when
statx is unavailable), it does lead to using usually far less used (thus
far less well tested) code: for example, musl's current fstatat fallback
for statx fails to set any values for stx_rdev_major and stx_rdev_minor,
which the raw syscall wouldn't (I've have sent a patch to musl for this,
but this won't fix older versions of musl and binaries/OSes using them).
Along with the fact that statx extends stat in several useful ways, this
seems to indicate it is far better to simply allow statx whenever pledge
also allows stat-family syscalls, i.e. for both rpath and wpath pledges.
Our old code wasn't working with projects like Qt that call connect() in
O_NONBLOCK mode multiple times. This change overhauls connect() to use a
simpler WSAConnect() API and follows the same pattern as cosmo accept().
This change also reduces the binary footprint of read(), which no longer
needs to depend on our enormous clock_gettime() function.
This change should fix the Windows issues Qt Creator has been having, by
ensuring accept() and accept4() work in O_NONBLOCK mode. I switched away
from AcceptEx() which is buggy, back to using WSAAccept(). This requires
making a tradeoff where we have to accept a busy loop. However it is low
latency in nature, just like our new and improved Windows poll() code. I
was furthermore able to eliminate a bunch of Windows-related test todos.
It appears that GetFileAttributes(u"\\etc\\passwd") can take two seconds
on Windows 10 at unpredictable times for reasons which are mysterious to
me. Let's try avoiding that path entirely and pray to Microsoft it works
This issue probably only impacted the earliest releases of Windows 7 and
we only support Windows 10+ these days, so it's not worth adding 2000 ms
of startup latency to vim when ~/.vim doesn't exist.
This change fixes a CreateProcess failure when a process is spawned with
no handles inherited. This is due to violating a common design rule in C
that works as follows: when you have (ptr, size) the ptr must be ignored
when size is zero. That's because cosmo's malloc(0) always returns a non
null pointer, which was happening in __describe_fds(), but ntspawn() was
basing its decision off the nullness of the pointer rather than its size
When polling sockets poll() can now let you know about an event in about
10µs rather than 10ms. If you're not polling sockets then poll() reports
console events now in microseconds instead of milliseconds.
Recursive mutexes now go as fast as normal mutexes. The tradeoff is they
are no longer safe to use in signal handlers. However you can still have
signal safe mutexes if you set your mutex to both recursive and pshared.
You can also make functions that use recursive mutexes signal safe using
sigprocmask to ensure recursion doesn't happen due to any signal handler
The impact of this change is that, on Windows, many functions which edit
the file descriptor table rely on recursive mutexes, e.g. open(). If you
develop your app so it uses pread() and pwrite() then your app should go
very fast when performing a heavily multithreaded and contended workload
For example, when scaling to 40+ cores, *NSYNC mutexes can go as much as
1000x faster (in CPU time) than the naive recursive lock implementation.
Now recursive will use *NSYNC under the hood when it's possible to do so
The worst issue I had with consts.sh for clock_gettime is how it defined
too many clocks. So I looked into these clocks all day to figure out how
how they overlap in functionality. I discovered counter-intuitive things
such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and
that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10
also has some incredible new APIs, that let us simplify clock_gettime().
- Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime()
- Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime()
- Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime()
- Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise()
Documentation on the clock crew has been added to clock_gettime() in the
docstring and in redbean's documentation too. You can read that to learn
interesting facts about eight essential clocks that survived this purge.
This is original research you will not find on Google, OpenAI, or Claude
I've tested this change by porting *NSYNC to become fully clock agnostic
since it has extensive tests for spotting irregularities in time. I have
also included these tests in the default build so they no longer need to
be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across
the entire amd64 and arm64 test fleets.
This is one of the few POSIX APIs that was missing. It lets you choose a
monotonic clock for your condition variables. This might improve perf on
some platforms. It might also grant more flexibility with NTP configs. I
know Qt is one project that believes it needs this. To introduce this, I
needed to change some the *NSYNC APIs, to support passing a clock param.
There's also new benchmarks, demonstrating Cosmopolitan's supremacy over
many libc implementations when it comes to mutex performance. Cygwin has
an alarmingly bad pthread_mutex_t implementation. It is so bad that they
would have been significantly better off if they'd used naive spinlocks.
On Windows file system tools like `ls` would print errors when they find
things like WSL symlinks, which can't be read by WIN32. I don't know how
they got on my hard drive but this change ensures Cosmo will handle them
more gracefully. If a reparse point can't be followed, then fstatat will
return information about the link itself. If readlink encounters reparse
points that are WIN32 symlinks, then it'll log more helpful details when
using MODE=dbg (a.k.a. cosmocc -mdbg). Speaking of which, this change is
also going to help you troubleshoot locks; when you build your app using
the cosmocc -mdbg flag your --strace logs will now show lock acquisition
poll() and select() now delegate to ppoll() and pselect() for assurances
that both polyfill implementations are correct and well-tested. Poll now
polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other
similar status flags. This change resolves a misunderstanding concerning
how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for
bouncing requests that exceed the 64 handle limit on Windows. With pipes
and consoles on Windows our poll impl will now report POLLHUP correctly.
Issues with Windows path generation have been fixed. For example, it was
problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/")
due to the need to un-UNC paths in some additional places. Calling fstat
on UNC style volume path handles will now work. posix_spawn now supports
simulating the opening of /dev/null and other special paths on Windows.
Cosmopolitan no longer defines epoll(). I think wepoll is a nice project
for using epoll() on Windows socket handles. However we need generalized
file descriptor support to make epoll() for Windows work well enough for
inclusion in a C library. It's also not worth having epoll() if we can't
get it to work on XNU and BSD OSes which provide different abstractions.
Even epoll() on Linux isn't that great of an abstraction since it's full
of footguns. Last time I tried to get it to be useful I had little luck.
Considering how long it took to get poll() and select() to be consistent
across platforms, we really have no business claiming to have epoll too.
While it'd be nice to have fully implemented, the only software that use
epoll() are event i/o libraries used by things like nodejs. Event i/o is
not the best paradigm for handling i/o; threads make so much more sense.
This change fixes an issue with all system calls ending with *at(), when
the caller passes `dirfd != AT_FDCWD` and an absolute path. It's because
the old code was turning paths like C:\bin\ls into \\C:\bin\ls\C:\bin\ls
after being converted from paths like /C/bin/ls. I noticed this when the
Emacs dired mode stopped working. It's unclear if it's a regression with
Cosmopolitan Libc or if this was introduced by the Emacs v29 upgrade. It
also impacted posix_spawn() for which a newly minted example now exists.
It turns out sched_getcpu() didn't work on many platforms. So the system
call now has tests and is well documented. We now employ new workarounds
on platforms where it isn't supported in our malloc() implementation. It
was previously the case that malloc() was only scalable on Linux/Windows
for x86-64. Now the other platforms are scalable too.
This is a breaking change. It defines the new environment variable named
_COSMO_FDS_V2 which is used for inheriting non-stdio file descriptors on
execve() or posix_spawn(). No effort has been spent thus far integrating
with the older variable. If a new binary launches the older ones or vice
versa they'll only be able to pass stdin / stdout / stderr to each other
therefore it's important that you upgrade all your cosmo binaries if you
depend on this functionality. You'll be glad you did because inheritance
of file descriptors is more aligned with the POSIX standard than before.
This change ensures that if a file descriptor for an open disk file gets
shared by multiple processes within a process tree, then lseek() changes
will be visible across processes, and read() / write() are synchronized.
Note this only applies to Windows, because UNIX kernels already do this.
We now have implement all of Musl's localization code, the same way that
Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"),
just in case anything stops working as expected.