- On Windows connect() can now be interrupted by a signal; connect() w/
O_NONBLOCK will now raise EINPROGRESS; and connect() with SO_SNDTIMEO
will raise ETIMEDOUT after the interval has elapsed.
- We now get the AcceptEx(), ConnectEx(), and TransmitFile() functions
from the WIN32 API the officially blessed way, using WSAIoctl().
- Do nothing on Windows when fsync() is called on a directory handle.
This was raising EACCES earlier becaues GENERIC_WRITE is required on
the handle. It's possible to FlushFileBuffers() a directory handle if
it's opened with write access but MSDN doesn't document what it does.
If you have any idea, please let us know!
- Prefer manual reset event objects for read() and write() on Windows.
- Do some code cleanup on our dlmalloc customizations.
- Fix errno type error in Windows blocking routines.
- Make the futex polyfill simpler and faster.
This change deletes mkfifo() so that GNU Make on Windows will work in
parallel mode using its pipe-based implementation. There's an example
called greenbean2 now, which shows how to build a scalable web server
for Windows with 10k+ threads. The accuracy of clock_nanosleep is now
significantly improved on Linux.
- We now serialize the file descriptor table when spawning / executing
processes on Windows. This means you can now inherit more stuff than
just standard i/o. It's needed by bash, which duplicates the console
to file descriptor #255. We also now do a better job serializing the
environment variables, so you're less likely to encounter E2BIG when
using your bash shell. We also no longer coerce environ to uppercase
- execve() on Windows now remotely controls its parent process to make
them spawn a replacement for itself. Then it'll be able to terminate
immediately once the spawn succeeds, without having to linger around
for the lifetime as a shell process for proxying the exit code. When
process worker thread running in the parent sees the child die, it's
given a handle to the new child, to replace it in the process table.
- execve() and posix_spawn() on Windows will now provide CreateProcess
an explicit handle list. This allows us to remove handle locks which
enables better fork/spawn concurrency, with seriously correct thread
safety. Other codebases like Go use the same technique. On the other
hand fork() still favors the conventional WIN32 inheritence approach
which can be a little bit messy, but is *controlled* by guaranteeing
perfectly clean slates at both the spawning and execution boundaries
- sigset_t is now 64 bits. Having it be 128 bits was a mistake because
there's no reason to use that and it's only supported by FreeBSD. By
using the system word size, signal mask manipulation on Windows goes
very fast. Furthermore @asyncsignalsafe funcs have been rewritten on
Windows to take advantage of signal masking, now that it's much more
pleasant to use.
- All the overlapped i/o code on Windows has been rewritten for pretty
good signal and cancelation safety. We're now able to ensure overlap
data structures are cleaned up so long as you don't longjmp() out of
out of a signal handler that interrupted an i/o operation. Latencies
are also improved thanks to the removal of lots of "busy wait" code.
Waits should be optimal for everything except poll(), which shall be
the last and final demon we slay in the win32 i/o horror show.
- getrusage() on Windows is now able to report RUSAGE_CHILDREN as well
as RUSAGE_SELF, thanks to aggregation in the process manager thread.
It's now possible to use sigaltstack() to recover from stack overflows
on Windows. Several bugs in sigaltstack() have been fixed, for all our
supported platforms. There's a newer better example showing how to use
this, along with three independent unit tests just to further showcase
the various techniques.
The `cat` command now works properly, when run by itself on the bash
command prompt. It's working beautifully so far, and is only missing
a few keystrokes for clearing words and lines. Definitely works more
well than the one that ships with WIN32 :-)
Thanks to @autumnjolitz (in #876) the Cosmopolitan codebase is now
acquainted with Apple's outstanding ulock system calls which offer
something much closer to futexes than Grand Central Dispatch which
wasn't quite as good, since its wait function can't be interrupted
by signals (therefore necessitating a busy loop) and it also needs
semaphore objects to be created and freed. Even though ulock is an
internal Apple API, strictly speaking, the benefits of futexes are
so great that it's worth the risk for now especially since we have
the GCD implementation still as a quick escape hatch if it changes
Here's why this change is important for x86 XNU users. Cosmo has a
suboptimal polyfill when the operating system doesn't offer an API
that let's us implement futexes properly. Sadly we had to use that
on X86 XNU until now. The polyfill works using clock_nanosleep, to
poll the futex in a busy loop with exponential backoff. On XNU x86
clock_nanosleep suffers from us not being able to use a fast clock
gettime implementation, which had a compounding effect that's made
the polyfill function even more poorly. On X86 XNU we also need to
polyfill sched_yield() using select(), which made things even more
troublesome. Now that we have futexes we don't have any busy loops
anymore for both condition variables and thread joining so optimal
performance is attained. To demonstrate, consider these benchmarks
Before:
$ ./lockscale_test.com -b
consumed 38.8377 seconds real time and
0.087131 seconds cpu time
After:
$ ./lockscale_test.com -b
consumed 0.007955 seconds real time and
0.011515 seconds cpu time
Fixes#876
This reverts commit b01282e23e. Some tests
are broken. It's not clear how it'll impact metal yet. Let's revisit the
memory optimization benefits of this change again sometime soon.
There was a glitch in the refactoring of __map_phdrs( ) in
commit ec480f5aa0, which caused it to try to map the PT_NOTE
program segment into virtual memory, which in turn caused the
memory page at BANE to be wrongly remapped to the start of
the program image (0x100000) rather than physical address 0.
This affected subsequent page allocation operations because
the `struct mman` was located at BANE + 0x0500.
Co-authored-by: tkchia <tkchia-cosmo@gmx.com>
This reduces the virtual memory usage of Emacs for me by 30%. We now
have a simpler implementation that uses read(), rather mmap()ing the
whole executable.
- This change fixes a bug that allowed unbuffered printf() output (to
streams like stderr) to be truncated. This regression was introduced
some time between now and the last release.
- POSIX specifies all functions as thread safe by default. This change
works towards cleaning up our use of the @threadsafe / @threadunsafe
documentation annotations to reflect that. The goal is (1) to use
@threadunsafe to document functions which POSIX say needn't be thread
safe, and (2) use @threadsafe to document functions that we chose to
implement as thread safe even though POSIX didn't mandate it.
- Tidy up the clock_gettime() implementation. We're now trying out a
cleaner approach to system call support that aims to maintain the
Linux errno convention as long as possible. This also fixes bugs that
existed previously, where the vDSO errno wasn't being translated
properly. The gettimeofday() system call is now a wrapper for
clock_gettime(), which reduces bloat in apps that use both.
- The recently-introduced improvements to the execute bit on Windows has
had bugs fixed. access(X_OK) on a directory on Windows now succeeds.
fstat() will now perform the MZ/#! ReadFile() operation correctly.
- Windows.h is no longer included in libc/isystem/, because it confused
PCRE's build system into thinking Cosmopolitan is a WIN32 platform.
Cosmo's Windows.h polyfill was never even really that good, since it
only defines a subset of the subset of WIN32 APIs that Cosmo defines.
- The setlongerjmp() / longerjmp() APIs are removed. While they're nice
APIs that are superior to the standardized setjmp / longjmp functions,
they weren't superior enough to not be dead code in the monorepo. If
you use these APIs, please file an issue and they'll be restored.
- The .com appending magic has now been removed from APE Loader.
This test broke itself due to relying on the current time. Mocking out
gettimeofday() confirms this. However, two ipv4 subject alt name tests
still surprisingly fail even with a fake current time. We'll want this
investigated further soon.
- ARM Neon headers are now exported in libc/isystem/
- stat() and access() now do a better job reporting which files are
executable which ones aren't. They do this by reading the first two
bytes in a file to see if it's `MZ` or `#!`.
Unlike CMD.EXE, CreateProcess() doesn't care if an executable name ends
with .COM or .EXE. We now have the unbourne shell and bash working well
on Windows, so we don't need DOS anymore. Making this change will grant
us better performance, particularly for builds, because commandv() will
need to make fewer system calls. Path mangling magic still happens with
WinMain() and ntspawn() in order to do things like turn \ into / so the
interop works well at the borders. But all the code in libraries, which
did that, has been removed. It's not possible for libraries to abstract
the differences between paths.