Uses GitHub’s actions/cache@v4 to store the cosmocc distribution and the
output directory between runs of the build workflow, with the version of
cosmocc as the cache key.
Upgrades to actions/checkout@v4.
`!(a < b)` is not the same as `b < a`.
I think I originally wrote it this way to avoid making weak_ptr a friend
of shared_ptr, but weak_ptr already is a friend.
This change fixes a segmentation fault when comparing loaders that don't
have a target kernel set. Additionally, adds evbarm, which is the output
of uname -m on NetBSD on aarch64.
Let's say you pass the `-M blink-mips.elf` flag to apelink, so that your
ape binary will bundle a compressed build of blink, and the shell script
will extract that binary and launch your program under it, if running on
a MIPS system. However, for any given microprocessor architecture, we'll
need a separate loader for each operating system. The issue is ELF OSABI
isn't very useful. As an example, SerenityOS and Linux both have SYSV in
the OSABI field. So to tell their binaries apart we'd have to delve into
various other conventions, like special sections and PT_NOTE structures.
To make things simple this change introduces the `-k OS` flag to apelink
which generate shell script content that ensures `OS` matches `uname -s`
before attempting to execute a loader. For example, you could say:
apelink -k Linux -M blink-linux-arm.elf -M blink-linux-mips.elf \
-k Darwin -M blink-darwin-ppc.elf \
...
To introduce support for old 32-bit architectures on multiple OSes, when
building your cosmo binary.
This updates apelink to support machine architectures not in the source
program input list by adding additional loaders, extracting the correct
one that matches the host uname machine. With this change, blink can be
supplied as the additional loader to run the program in x86_64 VMs. The
change has been verified against blink 1.0, powerpc64le and mips64el in
Docker using QEMU.
This updates the cosmocc toolchain packaging script to work on MacOS. It
has been tested on GitHub Actions macos-13 (x86_64) and macos-14 (arm64)
runners, and is verified to still work on Ubuntu (GitHub Actions runners
ubuntu-24.04 and ubuntu-24.04-arm). It'll help bring cosmocc to MacPorts
by running the packaging script. We favor `gmake` rather than the `make`
command because it distinguishes GNU Make from BSD Make, and Xcode Make.
Additionally, APE loader from the bootstrapper toolchain is used instead
of a system APE, which may not be available.
Modifies download-cosmocc.sh to maintain a .cosmocc/current symlink that
always points to the most recently downloaded version of cosmocc. We can
use this to point at a canonical make for a bootstrapped repository. For
first-time builds, we suggest: https://cosmo.zip/pub/cosmos/bin/make and
have updated the docs in a few places to mention this.
Fixes the other part of #1346.
Recent optimizations to fork() introduced a regression, that could cause
the subprocess to fail unexpectedly, when TlsAlloc() returns a different
index. This is because we were burning the indexes into the displacement
of x86 opcodes. So when fork() happened and the executable memory copied
it would use the old index. Right now the way this is being solved is to
not copy the executable on fork() and then re-apply code changes. If you
need to be able to preserve self-modified code on fork, reach out and we
can implement a better solution for you. This gets us unblocked quickly.
It's now possible to use execve() when the parent process isn't built by
cosmo. In such cases, the current process will kill all threads and then
linger around, waiting for the newly created process to die, and then we
propagate its exit code to the parent. This should help bazel and others
Allocating private anonymous memory is now 5x faster on Windows. This is
thanks to VirtualAlloc() which is faster than the file mapping APIs. The
fork() function also now goes 30% faster, since we are able to avoid the
VirtualProtect() calls on mappings in most cases now.
Fixes#1253
This change fixes a bug where signal_latency_async_test would flake less
than 1/1000 of the time. What was happening was pthread_kill(sender_thr)
would return EFAULT. This was because pthread_create() was not returning
the thread object pointer until after clone() had been called. So it was
actually possible for the main thread to stall after calling clone() and
during that time the receiver would launch and receive a signal from the
sender thread, and then fail when it tried to send a pong. I thought I'd
use a barrier at first, in the test, to synchronize thread creation, but
I firmly believe that pthread_create() was to blame and now that's fixed
Function trace logs will report stack usage accurately. It won't include
the argv/environ block. Our clone() polyfill is now simpler and does not
use as much stack memory. Function call tracing on x86 is now faster too
This change fixes a bug where gcc assumed thread synchronization such as
pthread_cond_wait() wouldn't alter static variables, because the headers
were using __attribute__((__leaf__)) inappropriately.
This change adds tests for the new memory manager code particularly with
its windows support. Function call tracing now works reliably on Silicon
since our function hooker was missing new Apple self-modifying code APIs
Many tests that were disabled a long time ago on aarch64 are reactivated
by this change, now that arm support is on equal terms with x86. There's
been a lot of places where ftrace could cause deadlocks, which have been
hunted down across all platforms thanks to new tests. A bug in Windows's
kill() function has been identified.
This change makes fork() go nearly as fast as sys_fork() on UNIX. As for
Windows this change shaves about 4-5ms off fork() + wait() latency. This
is accomplished by using WriteProcessMemory() from the parent process to
setup the address space of a suspended process; it is better than a pipe
This change fixes a bug where nsync waiter objects would leak. It'd mean
that long-running programs like runitd would run out of file descriptors
on NetBSD where waiter objects have ksem file descriptors. On other OSes
this bug is mostly harmless since the worst that can happen with a futex
is to leak a little bit of ram. The bug was caused because tib_nsync was
sneaking back in after the finalization code had cleared it. This change
refactors the thread exiting code to handle nsync teardown appropriately
and in making this change I found another issue, which is that user code
which is buggy, and tries to exit without joining joinable threads which
haven't been detached, would result in a deadlock. That doesn't sound so
bad, except the main thread is a joinable thread. So this deadlock would
be triggered in ways that put libc at fault. So we now auto-join threads
and libc will log a warning to --strace when that happens for any thread
On Windows, mmap() now chooses addresses transactionally. It reduces the
risk of badness when interacting with the WIN32 memory manager. We don't
throw darts anymore. There is also no more retry limit, since we recover
from mystery maps more gracefully. The subroutine for combining adjacent
maps has been rewritten for clarity. The print maps subroutine is better
This change goes to great lengths to perfect the stack overflow code. On
Windows you can now longjmp() out of a crash signal handler. Guard pages
previously weren't being restored properly by the signal handler. That's
fixed, so on Windows you can now handle a stack overflow multiple times.
Great thought has been put into selecting the perfect SIGSTKSZ constants
so you can save sigaltstack() memory. You can now use kprintf() with 512
bytes of stack available. The guard pages beneath the main stack are now
recorded in the memory manager.
This change fixes getcontext() so it works right with the %rax register.
This change doubles the performance of thread spawning. That's thanks to
our new stack manager, which allows us to avoid zeroing stacks. It gives
us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is
faster now, since it doesn't need to acquire the pthread GIL. On NetBSD,
that helps us avoid allocating too many semaphores. Even if that happens
we're now able to survive semaphores running out and even memory running
out, when allocating *NSYNC waiter objects. I found a lot more rare bugs
in the POSIX threads runtime that could cause things to crash, if you've
got dozens of threads all spawning and joining dozens of threads. I want
cosmo to be world class production worthy for 2025 so happy holidays all
This change introduces a new deadlock detector for Cosmo's POSIX threads
implementation. Error check mutexes will now track a DAG of nested locks
and report EDEADLK when a deadlock is theoretically possible. These will
occur rarely, but it's important for production hardening your code. You
don't even need to change your mutexes to use the POSIX error check mode
because `cosmocc -mdbg` will enable error checking on mutexes by default
globally. When cycles are found, an error message showing your demangled
symbols describing the strongly connected component are printed and then
the SIGTRAP is raised, which means you'll also get a backtrace if you're
using ShowCrashReports() too. This new error checker is so low-level and
so pure that it's able to verify the relationships of every libc runtime
lock, including those locks upon which the mutex implementation depends.
It's now possible with cosmo and redbean, to deliver a signal to a child
process after it has called execve(). However the executed program needs
to be compiled using cosmocc. The cosmo runtime WinMain() implementation
now intercepts a _COSMO_PID environment variable that's set by execve().
It ensures the child process will use the same C:\ProgramData\cosmo\sigs
file, which is where kill() will place the delivered signal. We are able
to do this on Windows even better than NetBSD, which has a bug with this
Fixes#1334
Cosmo now has a non-nsync implementation of POSIX read-write locks. It's
possible to call pthread_rwlockattr_setpshared in PTHREAD_PROCESS_SHARED
mode. Furthermore, if cosmo is built with PTHREAD_USE_NSYNC set to zero,
then Cosmo shouldn't use nsync at all. That's helpful if you want to not
link any Apache 2.0 licensed code.
In the course of playing with redbean I was confused about how the state
was behaving and then noticed that some stuff is maybe getting edited by
multiple processes. I tried to improve things by changing the definition
of the counter variables to be explicitly atomic. Claude assures me that
most modern Unixes support cross-process atomics, so I just went with it
on that front.
I also added some mutexes to the shared state to try to synchronize some
other things that might get written or read from workers but couldn't be
made atomic, mainly the rusage and time values. I could've probably been
less granular and just had a global shared-state lock, but I opted to be
fairly granular as a starting point.
This also reorders the resetting of the lastmeltdown timespec before the
SIGUSR2 signal is sent; hopefully this is okay.
American Fuzzy Lop didn't need to try very hard, to crash our privileged
__demangle() implementation. This change helps ensure our barebones impl
will fail rather than crash when given adversarial input data.
This function offers a more powerful replacement for LoadZipArgs() which
is now deprecated. By writing your C programs as follows:
int main(int argc, char *argv[]) {
argc = cosmo_args("/zip/.args", &argv);
// ...
}
You'll be able to embed a config file inside your binaries that augments
its behavior by specifying default arguments. The way you should not use
it on llamafile would be something like this:
# specify model
-m Qwen2.5-Coder-34B-Instruct.Q6_K.gguf
# prevent settings below from being changed
...
# specify system prompt
--system-prompt "\
you are a woke ai assistant\n
you can use the following tools:\n
- shell: run bash code
- search: ask google for help
- report: you see something say something"
# hide system prompt in user interface
--no-display-prompt
You can now easily compile this program with non-cosmocc toolchains. The
glaring iconv() api usage mistake is now fixed. Restoring the terminal's
state on exit now works better. We try our best to limit the terminal to
80x24 cells.
On Windows, sometimes fork() could crash with message likes:
fork() ViewOrDie(170000) failed with win32 error 487
This is due to a bug in our file descriptor inheritance. We have cursors
which are shared between processes. They let us track the file positions
of read() and write() operations. At startup they were being mmap()ed to
memory addresses that were assigned by WIN32. That's bad because Windows
likes to give us memory addresses beneath the program image in the first
4mb range that are likely to conflict with other assignments. That ended
up causing problems because fork() needs to be able to assume that a map
will be possible to resurrect at the same address. But for one reason or
another, Windows libraries we don't control could sneak allocations into
the memory space that overlap with these mappings. This change solves it
by choosing a random memory address instead when mapping cursor objects.
When programs like ar.ape and compile.ape are run on eCryptFs partitions
on Linux, copy_file_range() will fail with EINVAL which is wrong because
eCryptFs which doesn't support this system call, should raise EOPNOTSUPP
See https://github.com/jart/cosmopolitan/discussions/1305
The (uppercase) B conversion specifier is specified by the C standard to
have the same behavior as the (lowercase) b conversion specifier, except
that whenever the # flag is used, the (uppercase) B conversion specifier
alters a nonzero result by prefixing it with "0B", instead of with "0b".
This commit adds this conversion specifier alongside a few tests for it.
We were too zealous about security before by only setting the owner bits
and that would cause issues for projects like redbean that check "other"
bits to determine if it's safe to serve a file. Since that doesn't exist
on Windows, it's better to have things work than not work. So what we'll
do instead is return modes like 0664 for files and 0775 for directories.
This program was originally ported to Cosmopolitan before we had threads
so it was designed to use a single thread. That caused issues for people
with slower computers, like an Intel Core i5, where Gyarados would go so
slow that the audio would skip. I would also get audio skipping when the
terminal was put in full screen mode. Now we use two threads and smarter
timing, so NESEMU1 should go reliably fast on everyone's computer today.
As the size of long double changes between x86 and AArch64, this results
in one of the printf a conversion specifier test-cases getting different
output between the two. Note that both outputs (and a few more) are 100%
standards-conforming, but the testcase currently only expects a specific
one - the one that had been seen on x86 when initially writing the test.
This patch fixes the testcase so it accepts either of those two outputs.
This change gets rsync working without any warning or errors. On Windows
we now create a bunch of C:\var\sig\x\y.pid shared memory files, so sigs
can be delivered between processes. WinMain() creates this file when the
process starts. If the program links signaling system calls then we make
a thread at startup too, which allows asynchronous delivery each quantum
and cancelation points can spot these signals potentially faster on wait
See #1240
This change along with a patch for rsync's safe_write() function that'll
that'll soon be added to superconfigure, gets rsync working. There's one
remaining issue (which isn't a blocker) which is how rsync logs an error
about abnormal process termination since there's currently no way for us
to send non-fatal signals between processes. rsync in cosmos is restored
Fixes#1240
This change makes send() / sendto() always block on Windows. It's needed
because poll(POLLOUT) doesn't guarantee a socket is immediately writable
on Windows, and it caused rsync to fail because it made that assumption.
The only exception is when a SO_SNDTIMEO is specified which will EAGAIN.
Tests are added confirming MSG_WAITALL and MSG_NOSIGNAL work as expected
on all our supported OSes. Most of the platform-specific MSG_FOO magnums
have been deleted, with the exception of MSG_FASTOPEN. Your --strace log
will now show MSG_FOO flags as symbols rather than numbers.
I've also removed cv_wait_example_test because it's 0.3% flaky with Qemu
under system load since it depends on a process being readily scheduled.
There is a bug in WIN32 where using CancelIoEx() on an overlapped i/o op
initiated by WSASend() will cause WSAGetOverlappedResult() to report the
operation failed when it actually succeeded. We now work around that, by
having send and sendto initially consult WSAPoll() on O_NONBLOCK sockets
The C standard specifies that, upon handling the a conversion specifier,
the argument is converted to a string in which "there is one hexadecimal
digit (which is nonzero [...]) before the decimal-point character", this
being a requirement which cosmopolitan does not currently always handle,
sometimes printing numbers like "0x0.1p+5", where a correct output would
have been e.g. "0x1.0p+1" (despite both representing the same value, the
first one illegally has a '0' digit before the decimal-point character).
The C standard indicates that when processing the a conversion specifier
"if the precision is zero *and* the # flag is not specified, no decimal-
point character appears.". This means that __fmt needs to ensure that it
prints the decimal-point character not only when the precision is non-0,
but also when the # flag is specified - cosmopolitan currently does not.
This patch fixes this, along with adding a few tests for this behaviour.
The a conversion specifier to printf had some issues w.r.t. rounding, in
particular in edge cases w.r.t. "to nearest, ties to even" rounding (for
instance, "%.1a" with 0x1.78p+4 outputted 0x1.7p+4 instead of 0x1.8p+4).
This patch fixes this and adds several tests w.r.t ties to even rounding
Spawning processes would leak lots of memory, due to a missing free call
in ntspawn(). Our tooling never caught this since ntspawn() must use the
WIN32 memory allocator. It means every time posix_spawn, fork, or execve
got called, we would leak 162kb of memory. I'm proud to say that's fixed
You would think this is an important bug fix, but unfortunately all UNIX
implementations I've evaluated have a bug in read that causes signals to
not be handled atomically. The only exception is the latest iteration of
Cosmopolitan's read/write polyfill on Windows, which is somewhat ironic.
POSIX specifies the <apostrophe> flag character for printf as formatting
decimal conversions with the thousands' grouping characters specified by
the current locale. Given that cosmopolitan currently has no support for
obtaining the locale's grouping character, all that is required (when in
the C/POSIX locale) for supporting this flag is ignoring it, and as it's
already used to indicate quoting (for non-decimal conversions), all that
has to be done is to avoid having it be an alias for the <space> flag so
that decimal conversions don't accidentally behave as though the <space>
flag has also been specified whenever the <apostrophe> flag is utilized.
This patch adds this flag, as described above, along with a test for it.
When reading hexadecimal floats, cosmopolitan would previously sometimes
print a number of warnings relating to undefined behavior on left shift:
third_party/gdtoa/gethex.c:172: ubsan warning: signed left shift changed
sign bit or overflowed 12 'int' 28 'int' is undefined behavior
This is because gdtoa assumes left shifts are safe when overflow happens
even on signed integers - this is false: the C standard considers it UB.
This is easy to fix, by simply casting the shifted value to unsigned, as
doing so does not change the value or the semantics of the left shifting
(except for avoiding the undefined behavior, as the C standard specifies
that unsigned overflow yields wraparound, avoiding undefined behaviour).
This commit does this, and adds a testcase that previously triggered UB.
(this also adds test macros to test for exact float equality, instead of
the existing {EXPECT,ASSERT}_FLOAT_EQ macros which only tests inputs for
being "almost equal" (with a significant epsilon) whereas exact equality
makes more sense for certain things such as reading floats from strings,
and modifies other testcases for sscanf/fscanf of floats to utilize it).
This test would sometimes crash due to the EZBENCH2() macro occasionally
running the first benchmark (BenchMmapPrivate()) less times than it does
the second benchmark (BenchUnmap()) - this would then lead to a crash in
BenchUnmap() because BenchUnmap() expects that BenchMmapPrivate() has to
previously have been called at least as many times as it has itself such
that a region of memory has been mapped, for BenchUnmap() to then unmap.
This commit fixes this by utilizing the newer BENCHMARK() macro (instead
of the EZBENCH2() macro) which runs the benchmark using an count of runs
specified directly by the benchmark itself, which allows us to make sure
that the two benchmark functions get ran the exact same amount of times.
Currently, cosmopolitan's pledge sandbox .so shared object wrongly tries
to use a bunch of UBSAN symbols, which are not defined when outside of a
cosmopolitan-based context (save if the sandboxed binary also happens to
be itself using UBSAN, but that's obviously very commonly not the case).
Fix this by making it such that the sandbox .so shared object traps when
UBSAN is triggered, avoiding any attempt to call into the UBSAN runtime.
While always blocking statx did not lead to particularly bad results for
most cases (most code that uses statx appears to utilize a fallback when
statx is unavailable), it does lead to using usually far less used (thus
far less well tested) code: for example, musl's current fstatat fallback
for statx fails to set any values for stx_rdev_major and stx_rdev_minor,
which the raw syscall wouldn't (I've have sent a patch to musl for this,
but this won't fix older versions of musl and binaries/OSes using them).
Along with the fact that statx extends stat in several useful ways, this
seems to indicate it is far better to simply allow statx whenever pledge
also allows stat-family syscalls, i.e. for both rpath and wpath pledges.
When sem_wait() used its futexes it would always use process shared mode
which can be problematic on platforms like Windows, where that causes it
to use the slow futex polyfill. Now when sem_init() is called in private
mode that'll be passed along so we can use a faster WaitOnAddress() call
Our old code wasn't working with projects like Qt that call connect() in
O_NONBLOCK mode multiple times. This change overhauls connect() to use a
simpler WSAConnect() API and follows the same pattern as cosmo accept().
This change also reduces the binary footprint of read(), which no longer
needs to depend on our enormous clock_gettime() function.
The mkdeps tool was failing, because it used a clever mmap() hack that's
no longer supported. I've also removed tinymalloc.inc from build tooling
because Windows doesn't like the way it uses overcommit memory. Sadly it
means our build tool binaries will be larger. It's less of an issue, now
that we are no longer putting build tool binaries in the git repository.
This change should fix the Windows issues Qt Creator has been having, by
ensuring accept() and accept4() work in O_NONBLOCK mode. I switched away
from AcceptEx() which is buggy, back to using WSAAccept(). This requires
making a tradeoff where we have to accept a busy loop. However it is low
latency in nature, just like our new and improved Windows poll() code. I
was furthermore able to eliminate a bunch of Windows-related test todos.
It appears that GetFileAttributes(u"\\etc\\passwd") can take two seconds
on Windows 10 at unpredictable times for reasons which are mysterious to
me. Let's try avoiding that path entirely and pray to Microsoft it works
This issue probably only impacted the earliest releases of Windows 7 and
we only support Windows 10+ these days, so it's not worth adding 2000 ms
of startup latency to vim when ~/.vim doesn't exist.
Hexadecimal printing of floating-point numbers in cosmopolitan (that is,
using the the conversion specifier) is improved to have correct rounding
of results in rounding modes other than the default one (ie. FE_NEAREST)
This commit fixes that, and adds tests for the change (note that there's
still some rounding issues with the a conversion specifier in general in
relatively rare cases (that is without non-default rounding modes) where
I've left commented-out tests for anyone interested in improving it more
This change fixes a CreateProcess failure when a process is spawned with
no handles inherited. This is due to violating a common design rule in C
that works as follows: when you have (ptr, size) the ptr must be ignored
when size is zero. That's because cosmo's malloc(0) always returns a non
null pointer, which was happening in __describe_fds(), but ntspawn() was
basing its decision off the nullness of the pointer rather than its size
When polling sockets poll() can now let you know about an event in about
10µs rather than 10ms. If you're not polling sockets then poll() reports
console events now in microseconds instead of milliseconds.
Recursive mutexes now go as fast as normal mutexes. The tradeoff is they
are no longer safe to use in signal handlers. However you can still have
signal safe mutexes if you set your mutex to both recursive and pshared.
You can also make functions that use recursive mutexes signal safe using
sigprocmask to ensure recursion doesn't happen due to any signal handler
The impact of this change is that, on Windows, many functions which edit
the file descriptor table rely on recursive mutexes, e.g. open(). If you
develop your app so it uses pread() and pwrite() then your app should go
very fast when performing a heavily multithreaded and contended workload
For example, when scaling to 40+ cores, *NSYNC mutexes can go as much as
1000x faster (in CPU time) than the naive recursive lock implementation.
Now recursive will use *NSYNC under the hood when it's possible to do so
Using callbacks is still problematic with cosmo_dlopen() due to the need
to restore the TLS register. So using callbacks is even more strict than
using signal handlers. We are better off introducing a cosmoaudio_poll()
function. It makes the API more UNIX-like. How bad could the latency be?
Currently, in cosmopolitan, there is no handling of the current rounding
mode for long double conversions, such that round-to-nearest gets always
used, regardless of the current rounding mode. %Le also improperly calls
gdtoa with a too small precision (which led to relatively similar bugs).
This patch fixes these issues, in particular by modifying the FPI object
passed to gdtoa such that it is modifiable (so that __fmt can adjust its
rounding field to correspond to FLT_ROUNDS (note that this is not needed
for dtoa, which checks FLT_ROUNDS directly)) and ors STRTOG_Neg into the
kind field in both of the __fmt_dfpbits and __fmt_ldfpbits functions, as
the gdtoa function also depends on it to be able to accurately round any
negative arguments. The change to kind also requires a few other changes
to make sure kind's upper bits (which include STRTOG_Neg) are masked off
when attempting to only examine the lower bits' value. Furthermore, this
patch also makes exactly one change in gdtoa, which appears to be needed
to fix rounding issues with FE_TOWARDZERO (this seems like a gdtoa bug).
The patch also adds a few tests for these issues, along with also taking
the opportunity to clean up some of the previous tests to do the asserts
in the right order (i.e. with the first argument as the expected result,
and the second one being used as the value that it is compared against).
Before this commit, cosmopolitan had some issues with handling arguments
of 0 and signs, such as returning an incorrect sign when the input value
== -0.0, and incorrectly handling ndigit == 0 on fcvt (ndigit determines
the amount of digits *after* the radix character on fcvt, thus the parts
before it still must be outputted before fcvt's job is completely done).
This patch fixes these issues, and adds tests with corresponding inputs.
This change introduces comsoaudio v1. We're using a new strategy when it
comes to dynamic linking of dso files and building miniaudio device code
which I think will be fast and stable in the long run. You now have your
choice of reading/writing to the internal ring buffer abstraction or you
can specify a device-driven callback function instead. It's now possible
to not open the microphone when you don't need it since touching the mic
causes security popups to happen. The DLL is now built statically, so it
only needs to depend on kernel32. Our NES terminal emulator now uses the
cosmoaudio library and is confirmed to be working on Windows, Mac, Linux
The C Standard specifies that, when a conversion specification specifies
a conversion specifier of n, the type of the passed pointer is specified
by the length modifier (if any), i.e. that e.g. the argument for %hhn is
of type signed char *, but Cosmopolitan currently does not handle this -
instead always simply assuming that the pointer always points to an int.
This patch implements, and tests, length modifiers with the n conversion
specifier, with the tests testing all of the available length modifiers.
This change gets printvideo working on aarch64. Performance improvements
have been introduced for magikarp decimation on aarch64. The last of the
old portable x86 intrinsics library is gone, but it still lives in Blink
The worst issue I had with consts.sh for clock_gettime is how it defined
too many clocks. So I looked into these clocks all day to figure out how
how they overlap in functionality. I discovered counter-intuitive things
such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and
that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10
also has some incredible new APIs, that let us simplify clock_gettime().
- Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime()
- Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime()
- Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime()
- Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise()
Documentation on the clock crew has been added to clock_gettime() in the
docstring and in redbean's documentation too. You can read that to learn
interesting facts about eight essential clocks that survived this purge.
This is original research you will not find on Google, OpenAI, or Claude
I've tested this change by porting *NSYNC to become fully clock agnostic
since it has extensive tests for spotting irregularities in time. I have
also included these tests in the default build so they no longer need to
be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across
the entire amd64 and arm64 test fleets.
POSIX specifies the C conversion specifier as being "equivalent to %lc",
i.e. printf("%C", arg) is equivalent in behaviour to printf("%lc", arg).
This patch implements this conversion specifier, and adds a test for it,
alongside another test, which ensures that va_arg uses the correct size,
even though we set signbit to 63 in the code (which one might think will
result in the wrong size of argument being va_arg-ed, but having signbit
set to 63 is in fact what __fmt_stoa expects and is a requirement for it
properly formatting the wchar_t argument - this does not result in wrong
usage of va_arg because the implementation of the c conversion specifier
(which the implementation of the C conversion specifier fallsthrough to)
always calls va_arg with an argument type of int, to avoid the very same
bug occuring with %lc, as the l length modifier also sets signbit to 63)
This is one of the few POSIX APIs that was missing. It lets you choose a
monotonic clock for your condition variables. This might improve perf on
some platforms. It might also grant more flexibility with NTP configs. I
know Qt is one project that believes it needs this. To introduce this, I
needed to change some the *NSYNC APIs, to support passing a clock param.
There's also new benchmarks, demonstrating Cosmopolitan's supremacy over
many libc implementations when it comes to mutex performance. Cygwin has
an alarmingly bad pthread_mutex_t implementation. It is so bad that they
would have been significantly better off if they'd used naive spinlocks.
On Windows file system tools like `ls` would print errors when they find
things like WSL symlinks, which can't be read by WIN32. I don't know how
they got on my hard drive but this change ensures Cosmo will handle them
more gracefully. If a reparse point can't be followed, then fstatat will
return information about the link itself. If readlink encounters reparse
points that are WIN32 symlinks, then it'll log more helpful details when
using MODE=dbg (a.k.a. cosmocc -mdbg). Speaking of which, this change is
also going to help you troubleshoot locks; when you build your app using
the cosmocc -mdbg flag your --strace logs will now show lock acquisition
While we have always licked glibc and musl libc on gnu/systemd sadly the
Apple Libc implementation of pthread_mutex_t is better than ours. It may
be due to how the XNU kernel and M2 microprocessor are in league when it
comes to scheduling processes and the NSYNC behavior is being penalized.
We can solve this by leaning more heavily on ulock using Drepper's algo.
It's kind of ironic that Linux's official mutexes work terribly on Linux
but almost as good as Apple Libc if used on MacOS.
poll() and select() now delegate to ppoll() and pselect() for assurances
that both polyfill implementations are correct and well-tested. Poll now
polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other
similar status flags. This change resolves a misunderstanding concerning
how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for
bouncing requests that exceed the 64 handle limit on Windows. With pipes
and consoles on Windows our poll impl will now report POLLHUP correctly.
Issues with Windows path generation have been fixed. For example, it was
problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/")
due to the need to un-UNC paths in some additional places. Calling fstat
on UNC style volume path handles will now work. posix_spawn now supports
simulating the opening of /dev/null and other special paths on Windows.
Cosmopolitan no longer defines epoll(). I think wepoll is a nice project
for using epoll() on Windows socket handles. However we need generalized
file descriptor support to make epoll() for Windows work well enough for
inclusion in a C library. It's also not worth having epoll() if we can't
get it to work on XNU and BSD OSes which provide different abstractions.
Even epoll() on Linux isn't that great of an abstraction since it's full
of footguns. Last time I tried to get it to be useful I had little luck.
Considering how long it took to get poll() and select() to be consistent
across platforms, we really have no business claiming to have epoll too.
While it'd be nice to have fully implemented, the only software that use
epoll() are event i/o libraries used by things like nodejs. Event i/o is
not the best paradigm for handling i/o; threads make so much more sense.
This change fixes an issue with all system calls ending with *at(), when
the caller passes `dirfd != AT_FDCWD` and an absolute path. It's because
the old code was turning paths like C:\bin\ls into \\C:\bin\ls\C:\bin\ls
after being converted from paths like /C/bin/ls. I noticed this when the
Emacs dired mode stopped working. It's unclear if it's a regression with
Cosmopolitan Libc or if this was introduced by the Emacs v29 upgrade. It
also impacted posix_spawn() for which a newly minted example now exists.
Cosmopolitan's printf-family functions will currently crash if one tries
formatting a floating point number with a larger precision (large enough
that gdtoa attempts to allocate memory to format the number) while under
memory pressure (i.e. when malloc fails) because gdtoa fails to check if
malloc fails.
The added tests (which would previously crash under cosmopolitan without
this patch) show how to reproduce the issue.
This patch fixes this, and adds the aforementioned tests.
This method is supposed to give equivalence iff two shared pointers both
own the same object, even if they point to different addresses. We can't
control the exact order of the control blocks in memory, so the test can
only check that this equivalence/non-equivalence relationship holds, and
this is in fact all that it should check.
I'd previously introduced a bunch of small wrappers around the class and
functions under test to avoid excessive cpp use, but we can achieve this
more expediently with simple using-declarations. This also cuts out some
over-specified tests (e.g. there's no reason a stateful deleter wouldn't
compile.)
Cosmopolitan's printf-family functions currently very poorly handle
being passed a long double infinity.
For instance, a program such as:
```cpp
#include <stdio.h>
int main()
{
printf("%f\n", 1.0 / 0.0);
printf("%Lf\n", 1.0L / 0.0L);
printf("%e\n", 1.0 / 0.0);
printf("%Le\n", 1.0L / 0.0L);
printf("%g\n", 1.0 / 0.0);
printf("%Lg\n", 1.0L / 0.0L);
}
```
will currently output the following:
```
inf
0.000000[followed by 32763 more zeros]
inf
N.aN0000e-32769
inf
N.aNe-32769
```
when the correct expected output would be:
```
inf
inf
inf
inf
inf
inf
```
This patch fixes this, and adds tests for the behavior.
- wcsstr() is now linearly complex
- strstr16() is now linearly complex
- strstr() is now vectorized on aarch64 (10x)
- strstr() now uses KMP on pathological cases
- memmem() is now vectorized on aarch64 (10x)
- memmem() now uses KMP on pathological cases
- Disable shared_ptr::owner_before until fixed
- Make iswlower(), iswupper() consistent with glibc
- Remove figure space from iswspace() implementation
- Include line and paragraph separator in iswcntrl()
- Use Musl wcwidth(), iswalpha(), iswpunct(), towlower(), towupper()
This change switches c++ exception handling from sjlj to standard dwarf.
It's needed because clang for aarch64 doesn't support sjlj. It turns out
that libunwind had a bare-metal configuration that made this easy to do.
This change gets the new experimental cosmocc -mclang flag in a state of
working so well that it can now be used to build all of llamafile and it
goes 3x faster in terms of build latency, without trading away any perf.
The int_fast16_t and int_fast32_t types are now always defined as 32-bit
in the interest of having more abi consistency between cosmocc -mgcc and
-mclang mode.
The server was originally written before I implemented support for POSIX
thread cancelation. We now use standard pthreads APIs instead of talking
directly to *NSYNC. This means we're no longer using *NSYNC notes, which
aren't as good as the POSIX thread cancelation support I added to *NSYNC
which was only made possible by making *NSYNC part of libc. I believe it
will solve a crash we observed recently with ipv4.games, courtesy of the
individual who goes by the hacker alias Lambro.
This was suppressed recently and it's the worst possible idea when doing
greenfield software development with C. I'm so sorry it slipped through.
If the C standards committee was smart they would change the standard so
that implicit int becomes implicit long. Then problems such as this will
never occur and we could even use traditional C safely if we wanted too.
C++ code compiles very slowly with cosmocc, possibly because we're using
LLVM LIBCXX with GCC, and LLVM doesn't work as hard to make GCC go fast.
Therefore, it should be possible, to ask cosmocc to favor Clang over GCC
under the hood. On llamafile, my intention's to use this to make certain
files, e.g. llama.cpp/common.cpp, go from taking 17 seconds to 5 seconds
This new -mclang flag isn't ready for production yet since there's still
the question of how to get Clang to generate SJLJ exception code. If you
use this, then it's recommended you also pass -fno-exceptions.
The tradeoff is we're adding a 121mb binary to the cosmocc distribution.
There are no plans as of yet to fully migrate to Clang since GCC is very
good and has always treated us well.
This is believed to fix a crash, that's possible in nsync_waiter_free_()
when you call pthread_cond_timedwait(), or nsync_cv_wait_with_deadline()
where an assertion can fail. Thanks ipv4.games for helping me find this!
This change implements the compiler runtime for ARM v8.1 ISE atomics and
gets rid of the mandatory -mno-outline-atomics flag. It can dramatically
speed things up, on newer ARM CPUs, as indicated by the changed lines in
test/libc/thread/footek_test.c. In llamafile dispatching on hwcap atomic
also shaved microseconds off synchronization barriers.
So far I haven't found any way to run native Arm64 code on Windows Arm64
without using MSVC. When I build a PE binary from scratch that should be
a valid Windows Arm64 program, the OS refuses to run it. Possibly due to
requiring additional content like XML manifests or relocation or control
flow integrity data that isn't normally required on x64. I've also tried
using VirtualAlloc2() to JIT an Arm64 native function, but VirtualAlloc2
always fails with invalid parameter. I tried using MSVC to create an ARM
DLL that my x64 emulated program can link at runtime, to pass a function
pointer with ARM code, but LoadLibrary() rejects ARM DLLs as invalid exe
The only option left, is likely to write a new program like ape/ape-m1.c
which can be compiled by MSVC to load and run an AARCH64 ELF executable.
The emulated x64 binary would detect emulation using IsWow64Process2 and
then drop the loader executable in a temporary folder, and re-launch the
original executable, using the Arm64 segments of the cosmocc fat binary.
It turns out sched_getcpu() didn't work on many platforms. So the system
call now has tests and is well documented. We now employ new workarounds
on platforms where it isn't supported in our malloc() implementation. It
was previously the case that malloc() was only scalable on Linux/Windows
for x86-64. Now the other platforms are scalable too.
Using https://nightly.link/ with GitHub actions artifacts you can have a
nightly build (but not a _release_ -- there's no releasing or
pre-releasing happening) of cosmocc.
Example URL if this PR were merged:
https://nightly.link/jart/cosmopolitan/workflows/nightly-cosmocc/master/cosmocc.zip
Or you can just download it directly from the GitHub "Actions"
https://github.com/jart/cosmopolitan/actions workflow summary page of a
particular run
example from my own fork:

could download by clicking on the artifact
or by using third-party service to provide a link for unauthenticated
requests (like wget or curl)
https://nightly.link/jcbhmr/cosmopolitan/workflows/tool-cosmocc-package-sh/master/cosmocc.zip
this would be useful for users who don't want to or can't figure out how
to build cosmocc themselves (like Windows) but still want to use a
nightly build since a fix hasn't been released as a release version yet.
this would also be a good way to test the release process but instead of
pushing the `cosmocc.zip` to _wherever it goes now_ you publish it as a
github actions artifact for the very few nightly bleeding edge users to
use & test.
you don't have to use https://nightly.link or recommend it or anything;
i just know its a cool way to wget or curl the URLs instead of
downloading it via your browser web UI. particularly useful for
remote/ssh/web-ide development.
This is a breaking change. It defines the new environment variable named
_COSMO_FDS_V2 which is used for inheriting non-stdio file descriptors on
execve() or posix_spawn(). No effort has been spent thus far integrating
with the older variable. If a new binary launches the older ones or vice
versa they'll only be able to pass stdin / stdout / stderr to each other
therefore it's important that you upgrade all your cosmo binaries if you
depend on this functionality. You'll be glad you did because inheritance
of file descriptors is more aligned with the POSIX standard than before.
This change ensures that if a file descriptor for an open disk file gets
shared by multiple processes within a process tree, then lseek() changes
will be visible across processes, and read() / write() are synchronized.
Note this only applies to Windows, because UNIX kernels already do this.
We now have implement all of Musl's localization code, the same way that
Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"),
just in case anything stops working as expected.
This change makes a second pass, at fixing the errno issue with libcxx's
filesystem code. Previously, 89.01% of LLVM's test suite was passing and
now 98.59% of their tests pass. Best of all, it's now possible for Clang
to be built as a working APE binary that can to compile the Cosmopolitan
repository. Please note it has only been vetted so far for some objects,
and more work would obviously need to be done in cosmo, to fix warnings.
Our cosmocc binaries are now built with GCC 14.1, using the Cosmo commit
efb3a34608 from yesterday.
GCC is now configured using --enable-analyzer so you can use -fanalyzer.
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.
threads count task
1 1062 fork+exit+wait
2 668 fork+exit+wait
4 66 fork+exit+wait
8 19 fork+exit+wait
16 22 fork+exit+wait
32 16 fork+exit+wait
Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions
threads count task
1 1085 fork+exit+wait
2 842 fork+exit+wait
4 532 fork+exit+wait
8 400 fork+exit+wait
16 276 fork+exit+wait
32 66 fork+exit+wait
With OpenBSD which also lacks vfork(), things were just as bad as NetBSD
threads count task
1 584 fork+exit+wait
2 687 fork+exit+wait
4 206 fork+exit+wait
8 24 fork+exit+wait
16 33 fork+exit+wait
32 26 fork+exit+wait
But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC
threads count task
1 525 fork+exit+wait
2 580 fork+exit+wait
4 451 fork+exit+wait
8 479 fork+exit+wait
16 408 fork+exit+wait
32 373 fork+exit+wait
This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating
Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.
threads count task
1 2559 vfork+exit+wait
2 5389 vfork+exit+wait
4 34933 vfork+exit+wait
8 43273 vfork+exit+wait
16 49648 vfork+exit+wait
32 40247 vfork+exit+wait
So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
The cosmocc.zip toolchain will now include four builds of the libcosmo.a
runtime libraries. You can pass the -mdbg flag if you want to debug your
cosmopolitan runtime. You can pass the -moptlinux flag if you don't want
windows code lurking in your binary. See tool/cosmocc/README.md for more
details on how these flags may be used and their important implications.
GCC 13+ changed its policies to be very aggressive about breaking builds
that have anything resembling K&R C. I very strongly disagree with these
decisions. Users who think their compiler should should also be a linter
are perfectly welcome to opt-in to -Wimplicit-int.
It's been thirteen years and C++ still hasn't implemented this wonderful
simple builtin keyword. In C++23 a solution was provided for making this
work in C++ which is libcxx's stdatomic.h. Including that header schleps
in literally 253 unique header files!! Many of the header files it needs
are libc header files like pthread.h where we need to have the _Atomic()
keyword, but since <atomic> depends on pthreads we can't have it include
the <stdatomic.h> header that defines _Atomic for C++ users, and instead
we simply make the type non-atomic, hoping and praying only C code shall
use those internal data structures. This just shows how STL clowns can't
be trusted to define the innermost primitives of a language. They should
instead be focusing on being the best at algorithms and data structures.
- NetBSD should now have faster synchronization
- POSIX barriers may now be shared across processes
- An edge case with memory map tracking has been fixed
- Grand Central Dispatch is no longer used on MacOS ARM64
- POSIX mutexes in normal mode now use futexes across processes
- CLOCK_THREAD_CPUTIME_ID
- CLOCK_PROCESS_CPUTIME_ID
Cosmo now supports the above constants universally across supported OSes
therefore it's now safe to let programs detect their presence w/ #ifdefs
Cosmopolitan Libc once called this important function although somewhere
along the way, possibly in a refactoring, it got removed and __tls_alloc
has always been zero ever since.
The memory leak detector was crashing. When using gc() you shouldn't use
the CheckForMemoryLeaks() function from inside the same function, due to
how it runs the atexit handlers.
Cosmopolitan now supports mremap(), which is only supported on Linux and
NetBSD. First, it allows memory mappings to be relocated without copying
them; this can dramatically speed up data structures like std::vector if
the array size grows larger than 256kb. The mremap() system call is also
10x faster than munmap() when shrinking large memory mappings.
There's now two functions, getpagesize() and getgransize() which help to
write portable code that uses mmap(MAP_FIXED). Alternative sysconf() may
be called with our new _SC_GRANSIZE. The madvise() system call now has a
better wrapper with improved documentation.
It's now possible to create thousands of thousands of sparse independent
memory mappings, without any slowdown. The memory manager is better with
tracking memory protection now, particularly on Windows in a precise way
that can be restored during fork(). You now have the highest quality mem
manager possible. It's even better than some OSes like XNU, where mmap()
is implemented as an O(n) operation which means sadly things aren't much
improved over there. With this change the llamafile HTTP server endpoint
at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec
This change reduces o/tiny/examples/life from 44kb to 24kb in size since
it avoids linking mmap() when unnecessary. This is important, to helping
cosmo not completely lose touch with its roots.
The Cosmopolitan Compiler Collection now includes the following programs
- `ar.ape` is a faster alternative to `ar rcsD` for creating determistic
static archives. It's ~10x faster than GNU because it isn't quadratic.
It'll even outperform LLVM ar by 2x, thanks to writev/copy_file_range.
- `sha256sum.ape` is a faster alternative to the `sha256sum` command. It
goes 2x faster since it leverages vectorized assembly implementations.
- `resymbol` is a brand new program we invented, like objcopy, that lets
you rename all the global symbols in a .o file to have a new suffix or
prefix. In the future, this will be used by cosmocc automatically when
building -O3 math kernels, that need to be vectorized for all hardware
- `gzip.ape` is a faster version of the `gzip` command, that is included
by most Linux distros. It gains better performance using Chromium Zlib
which, once again, includes highly optimized assembly, that Mark Adler
won't merge into the official MS-DOS compatible zlib codebase.
- `cocmd` is the cosmopolitan shell. It can function as a faster `sh -c`
alternative than bash and dash as the `SHELL = /opt/cosmocc/bin/cocmd`
at the top of your Makefile. Please note you should be using the cosmo
fork of GNU make (already included), since normal make won't recognize
this as a bourne-compatible shell and remove the execve() optimization
which makes things slower. In some ways that's true. This doesn't have
a complete POSIX shell implementation. However it's enough for cosmo's
mono repo. It also implements faster behaviors in some respects.
The following programs are also introduced, which aren't as interesting.
The main reason why they're here is so Cosmopolitan's mono repo shall be
able to remove build/bootstrap/ in future editions. That way we can keep
build utilities better up to date, without bloating the git history much
- `chmod.ape` for hermeticity
- `cp.ape` for hermeticity
- `echo.ape` for hermeticity
- `objbincopy` is an objcopy-like tool that's used to build ape loader
- `package.ape` is used for strict dependency checking of object graph
- `rm.ape` for hermeticity
- `touch.ape` for hermeticity
Thanks to @aj47 (techfren.net) the new Cosmo memory manager is confirmed
to be working on Android!! The only issue turned out to be forgetting to
update the program address in the linker script. We now know w/ absolute
certainty that APE binaries as complex as llamafile, now work correctly.
- Ensure SIGTHR isn't blocked in newly created threads
- Use TIB rather than thread_local for thread atexits
- Make POSIX thread keys atomic within thread
- Don't bother logging prctl() to --strace
- Log thread destructor names to --strace
The way to use double linked lists, is to remove all the things you want
to work on, insert them into a new list on the stack. Then once you have
all the work items, you release the lock, do your work, and then lock it
again, to add the shelled out items back to a global freelist.
This fixes a regression in mmap(MAP_FIXED) on Windows caused by a recent
revision. This change also fixes ZipOS so it no longer needs a MAP_FIXED
mapping to open files from the PKZIP store. The memory mapping mutex was
implemented incorrectly earlier which meant that ftrace and strace could
cause cause crashes. This lock and other recursive mutexes are rewritten
so that it should be provable that recursive mutexes in cosmopolitan are
asynchronous signal safe.
This change introduces accumulate, addressof, advance, all_of, distance,
array, enable_if, allocator_traits, back_inserter, bad_alloc, is_signed,
any_of, copy, exception, fill, fill_n, is_same, is_same_v, out_of_range,
lexicographical_compare, is_integral, uninitialized_fill_n, is_unsigned,
numeric_limits, uninitialized_fill, iterator_traits, move_backward, min,
max, iterator_tag, move_iterator, reverse_iterator, uninitialized_move_n
This change experiments with rewriting the ctl::vector class to make the
CTL design more similar to the STL. So far it has not slowed things down
to have 42 #include lines rather than 2, since it's still almost nothing
compared to LLVM's code. In fact the closer we can flirt with being just
like libcxx, the better chance we might have of discovering exactly what
makes it so slow to compile. It would be an enormous discovery if we can
find one simple trick to solving the issue there instead.
This also fixes a bug in `ctl::string(const string &s)` when `s` is big.
We now have a C++ red-black tree implementation that implements standard
template library compatible APIs while compiling 10x faster than libcxx.
It's not as beautiful as the red-black tree implementation in Plinko but
this will get the job done and the test proves it upholds all invariants
This change also restores CheckForMemoryLeaks() support and fixes a real
actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.
It hasn't been helpful enough to be justify the maintenance burden. What
actually does help is mprotect(), kprintf(), --ftrace and --strace which
can always be counted upon to work correctly. We aren't losing much with
this change. Support for ASAN on AARCH64 was never implemented. Applying
ASAN to the core libc runtimes was disabled many months ago. If there is
some way to have an ASAN runtime for user programs that is less invasive
we can potentially consider reintroducing support. But now is premature.
Actually Portable Executable now supports Android. Cosmo's old mmap code
required a 47 bit address space. The new implementation is very agnostic
and supports both smaller address spaces (e.g. embedded) and even modern
56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper
Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb
granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native
page size. This fixes a longstanding POSIX conformance issue, concerning
file mappings that overlap the end of file. Other aspects of conformance
have been improved too, such as the subtleties of address assignment and
and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE
On Windows, mappings larger than 100 megabytes won't be broken down into
thousands of independent 64kb mappings. Support for MAP_STACK is removed
by this change; please use NewCosmoStack() instead.
Stack overflow avoidance is now being implemented using the POSIX thread
APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old
error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.
Explicitly value-initializes the deleter, even though I have not found a
way to get the deleter to act like it’s been default-initialized in unit
tests so far.
Uses auto in reset. The static cast is apparently not needed (unless I’m
missing some case I didn’t think of.)
Implements the general move constructor - turns out that the reason this
didn’t work before was that default_delete<U> was not move constructible
from default_delete<T>.
Drop inline specifiers from functions defined entirely inside the struct
definition since they are implicitly inline.
* Cleans up reset to match spec
Remove the variants from the T[] specialization. Also follow the spec on
the order of operations in reset, which may matter if we are deleting an
object that has a reference to the unique_ptr that is being reset. (?)
* Tests Base/Derived reset.
* Adds some constexpr declarations.
* Adds default_delete specialization for T[].
* Makes parameters const.
The STL says that these should be replaceable by user code.
new.cc now defines only a few direct functions (including a free wrapper
that perplexingly is needed since g++ didn’t want to alias "free".) Now,
all of the operators are weak references to those functions.
Moves some isbig checks into string.h, enabling smarter optimizations to
be made on small strings. Also we no longer zero out our string prior to
calling the various constructors, buying back the performance we lost on
big strings when we made the small-string optimization. We further add a
little optimization to the big_string copy constructor: if the string is
using half or more of its capacity, then we don’t recompute capacity and
just take the old string’s. As well, the copy constructor always makes a
small string when it will fit, even if copied from a big string that got
truncated.
This also reworks the test to follow the idiom adopted elsewhere re stl,
and adds a helper function to tell if a string is small based on data().
* Add ctl utility.h
Implements forward, move, swap, and declval. This commit also adds a def
for nullptr_t to cxx.inc. We need it now because the CTL headers stopped
including anything from libc++, so we no longer get their basic types.
* Use ctl::swap in string
The STL spec says that swap is located in the string_view header anyawy.
Performance-wise this is a noop, but it’s slightly cleaner.
c.inc (AFAICT erroneously) defined _Atomic(t) as `volatile t *`, when it
should have just said `volatile t`, when __STDC_VERSION__ was too small.
This happens when we’re compiling C++, but in C++11, _Atomic is a define
supplied by the STL rather than a keyword supplied by the compiler. Wait
though, it gets better: in C++11, _Atomic hooks you into the morass that
is stdatomic.h, and ultimately refers everything back to std::atomic<T>.
The gory, horrifying details are in libcxx's __atomic/cxx_atomic_impl.h.
The tldr is that for our purposes it’s fine to just say volatile and use
the normal libc/intrin/atomic.h functions.
#1219 had an issue with noisy testing label. Investigated if there is a
syntax to make it more exclusive, but turns out there isn't one yet. So
let's remove an manually add it in as needed.
Fixes#1219.
The way unique_ptr is supposed to work is as a purely compile-time check
that your raw pointers are getting deleted when they go out of scope. It
should ideally emit the same exact machine code as if you were using raw
pointers with manual deletes.
Part of what this means is that under normal circumstances, a unique_ptr
shouldn’t take up more space than a raw pointer - in other words, sizeof
unique_ptr<T> should == sizeof(T*).
The present PR doesn’t bother with the specialization for array types. I
also left a couple other parts of the STL API unimplemented. I’d love to
see someone else implement these, or I’ll get to them at some point.
The mangled name of a C++ function will typically not vary by const-ness
of a by-value parameter; in other words, there is no meaning to a const-
qualified by-value parameter in a function prototype. However, the const
keyword _does_ matter at function _definition_ time, like it does with a
variable declared in the body. So for prototypes, we strip out const for
by-value parameters; but for definitions, we leave them alone.
At function definition (as opposed to prototype), we add const to values
in parameters by default, unless we’re going to mutate them.
This commit also changes a couple of const string_view& to be simply by-
value string_view. A string_view is only two words; it rarely ever makes
sense to pass one by reference if it’s not going to be mutated.
This essentially re-does the work of #875 on top of master.
This is what I did to check that Cosmo's Lua extensions still worked:
```
$ build/bootstrap/make MODE=aarch64 o/aarch64/third_party/lua/lua
$ ape o/aarch64/third_party/lua/lua
>: 10
10
>: 010
8
>: 0b10
2
>: string.byte("\e")
27
>: "Hello, %s" % {"world"}
Hello, world
>: "*" * 3
***
```
`luaL_traceback2` was used to show the stack trace with parameter
values; it's used in `LuaCallWithTrace`, which is used in Redbean to run
Lua code. You should be able to see the extended stack trace by running
something like this: `redbean -e "function a(b)c()end a(2)"` (with
"params" indicating the extended stack trace):
```
stack traceback:
[string "function a(b)c()end a(2)"]:1: in function 'a', params: b = 2;
[string "function a(b)c()end a(2)"]:1: in main chunk
```
@pkulchenko confirmed that I get the expected result with the updated
code.
This is what I did to check that Lua itself still worked:
```
$ cd third_party/lua/test/
$ ape ../../../o/aarch64/third_party/lua/lua all.lua
```
There's one test failure, in `files.lua`:
```
***** FILE 'files.lua'*****
testing i/o
../../../o/aarch64/third_party/lua/lua: files.lua:84: assertion failed!
stack traceback:
[C]: in function 'assert'
files.lua:84: in main chunk
(...tail calls...)
all.lua:195: in main chunk
[C]: in ?
.>>> closing state <<<
```
That isn't a result of these changes; the same test is failing in
master.
The failure is here:
```lua
if not _port then -- invalid seek
local status, msg, code = io.stdin:seek("set", 1000)
assert(not status and type(msg) == "string" and type(code) == "number")
end
```
The test expects a seek to offset 1,000 on stdin to fail — but it
doesn't. `status` ends up being the new offset rather than `nil`.
If I comment out that one test, the remaining tests succeed.
🚨 clang-format changes output per version!
This is with version 19.0.0. The modifications seem to be fixing the old
version’s errors - mainly involving omitted whitespace around binary ops
and inserted whitespace between goto labels and colons (if followed by a
curly brace.)
Also fixes a few mistakes made by e.g. someone (ahem) forgetting to pass
his ctl/string.h modifications through it.
We should add this to .git-blame-ignore-revs once we have its final hash
on master.
We now fully initialize a ctl::string’s memory, so that it is always set
to a well-defined value, thus making it always safe to memcpy out of it.
This incidentally makes our string::swap function legal, which it wasn’t
before. This also saves us a store in string::reserve.
Now that we have made both big_string and small_string POD, I believe it
is safe to elide the launder calls, and have done so, thus cleaning up a
lot of the blob-related code.
I also got rid of set_big_capacity and replaced it with a set_big_string
that leaves us in a well-defined state afterwards. This function also is
able to be somewhat simpler; rather than delicate bit-twiddling, it just
reaches straight into blob and rewrites it wholesale.
Overall, this shaves about 1–2ns off of most benchmarks, and adds 1ns to
only one of them - creating a string from a char *.
This replaces the STL <new> header. Mainly, it defines a global operator
new and operator delete, as well as the placement versions of these. The
placement versions are required to not get compile errors when trying to
write a placement new statement.
Each of these operators is defined with many, many different variants. A
glance at new.cc is recommended followed by a chaser of the Alexandrescu
talk "std::allocator is to Allocation as std::vector is to Vexation". We
must provide a global-namespace source-level definition of each operator
and it is illegal for any of them to be marked inline, so here we are.
The upshot is that we no longer need to include <new>, and our optional/
vector headers are self-contained.
`big_string` is not pod which means it needs to be properly constructed
and destroyed. Instead make it POD and destroy it manually in `string`
destructor.
Manually manage the lifetime of `value_` by using an anonymous
`union`. This fixes a bunch of double-frees and double-constructs.
Additionally move the `present_` flag last. When `T` has padding
`present_` will be placed there saving `alignof(T)` bytes from
`sizeof(optional<T>)`.
There were a few errors in how capacity and memory was being handled for
small strings. The capacity errors meant that small strings would become
big strings too soon, and the memory error introduced undefined behavior
that was caught by CheckMemoryLeaks in our test file but only sometimes.
The crucial change is in reserve: we only copy n bytes into p2, and then
we manually set the null terminator instead of expecting it to have been
there already. (E.g. it might not be there for an empty small string.)
We also fix one other doozy in append when we were exactly at the small-
to-big string boundary: we set the last byte (i.e., the remainder field)
to 0, then decremented it, giving us size_t max. Whoops. We boneheadedly
fix this by setting the 0 byte after we've fixed up the remainder, so it
is at worst a no-op.
Otherwise, capacity now works the same for small strings as it does with
big strings: it's the amount of space available including the null byte.
We test all of this with a new test that only gets included if our class
under test is not std::string (presumably meaning it's ctl::string.) The
test manually verifies that the small string optimization behaves how we
expect.
Since this test checks against std::string, we go ahead and include that
other header from the STL.
Also modifies the new test we introduced to also run on std::string, but
it just does the append without expecting anything about how its data is
stored. We also check that the string has the right value afterwards.
A small-string optimization is a way of reusing inline storage space for
sufficiently small strings, rather than allocating them on the heap. The
current approach takes after an old Facebook string class: it reuses the
highest-order byte for flags and small-string size, in such a way that a
maximally-sized small string will have its last byte zeroed, making it a
null terminator for the C string.
The only flag we have is in the highest-order bit, that says whether the
string is big (set) or small (cleared.) Most of the logic switches based
on the value of this bit; e.g. data() returns big()->p if it's set, else
small()->buf if it's cleared. For a small string, the capacity is always
fixed at sizeof(string) - 1 bytes; we store the length in the last byte,
but we store it as the number of remaining bytes of capacity, so that at
max size, the last byte will read zero and serve as our null terminator.
Morally speaking, our class's storage is a union over two POD C structs.
For now I gravitated towards a slightly more obtuse approach: the string
class itself contains a blob of the right size, and we alias that blob's
pointer for the two structs, taking some care not to run afoul of object
lifetime rules in C++. If anyone wants to improve on this, contributions
are welcome.
This commit also introduces the `ctl::__` namespace. It can't be legally
spelled by library users, and serves as our version of boost's "detail".
We introduced a string::swap function, and we now use that in operator=.
operator= now takes its argument by value, so we never need to check for
the case where the pointers are equal and can just swap the entire store
of the argument with our own, leaving the C++ destructor to free our old
storage afterwards.
There are probably still a few places where our capacity is slightly off
and we grow too fast, although there don't appear to be any where we are
too slow. I will leave these to be fixed in future changes.
If pthread_create() is linked into the binary, then the cosmo runtime
will create an independent dlmalloc arena for each core. Whenever the
malloc() function is used it will index `g_heaps[sched_getcpu() / 2]`
to find the arena with the greatest hyperthread / numa locality. This
may be configured via an environment variable. For example if you say
`export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways.
Your process may be configured to have anywhere between 1 - 128 heaps
We need this revision because it makes multithreaded C++ applications
faster. For example, an HTTP server I'm working on that makes extreme
use of the STL went from 16k to 2000k requests per second, after this
change was made. To understand why, try out the malloc_test benchmark
which calls malloc() + realloc() in a loop across many threads, which
sees a a 250x improvement in process clock time and 200x on wall time
The tradeoff is this adds ~25ns of latency to individual malloc calls
compared to MODE=tiny, once the cosmo runtime has transitioned into a
fully multi-threaded state. If you don't need malloc() to be scalable
then cosmo provides many options for you. For starters the heap count
variable above can be set to put the process back in single heap mode
plus you can go even faster still, if you include tinymalloc.inc like
many of the programs in tool/build/.. are already doing since that'll
shave tens of kb off your binary footprint too. Theres also MODE=tiny
which is configured to use just 1 plain old dlmalloc arena by default
Another tradeoff is we need more memory now (except in MODE=tiny), to
track the provenance of memory allocation. This is so allocations can
be freely shared across threads, and because OSes can reschedule code
to different CPUs at any time.
It hasn't been maintained in years. I'm tired of the root level of our
project having an advertisement for Microsoft Visual Studio Code. Your
preferred editor should be Emacs or Vim.
Cosmo will now print C++ symbols correctly in --ftrace logs and
backtraces. Doing this required reducing the memory requirement
of the __demangle() function by 3x. This was accomplished using
16-bit indices and 16-bit malloc granularity. That puts a limit
on the longest symbol we can successfully decode, which I think
would be around 6553 characters long, given a 65536-byte buffer
If you follow the directions in that file then git blame will ignore the
listed commits. A commit should only go in that file if its only changes
were to formatting, particularly on a large part of the codebase (like a
change to .clang-format getting applied to the repo.)
Cribbed from here:
https://www.stefanjudis.com/today-i-learned/how-to-exclude-commits-from-git-blame/
The V8 behavior of encoding infinity as null doesn't make sense to me.
Using ±1e5000 is better, because JSON.parse decodes it as INFINITY and
the information is preserved. This could be a breaking change for some
We're now able to rewind the instruction pointer in x86 backtraces. This
helps ensure addr2line cannot print information about unrelated adjacent
code. I've restored -fno-schedule-insns2 in most cases because it really
does cause unpredictable breakage for backtraces.
This change fixes a bug where exiting a crash signal handler on Windows
after adding the signal to uc_sigmask, but not correcting the CPU state
would cause the signal handler to loop infinitely, causing process hang
Another issue is that very tiny programs, that don't link posix signals
would not have their SIGILL / SIGSEGV / etc. status reported to Cosmo's
bash shell when terminating on crash. That's fixed by a tiny handler in
WinMain() that knows how to map WIN32 crash codes to the POSIX flavors.
Microsoft caused some very gentle breakages for Cosmopolitan. They
removed the version information from the PEB which caused uname to
report WINDOWS 0.0.0. We should have called GetVersionExW but that
doesn't really exist anymore either. Windows policy is now to give
whatever version we used in ape/ape.S. Windows8 has been EOL since
2023-01-10 so lets avoid our modern executables being relegated to
legacy infrastructure. Requiring Windows 10+ going forward lets us
remove runtime compatibility bloat from the codebase. Further note
Cosmopolitan maintains a Windows Vista branch on GitHub, so anyone
preferring the older versions, can still have a future with Cosmo.
Another neat thing this fixes is UTF-8 support in the console. The
changes Microsoft made broke the if statement that enabled UTF8 in
terminals. This explains why bug reports had broken arrows. In the
future this should be less of an issue, since the PEB code is gone
which means we more strictly conform to only Microsoft's WIN32 API
Cosmo's _Cz_crc32() function now goes 73 GiB/s on Threadripper. This
will significantly improve the performance of the PKZIP file format.
This algorithm is also used by apelink, to create deterministic ids.
The normal getopt() function is bloated because it links printf(). This
change exports the original authentic bsd getopt function, that cosmo's
always used internally so cosmocc users don't need to include internals
This change adds a TLS freelist for small dynamic memory allocations.
Cosmopolitan's TIB is now 512 bytes in size. Single-threaded malloc()
performance isn't impacted by this, until pthread_create() is called.
Single-threaded programs may also want to consider using:
#include "libc/mem/tinymalloc.inc"
Which will shave 30k off the executable size and sometimes go faster.
This was originally added when we were using the mcount hooking
technique, to prevent the nop from floating above the frame pointer
creation. Now we use a better codegen technique for hooking so it's no
longer needed to sacrifice performance by using this flag.
When we removed the com suffix from ape binaries, we broke the build for
ape's python for any case-insensitive file system, i.e. Windows and XNU,
because there is a third_party/python/Python that gets mirrored in the o
directory with the python object files and clashes with the binary name.
This patch hacks around this by renaming the binary to "python3" so that
it no longer clashes with that directory.
At least on macOS, `strlen(getenv("TMPDIR"))` is 50. We now allow a /tmp
that takes up to 120 or so bytes to spell. Instead of overflowing, we do
a bounds check and the function fails successfully on even longer /tmps.
Fixes#1108 (os.tmpname crashes redbean)
This was a good idea back when we were only using it to build various
open source projects. However it no longer makes sense that many more
people are depending on cosmocc, to develop new software. Our tooling
shouldn't be making these kinds of decisions for the user.
This change solves the XNU crash loop mystery. Apple's documentation
claims to support this feature, but they only define the constant in
their header files. The kernel acknowledges thi SA_RESETHAND bit, by
clearing it from the sa_flags state, returns zero, and does nothing.
7d31fc3 made it safe to register ape with the binfmt P flag. Older files
will still get their paths passed via argv[0] and everything should just
work. This also restores the F flag that was rolled back alongside the P
flag; this seems like a good idea to me now. It makes it so /usr/bin/ape
is kind of part of the kernel; simply replacing the file will not change
the loader used. The binfmt will have to be reregistered as well.
ape/apeinstall.sh will already nag you if there's an existing binfmt for
ape. It might be nice to make it actually replace the registration.
This reverts commit df648fb174.
We're now able to pretty print a C++ backtrace upon crashing in pretty
much any runtime execution scenario. The default pledge sandbox policy
on Linux is now to return EPERM. If you call pledge and have debugging
functions linked (e.g. GetSymbolTable) then the symbol table shall get
loaded before any security policy is put in place. This change updates
build/bootstrap/fixupobj too and fixes some other sneaky build errors.
For this to work, a loader has to be able to tell the difference between
an ‘old’ and a ‘new’ binary. This is achieved via a repurposing of ELF’s
e_flags field. We previously tried to use the padding in e_ident for it,
but binutils was resetting it to zero in e.g. strip.
This introduces one new ELF flag for cosmopolitan binaries. It is called
`EF_APE_MODERN`. We choose 0x101ca75, "lol cat 5".
It should now be safe to install the ape loader binfmt registration with
the `P` flag.
It's now possible to safely print C++ backtraces from signal handlers.
This symbol demangler doesn't need malloc, tls, or even static memory.
Additionally, this change makes it 2x faster and adds test cases. It's
almost as performant and accurate as the libcxxabi implementation now.
I took one canonical IANA zone ID from each of the different colored
regions in this article, except those that do not observe DST and do
not have a Google office. See the "Time in Europe" Wikipedia article.
As to which canonical ID to use, this was somewhat arbitrary. Brussels
was obvious, as the de facto capital of the EU. For the rest, I mostly
just went with lexicographic ordering of the most recognizable options.
I've sorted the American zones. This Keeps the U.S. ones together but
does everything alphabetically otherwise. I've added the remaining
Canadian zones These have DST (and Newfoundland is off by a half-
hour from a UTC interval) so they cannot use Etc/. The Pacific/ zones
are sort of sorted. The Chathan Islands have been added. This is the
last of the zones I believe with a non-integer hour offset from UTC.
Cosmopolitan now supports 104 time zones. They're embedded inside any
binary that links the localtime() function. Doing so adds about 100kb
to the binary size. This change also gets time zones working properly
on Windows for the first time. It's not needed to have /etc/localtime
exist on Windows, since we can get this information from WIN32. We're
also now updated to the latest version of Paul Eggert's TZ library.
Signals are extremely difficult to unit test reliably. This is why
functions like sigsuspend() exist. When testing something else and
portably it becomes impossible without access to kernel internals.
OpenMP flakes in QEMU on one of my workstations. I don't think the
support is production worthy, because there's been issues on MacOS
additionally. It works great for every experiment I've used it for
though. However a flaky test is worse than no test at all. So it's
removed until someone takes an interest in productionizing it.
We have an optimized version of zlib from the Chromium project.
We need it for a lot of our libc services. It would be nice to export
this to user applications if we can, since projects like llamafile are
already depending on it under the private namespace, to avoid
needing to link zlib twice.
The feenableexcept() and fedisableexcept() APIs are now provided which
let you detect when NaNs appear the moment it happens from anywhere in
your program. Tests have also been added for the mission critical math
functions expf() and erff(), whose perfect operation has been assured.
See examples/trapping.c to see how to use this powerful functionality.
* Fix fork locking on win32
- __enable_threads / set __threaded in __proc_setup as threads are required for
win32 subprocess management
- move mmi/fds locking out of pthread_atfork.c into fork.c so it's done anytime
__threaded is set instead of being dependent of pthreads
- explicitly yoink _pthread_onfork_prepare, _pthread_onfork_parent, and
_pthread_onfork_child in pthread_create.c so they are linked in in-case they
are separated from _pthread_atfork
Big Thanks to @dfyz for help with locating the issue, testing, and devising a fix!
* fix child processes not being able to open files, initialize all necessary locks on fork
Commit bc6c183 introduced a bunch of discrepancies between what files
look like in the repo and what clang-format says they should look like.
However, there were already a few discrepancies prior to that. Most of
these discrepancies seemed to be unintentional, but a few of them were
load-bearing (e.g., a #include that violated header ordering needing
something to have been #defined by a 'later' #include.)
I opted to take what I hope is a relatively smooth-brained approach: I
reverted the .clang-format change, ran clang-format on the whole repo,
reapplied the .clang-format change, reran clang-format again, and then
reverted the commit that contained the first run. Thus the full effect
of this PR should only be to apply the changed formatting rules to the
repo, and from skimming the results, this seems to be the case.
My work can be checked by applying the short, manual commits, and then
rerunning the command listed in the autogenerated commits (those whose
messages I have prefixed auto:) and seeing if your results agree.
It might be that the other diffs should be fixed at some point but I'm
leaving that aside for now.
fd '\.c(c|pp)?$' --print0| xargs -0 clang-format -i
The `com` parameter to `TryPath` was always 1, so there was no reason to
have it. This patch changes the logic to be as though `com` was 0, which
provides a possible answer to the TODO question -- the answer is no.
If we never care about appending `.com`, then `CopyWithCwd` doesn't need
to return anything beyond a boolean success value.
__res_send returns the full answer length even if it didn't fit the
buffer, but __dns_parse expects the length of the filled part of the
buffer.
Analogous to Musl commit 77327ed064bd57b0e1865cd0e0364057ff4a53b4 which
fixed the only other __dns_parse call site.
The name resolution would abort when getting more than 63 records per
request, due to what seems to be a left-over from the original code.
This check was non-breaking but spurious prior to TCP fallback
support, since any 512-byte packet with more than 63 records was
necessarily malformed. But now, it wrongly rejects valid results.
Reported by Daniel Stefanik in Alpine Linux aports issue 15320.
f314e133929b6379eccc632bef32eaebb66a7335
Author: Rich Felker <dalias@aerifal.cx>
Date: Thu Nov 16 12:55:21 2023 -0500
mntent: fields are delimited only by tabs or spaces, not general whitespace
this matters because the kernel-provided mtab only escapes tabs,
spaces, newlines, and backslashes. it leaves carriage returns, form
feeds, and vertical tabs literal.
commit ee1d39bc1573c1ae49ee6b658938b56bbef95a6c
Author: q66 <q66@chimera-linux.org>
Date: Thu Nov 9 20:48:44 2023 +0100
mntent: unescape octal sequences
As entries in mtab are delimited by spaces, whitespace characters
are escaped as octal sequences. When reading them out, we have to
unescape these sequences to get the proper string.
Previously, the atomic store looked like it was happening while the
struct's memory was still poisoned. I was unable to observe any issues
with this, but this change seems to make the code more obviously correct
(at the cost of a redundant atomic store to zeroed space in case the map
needed to be extended.)
MaGuess on Discord pointed out the fact that cosmocc contradicts itself
on the signedness of `char`. It's up to each platform to choose one, so
the cosmo platform shall choose signed. The rationale is it makes the C
language syntax more internally similar. `char` should be `signed char`
for the same reason `int` means `signed int`. It's recommended that you
still assume `char` could go either way since that's portable thinking.
But if you want to assume we'll always have signed char, that's ok too.
You can now run commands like `x86_64-unknown-cosmo-c++ -S` when using
your cosmocc toolchain. Please note the S flag isn't supported for the
cosmocc command itself.
We technically support versions much earlier, possibly dating back to
2018 or earlier, but they're not officially supported since I have no
way of testing them.
Closes#1129
Now that these functions are behind _COSMO_SOURCE there's no reason for
having the ugly underscore anymore. To use these functions, you need to
pass -mcosmo to cosmocc.
The WIN32 CreateProcess() function does not require an .exe or .com
suffix in order to spawn an executable. Now that we have Cosmo bash
we're no longer so dependent on the cmd.exe prompt.
- Write some more unit tests
- memcpy() on ARM is now faster
- Address the Musl complex math FIXME comments
- Some libm funcs like pow() now support setting errno
- Import the latest and greatest math functions from ARM
- Use more accurate atan2f() and log1pf() implementations
- atoi() and atol() will no longer saturate or clobber errno
Multiple projects I care about make the assumption that this isn't a
system call that sleeps for a particular number of nanonseconds, but
rather a function that parks processes on kernel scheduler quantums.
Anyone who wants the old behavior should use cosmo_clock_nanosleep()
* Fix `if...fi` generation in the generated APE shell script
A shell will fail with a syntax error on an empty `if` or `else` body.
That is, neither of these is allowed:
# Empty `if`
if [ ... ]; then
fi
# Empty `else`
if [ ... ]; then
...
else
fi
There were two places where `apelink` could generate problematic `if`'s:
1. The XNU shell generation for aarch64 binaries when no loaders (either
binary or source) are provided. They can't assimilate, so the resulting
`else` body becomes empty.
There is actually a code path guarded by the `gotsome` variable that
inserts an extra `true` in this case, but the variable was never
initialized, so in practice this code path didn't activate in my
tests. This is fixed by initializing the variable.
2. The loader extraction code when no loaders are provided and XNU
support is requested. This is fixed by adding a simliar code path
that prevents an empty body from being generated.
* Update the apelink manual after commit d53c335
The `-s` option changed its meaning, but the docs weren't updated.
* Fix reading the same symbol twice when using `{f,}scanf()`
PR #924 appears to use `unget()` subtly incorrectly when parsing
floating point numbers. The rest of the code only uses `unget()`
immediately followed by `goto Done;` to return back the symbol that
can't possibly belong to the directive we're processing.
With floating-point, however, the ungot characters could very well
be valid for the *next* directive, so we will essentially read them
twice. It can't be seen in `sscanf()` tests because `unget()` is a
no-op there, but the test I added for `fscanf()` fails like this:
...
EXPECT_EQ(0xDEAD, i1)
need 57005 (or 0xdead) =
got 908973 (or 0x000ddead)
...
EXPECT_EQ(0xBEEF, i2)
need 48879 (or 0xbeef) =
got 769775 (or 0x000bbeef)
This means we read 0xDDEAD instead of 0xDEAD and 0xBBEEF instead of
0xBEEF. I checked that both musl and glibc read 0xDEAD/0xBEEF, as
expected.
Fix the failing test by removing the unneeded `unget()` calls.
* Don't read invalid floating-point numbers in `*scanf()`
Currently, we just ignore any errors from `strtod()`. They can
happen either because no valid float can be parsed at all, or
because the state machine recognizes only a prefix of a valid
floating-point number.
Fix this by making sure `strtod()` parses everything we recognized,
provided it's non-empty. This requires to pop the last character
off the FP buffer, which is supposed to be parsed by the next
`*scanf()` directive.
* Make `%c` parsing in `*scanf()` respect the C standard
Currently, `%c`-style directives always succeed even if there
are actually fewer characters in the input than requested.
Before the fix, the added test fails like this:
...
EXPECT_EQ(2, sscanf("ab", "%c %c %c", &c2, &c3, &c4))
need 2 (or 0x02 or '\2' or ENOENT) =
got 3 (or 0x03 or '\3' or ESRCH)
...
EXPECT_EQ(0, sscanf("abcd", "%5c", s2))
need 0 (or 0x0 or '\0') =
got 1 (or 0x01 or '\1' or EPERM)
musl and glibc pass this test.
It's now possible to use redbean Fetch() with arbitrary HTTP methods,
e.g. LIST which is used by Hashicorp. There's an eight char limit and
uppercase canonicalization still happens. This change also includes a
better function for launching a browser tab, that won't deadlock on a
headless workstation running Debian.
Closes#1107
Sometimes we need to interact with code that wasn't compiled using
`-fno-omit-frame-pointer`. For example, if a function pointer gets
passed and called by a foreign function, linked by cosmo_dlopen().
Function call tracing will now detect backtrace pointer corruption
and simply reduce the indentation level back to zero, as a result.
This change causes cosmocc to use -fno-inline-functions-called-once by
default, unless -Os or -finline-functions-called-once is defined. This
is important since I believe it generally makes code go faster, and it
most importantly makes --ftrace output much more understandable, since
the trace will be more likely to reflect the actual shape of the code.
We've always used this flag in the mono repo when ftracing is enabled,
but it slipped my mind to incorporate this into the cosmocc toolchain.
This change upgrades to GCC 12.3 and GNU binutils 2.42. The GNU linker
appears to have changed things so that only a single de-duplicated str
table is present in the binary, and it gets placed wherever the linker
wants, regardless of what the linker script says. To cope with that we
need to stop using .ident to embed licenses. As such, this change does
significant work to revamp how third party licenses are defined in the
codebase, using `.section .notice,"aR",@progbits`.
This new GCC 12.3 toolchain has support for GNU indirect functions. It
lets us support __target_clones__ for the first time. This is used for
optimizing the performance of libc string functions such as strlen and
friends so far on x86, by ensuring AVX systems favor a second codepath
that uses VEX encoding. It shaves some latency off certain operations.
It's a useful feature to have for scientific computing for the reasons
explained by the test/libcxx/openmp_test.cc example which compiles for
fifteen different microarchitectures. Thanks to the upgrades, it's now
also possible to use newer instruction sets, such as AVX512FP16, VNNI.
Cosmo now uses the %gs register on x86 by default for TLS. Doing it is
helpful for any program that links `cosmo_dlopen()`. Such programs had
to recompile their binaries at startup to change the TLS instructions.
That's not great, since it means every page in the executable needs to
be faulted. The work of rewriting TLS-related x86 opcodes, is moved to
fixupobj.com instead. This is great news for MacOS x86 users, since we
previously needed to morph the binary every time for that platform but
now that's no longer necessary. The only platforms where we need fixup
of TLS x86 opcodes at runtime are now Windows, OpenBSD, and NetBSD. On
Windows we morph TLS to point deeper into the TIB, based on a TlsAlloc
assignment, and on OpenBSD/NetBSD we morph %gs back into %fs since the
kernels do not allow us to specify a value for the %gs register.
OpenBSD users are now required to use APE Loader to run Cosmo binaries
and assimilation is no longer possible. OpenBSD kernel needs to change
to allow programs to specify a value for the %gs register, or it needs
to stop marking executable pages loaded by the kernel as mimmutable().
This release fixes __constructor__, .ctor, .init_array, and lastly the
.preinit_array so they behave the exact same way as glibc.
We no longer use hex constants to define math.h symbols like M_PI.
- Introduce portable sched_getcpu() api
- Support GCC's __target_clones__ feature
- Make fma() go faster on x86 in default mode
- Remove some asan checks from core libraries
- WinMain() now ensures $HOME and $USER are defined
1. `libc/isystem/complex.h` (included when you do `#include <complex.h>`)
defines `_COMPLEX_H`, and then proceeds to include `libc/complex.h`,
which contains the actual complex-related declarations. However, they
are *also* guarded by `_COMPLEX_H` and hence effectively ignored.
Fix this by changing `_COMPLEX_H` to `COSMOPOLITAN_LIBC_COMPLEX_H_`,
which is consistent with what the other headers (such as `math.h`) do.
2. Cosmopolitan could only support IPv4 multicast requests for sockets,
since a declaration for `struct ipv6_mreq` was missing. Add support
for IPv6, too, by adding the missing declaration.
- Let OpenMP be usable via cosmocc
- Let libunwind be usable via cosmocc
- Make X86_HAVE(AVXVNNI) work correctly
- Avoid using MAP_GROWSDOWN on qemu-aarch64
- Introduce in6addr_any and in6addr_loopback
- Have thread stacks use MAP_GROWSDOWN by default
- Ask OpenMP to not use filesystem to manage threads
- Make NI_MAXHOST and NI_MAXSERV available w/o _GNU_SOURCE
We recently broke MODE=dbg support when we added C++ exception support.
This change adds the missing UBSAN interfaces, needed to get it working
again. Some of the ASAN checking in the SJLJ guts needed to be disabled
since I doubt anyone's combined the two features until now.
Embedding Blink builds in Cosmo executables was a failed experiment. It
turned out to be easier than expected to let the mono repo have support
for multiple architectures. Blink still works great; it's supported and
recommended; just please use it as a separate program. For example, you
can use Blink to run Cosmo binaries on architectures like i486 / s390x.
a2753de contains some regressions, causing `fixupobj` to be
inappropriately suppressed when `-MD` or `-MMD` is passed.
This commit reverts most changes by a2753de, and:
- Treats all invocations of the compiler with `-M` and `-MM` as with the
`cpp` intent, since these flags imply `-E`.
- Handle the dependency output path specified by `-MF`.
+ This is trivial for `cosmocross` since the script does not throw
objects to and from temporary directories.
+ For `cosmocc`, the file names are calculated based on the `-MF`
value provided by the user. If this flag is not specified, the script
generates the file name based on the output file using GCC rules.
Then, before calling the real compilers, an additional `-MF` flag is
passed to override the dependency outputs with mangled file names.
If you install qemu-user from apt then glibc links a lot of address
space bloat that causes pthread_create() to ENOMEM (a.k.a. EAGAIN).
Boosting the virtual memory quota from 512m to 2048m will hopefully
future proof the build for the future, as Linux distros get fatter.
Please note this only applies to MODE=aarch64 on x86_64 builds when
you're using QEMU from Debian/Ubuntu rather than installing the one
cosmo provides in third_party/qemu/qemu-aarch64.gz. This change may
also be useful to people who are using the host compiler toolchain.
Added the implementation for `std::bad_any_cast` from upstream
`any.cpp`, and `std::bad_variant_access` from upstream `variant.cpp`.
This fixes missing `vtable` and `typeinfo` symbols when trying to link
code referencing these exception types.
- `__cxa_*` runtime functions are expected to be in the `abi` namespace,
which is currently an alias for `__cxxabiv1`.
- Rely on the header provided by `libcxxabi` for functions that we do
not implement ourselves anymore.
Some compiler flags (such as -E or -MM) instruct GCC to only run the
preprocessor and produce certain text files.
In this case, we do not want to run `fixupobj` and make the tool fail
because the input is not an ELF64 binary.
This will help C++ code that uses exceptions to be tinier. For example,
this change shaves away 1000 lines of assembly code from LLVM's libcxx,
which is 0.7% of all assembly instructions in the entire library.
We now store values in jmp_buf where the compiler wants them to be. This
fixes code that calls __builtin_setjmp() and __builtin_longjmp() such as
libunwind. All libcxxabi tests are now passing on ARM64.
See #1076
This test was was failing on GitHub Actions because GA uses Linux and
Linux supports resource usage accounting. Cosmo's compile.com program
imposes CPU, memory and file size limits on both the compiler and the
test programs themselves.
See #1076
Added the `libcxxabi` test suite as found in LLVM 17.0.6.
Some tests that do not apply to the current configuration of
comsopolitan are not added. These include:
- `backtrace_test`, `forced_unwind*`: Use unwind function unsupported in
SjLj mode.
- `noexception*`: Designed to test `libcxxabi` in no exceptions mode.
Some tests are added but not enabled due to bugs specific to GCC or
cosmopolitan. These are clearly indicated in the `BUILD.mk` file.
Renaming gc() to _gc() was a mistake since the better thing to do is put
it behind the _COSMO_SOURCE macro. We need this change because I haven't
wanted to use my amazing garbage collector ever since we renamed it. You
now need to define _COSMO_SOURCE yourself when using amalgamation header
and cosmocc users need to pass the -mcosmo flag to get the gc() function
Some other issues relating to cancelation have been fixed along the way.
We're also now putting cosmocc in a folder named `.cosmocc` so it can be
more safely excluded by grep --exclude-dir=.cosmocc --exclude-dir=o etc.
With `libunwind` and `libcxxabi` included in `libcosmo`, we can now
allow users to build C++ applications with exceptions and RTTI enabled.
The default is still disabling these two to avoid bloating the binary.
Closes#1065
* third_party: Add libcxxabi
Added libcxxabi from LLVM 17.0.6
The library implements the Itanium C++ exception handling ABI.
* third_party/libcxxabi: Enable __cxa_thread_atexit
Enable `__cxa_thread_atexit` from libcxxabi.
`__cxa_thread_atexit_impl` is still implemented by the cosmo libc.
The original `__cxa_thread_atexit` has been removed.
* third_party/libcxx: Build with exceptions
Build libcxx with exceptions enabled.
- Removed `_LIBCPP_NO_EXCEPTIONS` from `__config`.
- Switched the exception implementation to `libcxxabi`. These two files
are taken from the same `libcxx` version as mentioned in `README.cosmo`.
- Removed `new_handler_fallback` in favor of `libcxxabi` implementation.
- Enable `-fexceptions` and `-frtti` for `libcxx`.
- Removed `THIRD_PARTY_LIBCXX` dependency from `libcxxabi` and
`libunwind`. These libraries do not use any runtime `libcxx` functions,
just headers.
* libc: Remove remaining redundant cxa functions
- `__cxa_pure_virtual` in `libcxxabi` is also a stub similar to the
existing one.
- `__cxa_guard_*` from `libcxxabi` is used instead of the ones from
Android.
Now there should be no more duplicate implementations.
`__cxa_thread_atexit_impl`, `__cxa_atexit`, and related supporting
functions, are still left to other libraries as in `libcxxabi`.
`libcxxabi` is also now added to `cosmopolitan.a` to make up for the
removed functions.
Affected in-tree libraries (`third_party/double-conversion`) have been
updated.
The AMD HIP SDK for Linux ships prebuilt DSOs with an RWX PT_GNU_STACK
since old versions of GCC made it nearly impossible to build artifacts
where that wasn't the case, however modern glibc systems will flat out
refuse to link RWX DSOs from an execuatble that uses PT_GNU_STACK = RW
Now that cosmocc is unpacked into a version-specific directory under
cosmocc/, it makes more sense to put the versions in /opt/cosmocc and
maintain a symlink to the currently active one.
The cosmo_dlsym() function now returns the raw function address. You
need to call cosmo_dltramp() on the result, to make it safe to call.
This change is important, because cosmo_dltramp() magic can't always
work; for some tricky functions, you need to translate ABIs by hand.
Now we do them for assimilated binaries (except on OpenBSD or XNU
non-Silicon), for XnuSilicon, and for binaries with the preserve-
argv[0] auxv flag set. We check whether to pass the argv[0] value
at the test site rather than the Child site. We move a lot of the
test initialization into Child in the non-child case, in order to
get at the pre-init value of `__program_executable_name`. Finally,
we print out info about what we are skipping.
If cosmo_dlopen() is linked on AMD64 then the runtime will switch to
using %gs for thread-local storage. This eliminates the need for the
imported symbol trampoline. It's now safer to pass function pointers
back and forth with imported libraries. Your program gets recompiled
at runtime to make it happen and the overhead is a few milliseconds.
The toolchain will now be downloaded going forward from multiple pinned
URLs which have shasums. Either wget or curl must be installed.
This change unblocks #1053
Our dynamic linking implementation is now able to support functions with
dozens of parameters. In addition to having extra integral arguments you
can now pass vector registers using intrinsic types. Lastly, you can now
return multiple values, which is useful for functions returning structs.
It's now possible to pass flags like -Xaarch64-march=armv8.2-a+dotprod
so that cosmocc will use newer ARM ISAs. For AMD64 there's another one
worth mentioning, which looks like this: -Xx86_64-mssse3
This increases risk of fork bomb but is needed to support the NixOS.
Upstream dependencies of APE (uname, mkdir, dd, chmod, gzip, and mv)
will be removed from releases, and deleted from the cosmo.zip server
See #12
- Check `$COSMOCC`, defaulting to `/opt/cosmocc`.
- Try to get the full path of the repo `make.com`. I'm not aware of
a way of getting the path that defines a zsh function, so the best
fallback available is `$PWD`.
* Remove -f from loader usage
-f was removed in 1.5. As there is now only one flag, a couple more
bytes can be shaved off as well.
* Further loader golf
Shaves off a few bytes, paying for the cost of `RealPath` and then some
on x86_64 and offsetting some of the cost to aarch64.
* Shave off a few more bytes
Removes `-h` and flags from usage. Keeps flag-parsing logic the same,
i.e. still accepts `-h` / `--help`. Only difference is what fd and rc
the usage uses.
Still over 1k north of 8192.
* Reorder Launch arguments, pass aarch64 os
Third and fourth arguments are now identical between cosmo and Launch.
By passing sp as argument 4, we save a bit of register juggling.
Fourth argument (os) is now always passed by the loader on aarch64. It
is not yet processed by cosmo. Pushing this change separately, as the
cosmo side turns out to be somewhat more involved.
* cosmo2 receives os from loader
FreeBSD aarch64 now traps early rather than pretending to be Linux.
o/aarch64/examples/env.com still works on Linux and Xnu.
Now that our socket system call polyfills are good enough to support
Musl's DNS library we should be using that rather than the barebones
domain name system implementation we rolled on our own. There's many
benefits to making this change. So many, that I myself wouldn't feel
qualified to enumerate them all. The Musl DNS code had to be changed
in order to support Windows of course, which looks very solid so far
This commit and, by extension, PR attempts to update `stb` in the most
straightforward way possible as well as include fixes from main repo's
unmerged PRs for cases rearing their ugly heads during everyday usage:
- stb#1299: stb_rect_pack: Make rect_height_compare a stable sort
- stb#1402: stb_image: Fix "unused invalid_chunk" with STBI_FAILURE_USERMSG
- stb#1404: stb_image: Fix gif two_back memory address
- stb#1420: stb_image: Improve error reporting if file operations fail
within *_from_file functions
- stb#1445: stb_vorbis: Few static analyzers fixes
- stb#1487: stb_vorbis: Fix residue classdata bounding for
f->temp_memory_required
- stb#1490: stb_vorbis: Fix broken clamp in codebook_decode_deinterleave_repeat
- stb#1496: stb_image: Fix pnm only build
- stb#1497: stb_image: Fix memory leaks if stbi__convert failed
- stb#1498: stb_vorbis: Fix memory leaks in stb_vorbis
- stb#1499: stb_vorbis: Minor change to prevent the undefined behavior -
left shift of a negative value
- stb#1500: stb_vorbis: Fix signed integer overflow
Includes additional small fixes that I felt didn't warrant a separate PR.
Missed this when changing the code back to be like the old version.
com is now a parameter.
The only plausible way to trigger this would be to pass a loader
pathname close to MAX_PATH characters long, and then remove that
path prior to the first sys_faccessat.
This implements proposals 1 and 2a from this gist:
https://gist.github.com/mrdomino/2222cab61715fd527e82e036ba4156b1
The only reason to use realpath from the loader was to try to prevent a
TOCTOU between the loader and the binary. But this is only a real issue
in set-id contexts, and in those cases there is already a canonical way
to do it: `/dev/fd`, passed by the kernel to the loader, so all we have
to do is pass that along to the binary.
Aside from realpath, there is no reason to absolutize the path we supply
to the binary, since it can call `getcwd` as well as we can, and on non-
M1 the binary is in a much better position to make that call.
Since we no longer absolutize the path, the binary does need to do this,
so we make its argv-parsing code generic and apply that to the different
possible places the path could come from. This means that `_` is finally
usable as a relative path, as a nice side benefit.
The M1 realpath code had a significant bug - it uses the wrong offset to
truncate the `.ape` in the `$prog.ape` case.
This PR also fixes a regression in `ape $progname` out of `$PATH` on the
two BSDs (Free and Net) that did not implement `RealPath`.
`o/$mode/*` is passed through as-is. `o/*` other than `$mode` has
`$mode` inserted. `*` has `o/$mode/` prepended.
Really leveraging zsh default tab completion here; if you have built
things with `MODE=` you can leverage that for perfect tab completion
in other modes.
Fixes a regression in GetProgramExecutableName on Linux against old
loaders. In the loader case, /proc/self/exe gives the loader's path.
We tried to detect this by checking for `/usr/bin/ape`. But that is
only one of the possible places the loader could be.
Somehow or another, I previously had missed `BUILD.mk` files.
In the process I found a few straggler cases where the modeline was
different from the file, including one very involved manual fix where a
file had been treated like it was ts=2 and ts=8 on separate occasions.
The commit history in the PR shows the gory details; the BUILD.mk was
automated, everything else was mostly manual.
The ape loader now passes the program executable name directly as a
register. `x2` is used on aarch64, `%rdx` on x86_64. This is passed
as the third argument to `cosmo()` (M1) or `Launch` (non-M1) and is
assigned to the global `__program_executable_name`.
`GetProgramExecutableName` now returns this global's value, setting
it if it is initially null. `InitProgramExecutableName` first tries
exotic, secure methods: `KERN_PROC_PATHNAME` on FreeBSD/NetBSD, and
`/proc` on Linux. If those produce a reasonable response (i.e., not
`"/usr/bin/ape"`, which happens with the loader before this change),
that is used. Otherwise, if `issetugid()`, the empty string is used.
Otherwise, the old argv/envp parsing code is run.
The value returned from the loader is always the full absolute path
of the binary to be executed, having passed through `realpath`. For
the non-M1 loader, this necessitated writing `RealPath`, which uses
`readlinkat` of `"/proc/self/fd/[progfd]"` on Linux, `F_GETPATH` on
Xnu, and the `__realpath` syscall on OpenBSD. On FreeBSD/NetBSD, it
punts to `GetProgramExecutableName`, which is secure on those OSes.
With the loader, all platforms now have a secure program executable
name. With no loader or an old loader, everything still works as it
did, but setuid/setgid is not supported if the insecure pathfinding
code would have been needed.
Fixes#991.
Using this shell script:
#!/bin/sh
mkdir -p exe
for f in $(findpe); do
if [ -e exe/${f##*/}.exe ]; then
cp $f exe/${f##*/}-$(rand64).exe
else
cp $f exe/${f##*/}.exe
fi
done
rm -f /mnt/videos/microsoft.zip
zip -rj6 /mnt/videos/microsoft.zip exe
echo /mnt/videos/microsoft.zip
Helps file reports with Microsoft about incorrect AV detections.
See #1003
Please use https://github.com/mozilla-Ocho/llamafile which is better,
newer, and built on cosmocc. If you need the RadPajama model, file an
issue with llamafile asking for support.
We have received multiple reports of GCC breaking builds when compiler
flags like `-std=c11` were being passed. The workaround until the next
release is to simply not define `__STRICT_ANSI__` which is a bad idea.
At least in neovim, `│vi:` is not recognized as a modeline because it
has no preceding whitespace. After fixing this, opening a file yields
an error because `net` is not an option. (`noet`, however, is.)
* Introduce env.com
Handy tool for debugging environment issues.
* Inject path as COSMOPOLITAN_PROGRAM_EXECUTABLE
`argv[0]` was previously being used as a communication channel between
the loader and the binary, giving the binary its full path for use e.g.
in `GetProgramExecutableName`. But `argv[0]` is not a good channel for
this; much of what made 2a3813c6 so gross is due to that.
This change fixes the issue by preserving `argv[0]` and establishing a
new communication channel: `COSMOPOLITAN_PROGRAM_EXECUTABLE`.
The M1 loader will always set this as the first variable. Linux should
soon follow. On the other side, `GetProgramExecutableName` checks that
variable first. If it sees it, it trusts it as-is.
A lot of the churn in `ape/ape-m1.c` in this change is actually backing
out hacks introduced in 2a3813c6; the best comparison is:
git diff 2a3813c6^..
* ape loader: $prog.ape + login shell support
If the ape loader is invoked with `$0 = $prog.ape`, then it searches for
a `$prog` in the same directory as it and loads that. In particular, the
loader searches the `PATH` for an executable named `$prog.ape`, then for
an executable named `$prog` in the same directory. If the former but not
the latter is found, the search terminates with an error.
It also handles the special case of getting started as `-$SHELL`, which
getty uses to indicate that the shell is a login shell. The path is not
searched in this case, and the program location is read straight out of
the `SHELL` variable.
It is now possible to have `/usr/local/bin/zsh.ape` act as a login shell
for a `/usr/local/bin/zsh` αpε, insofar as the program will get started
with the 'correct' args. Unfortunately, many things break if `$0` is not
the actual full path of the executable being run; for example, backspace
does not update the display properly.
To work around the brokenness introduced by not having `$0` be the full
path of the binary, we cut the leading `-` out of `argv[0]` if present.
This gets the loader's behavior with `$prog.ape` up to par, but doesn't
tell login shells that they are login shells.
So we introduce a hack to accomplish that: if ape is run as `-$prog.ape`
and the shell is `$prog`, the binary that is loaded has a `-l` flag put
into its first argument.
As of this commit, αpε binaries can be used as login shells on OSX.
* if islogin, execfn = shell
Prior to this, execfn was not being properly set for login shells that
did not receive `$_`, which was the case for iTerm2 on Mac. There were
no observable consequences of this, but fixing it seems good anyway.
* Fix auxv location calculation
In the non-login-shell case, it was leaving a word of uninitialized
memory at `envp[i] + 1`. This reuses the previous calculation based
on `envp`.
* Better refcounting
Cribbed from [Rust Arc][1] and the [Boost docs][2]:
"""
Increasing the reference counter can always be done with
memory_order_relaxed: New references to an object can only be formed
from an existing reference, and passing an existing reference from one
thread to another must already provide any required synchronization.
It is important to enforce any possible access to the object in one
thread (through an existing reference) to happen before deleting the
object in a different thread. This is achieved by a "release" operation
after dropping a reference (any access to the object through this
reference must obviously happened before), and an "acquire" operation
before deleting the object.
It would be possible to use memory_order_acq_rel for the fetch_sub
operation, but this results in unneeded "acquire" operations when the
reference counter does not yet reach zero and may impose a performance
penalty.
"""
[1] https://moshg.github.io/rust-std-ja/src/alloc/arc.rs.html
[2] https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html
* Make ZiposHandle's pos atomic
Implements a somewhat stronger guarantee than POSIX specifies: reads and
seeks are atomic. They may be arbitrarily reordered between threads, but
each one happens all the way and leaves the fd in a consistent state.
This is achieved by "locking" pos in __zipos_read by storing SIZE_MAX to
pos during the operation, so only one can be in-flight at a time. Seeks,
on the other hand, just update pos in one go, and rerun if it changed in
the meantime.
I used `LIKELY` / `UNLIKELY` to pessimize the concurrent case; hopefully
that buys back some performance.
On aarch64 hosts, MODE= is changed to MODE=aarch64, so o// targets don't
work. So On aarch64, get apelink.com out of o/aarch64/. Also prepend ape
when calling it. And finally, fetch with curl when wget isn't installed.
* Implement __zipos_dup
Makes ZiposHandle reference-counted by an `rc` field in a union with its
freelist `next` pointer. The functions `__zipos_free` and `__zipos_keep`
function as incref/decref for it. Adds `__zipos_postdup` to fix metadata
on file descriptors after dup-like operations, and adds zipos support to
`sys_dup_nt` + `sys_close_nt`.
* Remove noop __zipos_postdup
rc is never a zipos file because it is always a previously unused file
descriptor. fd is never a zipos file because that case has been handled
above by __zipos_fcntl.
Includes additional fixes from main repo's unmerged PRs:
- quickjs#132: Fix undefined behavior: shift 32 bits for uint32_t in bf_set_ui
- quickjs#171: Fix locale-aware representation of hours in Date class
- quickjs#182: Fix stack overflow in CVE-2023-31922
On UNIX if dup2(newfd) was a ZipOS file descriptor, then its resources
weren't being released, and the newly created file descriptor would be
mistaken for ZipOS due to its memory not being cleared. On Windows, an
issue also existed relating to newfd resources not being released.
I'm no longer able to reproduce the PE import table corruption that was
previously observed. Since this optimization shaves up to 64kb off each
fat APE binary we build, it's worth turning this back on again, to wait
and see if something breaks, and if so, fix it. At least until the next
release is shipped.
See #965
This change upgrades to superconfigure z0.0.23 which fixes an issue
where the compiler had harmless /home/... paths baked-in, which are
normally only present in the build environment, and usually skipped
over. Sadly on MacOS calling fstatat() on these paths would lead to
cloud file system ops that caused system calls to take a long time.
That's problematic, since cosmocc needs to be a 100% local command.
This change fixes a regression that happened some time ago when building
for AARCH64 using the vendored toolchain rather than cosmocc. The errors
that would show up `Relocations in generic ELF (EM: 62)` have been fixed
Munging of paths passed inside the system() interpreter command is no
longer supported. You have to pass your paths to posix_spawn() or the
execve() family of functions if you want them to be munged. The first
three characters must match `^/[a-z]/` in which case, it'll be turned
into a DOS-style drive path with backslashes.
- Introduce MAP_JIT which is zero on other platforms
- Invent __jit_begin() and __jit_end() which wrap Apple's APIs
- Runtime dispatch to sys_icache_invalidate() in __clear_cache()
- Use good ELF technique in cosmo_dlopen()
- Make strerror() conform more to other libc impls
- Introduce __clear_cache() and use it in cosmo_dlopen()
- Remove libc/fmt/fmt.h header (trying to kill off LIBC_FMT)
It turns out my earlier commit ddc08dc974 caused a build with
MODE=aarch64 to fail. The commit changed deathstar.c to link
in code to support a VGA console, but this is not implemented
yet for AArch64. Thanks to @ahgamut for spotting this issue.
The test was failing if the process's umask happened to be
0077, for example. The file `foo` was then created with a
file mode of 0100600, rather than the expected 0100644.
After hearing horror stories from a trusted colleague, I don't think
this is the kind of API we want to be supporting. Also SQLite wisdom
regarding fdatasync() has been added to the documentation.
Imported functions are now aspected with a trampoline that blocks
signals and changes the thread-local storage register. This means
bigger more complicated libraries can now be imported even though
the whole technique remains fundamentally unsafe.
* [metal] Ensure DF is clear when calling C from exception handler
* [metal] Mark some internal routines and declarations as `@internal`
* [metal] Fix crash under UEFI when command line string is NULL
* [metal] Fix argc & argv[] setting, & VM page freeing, for UEFI
Part of the memory occupied by the argv[] contents was
erroneously used for page tables & then later erroneously
freed. The symptom was that argv[0] would show up as an
empty string ("").
We torture test dlmalloc() in test/libc/stdio/memory_test.c. That test
was crashing on occasion on Apple M1 microprocessors when dlmalloc was
using *NSYNC locks. It was relatively easy to spot the cause, which is
this one particular compare and swap operation, which needed to change
to use sequentially-consistent ordering rather than an acquire barrier
We now have an `#include <cxxabi.h>` header which defines all the APIs
Cosmopolitan's implemented so far. The `cosmocc` README.md file is now
greatly expanded with documentation.
The `cosmocc` compiler is now being distributed as a self-contained
toolchain that's path-agnostic and it no longer requires you clone the
Cosmop repo to use it. The bin/ folder has been deleted from the mono
repo. The `fatcosmocc` command has been renamed to `cosmocc`. MacOS
support now works very well.
Our makefile generator now accepts badly formatted include lines. It's
now more hermetic with better error checking in the cosmo repo, and it
can be configured to not be hermetic at all.
Using `cosmocc -std=c11` was causing `ucontext_t` to become misaligned.
This change also adds the GNU constants on x86_64 for accessing general
registers, so you will not need `#ifdef`s to support both Cosmo and GNU
This change addresses a $PATH resolution issue where APE depends on
uname and uname is an APE program. So sorry to anyone this impacted
we'll get a release out soon.
wait4() is now solid enough to run `make -j100` on Windows. You can now
use MSG_DONTWAIT on Windows. There was a handle leak in accept() that's
been fixed. Our WIN32 overlapped i/o code has been simplified. Priority
class now inherits into subprocesses, so the verynice command will work
and the signal mask will now be inherited by execve() and posix_spawn()
description:Used to request enhancements for cosmopolitan
title:"Feature Request: "
labels:["enhancement"]
body:
- type:markdown
attributes:
value:|
[Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/jart/cosmopolitan/discussions/categories/ideas)
- type:checkboxes
id:prerequisites
attributes:
label:Prerequisites
description:Please confirm the following before submitting your enhancement request.
options:
- label:I am running the latest code. Mention the version if possible as well.
required:true
- label:I carefully followed the [README.md](https://github.com/jart/cosmopolitan/blob/master/README.md).
required:true
- label:I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
required:true
- label:I reviewed the [Discussions](https://github.com/jart/cosmopolitan/discussions), and have a new and useful enhancement to share.
required:true
- type:textarea
id:feature-description
attributes:
label:Feature Description
description:Please provide a detailed written description of what you were trying to do, and what you expected `cosmopolitan` to do as an enhancement.
placeholder:Detailed description of the enhancement
validations:
required:true
- type:textarea
id:motivation
attributes:
label:Motivation
description:Please provide a detailed written description of reasons why this feature is necessary and how it is useful to `cosmopolitan` users.
placeholder:Explanation of why this feature is needed and its benefits
validations:
required:true
- type:textarea
id:possible-implementation
attributes:
label:Possible Implementation
description:If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
placeholder:Detailed description of potential implementation
Don't forget to check for any [duplicate research issue tickets](https://github.com/jart/cosmopolitan/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)
- type:checkboxes
id:research-stage
attributes:
label:Research Stage
description:Track general state of this research ticket
options:
- label:Background Research (Let's try to avoid reinventing the wheel)
- label:Hypothesis Formed (How do you think this will work and it's effect?)
- label:Strategy / Implementation Forming
- label:Analysis of results
- label:Debrief / Documentation (So people in the future can learn from us)
- type:textarea
id:background
attributes:
label:Previous existing literature and research
description:Whats the current state of the art and whats the motivation for this research?
- type:textarea
id:hypothesis
attributes:
label:Hypothesis
description:How do you think this will work and it's effect?
- type:textarea
id:implementation
attributes:
label:Implementation
description:Got an approach? e.g. a PR ready to go?
- type:textarea
id:analysis
attributes:
label:Analysis
description:How does the proposed implementation behave?
- type:textarea
id:logs
attributes:
label:Relevant log output
description:Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.
description:Used to track refactoring opportunities
title:"Refactor: "
labels:["refactor"]
body:
- type:markdown
attributes:
value:|
Don't forget to [check for existing refactor issue tickets](https://github.com/jart/cosmopolitan/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
Also you may want to check [Pull request refactor label as well](https://github.com/jart/cosmopolitan/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.
- type:textarea
id:background-description
attributes:
label:Background Description
description:Please provide a detailed written description of the pain points you are trying to solve.
placeholder:Detailed description behind your motivation to request refactor
validations:
required:true
- type:textarea
id:possible-approaches
attributes:
label:Possible Refactor Approaches
description:If you have some idea of possible approaches to solve this problem. You may want to make it a todo list.
placeholder:Your idea of possible refactoring opportunity/approaches
You can run your ELF AARCH64 executable on Apple Silicon as follows:
```sh
ape ./llama.com
```
If you want to run the `MODE=aarch64` unit tests, you need to have
qemu-aarch64 installed as a binfmt_misc interpreter. It needs to be a
static binary if you want it to work with Landlock Make's security. You
can use the build included in our `third_party/qemu/` folder.
```
doas cp o/third_party/qemu/qemu-aarch64 /usr/bin/qemu-aarch64
doas sh -c "echo ':qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-aarch64:CF' > /proc/sys/fs/binfmt_misc/register"
make -j8 m=aarch64
```
Please note that the qemu-aarch64 binfmt_misc interpreter installation
process is *essential* for being able to use the `aarch64-unknown-cosmo`
toolchain to build fat APE binaries on your x86-64 machine.
## AMD64 + ARM64 fat APE binaries
If you've setup the qemu binfmt_misc interpreter, then you can can use
cosmo's toolchains to build fat ape binaries. It works by compiling your
program twice, so you can have a native build for both architectures in
the same file. The two programs are merged together by apelink.com which
also embeds multiple copies of APE loader and multiple symbols tables.
The easiest way to build fat APE is using `fatcosmocc`. This compiler
works by creating a concomitant `.aarch64/foo.o` for every `foo.o` you
compile. The only exception is the C preprocessor mode, which actually
runs x86-64 GCC except with macros like `__x86_64__` undefined.
This toolchain works great for C projects that are written in a portable
way and don't produce architecture-specific artifacts. One example of a
large project that can be easily built is GNU coreutils.
```sh
cd coreutils
fatcosmocc --update ||exit
./configure CC=fatcosmocc \
AR=fatcosmoar \
INSTALL=$(command -v fatcosmoinstall) \
--prefix=/opt/cosmos \
--disable-nls \
--disable-dependency-tracking \
--disable-silent-rules
make -j8
```
You'll then have a bunch of files like `src/ls` which are fat ape
binaries. If you want to run them on Windows, then you simply need to
rename the file so that it has the `.com` suffix. Better yet, consider
making that a symlink (a.k.a. reparse point). The biggest gotcha with
`fatcosmocc` though is ensuring builds don't strip binaries. For
example, Linux's `install -s` command actually understands Windows'
Portable Executable format well enough to remove the MS-DOS stub, which
is where the APE shell script is stored. You need to ensure that
`fatcosmoinstall` is used instead. Especially if your project needs to
install the libraries built by `fatacosmoar` into `/opt/cosmos`.
## Advanced Fat APE Builds
Once you get seriously involved in creating fat APE builds of software
you're going to eventually outgrow `fatcosmocc`. One example is Emacs
which is trickier to build, because it produces architecture-specific
files, and it also depends on shared files, e.g. zoneinfo. Since we like
having everything in a neat little single-file executable container that
doesn't need an "installation wizard", this tutorial will explain how we
manage to accomplish that.
What you're going to do is, instead of using `fatcosmocc`, you're going
to use both the `x86_64-unknown-cosmo-cc` and `aarch64-unknown-cosmo-cc`
toolchains independently, and then run `apelink` and `zip` to manually
build the final files. But there's a few tricks to learn first.
The first trick is to create a symlink on your system called `/zip`.
Cosmopolitan Libc normally uses that as a synthetic folder that lets you
access the assets in your zip executable. But since that's a read-only
file system, your build system should use the normal one.
```sh
doas ln -sf /opt/cosmos /zip
```
Now create a file named `rebuild-fat.sh` which runs the build twice:
```sh
#!/bin/sh
set -ex
export MODE=aarch64
export COSMOS=/opt/cosmos/aarch64
rebuild-cosmos.sh aarch64
export MODE=
export COSMOS=/opt/cosmos/x86_64
rebuild-cosmos.sh x86_64
wall.com 'finished building'
```
Then create a second file `rebuild-cosmos.sh` which runs your build: