Modifies download-cosmocc.sh to maintain a .cosmocc/current symlink that
always points to the most recently downloaded version of cosmocc. We can
use this to point at a canonical make for a bootstrapped repository. For
first-time builds, we suggest: https://cosmo.zip/pub/cosmos/bin/make and
have updated the docs in a few places to mention this.
Fixes the other part of #1346.
It's now possible to use execve() when the parent process isn't built by
cosmo. In such cases, the current process will kill all threads and then
linger around, waiting for the newly created process to die, and then we
propagate its exit code to the parent. This should help bazel and others
Allocating private anonymous memory is now 5x faster on Windows. This is
thanks to VirtualAlloc() which is faster than the file mapping APIs. The
fork() function also now goes 30% faster, since we are able to avoid the
VirtualProtect() calls on mappings in most cases now.
Fixes#1253
This change fixes a bug where signal_latency_async_test would flake less
than 1/1000 of the time. What was happening was pthread_kill(sender_thr)
would return EFAULT. This was because pthread_create() was not returning
the thread object pointer until after clone() had been called. So it was
actually possible for the main thread to stall after calling clone() and
during that time the receiver would launch and receive a signal from the
sender thread, and then fail when it tried to send a pong. I thought I'd
use a barrier at first, in the test, to synchronize thread creation, but
I firmly believe that pthread_create() was to blame and now that's fixed
This change adds tests for the new memory manager code particularly with
its windows support. Function call tracing now works reliably on Silicon
since our function hooker was missing new Apple self-modifying code APIs
Many tests that were disabled a long time ago on aarch64 are reactivated
by this change, now that arm support is on equal terms with x86. There's
been a lot of places where ftrace could cause deadlocks, which have been
hunted down across all platforms thanks to new tests. A bug in Windows's
kill() function has been identified.
This change doubles the performance of thread spawning. That's thanks to
our new stack manager, which allows us to avoid zeroing stacks. It gives
us 15µs spawns rather than 30µs spawns on Linux. Also, pthread_exit() is
faster now, since it doesn't need to acquire the pthread GIL. On NetBSD,
that helps us avoid allocating too many semaphores. Even if that happens
we're now able to survive semaphores running out and even memory running
out, when allocating *NSYNC waiter objects. I found a lot more rare bugs
in the POSIX threads runtime that could cause things to crash, if you've
got dozens of threads all spawning and joining dozens of threads. I want
cosmo to be world class production worthy for 2025 so happy holidays all
This change introduces a new deadlock detector for Cosmo's POSIX threads
implementation. Error check mutexes will now track a DAG of nested locks
and report EDEADLK when a deadlock is theoretically possible. These will
occur rarely, but it's important for production hardening your code. You
don't even need to change your mutexes to use the POSIX error check mode
because `cosmocc -mdbg` will enable error checking on mutexes by default
globally. When cycles are found, an error message showing your demangled
symbols describing the strongly connected component are printed and then
the SIGTRAP is raised, which means you'll also get a backtrace if you're
using ShowCrashReports() too. This new error checker is so low-level and
so pure that it's able to verify the relationships of every libc runtime
lock, including those locks upon which the mutex implementation depends.
In the course of playing with redbean I was confused about how the state
was behaving and then noticed that some stuff is maybe getting edited by
multiple processes. I tried to improve things by changing the definition
of the counter variables to be explicitly atomic. Claude assures me that
most modern Unixes support cross-process atomics, so I just went with it
on that front.
I also added some mutexes to the shared state to try to synchronize some
other things that might get written or read from workers but couldn't be
made atomic, mainly the rusage and time values. I could've probably been
less granular and just had a global shared-state lock, but I opted to be
fairly granular as a starting point.
This also reorders the resetting of the lastmeltdown timespec before the
SIGUSR2 signal is sent; hopefully this is okay.
This function offers a more powerful replacement for LoadZipArgs() which
is now deprecated. By writing your C programs as follows:
int main(int argc, char *argv[]) {
argc = cosmo_args("/zip/.args", &argv);
// ...
}
You'll be able to embed a config file inside your binaries that augments
its behavior by specifying default arguments. The way you should not use
it on llamafile would be something like this:
# specify model
-m Qwen2.5-Coder-34B-Instruct.Q6_K.gguf
# prevent settings below from being changed
...
# specify system prompt
--system-prompt "\
you are a woke ai assistant\n
you can use the following tools:\n
- shell: run bash code
- search: ask google for help
- report: you see something say something"
# hide system prompt in user interface
--no-display-prompt
When programs like ar.ape and compile.ape are run on eCryptFs partitions
on Linux, copy_file_range() will fail with EINVAL which is wrong because
eCryptFs which doesn't support this system call, should raise EOPNOTSUPP
See https://github.com/jart/cosmopolitan/discussions/1305
Currently, cosmopolitan's pledge sandbox .so shared object wrongly tries
to use a bunch of UBSAN symbols, which are not defined when outside of a
cosmopolitan-based context (save if the sandboxed binary also happens to
be itself using UBSAN, but that's obviously very commonly not the case).
Fix this by making it such that the sandbox .so shared object traps when
UBSAN is triggered, avoiding any attempt to call into the UBSAN runtime.
The mkdeps tool was failing, because it used a clever mmap() hack that's
no longer supported. I've also removed tinymalloc.inc from build tooling
because Windows doesn't like the way it uses overcommit memory. Sadly it
means our build tool binaries will be larger. It's less of an issue, now
that we are no longer putting build tool binaries in the git repository.
Recursive mutexes now go as fast as normal mutexes. The tradeoff is they
are no longer safe to use in signal handlers. However you can still have
signal safe mutexes if you set your mutex to both recursive and pshared.
You can also make functions that use recursive mutexes signal safe using
sigprocmask to ensure recursion doesn't happen due to any signal handler
The impact of this change is that, on Windows, many functions which edit
the file descriptor table rely on recursive mutexes, e.g. open(). If you
develop your app so it uses pread() and pwrite() then your app should go
very fast when performing a heavily multithreaded and contended workload
For example, when scaling to 40+ cores, *NSYNC mutexes can go as much as
1000x faster (in CPU time) than the naive recursive lock implementation.
Now recursive will use *NSYNC under the hood when it's possible to do so
Using callbacks is still problematic with cosmo_dlopen() due to the need
to restore the TLS register. So using callbacks is even more strict than
using signal handlers. We are better off introducing a cosmoaudio_poll()
function. It makes the API more UNIX-like. How bad could the latency be?
This change introduces comsoaudio v1. We're using a new strategy when it
comes to dynamic linking of dso files and building miniaudio device code
which I think will be fast and stable in the long run. You now have your
choice of reading/writing to the internal ring buffer abstraction or you
can specify a device-driven callback function instead. It's now possible
to not open the microphone when you don't need it since touching the mic
causes security popups to happen. The DLL is now built statically, so it
only needs to depend on kernel32. Our NES terminal emulator now uses the
cosmoaudio library and is confirmed to be working on Windows, Mac, Linux
This change gets printvideo working on aarch64. Performance improvements
have been introduced for magikarp decimation on aarch64. The last of the
old portable x86 intrinsics library is gone, but it still lives in Blink
The worst issue I had with consts.sh for clock_gettime is how it defined
too many clocks. So I looked into these clocks all day to figure out how
how they overlap in functionality. I discovered counter-intuitive things
such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and
that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10
also has some incredible new APIs, that let us simplify clock_gettime().
- Linux CLOCK_REALTIME -> GetSystemTimePreciseAsFileTime()
- Linux CLOCK_MONOTONIC -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_MONOTONIC_RAW -> QueryUnbiasedInterruptTimePrecise()
- Linux CLOCK_REALTIME_COARSE -> GetSystemTimeAsFileTime()
- Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime()
- Linux CLOCK_BOOTTIME -> QueryInterruptTimePrecise()
Documentation on the clock crew has been added to clock_gettime() in the
docstring and in redbean's documentation too. You can read that to learn
interesting facts about eight essential clocks that survived this purge.
This is original research you will not find on Google, OpenAI, or Claude
I've tested this change by porting *NSYNC to become fully clock agnostic
since it has extensive tests for spotting irregularities in time. I have
also included these tests in the default build so they no longer need to
be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across
the entire amd64 and arm64 test fleets.
This change fixes an issue with all system calls ending with *at(), when
the caller passes `dirfd != AT_FDCWD` and an absolute path. It's because
the old code was turning paths like C:\bin\ls into \\C:\bin\ls\C:\bin\ls
after being converted from paths like /C/bin/ls. I noticed this when the
Emacs dired mode stopped working. It's unclear if it's a regression with
Cosmopolitan Libc or if this was introduced by the Emacs v29 upgrade. It
also impacted posix_spawn() for which a newly minted example now exists.
- wcsstr() is now linearly complex
- strstr16() is now linearly complex
- strstr() is now vectorized on aarch64 (10x)
- strstr() now uses KMP on pathological cases
- memmem() is now vectorized on aarch64 (10x)
- memmem() now uses KMP on pathological cases
- Disable shared_ptr::owner_before until fixed
- Make iswlower(), iswupper() consistent with glibc
- Remove figure space from iswspace() implementation
- Include line and paragraph separator in iswcntrl()
- Use Musl wcwidth(), iswalpha(), iswpunct(), towlower(), towupper()
This change switches c++ exception handling from sjlj to standard dwarf.
It's needed because clang for aarch64 doesn't support sjlj. It turns out
that libunwind had a bare-metal configuration that made this easy to do.
This change gets the new experimental cosmocc -mclang flag in a state of
working so well that it can now be used to build all of llamafile and it
goes 3x faster in terms of build latency, without trading away any perf.
The int_fast16_t and int_fast32_t types are now always defined as 32-bit
in the interest of having more abi consistency between cosmocc -mgcc and
-mclang mode.
C++ code compiles very slowly with cosmocc, possibly because we're using
LLVM LIBCXX with GCC, and LLVM doesn't work as hard to make GCC go fast.
Therefore, it should be possible, to ask cosmocc to favor Clang over GCC
under the hood. On llamafile, my intention's to use this to make certain
files, e.g. llama.cpp/common.cpp, go from taking 17 seconds to 5 seconds
This new -mclang flag isn't ready for production yet since there's still
the question of how to get Clang to generate SJLJ exception code. If you
use this, then it's recommended you also pass -fno-exceptions.
The tradeoff is we're adding a 121mb binary to the cosmocc distribution.
There are no plans as of yet to fully migrate to Clang since GCC is very
good and has always treated us well.