This change makes a second pass, at fixing the errno issue with libcxx's
filesystem code. Previously, 89.01% of LLVM's test suite was passing and
now 98.59% of their tests pass. Best of all, it's now possible for Clang
to be built as a working APE binary that can to compile the Cosmopolitan
repository. Please note it has only been vetted so far for some objects,
and more work would obviously need to be done in cosmo, to fix warnings.
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.
threads count task
1 1062 fork+exit+wait
2 668 fork+exit+wait
4 66 fork+exit+wait
8 19 fork+exit+wait
16 22 fork+exit+wait
32 16 fork+exit+wait
Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions
threads count task
1 1085 fork+exit+wait
2 842 fork+exit+wait
4 532 fork+exit+wait
8 400 fork+exit+wait
16 276 fork+exit+wait
32 66 fork+exit+wait
With OpenBSD which also lacks vfork(), things were just as bad as NetBSD
threads count task
1 584 fork+exit+wait
2 687 fork+exit+wait
4 206 fork+exit+wait
8 24 fork+exit+wait
16 33 fork+exit+wait
32 26 fork+exit+wait
But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC
threads count task
1 525 fork+exit+wait
2 580 fork+exit+wait
4 451 fork+exit+wait
8 479 fork+exit+wait
16 408 fork+exit+wait
32 373 fork+exit+wait
This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating
Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.
threads count task
1 2559 vfork+exit+wait
2 5389 vfork+exit+wait
4 34933 vfork+exit+wait
8 43273 vfork+exit+wait
16 49648 vfork+exit+wait
32 40247 vfork+exit+wait
So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
The cosmocc.zip toolchain will now include four builds of the libcosmo.a
runtime libraries. You can pass the -mdbg flag if you want to debug your
cosmopolitan runtime. You can pass the -moptlinux flag if you don't want
windows code lurking in your binary. See tool/cosmocc/README.md for more
details on how these flags may be used and their important implications.
- NetBSD should now have faster synchronization
- POSIX barriers may now be shared across processes
- An edge case with memory map tracking has been fixed
- Grand Central Dispatch is no longer used on MacOS ARM64
- POSIX mutexes in normal mode now use futexes across processes
Cosmopolitan now supports mremap(), which is only supported on Linux and
NetBSD. First, it allows memory mappings to be relocated without copying
them; this can dramatically speed up data structures like std::vector if
the array size grows larger than 256kb. The mremap() system call is also
10x faster than munmap() when shrinking large memory mappings.
There's now two functions, getpagesize() and getgransize() which help to
write portable code that uses mmap(MAP_FIXED). Alternative sysconf() may
be called with our new _SC_GRANSIZE. The madvise() system call now has a
better wrapper with improved documentation.
It's now possible to create thousands of thousands of sparse independent
memory mappings, without any slowdown. The memory manager is better with
tracking memory protection now, particularly on Windows in a precise way
that can be restored during fork(). You now have the highest quality mem
manager possible. It's even better than some OSes like XNU, where mmap()
is implemented as an O(n) operation which means sadly things aren't much
improved over there. With this change the llamafile HTTP server endpoint
at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec
This change reduces o/tiny/examples/life from 44kb to 24kb in size since
it avoids linking mmap() when unnecessary. This is important, to helping
cosmo not completely lose touch with its roots.
The Cosmopolitan Compiler Collection now includes the following programs
- `ar.ape` is a faster alternative to `ar rcsD` for creating determistic
static archives. It's ~10x faster than GNU because it isn't quadratic.
It'll even outperform LLVM ar by 2x, thanks to writev/copy_file_range.
- `sha256sum.ape` is a faster alternative to the `sha256sum` command. It
goes 2x faster since it leverages vectorized assembly implementations.
- `resymbol` is a brand new program we invented, like objcopy, that lets
you rename all the global symbols in a .o file to have a new suffix or
prefix. In the future, this will be used by cosmocc automatically when
building -O3 math kernels, that need to be vectorized for all hardware
- `gzip.ape` is a faster version of the `gzip` command, that is included
by most Linux distros. It gains better performance using Chromium Zlib
which, once again, includes highly optimized assembly, that Mark Adler
won't merge into the official MS-DOS compatible zlib codebase.
- `cocmd` is the cosmopolitan shell. It can function as a faster `sh -c`
alternative than bash and dash as the `SHELL = /opt/cosmocc/bin/cocmd`
at the top of your Makefile. Please note you should be using the cosmo
fork of GNU make (already included), since normal make won't recognize
this as a bourne-compatible shell and remove the execve() optimization
which makes things slower. In some ways that's true. This doesn't have
a complete POSIX shell implementation. However it's enough for cosmo's
mono repo. It also implements faster behaviors in some respects.
The following programs are also introduced, which aren't as interesting.
The main reason why they're here is so Cosmopolitan's mono repo shall be
able to remove build/bootstrap/ in future editions. That way we can keep
build utilities better up to date, without bloating the git history much
- `chmod.ape` for hermeticity
- `cp.ape` for hermeticity
- `echo.ape` for hermeticity
- `objbincopy` is an objcopy-like tool that's used to build ape loader
- `package.ape` is used for strict dependency checking of object graph
- `rm.ape` for hermeticity
- `touch.ape` for hermeticity
We now have a C++ red-black tree implementation that implements standard
template library compatible APIs while compiling 10x faster than libcxx.
It's not as beautiful as the red-black tree implementation in Plinko but
this will get the job done and the test proves it upholds all invariants
This change also restores CheckForMemoryLeaks() support and fixes a real
actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.
It hasn't been helpful enough to be justify the maintenance burden. What
actually does help is mprotect(), kprintf(), --ftrace and --strace which
can always be counted upon to work correctly. We aren't losing much with
this change. Support for ASAN on AARCH64 was never implemented. Applying
ASAN to the core libc runtimes was disabled many months ago. If there is
some way to have an ASAN runtime for user programs that is less invasive
we can potentially consider reintroducing support. But now is premature.
Actually Portable Executable now supports Android. Cosmo's old mmap code
required a 47 bit address space. The new implementation is very agnostic
and supports both smaller address spaces (e.g. embedded) and even modern
56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper
Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb
granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native
page size. This fixes a longstanding POSIX conformance issue, concerning
file mappings that overlap the end of file. Other aspects of conformance
have been improved too, such as the subtleties of address assignment and
and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE
On Windows, mappings larger than 100 megabytes won't be broken down into
thousands of independent 64kb mappings. Support for MAP_STACK is removed
by this change; please use NewCosmoStack() instead.
Stack overflow avoidance is now being implemented using the POSIX thread
APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old
error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.
This essentially re-does the work of #875 on top of master.
This is what I did to check that Cosmo's Lua extensions still worked:
```
$ build/bootstrap/make MODE=aarch64 o/aarch64/third_party/lua/lua
$ ape o/aarch64/third_party/lua/lua
>: 10
10
>: 010
8
>: 0b10
2
>: string.byte("\e")
27
>: "Hello, %s" % {"world"}
Hello, world
>: "*" * 3
***
```
`luaL_traceback2` was used to show the stack trace with parameter
values; it's used in `LuaCallWithTrace`, which is used in Redbean to run
Lua code. You should be able to see the extended stack trace by running
something like this: `redbean -e "function a(b)c()end a(2)"` (with
"params" indicating the extended stack trace):
```
stack traceback:
[string "function a(b)c()end a(2)"]:1: in function 'a', params: b = 2;
[string "function a(b)c()end a(2)"]:1: in main chunk
```
@pkulchenko confirmed that I get the expected result with the updated
code.
This is what I did to check that Lua itself still worked:
```
$ cd third_party/lua/test/
$ ape ../../../o/aarch64/third_party/lua/lua all.lua
```
There's one test failure, in `files.lua`:
```
***** FILE 'files.lua'*****
testing i/o
../../../o/aarch64/third_party/lua/lua: files.lua:84: assertion failed!
stack traceback:
[C]: in function 'assert'
files.lua:84: in main chunk
(...tail calls...)
all.lua:195: in main chunk
[C]: in ?
.>>> closing state <<<
```
That isn't a result of these changes; the same test is failing in
master.
The failure is here:
```lua
if not _port then -- invalid seek
local status, msg, code = io.stdin:seek("set", 1000)
assert(not status and type(msg) == "string" and type(code) == "number")
end
```
The test expects a seek to offset 1,000 on stdin to fail — but it
doesn't. `status` ends up being the new offset rather than `nil`.
If I comment out that one test, the remaining tests succeed.
If pthread_create() is linked into the binary, then the cosmo runtime
will create an independent dlmalloc arena for each core. Whenever the
malloc() function is used it will index `g_heaps[sched_getcpu() / 2]`
to find the arena with the greatest hyperthread / numa locality. This
may be configured via an environment variable. For example if you say
`export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways.
Your process may be configured to have anywhere between 1 - 128 heaps
We need this revision because it makes multithreaded C++ applications
faster. For example, an HTTP server I'm working on that makes extreme
use of the STL went from 16k to 2000k requests per second, after this
change was made. To understand why, try out the malloc_test benchmark
which calls malloc() + realloc() in a loop across many threads, which
sees a a 250x improvement in process clock time and 200x on wall time
The tradeoff is this adds ~25ns of latency to individual malloc calls
compared to MODE=tiny, once the cosmo runtime has transitioned into a
fully multi-threaded state. If you don't need malloc() to be scalable
then cosmo provides many options for you. For starters the heap count
variable above can be set to put the process back in single heap mode
plus you can go even faster still, if you include tinymalloc.inc like
many of the programs in tool/build/.. are already doing since that'll
shave tens of kb off your binary footprint too. Theres also MODE=tiny
which is configured to use just 1 plain old dlmalloc arena by default
Another tradeoff is we need more memory now (except in MODE=tiny), to
track the provenance of memory allocation. This is so allocations can
be freely shared across threads, and because OSes can reschedule code
to different CPUs at any time.
The V8 behavior of encoding infinity as null doesn't make sense to me.
Using ±1e5000 is better, because JSON.parse decodes it as INFINITY and
the information is preserved. This could be a breaking change for some
Microsoft caused some very gentle breakages for Cosmopolitan. They
removed the version information from the PEB which caused uname to
report WINDOWS 0.0.0. We should have called GetVersionExW but that
doesn't really exist anymore either. Windows policy is now to give
whatever version we used in ape/ape.S. Windows8 has been EOL since
2023-01-10 so lets avoid our modern executables being relegated to
legacy infrastructure. Requiring Windows 10+ going forward lets us
remove runtime compatibility bloat from the codebase. Further note
Cosmopolitan maintains a Windows Vista branch on GitHub, so anyone
preferring the older versions, can still have a future with Cosmo.
Another neat thing this fixes is UTF-8 support in the console. The
changes Microsoft made broke the if statement that enabled UTF8 in
terminals. This explains why bug reports had broken arrows. In the
future this should be less of an issue, since the PEB code is gone
which means we more strictly conform to only Microsoft's WIN32 API
Cosmo's _Cz_crc32() function now goes 73 GiB/s on Threadripper. This
will significantly improve the performance of the PKZIP file format.
This algorithm is also used by apelink, to create deterministic ids.
The normal getopt() function is bloated because it links printf(). This
change exports the original authentic bsd getopt function, that cosmo's
always used internally so cosmocc users don't need to include internals
This change adds a TLS freelist for small dynamic memory allocations.
Cosmopolitan's TIB is now 512 bytes in size. Single-threaded malloc()
performance isn't impacted by this, until pthread_create() is called.
Single-threaded programs may also want to consider using:
#include "libc/mem/tinymalloc.inc"
Which will shave 30k off the executable size and sometimes go faster.
When we removed the com suffix from ape binaries, we broke the build for
ape's python for any case-insensitive file system, i.e. Windows and XNU,
because there is a third_party/python/Python that gets mirrored in the o
directory with the python object files and clashes with the binary name.
This patch hacks around this by renaming the binary to "python3" so that
it no longer clashes with that directory.
At least on macOS, `strlen(getenv("TMPDIR"))` is 50. We now allow a /tmp
that takes up to 120 or so bytes to spell. Instead of overflowing, we do
a bounds check and the function fails successfully on even longer /tmps.
Fixes#1108 (os.tmpname crashes redbean)
I took one canonical IANA zone ID from each of the different colored
regions in this article, except those that do not observe DST and do
not have a Google office. See the "Time in Europe" Wikipedia article.
As to which canonical ID to use, this was somewhat arbitrary. Brussels
was obvious, as the de facto capital of the EU. For the rest, I mostly
just went with lexicographic ordering of the most recognizable options.
I've sorted the American zones. This Keeps the U.S. ones together but
does everything alphabetically otherwise. I've added the remaining
Canadian zones These have DST (and Newfoundland is off by a half-
hour from a UTC interval) so they cannot use Etc/. The Pacific/ zones
are sort of sorted. The Chathan Islands have been added. This is the
last of the zones I believe with a non-integer hour offset from UTC.
Cosmopolitan now supports 104 time zones. They're embedded inside any
binary that links the localtime() function. Doing so adds about 100kb
to the binary size. This change also gets time zones working properly
on Windows for the first time. It's not needed to have /etc/localtime
exist on Windows, since we can get this information from WIN32. We're
also now updated to the latest version of Paul Eggert's TZ library.
Signals are extremely difficult to unit test reliably. This is why
functions like sigsuspend() exist. When testing something else and
portably it becomes impossible without access to kernel internals.
OpenMP flakes in QEMU on one of my workstations. I don't think the
support is production worthy, because there's been issues on MacOS
additionally. It works great for every experiment I've used it for
though. However a flaky test is worse than no test at all. So it's
removed until someone takes an interest in productionizing it.
We have an optimized version of zlib from the Chromium project.
We need it for a lot of our libc services. It would be nice to export
this to user applications if we can, since projects like llamafile are
already depending on it under the private namespace, to avoid
needing to link zlib twice.
__res_send returns the full answer length even if it didn't fit the
buffer, but __dns_parse expects the length of the filled part of the
buffer.
Analogous to Musl commit 77327ed064bd57b0e1865cd0e0364057ff4a53b4 which
fixed the only other __dns_parse call site.
The name resolution would abort when getting more than 63 records per
request, due to what seems to be a left-over from the original code.
This check was non-breaking but spurious prior to TCP fallback
support, since any 512-byte packet with more than 63 records was
necessarily malformed. But now, it wrongly rejects valid results.
Reported by Daniel Stefanik in Alpine Linux aports issue 15320.
f314e133929b6379eccc632bef32eaebb66a7335
Author: Rich Felker <dalias@aerifal.cx>
Date: Thu Nov 16 12:55:21 2023 -0500
mntent: fields are delimited only by tabs or spaces, not general whitespace
this matters because the kernel-provided mtab only escapes tabs,
spaces, newlines, and backslashes. it leaves carriage returns, form
feeds, and vertical tabs literal.
commit ee1d39bc1573c1ae49ee6b658938b56bbef95a6c
Author: q66 <q66@chimera-linux.org>
Date: Thu Nov 9 20:48:44 2023 +0100
mntent: unescape octal sequences
As entries in mtab are delimited by spaces, whitespace characters
are escaped as octal sequences. When reading them out, we have to
unescape these sequences to get the proper string.
Now that these functions are behind _COSMO_SOURCE there's no reason for
having the ugly underscore anymore. To use these functions, you need to
pass -mcosmo to cosmocc.
The WIN32 CreateProcess() function does not require an .exe or .com
suffix in order to spawn an executable. Now that we have Cosmo bash
we're no longer so dependent on the cmd.exe prompt.
This change upgrades to GCC 12.3 and GNU binutils 2.42. The GNU linker
appears to have changed things so that only a single de-duplicated str
table is present in the binary, and it gets placed wherever the linker
wants, regardless of what the linker script says. To cope with that we
need to stop using .ident to embed licenses. As such, this change does
significant work to revamp how third party licenses are defined in the
codebase, using `.section .notice,"aR",@progbits`.
This new GCC 12.3 toolchain has support for GNU indirect functions. It
lets us support __target_clones__ for the first time. This is used for
optimizing the performance of libc string functions such as strlen and
friends so far on x86, by ensuring AVX systems favor a second codepath
that uses VEX encoding. It shaves some latency off certain operations.
It's a useful feature to have for scientific computing for the reasons
explained by the test/libcxx/openmp_test.cc example which compiles for
fifteen different microarchitectures. Thanks to the upgrades, it's now
also possible to use newer instruction sets, such as AVX512FP16, VNNI.
Cosmo now uses the %gs register on x86 by default for TLS. Doing it is
helpful for any program that links `cosmo_dlopen()`. Such programs had
to recompile their binaries at startup to change the TLS instructions.
That's not great, since it means every page in the executable needs to
be faulted. The work of rewriting TLS-related x86 opcodes, is moved to
fixupobj.com instead. This is great news for MacOS x86 users, since we
previously needed to morph the binary every time for that platform but
now that's no longer necessary. The only platforms where we need fixup
of TLS x86 opcodes at runtime are now Windows, OpenBSD, and NetBSD. On
Windows we morph TLS to point deeper into the TIB, based on a TlsAlloc
assignment, and on OpenBSD/NetBSD we morph %gs back into %fs since the
kernels do not allow us to specify a value for the %gs register.
OpenBSD users are now required to use APE Loader to run Cosmo binaries
and assimilation is no longer possible. OpenBSD kernel needs to change
to allow programs to specify a value for the %gs register, or it needs
to stop marking executable pages loaded by the kernel as mimmutable().
This release fixes __constructor__, .ctor, .init_array, and lastly the
.preinit_array so they behave the exact same way as glibc.
We no longer use hex constants to define math.h symbols like M_PI.
- Introduce portable sched_getcpu() api
- Support GCC's __target_clones__ feature
- Make fma() go faster on x86 in default mode
- Remove some asan checks from core libraries
- WinMain() now ensures $HOME and $USER are defined
- Let OpenMP be usable via cosmocc
- Let libunwind be usable via cosmocc
- Make X86_HAVE(AVXVNNI) work correctly
- Avoid using MAP_GROWSDOWN on qemu-aarch64
- Introduce in6addr_any and in6addr_loopback
- Have thread stacks use MAP_GROWSDOWN by default
- Ask OpenMP to not use filesystem to manage threads
- Make NI_MAXHOST and NI_MAXSERV available w/o _GNU_SOURCE
We recently broke MODE=dbg support when we added C++ exception support.
This change adds the missing UBSAN interfaces, needed to get it working
again. Some of the ASAN checking in the SJLJ guts needed to be disabled
since I doubt anyone's combined the two features until now.
If you install qemu-user from apt then glibc links a lot of address
space bloat that causes pthread_create() to ENOMEM (a.k.a. EAGAIN).
Boosting the virtual memory quota from 512m to 2048m will hopefully
future proof the build for the future, as Linux distros get fatter.
Please note this only applies to MODE=aarch64 on x86_64 builds when
you're using QEMU from Debian/Ubuntu rather than installing the one
cosmo provides in third_party/qemu/qemu-aarch64.gz. This change may
also be useful to people who are using the host compiler toolchain.
Added the implementation for `std::bad_any_cast` from upstream
`any.cpp`, and `std::bad_variant_access` from upstream `variant.cpp`.
This fixes missing `vtable` and `typeinfo` symbols when trying to link
code referencing these exception types.
We now store values in jmp_buf where the compiler wants them to be. This
fixes code that calls __builtin_setjmp() and __builtin_longjmp() such as
libunwind. All libcxxabi tests are now passing on ARM64.
See #1076
This test was was failing on GitHub Actions because GA uses Linux and
Linux supports resource usage accounting. Cosmo's compile.com program
imposes CPU, memory and file size limits on both the compiler and the
test programs themselves.
See #1076
Added the `libcxxabi` test suite as found in LLVM 17.0.6.
Some tests that do not apply to the current configuration of
comsopolitan are not added. These include:
- `backtrace_test`, `forced_unwind*`: Use unwind function unsupported in
SjLj mode.
- `noexception*`: Designed to test `libcxxabi` in no exceptions mode.
Some tests are added but not enabled due to bugs specific to GCC or
cosmopolitan. These are clearly indicated in the `BUILD.mk` file.
Renaming gc() to _gc() was a mistake since the better thing to do is put
it behind the _COSMO_SOURCE macro. We need this change because I haven't
wanted to use my amazing garbage collector ever since we renamed it. You
now need to define _COSMO_SOURCE yourself when using amalgamation header
and cosmocc users need to pass the -mcosmo flag to get the gc() function
Some other issues relating to cancelation have been fixed along the way.
We're also now putting cosmocc in a folder named `.cosmocc` so it can be
more safely excluded by grep --exclude-dir=.cosmocc --exclude-dir=o etc.
* third_party: Add libcxxabi
Added libcxxabi from LLVM 17.0.6
The library implements the Itanium C++ exception handling ABI.
* third_party/libcxxabi: Enable __cxa_thread_atexit
Enable `__cxa_thread_atexit` from libcxxabi.
`__cxa_thread_atexit_impl` is still implemented by the cosmo libc.
The original `__cxa_thread_atexit` has been removed.
* third_party/libcxx: Build with exceptions
Build libcxx with exceptions enabled.
- Removed `_LIBCPP_NO_EXCEPTIONS` from `__config`.
- Switched the exception implementation to `libcxxabi`. These two files
are taken from the same `libcxx` version as mentioned in `README.cosmo`.
- Removed `new_handler_fallback` in favor of `libcxxabi` implementation.
- Enable `-fexceptions` and `-frtti` for `libcxx`.
- Removed `THIRD_PARTY_LIBCXX` dependency from `libcxxabi` and
`libunwind`. These libraries do not use any runtime `libcxx` functions,
just headers.
* libc: Remove remaining redundant cxa functions
- `__cxa_pure_virtual` in `libcxxabi` is also a stub similar to the
existing one.
- `__cxa_guard_*` from `libcxxabi` is used instead of the ones from
Android.
Now there should be no more duplicate implementations.
`__cxa_thread_atexit_impl`, `__cxa_atexit`, and related supporting
functions, are still left to other libraries as in `libcxxabi`.
`libcxxabi` is also now added to `cosmopolitan.a` to make up for the
removed functions.
Affected in-tree libraries (`third_party/double-conversion`) have been
updated.
The toolchain will now be downloaded going forward from multiple pinned
URLs which have shasums. Either wget or curl must be installed.
This change unblocks #1053