- Every unit test now passes on Apple Silicon. The final piece of this
puzzle was porting our POSIX threads cancelation support, since that
works differently on ARM64 XNU vs. AMD64. Our semaphore support on
Apple Silicon is also superior now compared to AMD64, thanks to the
grand central dispatch library which lets *NSYNC locks go faster.
- The Cosmopolitan runtime is now more stable, particularly on Windows.
To do this, thread local storage is mandatory at all runtime levels,
and the innermost packages of the C library is no longer being built
using ASAN. TLS is being bootstrapped with a 128-byte TIB during the
process startup phase, and then later on the runtime re-allocates it
either statically or dynamically to support code using _Thread_local.
fork() and execve() now do a better job cooperating with threads. We
can now check how much stack memory is left in the process or thread
when functions like kprintf() / execve() etc. call alloca(), so that
ENOMEM can be raised, reduce a buffer size, or just print a warning.
- POSIX signal emulation is now implemented the same way kernels do it
with pthread_kill() and raise(). Any thread can interrupt any other
thread, regardless of what it's doing. If it's blocked on read/write
then the killer thread will cancel its i/o operation so that EINTR can
be returned in the mark thread immediately. If it's doing a tight CPU
bound operation, then that's also interrupted by the signal delivery.
Signal delivery works now by suspending a thread and pushing context
data structures onto its stack, and redirecting its execution to a
trampoline function, which calls SetThreadContext(GetCurrentThread())
when it's done.
- We're now doing a better job managing locks and handles. On NetBSD we
now close semaphore file descriptors in forked children. Semaphores on
Windows can now be canceled immediately, which means mutexes/condition
variables will now go faster. Apple Silicon semaphores can be canceled
too. We're now using Apple's pthread_yield() funciton. Apple _nocancel
syscalls are now used on XNU when appropriate to ensure pthread_cancel
requests aren't lost. The MbedTLS library has been updated to support
POSIX thread cancelations. See tool/build/runitd.c for an example of
how it can be used for production multi-threaded tls servers. Handles
on Windows now leak less often across processes. All i/o operations on
Windows are now overlapped, which means file pointers can no longer be
inherited across dup() and fork() for the time being.
- We now spawn a thread on Windows to deliver SIGCHLD and wakeup wait4()
which means, for example, that posix_spawn() now goes 3x faster. POSIX
spawn is also now more correct. Like Musl, it's now able to report the
failure code of execve() via a pipe although our approach favors using
shared memory to do that on systems that have a true vfork() function.
- We now spawn a thread to deliver SIGALRM to threads when setitimer()
is used. This enables the most precise wakeups the OS makes possible.
- The Cosmopolitan runtime now uses less memory. On NetBSD for example,
it turned out the kernel would actually commit the PT_GNU_STACK size
which caused RSS to be 6mb for every process. Now it's down to ~4kb.
On Apple Silicon, we reduce the mandatory upstream thread size to the
smallest possible size to reduce the memory overhead of Cosmo threads.
The examples directory has a program called greenbean which can spawn
a web server on Linux with 10,000 worker threads and have the memory
usage of the process be ~77mb. The 1024 byte overhead of POSIX-style
thread-local storage is now optional; it won't be allocated until the
pthread_setspecific/getspecific functions are called. On Windows, the
threads that get spawned which are internal to the libc implementation
use reserve rather than commit memory, which shaves a few hundred kb.
- sigaltstack() is now supported on Windows, however it's currently not
able to be used to handle stack overflows, since crash signals are
still generated by WIN32. However the crash handler will still switch
to the alt stack, which is helpful in environments with tiny threads.
- Test binaries are now smaller. Many of the mandatory dependencies of
the test runner have been removed. This ensures many programs can do a
better job only linking the the thing they're testing. This caused the
test binaries for LIBC_FMT for example, to decrease from 200kb to 50kb
- long double is no longer used in the implementation details of libc,
except in the APIs that define it. The old code that used long double
for time (instead of struct timespec) has now been thoroughly removed.
- ShowCrashReports() is now much tinier in MODE=tiny. Instead of doing
backtraces itself, it'll just print a command you can run on the shell
using our new `cosmoaddr2line` program to view the backtrace.
- Crash report signal handling now works in a much better way. Instead
of terminating the process, it now relies on SA_RESETHAND so that the
default SIG_IGN behavior can terminate the process if necessary.
- Our pledge() functionality has now been fully ported to AARCH64 Linux.
This way complex runtime features (e.g. ftrace, symbol tables) can
always yoink zipos support. This is important now that apelink.com
automates embedding symbol tables for multiple cpus.
- Found some bugs in LLVM compiler-rt library
- The useless LIBC_STUBS package is now deleted
- Improve the overflow checking story even further
- Get chibicc tests working in MODE=dbg mode again
- The libc/isystem/ headers now have correctly named guards
This change implements a new approach to function call logging, that's
based on the GCC flag: -fpatchable-function-entry. Read the commentary
in build/config.mk to learn how it works.
- Utilities like pledge.com now build
- kprintf() will no longer balk at 48-bit addresses
- There's a new aarch64-dbg build mode that should work
- gc() and defer() are mostly pacified; avoid using them on aarch64
- THIRD_PART_STB now has Arm Neon intrinsics for fast image handling
This makes it possible for us to use system() and popen() with paths
that redirect to filenames that contain spaces, e.g.
system("echo.com hello >\"hello there.txt\"")
It's difficult to solve this problem, because WIN32 only allows passing
one single argument when launching programs and each program is allowed
to tokenize that however it wants. Most software follows the convention
of cmd.exe which is poorly documented and positively byzantine.
In the future we're going to solve this by not using cmd.exe at all and
instead embedding the cocmd.com interpreter into the system() function.
In the meantime, our documentation has been updated to help recalibrate
any expectation the user might hold regarding the security of using the
Windows command interpreter.
Fixes#644
This change restores the .symtab symbol table files in our flagship
programs (e.g. redbean.com, python.com) needed to show backtraces. This
also rolls back earlier changes to zip.com w.r.t. temp directories since
the right way to do it turned out to be the -b DIR flag.
This change also improves the performance of zip.com. It turned out
mmap() wasn't being used, because zip.com was assuming a 4096-byte
granularity, but cosmo requires 65536. There was also a chance to speed
up stdio scanning using the unlocked functions.
This change also removes the futimens() call on the Landlock Make output
file workaround, since it caused problems with commands like fixupobj
which modify-in-place. It turns out if a file is opened for writing and
then no writes actually occur, then the modified time doesn't change.
- 10.5% reduction of o//depend dependency graph
- 8.8% reduction in latency of make command
- Fix issue with temporary file cleanup
There's a new -w option in compile.com that turns off the recent
Landlock output path workaround for "good commands" which do not
unlink() the output file like GNU tooling does.
Our new GNU Make unveil sandboxing appears to have zero overhead
in the grand scheme of things. Full builds are pretty fast since
the only thing that's actually slowed us down is probably libcxx
make -j16 MODE=rel
RL: took 85,732,063µs wall time
RL: ballooned to 323,612kb in size
RL: needed 828,560,521µs cpu (11% kernel)
RL: caused 39,080,670 page faults (99% memcpy)
RL: 350,073 context switches (72% consensual)
RL: performed 0 reads and 11,494,960 write i/o operations
pledge() and unveil() no longer consider ENOSYS to be an error.
These functions have also been added to Python's cosmo module.
This change also removes some WIN32 APIs and System Five magnums
which we're not using and it's doubtful anyone else would be too
This change fixes Landlock Make so that only the output target file is
unveiled, rather than unveiling the directory that contains it. This
gives us a much stronger sandbox. It also helped identify problematic
build code in our repo that should have been using o/tmp instead.
Landlock isn't able to let us unveil files that don't exist. Even if
they do, then once a file is deleted, the sandboxing for it goes away.
This caused problems for Landlock Make because tools like GNU LD will
repeatedly delete and recreate the output file. This change uses the
compile.com wrapper to ensure on changes happen to the output inode.
New binary available on https://justine.lol/make/Fixes#528
We're now able to drop both `exec` and `prot_exec` privileges
automatically when launching glibc dynamic executables. We also have
really outstanding standard error logging now, that explains which
promises are needed, even in cases where `exec` is used.
We now rewrite the binary image at runtime on Windows and XNU to change
mov %fs:0,%reg instructions to use %gs instead. There's also simpler
threading API introduced by this change and it's called _spawn() and
_join(), which has replaced most clone() usage.
This change turns symbol table compression back on using Puff, which
noticeably reduces the size of programs like redbean and Python. The
redbean web server receives some minor API additions for controlling
things like SSL in addition to filling gaps in the documentation.
- Fix some minor issues in ar.com
- Have execve() look for `ape` command
- Rewrite NT paths using /c/ rather /??/c:/
- Replace broken GCC symlinks with .sym files
- Rewrite $PATH environment variables on startup
- Make $(APE_NO_MODIFY_SELF) the default bootloader
- Add all build command dependencies to build/bootstrap
- Get the repository mostly building from source on non-Linux
- Add hierarchical auto-completion to redbean's repl
- Fetch latest localtime() and strftime() from Eggert
- Shave a few milliseconds off redbean start latency
- Fix redbean repl with multi-line statements
- Make the Lua unix module code more elegant
- Harden Lua data structure serialization
- Improve serialization
- Add Benchmark() API to redbean
- Refactor UNIX API to be assert() friendly
- Make the redbean Lua REPL print data structures
- Fix recent regressions in linenoise reverse search
- Add -i flag so redbean can be a language interpreter
This change fixes minor bugs and adds a feature, which lets us store the
ELF symbol table, inside the ZIP directory. We use the path /zip/.symtab
which can be safely removed using a zip editing tool, to make the binary
smaller after compilation. This supplements the existing method of using
a separate .com.dbg file, which is still supported. The intent is people
don't always know that it's a good idea to download the debug file. It's
not great having someone's first experience be a crash report, that only
has numbers rather than symbols. This will help fix that!
This program usually runs once at the begininng of each GNU Make
invocation. It generates an o//depend file with 170,000 lines of
Makefile code to define source -> headers relationships.
This change makes that take 650 milliseconds rather than 1,100ms
by improving the performance of strstr(), using longsort(), plus
migrating to the new append library.
The APE_NO_MODIFY_SELF loader payload has been moved out of the examples
folder and improved so that it works on BSD systems, and permits general
elf program headers. This brings its quality up enough that it should be
acceptable to use by default for many programs, e.g. Python, Lua, SQLite
and Python. It's the responsibility of the user to define an appropriate
TMPDIR if /tmp is considered an adversarial environment. Mac OS shall be
supported by APE_NO_MODIFY_SELF soon.
Fixes and improvements have been made to program_executable_name as it's
now the one true way to get the absolute path of the executing image.
This change fixes a memory leak in linenoise history loading, introduced
by performance optimizations in 51904e2687
This change fixes a longstanding regression with Mach system calls, that
23ae9dfceb back in February which impacted
our sched_yield() implementation, which is why no one noticed until now.
The Blinkenlights PC emulator has been improved. We now fix rendering on
XNU and BSD by not making the assumption that the kernel terminal driver
understands UTF8 since that seems to break its internal modeling of \r\n
which is now being addressed by using \e[𝑦H instead. The paneling is now
more compact in real mode so you won't need to make your font as tiny if
you're only emulating an 8086 program. The CLMUL ISA is now emulated too
This change also makes improvement to time. CLOCK_MONOTONIC now does the
right thing on Windows NT. The nanosecond time module functions added in
Python 3.7 have been backported.
This change doubles the performance of Argon2 password stretching simply
by not using its copy_block and xor_block helper functions, as they were
trivial to inline thus resulting in us needing to iterate over each 1024
byte block four fewer times.
This change makes code size improvements. _PyUnicode_ToNumeric() was 64k
in size and now it's 10k. The CJK codec lookup tables now use lazy delta
zigzag deflate (δzd) encoding which reduces their size from 600k to 200k
plus the code bloat caused by macro abuse in _decimal.c is now addressed
so our fully-loaded statically-linked hermetically-sealed Python virtual
interpreter container is now 9.4 megs in the default build mode and 5.5m
in MODE=tiny which leaves plenty of room for chibicc.
The pydoc web server now accommodates the use case of people who work by
SSH'ing into a different machine w/ python.com -m pydoc -p8080 -h0.0.0.0
Finally Python Capsulae delenda est and won't be supported in the future
This change gets the Python codebase into a state where it conforms to
the conventions of this codebase. It's now possible to include headers
from Python, without worrying about ordering. Python has traditionally
solved that problem by "diamonding" everything in Python.h, but that's
problematic since it means any change to any Python header invalidates
all the build artifacts. Lastly it makes tooling not work. Since it is
hard to explain to Emacs when I press C-c C-h to add an import line it
shouldn't add the header that actually defines the symbol, and instead
do follow the nonstandard Python convention.
Progress has been made on letting Python load source code from the zip
executable structure via the standard C library APIs. System calss now
recognizes zip!FILENAME alternative URIs as equivalent to zip:FILENAME
since Python uses colon as its delimiter.
Some progress has been made on embedding the notice license terms into
the Python object code. This is easier said than done since Python has
an extremely complicated ownership story.
- Some termios APIs have been added
- Implement rewinddir() dirstream API
- GetCpuCount() API added to Cosmopolitan Libc
- More bugs in Cosmopolitan Libc have been fixed
- zipobj.com now has flags for mangling the path
- Fixed bug a priori with sendfile() on certain BSDs
- Polyfill F_DUPFD and F_DUPFD_CLOEXEC across platforms
- FIOCLEX / FIONCLEX now polyfilled for fast O_CLOEXEC changes
- APE now supports a hybrid solution to no-self-modify for builds
- Many BSD-only magnums added, e.g. O_SEARCH, O_SHLOCK, SF_NODISKIO
- Reduce full build latency from ~20s to ~18s
- Bring back silent mode if `make V=0` is passed
- Demodernize utimes() polyfill so it works RHEL5
- Delete some old shell scripts that are no longer needed
- Truncate long lines when outputting builds to Emacs buffers
For the first time ever, all tests in this codebase now pass, when
run automatically on macos, freebsd, openbsd, rhel5, rhel7, alpine
and windows via the network using the runit and runitd build tools
- Fix vfork exec path etc.
- Add XNU opendir() support
- Add OpenBSD opendir() support
- Add Linux history to syscalls.sh
- Use copy_file_range on FreeBSD 13+
- Fix system calls with 7+ arguments
- Fix Windows with greater than 16 FDs
- Fix RUNIT.COM and RUNITD.COM flakiness
- Fix OpenBSD munmap() when files are mapped
- Fix long double so it's actually long on Windows
- Fix OpenBSD truncate() and ftruncate() thunk typo
- Let Windows fcntl() be used on socket files descriptors
- Fix Windows fstat() which had an accidental printf statement
- Fix RHEL5 CLOCK_MONOTONIC by not aliasing to CLOCK_MONOTONIC_RAW
This is wonderful. I never could have dreamed it would be possible
to get it working so well on so many platforms with tiny binaries.
Fixes#31Fixes#25Fixes#14
It turned out that the linker was doing the wrong with the amalgamation
library concerning weak stubs. A regression test has been added and new
binaries have been uploaded to https://justine.lol/cosmopolitan/
Ideally this should be fixed by building a tool that turns multiple .a
files into a single .a file with deduplication. As a workaround for now
the cosmopolitan.a build is restructured to not include LIBC_STUBS which
meant technical debt needed to be paid off where non-stub interfaces
were moved to LIBC_INTRIN and LIBC_NEXGEN32E.
Thank @PerfectProductions in #31 for the report!
This change pays off technical debt with the function -> DLL mappings in
libc/nt/master.sh, which was originally defined based on binary analysis
on Windows 10. It's now been updated so the kernel32/kernelbase/advapi32
imports should be exactly as they are written, on the MSDN documentation
and that wouldn't have been easy without Geoff Chappell's work thank him
https://www.geoffchappell.com/studies/windows/win32/index.htm
This program popped up on Hacker News recently. It's the only modern
compiler I've ever seen that doesn't have dependencies and is easily
modified. So I added all of the missing GNU extensions I like to use
which means it might be possible soon to build on non-Linux and have
third party not vendor gcc binaries.
This is done without using Microsoft's internal APIs. MAP_PRIVATE
mappings are copied to the subprocess via a pipe, since Microsoft
doesn't want us to have proper COW pages. MAP_SHARED mappings are
remapped without needing to do any copying. Global variables need
copying along with the stack and the whole heap of anonymous mem.
This actually improves the reliability of the redbean http server
although one shouldn't expect 10k+ connections on a home computer
that isn't running software built to serve like Linux or FreeBSD.
blinkenlights now does a pretty good job emulating what happens when
binaries boot from BIOS into long mode. So it's been much easier to
debug the bare metal process and wrinkle out many issues.