cosmopolitan/ape/specification.md
Justine Tunney 3de6632be6
Graduate some clock_gettime() constants to #define
- CLOCK_THREAD_CPUTIME_ID
- CLOCK_PROCESS_CPUTIME_ID

Cosmo now supports the above constants universally across supported OSes
therefore it's now safe to let programs detect their presence w/ #ifdefs
2024-07-22 07:14:35 -07:00

703 lines
31 KiB
Markdown

# Actually Portable Executable Specification v0.1
Actually Portable Executable (APE) is an executable file format that
polyglots the Windows Portable Executable (PE) format with a UNIX Sixth
Edition style shell script that doesn't have a shebang. This makes it
possible to produce a single file binary that executes on the stock
installations of the many OSes and architectures.
## Supported OSes and Architectures
- AMD64
- Linux
- MacOS
- Windows
- FreeBSD
- OpenBSD
- NetBSD
- BIOS
- ARM64
- Linux
- MacOS
- FreeBSD
- Windows (non-native)
## File Header
APE defines three separate file magics, all of which are 8 characters
long. Any file that starts with one of these magic values can be
considered an APE program.
### (1) APE MZ Magic
- ASCII: `MZqFpD='`
- Hex: 4d 5a 71 46 70 44 3d 27
This is the canonical magic used by almost all APE programs. It enables
maximum portability between OSes. When interpreted as a shell script, it
is assigning a single quoted string to an unused variable. The shell
will then ignore subsequent binary content that's placed inside the
string.
It is strongly recommended that this magic value be immediately followed
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
Zsh impose a binary safety check before handing off files that don't
have a shebang to `/bin/sh`. That check applies to the first line, which
can't contain NUL characters.
The letters were carefully chosen so as to be valid x86 instructions in
all operating modes. This makes it possible to store a BIOS bootloader
disk image inside an APE binary. For example, simple CLI programs built
with Cosmopolitan Libc will boot from BIOS into long mode if they're
treated as a floppy disk image.
The letters also allow for the possibility of being treated on x86-64 as
a flat executable, where the PE / ELF / Mach-O executable structures are
ignored, and execution simply begins at the beginning of the file,
similar to how MS-DOS .COM binaries work.
The 0x4a relative offset of the magic causes execution to jump into the
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
Libc use tricks in the MS-DOS stub to check the operating mode and then
jump to the appropriate entrypoint, e.g. `_start()`.
#### Decoded as i8086
```asm
dec %bp
pop %dx
jno 0x4a
jo 0x4a
```
#### Decoded as i386
```asm
push %ebp
pop %edx
jno 0x4a
jo 0x4a
```
#### Decoded as x86-64
```asm
rex.WRB
pop %r10
jno 0x4a
jo 0x4a
```
### (2) APE UNIX-Only Magic
- ASCII: `jartsr='`
- Hex: 6a 61 72 74 73 72 3d 27
Being a novel executable format that was first published in 2020, the
APE file format is less understood by industry tools compared to the PE,
ELF, and Mach-O executable file formats, which have been around for
decades. For this reason, APE programs that use the MZ magic above can
attract attention from Windows AV software, which may be unwanted by
developers who aren't interested in targeting the Windows platform.
Therefore the `jartsr='` magic is defined which enables the creation of
APE binaries that can safely target all non-Windows platforms. Even
though this magic is less common, APE interpreters and binfmt-misc
installations MUST support this.
It is strongly recommended that this magic value be immediately followed
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
Zsh impose a binary safety check before handing off files that don't
have a shebang to `/bin/sh`. That check applies to the first line, which
can't contain NUL characters.
The letters were carefully chosen so as to be valid x86 instructions in
all operating modes. This makes it possible to store a BIOS bootloader
disk image inside an APE binary. For example, simple CLI programs built
with Cosmopolitan Libc will boot from BIOS into long mode if they're
treated as a floppy disk image.
The letters also allow for the possibility of being treated on x86-64 as
a flat executable, where the PE / ELF / Mach-O executable structures are
ignored, and execution simply begins at the beginning of the file,
similar to how MS-DOS .COM binaries work.
The 0x78 relative offset of the magic causes execution to jump into the
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
Libc use tricks in the MS-DOS stub to check the operating mode and then
jump to the appropriate entrypoint, e.g. `_start()`.
#### Decoded as i8086 / i386 / x86-64
```asm
push $0x61
jb 0x78
jae 0x78
```
### (3) APE Debug Magic
- ASCII: `APEDBG='`
- Hex: 41 50 45 44 42 47 3d 27
While APE files must be valid shell scripts, in practice, UNIX systems
will oftentimes be configured to provide a faster safer alternative to
loading an APE binary through `/bin/sh`. The Linux Kernel can be patched
to have execve() recognize the APE format and directly load its embedded
ELF header. Linux systems can also use binfmt-misc to recognize APE's MZ
and jartsr magic, and pass them to a userspace program named `ape` that
acts as an interpreter. In such environments, the need sometimes arises
to be able to test that the `/bin/sh` is working correctly, in which
case the `APEDBG='` magic is RECOMMENDED.
APE interpreters, execve() implementations, and binfmt-misc installs
MUST ignore this magic. If necessary, steps can be taken to help files
with this magic be passed to `/bin/sh` like a normal shebang-less shell
script for execution.
## Embedded ELF Header
APE binaries MAY embed an ELF header inside them. Unlike conventional
executable file formats, this header is not stored at a fixed offset.
It's instead encoded as octal escape codes in a shell script `printf`
statement. For example:
```
printf '\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\166\105\100\000\000\000\000\000\060\013\000\000\000\000\000\000\000\000\000\000\000\000\000\000\165\312\1\1\100\0\070\0\005\000\0\0\000\000\000\000'
```
This `printf` statement MUST appear in the first 8192 bytes of the APE
executable, so as to limit how much of the initial portion of a file an
interpreter must load.
Multiple such `printf` statements MAY appear in the first 8192 bytes, in
order to specify multiple architectures. For example, fat binaries built
by the `apelink` program (provided by Cosmo Libc) will have two encoded
ELF headers, for AMD64 and ARM64, each of which point into the proper
file offsets for their respective native code. Therefore, kernels and
interpreters which load the APE format directly MUST check the
`e_machine` field of the `Elf64_Ehdr` that's decoded from the octal
codes, before accepting a `printf` shell statement as valid.
These printf statements MUST always use only unescaped ASCII characters
or octal escape codes. These printf statements MUST NOT use space saving
escape codes such as `\n`. For example, rather than saying `\n` it would
be valid to say `\012` instead. It's also valid to say `\12` but only if
the encoded characters that follow aren't an octal digit.
For example, the following algorithm may be used for parsing octal:
```c
static int ape_parse_octal(const unsigned char page[8192], int i, int *pc)
{
int c;
if ('0' <= page[i] && page[i] <= '7') {
c = page[i++] - '0';
if ('0' <= page[i] && page[i] <= '7') {
c *= 8;
c += page[i++] - '0';
if ('0' <= page[i] && page[i] <= '7') {
c *= 8;
c += page[i++] - '0';
}
}
*pc = c;
}
return i;
}
```
APE aware interpreters SHOULD only take `e_machine` into consideration.
It is the responsibility of the `_start()` function to detect the OS.
Therefore, multiple `printf` statements are only embedded in the shell
script for different CPU architectures.
The OS ABI field of an APE embedded `Elf64_Ehdr` SHOULD be set to
`ELFOSABI_FREEBSD`, since it's the only UNIX OS APE supports that
actually checks the field. However different values MAY be chosen for
binaries that don't intend to have FreeBSD in their support vector.
Counter-intuitively, the ARM64 ELF header is used on the MacOS ARM64
platform when loading from fat binaries.
## Embedded Mach-O Header (x86-64 only)
APE shell scripts that support MacOS on AMD64 must use the `dd` command
in a very specific way to specify how the embedded binary Macho-O header
is copied backward to the start of the file. For example:
```
dd if="$o" of="$o" bs=8 skip=433 count=66 conv=notrunc
```
These `dd` statements have traditionally been generated by the GNU as
and ld.bfd programs by encoding ASCII into 64-bit linker relocations,
which necessitated a fixed width for integer values. It took several
iterations over APE's history before we eventually got it right:
- `arg=" 9293"` is how we originally had ape do it
- `arg=$(( 9293))` b/c busybox sh disliked quoted space
- `arg=9293 ` is generated by modern apelink program
Software that parses the APE file format, which needs to extract the
Macho-O x86-64 header SHOULD support the old binaries that use the
previous encodings. To make backwards compatibility simple the following
regular expression may be used, which generalizes to all defined
formats:
```c
regcomp(&rx,
"bs=" // dd block size arg
"(['\"] *)?" // #1 optional quote w/ space
"(\\$\\(\\( *)?" // #2 optional math w/ space
"([[:digit:]]+)" // #3
"( *\\)\\))?" // #4 optional math w/ space
"( *['\"])?" // #5 optional quote w/ space
" +" //
"skip=" // dd skip arg
"(['\"] *)?" // #6 optional quote w/ space
"(\\$\\(\\( *)?" // #7 optional math w/ space
"([[:digit:]]+)" // #8
"( *\\)\\))?" // #9 optional math w/ space
"( *['\"])?" // #10 optional quote w/ space
" +" //
"count=" // dd count arg
"(['\"] *)?" // #11 optional quote w/ space
"(\\$\\(\\( *)?" // #12 optional math w/ space
"([[:digit:]]+)", // #13
REG_EXTENDED);
```
For further details, see the canonical implementation in
`cosmopolitan/tool/build/assimilate.c`.
## Static Linking
Actually Portable Executables are always statically linked. This
revision of the specification does not define any facility for storing
code in dynamic shared objects.
Cosmopolitan Libc provides a solution that enables APE binaries have
limited access to dlopen(). By manually loading a platform-specific
executable and asking the OS-specific libc's dlopen() to load
OS-specific libraries, it becomes possible to use GPUs and GUIs. This
has worked great for AI projects like llamafile.
There is no way for an Actually Portable Executable to interact with
OS-specific dynamic shared object extension modules to programming
languages. For example, a Lua interpreter compiled as an Actually
Portable Executable would have no way of linking extension libraries
downloaded from the Lua Rocks package manager. This is primarily because
different OSes define incompatible ABIs.
While it was possible to polyglot PE+ELF+MachO to create multi-OS
executables, it simply isn't possible to do that same thing for
DLL+DYLIB+SO. Therefore, in order to have DSOs, APE would need to either
choose one of the existing formats or invent one of its own, and then
develop its own parallel ecosystem of extension software. In the future,
the APE specification may expand to encompass this. However the focus to
date has been exclusively on executables with limited dlopen() support.
## Application Binary Interface (ABI)
APE binaries use the System V ABI, as defined by:
- [System V ABI - AMD64 Architecture Processor Supplement](https://gitlab.com/x86-psABIs/x86-64-ABI)
- AARCH64 has a uniform consensus defined by ARM Limited
There are however a few changes we've had to make.
### No Red Zone
Actually Portable Executables that have Windows and/or bare metal in
their support vector MUST be compiled using `-mno-red-zone`. This is
because, on Windows, DLLs and other software lurking in the va-space
might use tricks like SetThreadContext() to take control of a thread
whereas on bare metal, it's also generally accepted that kernel-mode
code cannot assume a red zone either due to hardware interrupts that
pull the exact same kinds of stunts.
APE software that only has truly System V ABI conformant OSes (e.g.
Linux) in their support vector MAY use the red zone optimization.
### Thread Local Storage
#### aarch64
Here's the TLS memory layout on aarch64:
```
x28
%tpidr_el0
│ _Thread_local
┌───┼───┬──────────┬──────────┐
│tib│dtv│ .tdata │ .tbss │
├───┴───┴──────────┴──────────┘
__get_tls()
```
The ARM64 code in actually portable executables use the `x28` register
to store the address of the thread information block. All aarch64 code
linked into these executables SHOULD be compiled with `-ffixed-x28`
which is supported by both Clang and GCC.
The runtime library for an actually portable executables MAY choose to
use `tpidr_el0` instead, if OSes like MacOS aren't being targeted. For
example, if the goal is to create a Linux-only fat binary linker program
for Musl Libc, then choosing to use the existing `tpidr_el0` convention
would be friction-free alternative.
It's not possible for an APE runtime that targets the full range of OSes
to use the `tpidr_el0` register for TLS because Apple won't allow it. On
MacOS ARM64 systems, this register can only be used by a runtime to
implement the `sched_getcpu()` system call. It's reserved by MacOS.
#### x86-64
Here's the TLS memory layout on x86_64:
```
__get_tls()
%fs OpenBSD/NetBSD
_Thread_local │
┌───┬──────────┬──────────┼───┐
│pad│ .tdata │ .tbss │tib│
└───┴──────────┴──────────┼───┘
Linux/FreeBSD/Windows/Mac %gs
```
Quite possibly the greatest challenge in Actually Portable Executable
working, has been overcoming the incompatibilities between OSes in how
thread-local storage works on x86-64. The AMD64 architecture defines two
special segment registers. Every OS uses one of these segment registers
to implement TLS support. However not all OSes agree on which register
to use. Some OSes grant userspace the power to define either of these
registers to hold any value that is desired. Some OSes only effectively
allow a single one of them to be changed. Lastly, some OSes, e.g.
Windows, claim ownership of the memory layout these registers point
towards too.
Here's a breakdown on how much power is granted to userspace runtimes by
each OS when it comes to changing amd64 segment registers.
| | %fs | %gs |
|---------|--------------|--------------|
| Linux | unrestricted | unrestricted |
| MacOS | inaccessible | unrestricted |
| Windows | inaccessible | restricted |
| FreeBSD | unrestricted | unrestricted |
| NetBSD | unrestricted | broken |
| OpenBSD | unrestricted | inaccessible |
Therefore, regardless of which register one we choose, some OSes are
going to be incompatible.
APE binaries are always built with a Linux compiler. So another issue
arises in the fact that our Linux-flavored GCC and Clang toolchains
(which are used to produce cross-OS binaries) are also only capable of
producing TLS instructions that use the %fs convention.
To solve these challenges, the `cosmocc` compiler will rewrite binary
objects after they've been compiled by GCC, so that the `%gs` register
is used, rather than `%fs`. Morphing x86-64 binaries after they've been
compiled is normally difficult, due to the complexity of the machine
instruction language. However GCC provides `-mno-tls-direct-seg-refs`
which greatly reduces the complexity of this task. This flag forgoes
some optimizations to make the generated code simpler. Rather than doing
clever arithmetic with `%fs` prefixes, the compiler will always generate
the thread information block address load as a separate instruction.
```c
// Change AMD code to use %gs:0x30 instead of %fs:0
// We assume -mno-tls-direct-seg-refs has been used
static void ChangeTlsFsToGs(unsigned char *p, size_t n) {
unsigned char *e = p + n - 9;
while (p <= e) {
// we're checking for the following expression:
// 0144 == p[0] && // %fs
// 0110 == (p[1] & 0373) && // rex.w (and ignore rex.r)
// (0213 == p[2] || // mov reg/mem → reg (word-sized)
// 0003 == p[2]) && // add reg/mem → reg (word-sized)
// 0004 == (p[3] & 0307) && // mod/rm (4,reg,0) means sib → reg
// 0045 == p[4] && // sib (5,4,0) → (rbp,rsp,0) → disp32
// 0000 == p[5] && // displacement (von Neumann endian)
// 0000 == p[6] && // displacement
// 0000 == p[7] && // displacement
// 0000 == p[8] // displacement
uint64_t w = READ64LE(p) & READ64LE("\377\373\377\307\377\377\377\377");
if ((w == READ64LE("\144\110\213\004\045\000\000\000") ||
w == READ64LE("\144\110\003\004\045\000\000\000")) &&
!p[8]) {
p[0] = 0145; // change %fs to %gs
p[5] = 0x30; // change 0 to 0x30
p += 9;
} else {
++p;
}
}
}
```
By favoring `%gs` we've now ensured friction-free compatibility for the
APE runtime on MacOS, Linux, and FreeBSD which are all able to conform
easily to this convention. However additional work needs to be done at
runtime when an APE program is started on Windows, OpenBSD, and NetBSD.
On these platforms, all executable pages must be faulted and morphed to
fixup the TLS instructions.
On OpenBSD and NetBSD, this is as simple as undoing the example
operation above. Earlier at compile-time we turned `%fs` into `%gs`.
Now, at runtime, `%gs` must be turned back into `%fs`. Since the
executable is morphing itself, this is easier said than done.
OpenBSD for example enforces a `W^X` invariant. Code that's executing
can't modify itself at the same time. The way Cosmopolitan solves this
is by defining a special part of the binary called `.text.privileged`.
This section is aligned to page boundaries. A GNU ld linker script is
used to ensure that code which morphs code is placed into this section,
through the use of a header-defined cosmo-specific keyword `privileged`.
Additionally, the `fixupobj` program is used by the Cosmo build system
to ensure that compiled objects don't contain privileged functions that
call non-privileged functions. Needless to say, `mprotect()` needs to be
a privileged function, so that it can be used to disable the execute bit
on all other parts of the executable except for the privileged section,
thereby making it writable. Once this has been done, code can change.
On Windows the displacement bytes of the TLS instruction are changed to
use the `%gs:0x1480+i*8` ABI where `i` is a number assigned by the WIN32
`TlsAlloc()` API. This avoids the need to call `TlsGetValue()` which is
implemented this exact same way under the hood. Even though 0x1480 isn't
explicitly documented by MSDN, this ABI is believed to be stable because
MSVC generates binaries that use this offset directly. The only caveat
is that `TlsAlloc()` must be called as early in the runtime init as
possible, to ensure an index less than 64 is returned.
### Thread Information Block (TIB)
The Actually Portable Executable Thread Information Block (TIB) is
defined by this version of the specification as follows:
- The 64-bit TIB self-pointer is stored at offset 0x00.
- The 64-bit TIB self-pointer is also stored at offset 0x30.
- The 32-bit `errno` value is stored at offset 0x3c.
All other parts of the thread information block should be considered
unspecified and therefore reserved for future specifications.
The APE thread information block is aligned on a 64-byte boundary.
Cosmopolitan Libc v3.5.8 (c. 2024-07-21) currently implements a thread
information block that's 512 bytes in size.
### Foreign Function Calls
Even though APE programs always use the System V ABI, there arises the
occasional need to interface with foreign functions, e.g. WIN32. The
`__attribute__((__ms_abi__))` annotation introduced by GCC v6 is used
for this purpose.
The ability to change a function's ABI on a case-by-case basis is
surprisingly enough supported by GCC, Clang, NVCC, and even the AMD HIP
compilers for both UNIX systems and Windows. All of these compilers
support both the System V ABI and the Microsoft x64 ABI.
APE binaries will actually favor the Microsoft ABI even when running on
UNIX OSes for certain dlopen() use-cases. For example, if we control the
code to a CUDA module, which we compile on each OS separately from our
main APE binary, then any function that's inside the APE binary whose
pointer may be passed into a foreign module SHOULD be compiled to use
the Microsoft ABI. This is because in practice the OS-specific module
may need to be compiled by MSVC, where MS ABI is the *only* ABI, which
forces our UNIX programs to partially conform. Thankfully, all UNIX
compilers support doing it on a case-by-case basis.
### Char Signedness
Actually Portable Executable defines `char` as signed.
Therefore conformant APE software MUST use `-fsigned-char` when building
code for aarch64, as well as any other architecture that (unlike x86-64)
would otherwise define `char` as being `unsigned char` by default.
This decision was one of the cases where it made sense to offer a more
consistent runtime experience for fat multi-arch binaries. However you
SHOULD still write code to assume `char` can go either way. But if all
you care about is using APE, then you CAN assume `char` is signed.
### Long Double
On AMD64 platforms, APE binaries define `long double` as 80-bit.
On ARM64 platforms, APE binaries define `long double` as 128-bit.
We accept inconsistency in this case, because hardware acceleration is
far more valuable than stylistic consistency in the case of mathematics.
One challenge arises on AMD64 for supporting `long double` across OSes.
Unlike UNIX systems, the Windows Executive on x86-64 initializes the x87
FPU to have double (64-bit) precision rather than 80-bit. That's because
code compiled by MSVC treats `long double` as though it were `double` to
prefer always using the more modern SSE instructions. However System V
requires genuine 80-bit `long double` support on AMD64.
Therefore, if an APE program detects that it's been started on a Windows
x86-64 system, then it SHOULD use the following assembly to initialize
the x87 FPU in System V ABI mode.
```asm
fldcw 1f(%rip)
.rodata
.balign 2
// 8087 FPU Control Word
// IM: Invalid Operation ───────────────┐
// DM: Denormal Operand ───────────────┐│
// ZM: Zero Divide ───────────────────┐││
// OM: Overflow ─────────────────────┐│││
// UM: Underflow ───────────────────┐││││
// PM: Precision ──────────────────┐│││││
// PC: Precision Control ───────┐ ││││││
// {float,∅,double,long double}│ ││││││
// RC: Rounding Control ──────┐ │ ││││││
// {even, →-∞, →+∞, →0} │┌┤ ││││││
// ┌┤││ ││││││
// d││││rr││││││
1: .short 0b00000000000000000001101111111
.previous
```
## Executable File Alignment
Actually Portable Executable is a statically-linked flat executable file
format that is, as a thing in itself, agnostic to file alignments. For
example, the shell script payload at the beginning of the file and its
statements have no such requirements. Alignment requirements are however
imposed by the executable formats that APE wraps.
1. ELF requires that file offsets be congruent with virtual addresses
modulo the CPU page size. So when we add a shell script to the start
of an executable, we need to round up to the page size in order to
maintain ELF's invariant. Although no such roundup is required on the
program segments once the invariant is restored. ELF loaders will
happily map program headers from arbitrary file intervals (which may
overlap) onto arbitrarily virtual intervals (which don't need to be
contiguous). In order to do that, the loaders will generally use
UNIX's mmap() function which is more restrictive and only accepts
addresses and offsets that are page aligned. To make it possible to
map an unaligned ELF program header that could potentially start and
stop at any byte, ELF loaders round-out the intervals, which means
adjacent unrelated data might also get mapped, which may need to be
explicitly zero'd. Thanks to the cleverness of ELF, it's possible to
have an executable file be very tiny, without needing any alignment
bytes, and it'll be loaded into a properly aligned virtual space
where segments can be as sparse as we want them to be.
2. PE doesn't care about congruence and instead defines two separate
kinds of alignment. First, PE requires that the layout of segment
memory inside the file be aligned on at minimum the classic 512 byte
MS-DOS page size. This means that, unlike ELF, some alignment padding
may need to be encoded into the file, making it slightly larger. Next
PE imposes an alignment restriction on segments once they've been
mapped into the virtual address space, which must be rounded to the
system page size. Like ELF, PE segments need to be properly ordered
but they're allowed to drift apart once mapped in a non-contiguous
sparsely mapped way. When inserting shell script content at the start
of a PE file, the most problematic thing is the need to round up to
the 64kb system granularity, which results in a lot of needless bytes
of padding being inserted by a naive second-pass linker.
3. Apple's Mach-O format is the strictest of them all. While both ELF
and PE are defined in such a way that invites great creativity, XNU
will simply refuse to an executable that does anything creative with
alignment. All loaded segments need to both start and end on a page
aligned address. XNU also wants segments to be contiguous similar to
portable executable, except it applies to both the file and virtual
spaces, which must follow the same structure.
Actually Portable Executables must conform to the strictest requirements
demanded by the support vector. Therefore an APE binary that has headers
for all three of the above executable formats MUST conform to the Apple
way of doing things. GNU ld linker scripts aren't very good at producing
ELF binaries that rigidly conform to this simple naive layout. There are
so many ways things can go wrong, where third party code might slip its
own custom section name in-between the linker script sections that are
explicitly defined, thereby causing ELF's powerful features to manifest
and the resulting content overlapping. The best `ld` flag that helps is
`--orphan-handling=error` which can help with explaining such mysteries.
While Cosmopolitan was originally defined to just use stock GNU tools,
this proved intractable over time, and the project has been evolving in
the direction of building its own. Inventing the `apelink` program was
what enabled the project to achieve multi-architecture binaries whereas
previously it was only possible to do multi-OS binaries. In the future,
our hope is that a fast power linker like Mold can be adapted to produce
fat APE binaries directly from object files in one pass.
## Position Independent Code
APE doesn't currently support position independent executable formats.
This is because APE was originally written for the GNU linker, where PIC
and PIE were after-thoughts and never fully incorporated with the older
more powerful linker script techniques upon which APE relies. Future
iterations of this specification are intended to converge on modern
standards, as our tooling becomes developed enough to support it.
However this only applies to the wrapped executable formats themselves.
While our convention to date has been to always load ELF programs at the
4mb mark, this is not guaranteed across OSes and architectures. Programs
should have no expectations that a program will be loaded to any given
address. For example, Cosmo currently implements APE on AARCH64 as
loading executables to a starting address of 0x000800000000. This
address occupies a sweet spot of requirements.
## Address Space
In order to create a single binary that supports as many platforms as
possible without needing to be recompiled, there's a very narrow range
of addresses that can be used. That range is somewhere between 32 bits
and 39 bits.
- Embedded devices that claim to be 64-bit will oftentimes only support
a virtual address space that's 39 bits in size.
- We can't load executable images on AARCH64 beneath 0x100000000 (4gb)
because Apple forbids doing that, possibly in an effort to enforce a
best practice for spotting 32-bit to 64-bit transition bugs. Please
note that this restriction only applies to Apple ARM64 systems. The
x86-64 version of XNU will happily load APE binaries to 0x00400000.
- The AMD64 architecture on desktops and servers can usually be counted
upon to provide a 47-bit address space. The Linux Kernel for instance
grants each userspace program full dominion over addresses 0x00200000
through 0x00007fffffffffff provided the hardware supports this. On
modern workstations supporting Intel and AMD's new PML5T feature which
virtualizes memory using a radix trie that's five layers deep, Linux
is able to offer userspace its choice of fixed addresses from
0x00200000 through 0x00ffffffffffffff. The only exception to this rule
we've encountered so far is that Windows 7 and Windows Vista behaved
similar to embedded devices in reducing the number of va bits.
## Page Size
APE software MUST be page size agnostic. For many years the industry had
converged on a strong consensus of having a page size that's 4096 bytes.
However this convention was never guaranteed. New computers have become
extremely popular, such as Apple Silicon, that use a 16kb page size.
By convention, Cosmopolitan Libc currently generates ELF headers for
x86-64 that are strictly aligned on a 4096-byte page size. On ARM64
Cosmopolitan is currently implemented to always generate ELF headers
aligned on a 16kb page size.
In addition to being page size agnostic, APE software that cares about
working correctly on Windows needs to be aware of the concept of
allocation granularity. While the page size on Windows is generally 4kb
in size, memory mappings can only be created on addresses that aligned
to the system allocation granularity, which is generally 64kb. If you
use a function like mmap() with Cosmopolitan Libc, then the `addr` and
`offset` parameters need to be aligned to `sysconf(_SC_GRANSIZE)` or
else your software won't work on Windows. Windows has other limitations
too, such as lacking the ability to carve or punch holes in mappings.