mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-01-31 03:27:39 +00:00
Start writing formal specification for APE
This commit is contained in:
parent
7996bf67b5
commit
29ce25c767
2 changed files with 272 additions and 1 deletions
271
ape/specification.md
Normal file
271
ape/specification.md
Normal file
|
@ -0,0 +1,271 @@
|
||||||
|
# Actually Portable Executable Specification v0.1
|
||||||
|
|
||||||
|
Actually Portable Executable (APE) is an executable file format that
|
||||||
|
polyglots the Windows Portable Executable (PE) format with a UNIX Sixth
|
||||||
|
Edition style shell script that doesn't have a shebang. This makes it
|
||||||
|
possible to produce a single file binary that executes on the stock
|
||||||
|
installations of the many OSes and architectures.
|
||||||
|
|
||||||
|
## Supported OSes and Architectures
|
||||||
|
|
||||||
|
- AMD64
|
||||||
|
- Linux
|
||||||
|
- MacOS
|
||||||
|
- Windows
|
||||||
|
- FreeBSD
|
||||||
|
- OpenBSD
|
||||||
|
- NetBSD
|
||||||
|
- BIOS
|
||||||
|
|
||||||
|
- ARM64
|
||||||
|
- Linux
|
||||||
|
- MacOS
|
||||||
|
- FreeBSD
|
||||||
|
- Windows (non-native)
|
||||||
|
|
||||||
|
## File Header
|
||||||
|
|
||||||
|
APE defines three separate file magics, all of which are 8 characters
|
||||||
|
long. Any file that starts with one of these magic values can be
|
||||||
|
considered an APE program.
|
||||||
|
|
||||||
|
### (1) APE MZ Magic
|
||||||
|
|
||||||
|
- ASCII: `MZqFpD='`
|
||||||
|
- Hex: 4d 5a 71 46 70 44 3d 27
|
||||||
|
|
||||||
|
This is the canonical magic used by almost all APE programs. It enables
|
||||||
|
maximum portability between OSes. When interpreted as a shell script, it
|
||||||
|
is assiging a single quoted string to an unused variable. The shell will
|
||||||
|
then ignore subsequent binary content that's placed inside the string.
|
||||||
|
|
||||||
|
It is strongly recommended that this magic value be immediately followed
|
||||||
|
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
|
||||||
|
Zsh impose a binary safety check before handing off files that don't
|
||||||
|
have a shebang to `/bin/sh`. That check applies to the first line, which
|
||||||
|
can't contain NUL characters.
|
||||||
|
|
||||||
|
The letters were carefully chosen so as to be valid x86 instructions in
|
||||||
|
all operating modes. This makes it possible to store a BIOS bootloader
|
||||||
|
disk image inside an APE binary. For example, simple CLI programs built
|
||||||
|
with Cosmopolitan Libc will boot from BIOS into long mode if they're
|
||||||
|
treated as a floppy disk image.
|
||||||
|
|
||||||
|
The letters also allow for the possibility of being treated on x86-64 as
|
||||||
|
a flat executable, where the PE / ELF / Mach-O executable structures are
|
||||||
|
ignored, and execution simply begins at the beginning of the file,
|
||||||
|
similar to how MS-DOS .COM binaries work.
|
||||||
|
|
||||||
|
The 0x4a relative offset of the magic causes execution to jump into the
|
||||||
|
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
|
||||||
|
Libc use tricks in the MS-DOS stub to check the operating mode and then
|
||||||
|
jump to the appropriate entrypoint, e.g. `_start()`.
|
||||||
|
|
||||||
|
#### Decoded as i8086
|
||||||
|
|
||||||
|
```asm
|
||||||
|
dec %bp
|
||||||
|
pop %dx
|
||||||
|
jno 0x4a
|
||||||
|
jo 0x4a
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Decoded as i386
|
||||||
|
|
||||||
|
```asm
|
||||||
|
push %ebp
|
||||||
|
pop %edx
|
||||||
|
jno 0x4a
|
||||||
|
jo 0x4a
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Decoded as x86-64
|
||||||
|
|
||||||
|
```asm
|
||||||
|
rex.WRB
|
||||||
|
pop %r10
|
||||||
|
jno 0x4a
|
||||||
|
jo 0x4a
|
||||||
|
```
|
||||||
|
|
||||||
|
### (2) APE UNIX-Only Magic
|
||||||
|
|
||||||
|
- ASCII: `jartsr='`
|
||||||
|
- Hex: 6a 61 72 74 73 72 3d 27
|
||||||
|
|
||||||
|
Being a novel executable format that was first published in 2020, the
|
||||||
|
APE file format is less understood by industry tools compared to the PE,
|
||||||
|
ELF, and Mach-O executable file formats, which have been around for
|
||||||
|
decades. For this reason, APE programs that use the MZ magic above can
|
||||||
|
attract attention from Windows AV software, which may be unwanted by
|
||||||
|
developers who aren't interested in targeting the Windows platform.
|
||||||
|
Therefore the `jartsr='` magic is defined which enables the creation of
|
||||||
|
APE binaries that can safely target all non-Windows platforms. Even
|
||||||
|
though this magic is less common, APE interpreters and binfmt-misc
|
||||||
|
installations MUST support this.
|
||||||
|
|
||||||
|
It is strongly recommended that this magic value be immediately followed
|
||||||
|
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
|
||||||
|
Zsh impose a binary safety check before handing off files that don't
|
||||||
|
have a shebang to `/bin/sh`. That check applies to the first line, which
|
||||||
|
can't contain NUL characters.
|
||||||
|
|
||||||
|
The letters were carefully chosen so as to be valid x86 instructions in
|
||||||
|
all operating modes. This makes it possible to store a BIOS bootloader
|
||||||
|
disk image inside an APE binary. For example, simple CLI programs built
|
||||||
|
with Cosmopolitan Libc will boot from BIOS into long mode if they're
|
||||||
|
treated as a floppy disk image.
|
||||||
|
|
||||||
|
The letters also allow for the possibility of being treated on x86-64 as
|
||||||
|
a flat executable, where the PE / ELF / Mach-O executable structures are
|
||||||
|
ignored, and execution simply begins at the beginning of the file,
|
||||||
|
similar to how MS-DOS .COM binaries work.
|
||||||
|
|
||||||
|
The 0x78 relative offset of the magic causes execution to jump into the
|
||||||
|
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
|
||||||
|
Libc use tricks in the MS-DOS stub to check the operating mode and then
|
||||||
|
jump to the appropriate entrypoint, e.g. `_start()`.
|
||||||
|
|
||||||
|
#### Decoded as i8086 / i386 / x86-64
|
||||||
|
|
||||||
|
```asm
|
||||||
|
push $0x61
|
||||||
|
jb 0x78
|
||||||
|
jae 0x78
|
||||||
|
```
|
||||||
|
|
||||||
|
### (3) APE Debug Magic
|
||||||
|
|
||||||
|
- ASCII: `APEDBG='`
|
||||||
|
- Hex: 41 50 45 44 42 47 3d 27
|
||||||
|
|
||||||
|
While APE files must be valid shell scripts, in practice, UNIX systems
|
||||||
|
will oftentimes be configured to provide a faster safer alternative to
|
||||||
|
loading an APE binary through `/bin/sh`. The Linux Kernel can be patched
|
||||||
|
to have execve() recognize the APE format and directly load its embedded
|
||||||
|
ELF header. Linux systems can also use binfmt-misc to recognize APE's MZ
|
||||||
|
and jartsr magic, and pass them to a userspace program named `ape` that
|
||||||
|
acts as an interpreter. In such environments, the need sometimes arises
|
||||||
|
to be able to test that the `/bin/sh` is working correctly, in which
|
||||||
|
case the `APEDBG='` magic is RECOMMENDED.
|
||||||
|
|
||||||
|
APE interpreters, execve() implementations, and binfmt-misc installs
|
||||||
|
MUST ignore this magic. If necessary, steps can be taken to help files
|
||||||
|
with this magic be passed to `/bin/sh` like a normal shebang-less shell
|
||||||
|
script for execution.
|
||||||
|
|
||||||
|
## Embedded ELF Header
|
||||||
|
|
||||||
|
APE binaries MAY embed an ELF header inside them. Unlike conventional
|
||||||
|
executable file formats, this header is not stored at a fixed offset.
|
||||||
|
It's instead encoded as octal escape codes in a shell script `printf`
|
||||||
|
statement. For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
printf '\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\166\105\100\000\000\000\000\000\060\013\000\000\000\000\000\000\000\000\000\000\000\000\000\000\165\312\1\1\100\0\070\0\005\000\0\0\000\000\000\000'
|
||||||
|
```
|
||||||
|
|
||||||
|
This `printf` statement MUST appear in the first 8192 bytes of the APE
|
||||||
|
executable, so as to limit how much of the initial portion of a file an
|
||||||
|
intepreter must load.
|
||||||
|
|
||||||
|
Multiple such `printf` statements MAY appear in hte first 8192 bytes, in
|
||||||
|
order to specify multiple architectures. For example, fat binaries built
|
||||||
|
by the `apelink` program (provided by Cosmo Libc) will have two encoded
|
||||||
|
ELF headers, for amd64 and arm64, each of which point into the proper
|
||||||
|
file offsets for their respective native code. Therefore, kernels and
|
||||||
|
interpreters which load the APE format directly MUST check the
|
||||||
|
`e_machine` field of the `Elf64_Ehdr` that's decoded from the octal
|
||||||
|
codes, before accepting a `printf` shell statement as valid.
|
||||||
|
|
||||||
|
These printf statements MUST always use only unescaped ASCII characters
|
||||||
|
or octal escape codes. These printf statements MUST NOT use space saving
|
||||||
|
escape codes such as `\n`. For example, rather than saying `\n` it would
|
||||||
|
be valid to say `\012` instead. It's also valid to say `\12` but only if
|
||||||
|
the encoded characters that follow aren't an octal digit.
|
||||||
|
|
||||||
|
For example, the following algorithm may be used for parsing octal:
|
||||||
|
|
||||||
|
```c
|
||||||
|
static int ape_parse_octal(const unsigned char page[8192], int i, int *pc)
|
||||||
|
{
|
||||||
|
int c;
|
||||||
|
if ('0' <= page[i] && page[i] <= '7') {
|
||||||
|
c = page[i++] - '0';
|
||||||
|
if ('0' <= page[i] && page[i] <= '7') {
|
||||||
|
c *= 8;
|
||||||
|
c += page[i++] - '0';
|
||||||
|
if ('0' <= page[i] && page[i] <= '7') {
|
||||||
|
c *= 8;
|
||||||
|
c += page[i++] - '0';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*pc = c;
|
||||||
|
}
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
APE aware interpreters SHOULD only take `e_machine` into consideration.
|
||||||
|
It is the responsibility of the `_start()` function to detect the OS.
|
||||||
|
Therefore, multiple `printf` statements are only embedded in the shell
|
||||||
|
script for different CPU architectures.
|
||||||
|
|
||||||
|
The OS ABI field of an APE embedded `Elf64_Ehdr` SHOULD be set to
|
||||||
|
`ELFOSABI_FREEBSD`, since it's the only UNIX OS APE supports that
|
||||||
|
actually checks the field. However different values MAY be chosen for
|
||||||
|
binaries that don't intend to have FreeBSD in their support vector.
|
||||||
|
|
||||||
|
Counter-intuitively, the ARM64 ELF header is used on the MacOS ARM64
|
||||||
|
platform when loading from fat binaries.
|
||||||
|
|
||||||
|
## Embedded Mach-O Header (x86-64 only)
|
||||||
|
|
||||||
|
APE shell scripts that support MacOS on AMD64 must use the `dd` command
|
||||||
|
in a very specific way to specify how the embedded binary Macho-O header
|
||||||
|
is copied backward to the start of the file. For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
dd if="$o" of="$o" bs=8 skip=433 count=66 conv=notrunc
|
||||||
|
```
|
||||||
|
|
||||||
|
These `dd` statements have traditionally been generated by the GNU as
|
||||||
|
and ld.bfd programs by encoding ASCII into 64-bit linker relocations,
|
||||||
|
which necessitated a fixed width for integer values. It took several
|
||||||
|
iterations over APE's history before we eventually got it right:
|
||||||
|
|
||||||
|
- `arg=" 9293"` is how we originally had ape do it
|
||||||
|
- `arg=$(( 9293))` b/c busybox sh disliked quoted space
|
||||||
|
- `arg=9293 ` is generated by modern apelink program
|
||||||
|
|
||||||
|
Software that parses the APE file format, which needs to extract to be
|
||||||
|
able extract the Macho-O x86-64 header SHOULD support the old binaries
|
||||||
|
that use the previous encodings. To make backwards compatibility simple
|
||||||
|
the following regular expression may be used, which generalizes to all
|
||||||
|
defined formats:
|
||||||
|
|
||||||
|
```c
|
||||||
|
regcomp(&rx,
|
||||||
|
"bs=" // dd block size arg
|
||||||
|
"(['\"] *)?" // #1 optional quote w/ space
|
||||||
|
"(\\$\\(\\( *)?" // #2 optional math w/ space
|
||||||
|
"([[:digit:]]+)" // #3
|
||||||
|
"( *\\)\\))?" // #4 optional math w/ space
|
||||||
|
"( *['\"])?" // #5 optional quote w/ space
|
||||||
|
" +" //
|
||||||
|
"skip=" // dd skip arg
|
||||||
|
"(['\"] *)?" // #6 optional quote w/ space
|
||||||
|
"(\\$\\(\\( *)?" // #7 optional math w/ space
|
||||||
|
"([[:digit:]]+)" // #8
|
||||||
|
"( *\\)\\))?" // #9 optional math w/ space
|
||||||
|
"( *['\"])?" // #10 optional quote w/ space
|
||||||
|
" +" //
|
||||||
|
"count=" // dd count arg
|
||||||
|
"(['\"] *)?" // #11 optional quote w/ space
|
||||||
|
"(\\$\\(\\( *)?" // #12 optional math w/ space
|
||||||
|
"([[:digit:]]+)", // #13
|
||||||
|
REG_EXTENDED);
|
||||||
|
```
|
||||||
|
|
||||||
|
For further details, see the canonical implementation in
|
||||||
|
`cosmopolitan/tool/build/assimilate.c`.
|
|
@ -26,7 +26,7 @@
|
||||||
* pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
|
* pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
|
||||||
* // ...
|
* // ...
|
||||||
* pthread_mutex_lock(&lock);
|
* pthread_mutex_lock(&lock);
|
||||||
* pthread_cond_signal(&cond, &lock);
|
* pthread_cond_signal(&cond);
|
||||||
* pthread_mutex_unlock(&lock);
|
* pthread_mutex_unlock(&lock);
|
||||||
*
|
*
|
||||||
* This function has no effect if there aren't any threads currently
|
* This function has no effect if there aren't any threads currently
|
||||||
|
|
Loading…
Reference in a new issue