Commit graph

193 commits

Author SHA1 Message Date
Justine Tunney
42a3bb729a
Make execve() linger when it can't spoof parent
It's now possible to use execve() when the parent process isn't built by
cosmo. In such cases, the current process will kill all threads and then
linger around, waiting for the newly created process to die, and then we
propagate its exit code to the parent. This should help bazel and others

Allocating private anonymous memory is now 5x faster on Windows. This is
thanks to VirtualAlloc() which is faster than the file mapping APIs. The
fork() function also now goes 30% faster, since we are able to avoid the
VirtualProtect() calls on mappings in most cases now.

Fixes #1253
2025-01-04 21:13:37 -08:00
Justine Tunney
0b3c81dd4e
Make fork() go 30% faster
This change makes fork() go nearly as fast as sys_fork() on UNIX. As for
Windows this change shaves about 4-5ms off fork() + wait() latency. This
is accomplished by using WriteProcessMemory() from the parent process to
setup the address space of a suspended process; it is better than a pipe
2025-01-01 04:59:38 -08:00
Justine Tunney
379cd77078
Improve memory manager and signal handling
On Windows, mmap() now chooses addresses transactionally. It reduces the
risk of badness when interacting with the WIN32 memory manager. We don't
throw darts anymore. There is also no more retry limit, since we recover
from mystery maps more gracefully. The subroutine for combining adjacent
maps has been rewritten for clarity. The print maps subroutine is better

This change goes to great lengths to perfect the stack overflow code. On
Windows you can now longjmp() out of a crash signal handler. Guard pages
previously weren't being restored properly by the signal handler. That's
fixed, so on Windows you can now handle a stack overflow multiple times.
Great thought has been put into selecting the perfect SIGSTKSZ constants
so you can save sigaltstack() memory. You can now use kprintf() with 512
bytes of stack available. The guard pages beneath the main stack are now
recorded in the memory manager.

This change fixes getcontext() so it works right with the %rax register.
2024-12-27 01:33:00 -08:00
Justine Tunney
b490e23d63
Improve Windows sleep accuracy from 15ms to 15µs 2024-12-06 23:03:57 -08:00
Bach Le
baad1df71d
Add several NT functions (#1318)
With these addtions, I could build and run a
[sokol](https://github.com/floooh/sokol) application (using OpenGL) on
both Linux and Windows.
2024-10-27 21:10:32 -07:00
Justine Tunney
0d74673213
Introduce interprocess signaling on Windows
This change gets rsync working without any warning or errors. On Windows
we now create a bunch of C:\var\sig\x\y.pid shared memory files, so sigs
can be delivered between processes. WinMain() creates this file when the
process starts. If the program links signaling system calls then we make
a thread at startup too, which allows asynchronous delivery each quantum
and cancelation points can spot these signals potentially faster on wait

See #1240
2024-09-19 03:02:13 -07:00
mierenhoop
6397999fca
Fix unicode look-alike in define (#1294) 2024-09-15 12:13:25 -07:00
Justine Tunney
dd8544c3bd
Delve into clock rabbit hole
The worst issue I had with consts.sh for clock_gettime is how it defined
too many clocks. So I looked into these clocks all day to figure out how
how they overlap in functionality. I discovered counter-intuitive things
such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and
that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10
also has some incredible new APIs, that let us simplify clock_gettime().

  - Linux CLOCK_REALTIME         -> GetSystemTimePreciseAsFileTime()
  - Linux CLOCK_MONOTONIC        -> QueryUnbiasedInterruptTimePrecise()
  - Linux CLOCK_MONOTONIC_RAW    -> QueryUnbiasedInterruptTimePrecise()
  - Linux CLOCK_REALTIME_COARSE  -> GetSystemTimeAsFileTime()
  - Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime()
  - Linux CLOCK_BOOTTIME         -> QueryInterruptTimePrecise()

Documentation on the clock crew has been added to clock_gettime() in the
docstring and in redbean's documentation too. You can read that to learn
interesting facts about eight essential clocks that survived this purge.
This is original research you will not find on Google, OpenAI, or Claude

I've tested this change by porting *NSYNC to become fully clock agnostic
since it has extensive tests for spotting irregularities in time. I have
also included these tests in the default build so they no longer need to
be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across
the entire amd64 and arm64 test fleets.
2024-09-04 01:32:46 -07:00
Justine Tunney
610c951f71
Fix the build 2024-08-26 16:44:05 -07:00
Justine Tunney
5bd22aef12
Experiment with supporting Windows Arm64 natively
So far I haven't found any way to run native Arm64 code on Windows Arm64
without using MSVC. When I build a PE binary from scratch that should be
a valid Windows Arm64 program, the OS refuses to run it. Possibly due to
requiring additional content like XML manifests or relocation or control
flow integrity data that isn't normally required on x64. I've also tried
using VirtualAlloc2() to JIT an Arm64 native function, but VirtualAlloc2
always fails with invalid parameter. I tried using MSVC to create an ARM
DLL that my x64 emulated program can link at runtime, to pass a function
pointer with ARM code, but LoadLibrary() rejects ARM DLLs as invalid exe

The only option left, is likely to write a new program like ape/ape-m1.c
which can be compiled by MSVC to load and run an AARCH64 ELF executable.
The emulated x64 binary would detect emulation using IsWow64Process2 and
then drop the loader executable in a temporary folder, and re-launch the
original executable, using the Arm64 segments of the cosmocc fat binary.
2024-08-16 06:43:59 -07:00
Justine Tunney
31194165d2
Remove .internal from more header filenames 2024-08-04 12:52:25 -07:00
Justine Tunney
101fb3d9b3
Make some new Windows 10 memory APIs available 2024-07-19 22:26:49 -07:00
Justine Tunney
f7780de24b
Make realloc() go 100x faster on Linux/NetBSD
Cosmopolitan now supports mremap(), which is only supported on Linux and
NetBSD. First, it allows memory mappings to be relocated without copying
them; this can dramatically speed up data structures like std::vector if
the array size grows larger than 256kb. The mremap() system call is also
10x faster than munmap() when shrinking large memory mappings.

There's now two functions, getpagesize() and getgransize() which help to
write portable code that uses mmap(MAP_FIXED). Alternative sysconf() may
be called with our new _SC_GRANSIZE. The madvise() system call now has a
better wrapper with improved documentation.
2024-07-07 12:40:30 -07:00
Jōshin
89fc95fefd
Rerun clang-format on the repo (#1217)
🚨 clang-format changes output per version!

This is with version 19.0.0. The modifications seem to be fixing the old
version’s errors - mainly involving omitted whitespace around binary ops
and inserted whitespace between goto labels and colons (if followed by a
curly brace.)

Also fixes a few mistakes made by e.g. someone (ahem) forgetting to pass
his ctl/string.h modifications through it.

We should add this to .git-blame-ignore-revs once we have its final hash
on master.
2024-06-15 16:34:48 -04:00
Jōshin
f032b5570b
Run clang-format (#1197) 2024-06-01 16:30:43 -04:00
Justine Tunney
e4d25d68e4
Drop support for Windows 8
Microsoft caused some very gentle breakages for Cosmopolitan. They
removed the version information from the PEB which caused uname to
report WINDOWS 0.0.0. We should have called GetVersionExW but that
doesn't really exist anymore either. Windows policy is now to give
whatever version we used in ape/ape.S. Windows8 has been EOL since
2023-01-10 so lets avoid our modern executables being relegated to
legacy infrastructure. Requiring Windows 10+ going forward lets us
remove runtime compatibility bloat from the codebase. Further note
Cosmopolitan maintains a Windows Vista branch on GitHub, so anyone
preferring the older versions, can still have a future with Cosmo.

Another neat thing this fixes is UTF-8 support in the console. The
changes Microsoft made broke the if statement that enabled UTF8 in
terminals. This explains why bug reports had broken arrows. In the
future this should be less of an issue, since the PEB code is gone
which means we more strictly conform to only Microsoft's WIN32 API
2024-05-29 19:37:47 -07:00
Justine Tunney
5fd7b07fac
Improve AVX512 feature detection 2024-05-07 03:19:49 -07:00
Justine Tunney
06d916b449
Add VirtualAlloc2 WIN32 API 2024-05-04 23:26:40 -07:00
Justine Tunney
b0df6c1fce
Implement proper time zone support
Cosmopolitan now supports 104 time zones. They're embedded inside any
binary that links the localtime() function. Doing so adds about 100kb
to the binary size. This change also gets time zones working properly
on Windows for the first time. It's not needed to have /etc/localtime
exist on Windows, since we can get this information from WIN32. We're
also now updated to the latest version of Paul Eggert's TZ library.
2024-05-04 23:06:37 -07:00
Justine Tunney
d5ebb1fa5b
Add MapViewOfFile3 WIN32 API 2024-05-04 12:25:41 -07:00
Justine Tunney
0ef36489c8
Walk back most uses of __STRICT_ANSI__ 2024-02-27 04:09:49 -08:00
Justine Tunney
77a92f517b
Introduce getcpu() system call from glibc 2024-02-21 18:17:20 -08:00
Justine Tunney
3eb405e0e2
Resurrect <windows.h> as <windowsesque.h> 2024-02-21 16:41:11 -08:00
Justine Tunney
957c61cbbf
Release Cosmopolitan v3.3
This change upgrades to GCC 12.3 and GNU binutils 2.42. The GNU linker
appears to have changed things so that only a single de-duplicated str
table is present in the binary, and it gets placed wherever the linker
wants, regardless of what the linker script says. To cope with that we
need to stop using .ident to embed licenses. As such, this change does
significant work to revamp how third party licenses are defined in the
codebase, using `.section .notice,"aR",@progbits`.

This new GCC 12.3 toolchain has support for GNU indirect functions. It
lets us support __target_clones__ for the first time. This is used for
optimizing the performance of libc string functions such as strlen and
friends so far on x86, by ensuring AVX systems favor a second codepath
that uses VEX encoding. It shaves some latency off certain operations.
It's a useful feature to have for scientific computing for the reasons
explained by the test/libcxx/openmp_test.cc example which compiles for
fifteen different microarchitectures. Thanks to the upgrades, it's now
also possible to use newer instruction sets, such as AVX512FP16, VNNI.

Cosmo now uses the %gs register on x86 by default for TLS. Doing it is
helpful for any program that links `cosmo_dlopen()`. Such programs had
to recompile their binaries at startup to change the TLS instructions.
That's not great, since it means every page in the executable needs to
be faulted. The work of rewriting TLS-related x86 opcodes, is moved to
fixupobj.com instead. This is great news for MacOS x86 users, since we
previously needed to morph the binary every time for that platform but
now that's no longer necessary. The only platforms where we need fixup
of TLS x86 opcodes at runtime are now Windows, OpenBSD, and NetBSD. On
Windows we morph TLS to point deeper into the TIB, based on a TlsAlloc
assignment, and on OpenBSD/NetBSD we morph %gs back into %fs since the
kernels do not allow us to specify a value for the %gs register.

OpenBSD users are now required to use APE Loader to run Cosmo binaries
and assimilation is no longer possible. OpenBSD kernel needs to change
to allow programs to specify a value for the %gs register, or it needs
to stop marking executable pages loaded by the kernel as mimmutable().

This release fixes __constructor__, .ctor, .init_array, and lastly the
.preinit_array so they behave the exact same way as glibc.

We no longer use hex constants to define math.h symbols like M_PI.
2024-02-20 13:27:59 -08:00
Justine Tunney
2ab9e9f7fd
Make improvements
- Introduce portable sched_getcpu() api
- Support GCC's __target_clones__ feature
- Make fma() go faster on x86 in default mode
- Remove some asan checks from core libraries
- WinMain() now ensures $HOME and $USER are defined
2024-02-12 10:23:00 -08:00
Justine Tunney
5f8e9f14c1
Add OpenMP support 2024-01-28 22:39:02 -08:00
Justine Tunney
ce0143e2a1
Fix madvise() on Windows 2023-12-27 22:41:46 -08:00
Jōshin
3a8e01a77a
more modeline errata (#1019)
Somehow or another, I previously had missed `BUILD.mk` files.

In the process I found a few straggler cases where the modeline was
different from the file, including one very involved manual fix where a
file had been treated like it was ts=2 and ts=8 on separate occasions.

The commit history in the PR shows the gory details; the BUILD.mk was
automated, everything else was mostly manual.
2023-12-16 23:07:10 -05:00
Jōshin
2fc507c98f
Fix more vi modelines (#1006)
* modelines: tw -> sw

shiftwidth, not textwidth.

* space-surround modelines

* fix irregular modelines

* Fix modeline in titlegen.c
2023-12-13 02:28:11 -05:00
Justine Tunney
4f66d7f2dd
Add WIN32 pseudo console APIs
See #999
2023-12-10 01:29:25 -08:00
Jōshin
e16a7d8f3b
flip et / noet in modelines
`et` means `expandtab`.

```sh
rg 'vi: .* :vi' -l -0 | \
  xargs -0 sed -i '' 's/vi: \(.*\) et\(.*\)  :vi/vi: \1 xoet\2:vi/'
rg 'vi: .*  :vi' -l -0 | \
  xargs -0 sed -i '' 's/vi: \(.*\)noet\(.*\):vi/vi: \1et\2  :vi/'
rg 'vi: .*  :vi' -l -0 | \
  xargs -0 sed -i '' 's/vi: \(.*\)xoet\(.*\):vi/vi: \1noet\2:vi/'
```
2023-12-07 22:17:11 -05:00
Jōshin
394d998315
Fix vi modelines (#989)
At least in neovim, `│vi:` is not recognized as a modeline because it
has no preceding whitespace. After fixing this, opening a file yields
an error because `net` is not an option. (`noet`, however, is.)
2023-12-05 14:37:54 -08:00
Justine Tunney
fa20edc44d
Reduce header complexity
- Remove most __ASSEMBLER__ __LINKER__ ifdefs
- Rename libc/intrin/bits.h to libc/serialize.h
- Block pthread cancelation in fchmodat() polyfill
- Remove `clang-format off` statements in third_party
2023-11-28 14:39:42 -08:00
Justine Tunney
96f979dfc5
Rename makefiles BUILD.mk
This way they appear at the top of directory listings.
2023-11-28 11:21:08 -08:00
Justine Tunney
8caf1b48a9
Improve time/sleep accuracy on Windows
It's now almost as good as Linux thanks to a Windows 8+ API.
2023-11-18 01:57:44 -08:00
Justine Tunney
2c9d2943d6
Introduce AddDllDirectory() 2023-11-17 10:35:34 -08:00
Justine Tunney
68c7c9c1e0
Clean up some code
- Use good ELF technique in cosmo_dlopen()
- Make strerror() conform more to other libc impls
- Introduce __clear_cache() and use it in cosmo_dlopen()
- Remove libc/fmt/fmt.h header (trying to kill off LIBC_FMT)
2023-11-16 17:31:07 -08:00
Justine Tunney
1351d3cede
Remove bool from public headers 2023-11-15 20:58:46 -08:00
tkchia
ed17d3008b
[metal] Add a uprintf() routine, for non-emergency boot logging (#905)
* [metal] Add a uprintf() routine, for non-emergency boot logging
* [metal] _Really_ push forward timing of VGA TTY initialization
* [metal] Do something useful with uprintf()
* [metal] Locate some ACPI tables, for later hardware detection

Specifically the code now tries to find the ACPI RSDP,
RSDT/XSDT, FADT, & MADT tables, whether in legacy BIOS
bootup mode or in a UEFI bootup.  These are useful for
figuring out how to (re)enable asynchronous interrupts
in legacy 8259 PIC mode.
2023-10-25 14:32:20 -07:00
Justine Tunney
cdf556e7d2
Implement signal handler tail recursion
GNU Make on Windows now appears to be working reliably. This change also
fixes a bug where, after fork the Windows thread handle wasn't reset and
that caused undefined behavior using SetThreadContext() with our signals
2023-10-14 10:38:15 -07:00
Justine Tunney
4bcb107cb0
Fix ctrl-c in redbean on Windows 2023-10-13 08:10:03 -07:00
Justine Tunney
49b0eaa69f
Improve threading and i/o routines
- On Windows connect() can now be interrupted by a signal; connect() w/
  O_NONBLOCK will now raise EINPROGRESS; and connect() with SO_SNDTIMEO
  will raise ETIMEDOUT after the interval has elapsed.

- We now get the AcceptEx(), ConnectEx(), and TransmitFile() functions
  from the WIN32 API the officially blessed way, using WSAIoctl().

- Do nothing on Windows when fsync() is called on a directory handle.
  This was raising EACCES earlier becaues GENERIC_WRITE is required on
  the handle. It's possible to FlushFileBuffers() a directory handle if
  it's opened with write access but MSDN doesn't document what it does.
  If you have any idea, please let us know!

- Prefer manual reset event objects for read() and write() on Windows.

- Do some code cleanup on our dlmalloc customizations.

- Fix errno type error in Windows blocking routines.

- Make the futex polyfill simpler and faster.
2023-10-12 23:13:04 -07:00
Justine Tunney
3b086af91b
Fix issues for latest GCC toolchain 2023-10-11 14:54:42 -07:00
Justine Tunney
791f79fcb3
Make improvements
- We now serialize the file descriptor table when spawning / executing
  processes on Windows. This means you can now inherit more stuff than
  just standard i/o. It's needed by bash, which duplicates the console
  to file descriptor #255. We also now do a better job serializing the
  environment variables, so you're less likely to encounter E2BIG when
  using your bash shell. We also no longer coerce environ to uppercase

- execve() on Windows now remotely controls its parent process to make
  them spawn a replacement for itself. Then it'll be able to terminate
  immediately once the spawn succeeds, without having to linger around
  for the lifetime as a shell process for proxying the exit code. When
  process worker thread running in the parent sees the child die, it's
  given a handle to the new child, to replace it in the process table.

- execve() and posix_spawn() on Windows will now provide CreateProcess
  an explicit handle list. This allows us to remove handle locks which
  enables better fork/spawn concurrency, with seriously correct thread
  safety. Other codebases like Go use the same technique. On the other
  hand fork() still favors the conventional WIN32 inheritence approach
  which can be a little bit messy, but is *controlled* by guaranteeing
  perfectly clean slates at both the spawning and execution boundaries

- sigset_t is now 64 bits. Having it be 128 bits was a mistake because
  there's no reason to use that and it's only supported by FreeBSD. By
  using the system word size, signal mask manipulation on Windows goes
  very fast. Furthermore @asyncsignalsafe funcs have been rewritten on
  Windows to take advantage of signal masking, now that it's much more
  pleasant to use.

- All the overlapped i/o code on Windows has been rewritten for pretty
  good signal and cancelation safety. We're now able to ensure overlap
  data structures are cleaned up so long as you don't longjmp() out of
  out of a signal handler that interrupted an i/o operation. Latencies
  are also improved thanks to the removal of lots of "busy wait" code.
  Waits should be optimal for everything except poll(), which shall be
  the last and final demon we slay in the win32 i/o horror show.

- getrusage() on Windows is now able to report RUSAGE_CHILDREN as well
  as RUSAGE_SELF, thanks to aggregation in the process manager thread.
2023-10-08 08:59:53 -07:00
Justine Tunney
4631d34d0d
Improve stack overflow recovery
It's now possible to use sigaltstack() to recover from stack overflows
on Windows. Several bugs in sigaltstack() have been fixed, for all our
supported platforms. There's a newer better example showing how to use
this, along with three independent unit tests just to further showcase
the various techniques.
2023-10-04 07:35:17 -07:00
Justine Tunney
ff77f2a6af
Make improvements
- This change fixes a bug that allowed unbuffered printf() output (to
  streams like stderr) to be truncated. This regression was introduced
  some time between now and the last release.

- POSIX specifies all functions as thread safe by default. This change
  works towards cleaning up our use of the @threadsafe / @threadunsafe
  documentation annotations to reflect that. The goal is (1) to use
  @threadunsafe to document functions which POSIX say needn't be thread
  safe, and (2) use @threadsafe to document functions that we chose to
  implement as thread safe even though POSIX didn't mandate it.

- Tidy up the clock_gettime() implementation. We're now trying out a
  cleaner approach to system call support that aims to maintain the
  Linux errno convention as long as possible. This also fixes bugs that
  existed previously, where the vDSO errno wasn't being translated
  properly. The gettimeofday() system call is now a wrapper for
  clock_gettime(), which reduces bloat in apps that use both.

- The recently-introduced improvements to the execute bit on Windows has
  had bugs fixed. access(X_OK) on a directory on Windows now succeeds.
  fstat() will now perform the MZ/#! ReadFile() operation correctly.

- Windows.h is no longer included in libc/isystem/, because it confused
  PCRE's build system into thinking Cosmopolitan is a WIN32 platform.
  Cosmo's Windows.h polyfill was never even really that good, since it
  only defines a subset of the subset of WIN32 APIs that Cosmo defines.

- The setlongerjmp() / longerjmp() APIs are removed. While they're nice
  APIs that are superior to the standardized setjmp / longjmp functions,
  they weren't superior enough to not be dead code in the monorepo. If
  you use these APIs, please file an issue and they'll be restored.

- The .com appending magic has now been removed from APE Loader.
2023-10-03 06:17:16 -07:00
Justine Tunney
0c5dd7b342
Make improvements
- Improved async signal safety of read() particularly for longjmp()
- Started adding cancel cleanup handlers for locks / etc on Windows
- Make /dev/tty work better particularly for uses like `foo | less`
- Eagerly read console input into a linked list, so poll can signal
- Fix some libc definitional bugs, which configure scripts detected
2023-09-21 07:30:39 -07:00
Justine Tunney
d6c2830850
Rewrite Windows console input handling
This change removes our use of ENABLE_VIRTUAL_TERMINAL_INPUT (which
isn't very good) in favor of having read() translate Windows Console
input events to ANSI/XTERM sequences by hand. This makes it possible to
capture important keystrokes (e.g. ctrl-space) that weren't possible
before. Most importantly this change also removes the stdin/sigwinch
worker threads, which never really worked that well. Interactive TTY
sessions will now work reliably when a Cosmo process spawns or forks
another Cosmo process, e.g. unbourne.com launching emacs.com.
2023-09-19 11:53:27 -07:00
Justine Tunney
ec480f5aa0
Make improvements
- Every unit test now passes on Apple Silicon. The final piece of this
  puzzle was porting our POSIX threads cancelation support, since that
  works differently on ARM64 XNU vs. AMD64. Our semaphore support on
  Apple Silicon is also superior now compared to AMD64, thanks to the
  grand central dispatch library which lets *NSYNC locks go faster.

- The Cosmopolitan runtime is now more stable, particularly on Windows.
  To do this, thread local storage is mandatory at all runtime levels,
  and the innermost packages of the C library is no longer being built
  using ASAN. TLS is being bootstrapped with a 128-byte TIB during the
  process startup phase, and then later on the runtime re-allocates it
  either statically or dynamically to support code using _Thread_local.
  fork() and execve() now do a better job cooperating with threads. We
  can now check how much stack memory is left in the process or thread
  when functions like kprintf() / execve() etc. call alloca(), so that
  ENOMEM can be raised, reduce a buffer size, or just print a warning.

- POSIX signal emulation is now implemented the same way kernels do it
  with pthread_kill() and raise(). Any thread can interrupt any other
  thread, regardless of what it's doing. If it's blocked on read/write
  then the killer thread will cancel its i/o operation so that EINTR can
  be returned in the mark thread immediately. If it's doing a tight CPU
  bound operation, then that's also interrupted by the signal delivery.
  Signal delivery works now by suspending a thread and pushing context
  data structures onto its stack, and redirecting its execution to a
  trampoline function, which calls SetThreadContext(GetCurrentThread())
  when it's done.

- We're now doing a better job managing locks and handles. On NetBSD we
  now close semaphore file descriptors in forked children. Semaphores on
  Windows can now be canceled immediately, which means mutexes/condition
  variables will now go faster. Apple Silicon semaphores can be canceled
  too. We're now using Apple's pthread_yield() funciton. Apple _nocancel
  syscalls are now used on XNU when appropriate to ensure pthread_cancel
  requests aren't lost. The MbedTLS library has been updated to support
  POSIX thread cancelations. See tool/build/runitd.c for an example of
  how it can be used for production multi-threaded tls servers. Handles
  on Windows now leak less often across processes. All i/o operations on
  Windows are now overlapped, which means file pointers can no longer be
  inherited across dup() and fork() for the time being.

- We now spawn a thread on Windows to deliver SIGCHLD and wakeup wait4()
  which means, for example, that posix_spawn() now goes 3x faster. POSIX
  spawn is also now more correct. Like Musl, it's now able to report the
  failure code of execve() via a pipe although our approach favors using
  shared memory to do that on systems that have a true vfork() function.

- We now spawn a thread to deliver SIGALRM to threads when setitimer()
  is used. This enables the most precise wakeups the OS makes possible.

- The Cosmopolitan runtime now uses less memory. On NetBSD for example,
  it turned out the kernel would actually commit the PT_GNU_STACK size
  which caused RSS to be 6mb for every process. Now it's down to ~4kb.
  On Apple Silicon, we reduce the mandatory upstream thread size to the
  smallest possible size to reduce the memory overhead of Cosmo threads.
  The examples directory has a program called greenbean which can spawn
  a web server on Linux with 10,000 worker threads and have the memory
  usage of the process be ~77mb. The 1024 byte overhead of POSIX-style
  thread-local storage is now optional; it won't be allocated until the
  pthread_setspecific/getspecific functions are called. On Windows, the
  threads that get spawned which are internal to the libc implementation
  use reserve rather than commit memory, which shaves a few hundred kb.

- sigaltstack() is now supported on Windows, however it's currently not
  able to be used to handle stack overflows, since crash signals are
  still generated by WIN32. However the crash handler will still switch
  to the alt stack, which is helpful in environments with tiny threads.

- Test binaries are now smaller. Many of the mandatory dependencies of
  the test runner have been removed. This ensures many programs can do a
  better job only linking the the thing they're testing. This caused the
  test binaries for LIBC_FMT for example, to decrease from 200kb to 50kb

- long double is no longer used in the implementation details of libc,
  except in the APIs that define it. The old code that used long double
  for time (instead of struct timespec) has now been thoroughly removed.

- ShowCrashReports() is now much tinier in MODE=tiny. Instead of doing
  backtraces itself, it'll just print a command you can run on the shell
  using our new `cosmoaddr2line` program to view the backtrace.

- Crash report signal handling now works in a much better way. Instead
  of terminating the process, it now relies on SA_RESETHAND so that the
  default SIG_IGN behavior can terminate the process if necessary.

- Our pledge() functionality has now been fully ported to AARCH64 Linux.
2023-09-18 21:04:47 -07:00
Justine Tunney
99dc1281f5
Overhaul Windows signal handling
The new asynchronous signal delivery technique is now also being used
for tkill(), raise(), etc. Many subtle issues have been addresesd. We
now signal handling on Windows that's remarkably similar to the POSIX
behaviors. However that's just across threads. We're lacking a way to
have the signal semantics work well, across multiple WIN32 processes.
2023-09-08 01:49:41 -07:00