cosmopolitan

mirror of https://github.com/jart/cosmopolitan.git synced 2025-01-31 11:37:35 +00:00

Author	SHA1	Message	Date
Justine Tunney	957c61cbbf	Release Cosmopolitan v3.3 This change upgrades to GCC 12.3 and GNU binutils 2.42. The GNU linker appears to have changed things so that only a single de-duplicated str table is present in the binary, and it gets placed wherever the linker wants, regardless of what the linker script says. To cope with that we need to stop using .ident to embed licenses. As such, this change does significant work to revamp how third party licenses are defined in the codebase, using `.section .notice,"aR",@progbits`. This new GCC 12.3 toolchain has support for GNU indirect functions. It lets us support __target_clones__ for the first time. This is used for optimizing the performance of libc string functions such as strlen and friends so far on x86, by ensuring AVX systems favor a second codepath that uses VEX encoding. It shaves some latency off certain operations. It's a useful feature to have for scientific computing for the reasons explained by the test/libcxx/openmp_test.cc example which compiles for fifteen different microarchitectures. Thanks to the upgrades, it's now also possible to use newer instruction sets, such as AVX512FP16, VNNI. Cosmo now uses the %gs register on x86 by default for TLS. Doing it is helpful for any program that links `cosmo_dlopen()`. Such programs had to recompile their binaries at startup to change the TLS instructions. That's not great, since it means every page in the executable needs to be faulted. The work of rewriting TLS-related x86 opcodes, is moved to fixupobj.com instead. This is great news for MacOS x86 users, since we previously needed to morph the binary every time for that platform but now that's no longer necessary. The only platforms where we need fixup of TLS x86 opcodes at runtime are now Windows, OpenBSD, and NetBSD. On Windows we morph TLS to point deeper into the TIB, based on a TlsAlloc assignment, and on OpenBSD/NetBSD we morph %gs back into %fs since the kernels do not allow us to specify a value for the %gs register. OpenBSD users are now required to use APE Loader to run Cosmo binaries and assimilation is no longer possible. OpenBSD kernel needs to change to allow programs to specify a value for the %gs register, or it needs to stop marking executable pages loaded by the kernel as mimmutable(). This release fixes __constructor__, .ctor, .init_array, and lastly the .preinit_array so they behave the exact same way as glibc. We no longer use hex constants to define math.h symbols like M_PI.	2024-02-20 13:27:59 -08:00
Jōshin	e16a7d8f3b	flip et / noet in modelines `et` means `expandtab`. ```sh rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\) et\(.\) :vi/vi: \1 xoet\2:vi/' rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\)noet\(.\):vi/vi: \1et\2 :vi/' rg 'vi: .* :vi' -l -0 \| \ xargs -0 sed -i '' 's/vi: \(.\)xoet\(.\):vi/vi: \1noet\2:vi/' ```	2023-12-07 22:17:11 -05:00
Jōshin	394d998315	Fix vi modelines (#989 ) At least in neovim, `│vi:` is not recognized as a modeline because it has no preceding whitespace. After fixing this, opening a file yields an error because `net` is not an option. (`noet`, however, is.)	2023-12-05 14:37:54 -08:00
Justine Tunney	fa20edc44d	Reduce header complexity - Remove most __ASSEMBLER__ __LINKER__ ifdefs - Rename libc/intrin/bits.h to libc/serialize.h - Block pthread cancelation in fchmodat() polyfill - Remove `clang-format off` statements in third_party	2023-11-28 14:39:42 -08:00
Justine Tunney	68c7c9c1e0	Clean up some code - Use good ELF technique in cosmo_dlopen() - Make strerror() conform more to other libc impls - Introduce __clear_cache() and use it in cosmo_dlopen() - Remove libc/fmt/fmt.h header (trying to kill off LIBC_FMT)	2023-11-16 17:31:07 -08:00
Justine Tunney	ec480f5aa0	Make improvements - Every unit test now passes on Apple Silicon. The final piece of this puzzle was porting our POSIX threads cancelation support, since that works differently on ARM64 XNU vs. AMD64. Our semaphore support on Apple Silicon is also superior now compared to AMD64, thanks to the grand central dispatch library which lets *NSYNC locks go faster. - The Cosmopolitan runtime is now more stable, particularly on Windows. To do this, thread local storage is mandatory at all runtime levels, and the innermost packages of the C library is no longer being built using ASAN. TLS is being bootstrapped with a 128-byte TIB during the process startup phase, and then later on the runtime re-allocates it either statically or dynamically to support code using _Thread_local. fork() and execve() now do a better job cooperating with threads. We can now check how much stack memory is left in the process or thread when functions like kprintf() / execve() etc. call alloca(), so that ENOMEM can be raised, reduce a buffer size, or just print a warning. - POSIX signal emulation is now implemented the same way kernels do it with pthread_kill() and raise(). Any thread can interrupt any other thread, regardless of what it's doing. If it's blocked on read/write then the killer thread will cancel its i/o operation so that EINTR can be returned in the mark thread immediately. If it's doing a tight CPU bound operation, then that's also interrupted by the signal delivery. Signal delivery works now by suspending a thread and pushing context data structures onto its stack, and redirecting its execution to a trampoline function, which calls SetThreadContext(GetCurrentThread()) when it's done. - We're now doing a better job managing locks and handles. On NetBSD we now close semaphore file descriptors in forked children. Semaphores on Windows can now be canceled immediately, which means mutexes/condition variables will now go faster. Apple Silicon semaphores can be canceled too. We're now using Apple's pthread_yield() funciton. Apple _nocancel syscalls are now used on XNU when appropriate to ensure pthread_cancel requests aren't lost. The MbedTLS library has been updated to support POSIX thread cancelations. See tool/build/runitd.c for an example of how it can be used for production multi-threaded tls servers. Handles on Windows now leak less often across processes. All i/o operations on Windows are now overlapped, which means file pointers can no longer be inherited across dup() and fork() for the time being. - We now spawn a thread on Windows to deliver SIGCHLD and wakeup wait4() which means, for example, that posix_spawn() now goes 3x faster. POSIX spawn is also now more correct. Like Musl, it's now able to report the failure code of execve() via a pipe although our approach favors using shared memory to do that on systems that have a true vfork() function. - We now spawn a thread to deliver SIGALRM to threads when setitimer() is used. This enables the most precise wakeups the OS makes possible. - The Cosmopolitan runtime now uses less memory. On NetBSD for example, it turned out the kernel would actually commit the PT_GNU_STACK size which caused RSS to be 6mb for every process. Now it's down to ~4kb. On Apple Silicon, we reduce the mandatory upstream thread size to the smallest possible size to reduce the memory overhead of Cosmo threads. The examples directory has a program called greenbean which can spawn a web server on Linux with 10,000 worker threads and have the memory usage of the process be ~77mb. The 1024 byte overhead of POSIX-style thread-local storage is now optional; it won't be allocated until the pthread_setspecific/getspecific functions are called. On Windows, the threads that get spawned which are internal to the libc implementation use reserve rather than commit memory, which shaves a few hundred kb. - sigaltstack() is now supported on Windows, however it's currently not able to be used to handle stack overflows, since crash signals are still generated by WIN32. However the crash handler will still switch to the alt stack, which is helpful in environments with tiny threads. - Test binaries are now smaller. Many of the mandatory dependencies of the test runner have been removed. This ensures many programs can do a better job only linking the the thing they're testing. This caused the test binaries for LIBC_FMT for example, to decrease from 200kb to 50kb - long double is no longer used in the implementation details of libc, except in the APIs that define it. The old code that used long double for time (instead of struct timespec) has now been thoroughly removed. - ShowCrashReports() is now much tinier in MODE=tiny. Instead of doing backtraces itself, it'll just print a command you can run on the shell using our new `cosmoaddr2line` program to view the backtrace. - Crash report signal handling now works in a much better way. Instead of terminating the process, it now relies on SA_RESETHAND so that the default SIG_IGN behavior can terminate the process if necessary. - Our pledge() functionality has now been fully ported to AARCH64 Linux.	2023-09-18 21:04:47 -07:00
Justine Tunney	10fd8bdb70	Unbloat the build This change resurrects `ae5d06dc53`	2022-08-11 00:15:29 -07:00
Justine Tunney	c1d99676c4	Revert "Unbloat build config" This reverts commit `ae5d06dc53`.	2022-08-10 12:44:56 -07:00
Justine Tunney	ae5d06dc53	Unbloat build config - 10.5% reduction of o//depend dependency graph - 8.8% reduction in latency of make command - Fix issue with temporary file cleanup There's a new -w option in compile.com that turns off the recent Landlock output path workaround for "good commands" which do not unlink() the output file like GNU tooling does. Our new GNU Make unveil sandboxing appears to have zero overhead in the grand scheme of things. Full builds are pretty fast since the only thing that's actually slowed us down is probably libcxx make -j16 MODE=rel RL: took 85,732,063µs wall time RL: ballooned to 323,612kb in size RL: needed 828,560,521µs cpu (11% kernel) RL: caused 39,080,670 page faults (99% memcpy) RL: 350,073 context switches (72% consensual) RL: performed 0 reads and 11,494,960 write i/o operations pledge() and unveil() no longer consider ENOSYS to be an error. These functions have also been added to Python's cosmo module. This change also removes some WIN32 APIs and System Five magnums which we're not using and it's doubtful anyone else would be too	2022-08-10 04:43:09 -07:00
Justine Tunney	398f0c16fb	Add SNI support to redbean and improve SSL perf This change makes SSL virtual hosting possible. You can now load multiple certificates for multiple domains and redbean will just figure out which one to use, even if you only have 1 ip address. You can also use a jumbo certificate that lists all your domains in the the subject alternative names. This change also makes performance improvements to MbedTLS. Here are some benchmarks vs. `cc1920749e` BEFORE AFTER (microsecs) suite_ssl.com 2512881 191738 13.11x faster suite_pkparse.com 36291 3295 11.01x faster suite_x509parse.com 854669 120293 7.10x faster suite_pkwrite.com 6549 1265 5.18x faster suite_ecdsa.com 53347 18778 2.84x faster suite_pk.com 49051 18717 2.62x faster suite_ecdh.com 19535 9502 2.06x faster suite_shax.com 15848 7965 1.99x faster suite_rsa.com 353257 184828 1.91x faster suite_x509write.com 162646 85733 1.90x faster suite_ecp.com 20503 11050 1.86x faster suite_hmac_drbg.no_reseed.com 19528 11417 1.71x faster suite_hmac_drbg.nopr.com 12460 8010 1.56x faster suite_mpi.com 687124 442661 1.55x faster suite_hmac_drbg.pr.com 11890 7752 1.53x faster There aren't any special tricks to the performance imporvements. It's mostly due to code cleanup, assembly and intel instructions like mulx, adox, and adcx.	2021-07-23 13:56:13 -07:00
Justine Tunney	e51034bab3	Make GCM AES faster 13.22% mbedtls_aesni_gcm_mult 13.03% mbedtls_gcm_update 9.85% mbedtls_aesni_crypt_ecb Overhead improvement (perf record) 10.97% mbedtls_aesni_gcm_mult 10.59% mbedtls_aesni_crypt_ecb 2.26% mbedtls_gcm_update	2021-07-06 08:27:16 -07:00
Justine Tunney	0ecd71f697	Make chacha20 go faster	2021-07-05 14:03:50 -07:00
Justine Tunney	cc1920749e	Add SSL to redbean Your redbean can now interoperate with clients that require TLS crypto. This is accomplished using a protocol polyglot that lets us distinguish between HTTP and HTTPS regardless of the port number. Certificates will be generated automatically, if none are supplied by the user. Footprint increases by only a few hundred kb so redbean in MODY=tiny is now 1.0mb - Add lseek() polyfills for ZIP executable - Automatically polyfill /tmp/FOO paths on NT - Fix readdir() / ftw() / nftw() bugs on Windows - Introduce -B flag for slower SSL that's stronger - Remove mbedtls features Cosmopolitan doesn't need - Have base64 decoder support the uri-safe alternative - Remove Truncated HMAC because it's forbidden by the IETF - Add all the mbedtls test suites and make them go 3x faster - Support opendir() / readdir() / closedir() on ZIP executable - Use Everest for ECDHE-ECDSA because it's so good it's so good - Add tinier implementation of sha1 since it's not worth the rom - Add chi-square monte-carlo mean correlation tests for getrandom() - Source entropy on Windows from the proper interface everyone uses We're continuing to outperform NGINX and other servers on raw message throughput. Using SSL means that instead of 1,000,000 qps you can get around 300,000 qps. However redbean isn't as fast as NGINX yet at SSL handshakes, since redbean can do 2,627 per second and NGINX does 4.3k Right now, the SSL UX story works best if you give your redbean a key signing key since that can be easily generated by openssl using a one liner then redbean will do all the things that are impossibly hard to do like signing ecdsa and rsa certificates that'll work in chrome. We should integrate the let's encrypt acme protocol in the future. Live Demo: https://redbean.justine.lol/ Root Cert: https://redbean.justine.lol/redbean1.crt	2021-06-24 13:20:50 -07:00
Justine Tunney	1beeb7a829	Flatten Mbed TLS directory structure	2021-06-24 11:13:12 -07:00

14 commits