Commit graph

31 commits

Author SHA1 Message Date
Justine Tunney
e5e3cdf447
Get LIBC_RUNTIME and LIBC_CALLS building on aarch64 2023-05-10 04:20:47 -07:00
Justine Tunney
e522aa3a07
Make more threading improvements
- ASAN memory morgue is now lockless
- Make C11 atomics header more portable
- Rewrote pthread keys support to be lockless
- Simplify Python's unicode table unpacking code
- Make crash report write(2) closer to being atomic
- Make it possible to strace/ftrace a single thread
- ASAN now checks nul-terminated strings fast and properly
- Windows fork() now restores TLS memory of calling thread
2022-11-01 23:28:26 -07:00
Justine Tunney
cdb2284f0d
Remove stdio lock macros from amalgamation 2022-09-10 12:03:36 -07:00
Justine Tunney
155b378a39
Tidy up the threading implementation
The organization of the source files is now much more rational.
Old experiments that didn't work out are now deleted. Naming of
things like files is now more intuitive.
2022-09-10 02:56:25 -07:00
Justine Tunney
2d17ab016c
Perform more low-level code cleanup 2022-09-09 04:07:08 -07:00
Justine Tunney
69f4152f38 Always initialize thread local storage
We had previously not enabled TLS in MODE=tiny in order to keep the
smallest example programs (e.g. life.com) just 16kb in size. But it
was error prone doing that, so now we just always enable it because
this change uses hacks to ensure it won't increase life.com's size.

This change also fixes a bug on NetBSD, where signal handlers would
break thread local storage if SA_SIGINFO was being used. This looks
like it might be a bug in NetBSD, but it's got a simple workaround.
2022-07-19 00:21:46 -07:00
Justine Tunney
b1d9d11be1 Simplify TLS and reduce startup latency
This change simplifies the thread-local storage support code. On Windows
and Mac OS X the startup latency of __enable_tls() has been reduced from
30ms to 1ms. On Windows, TLS memory accesses will now go much faster due
to better self-modifying code that prevents a function call and acquires
our thread information block pointer in a single instruction.
2022-07-18 04:10:54 -07:00
Justine Tunney
694a0da990 Make function call tracing lockless 2022-07-11 08:04:58 -07:00
Justine Tunney
3f015b1e51 Make some minor fixups to bug reporting, etc. 2022-07-11 05:58:24 -07:00
Justine Tunney
fbc053e018 Make fixes and improvements
- Introduce __assert_disable global
- Improve strsignal() thread safety
- Make system call tracing thread safe
- Fix SO_RCVTIMEO / SO_SNDTIMEO on Windows
- Refactor DescribeFoo() functions into one place
- Fix fork() on Windows when TLS and MAP_STACK exist
- Round upwards in setsockopt(SO_RCVTIMEO) on Windows
- Disable futexes on OpenBSD which seem extremely broken
- Implement a better kludge for monotonic time on Windows
2022-06-25 21:09:09 -07:00
Justine Tunney
8cdec62f5b Apply even more fixups
- Finish cleaning up the stdio unlocked APIs
- Make __cxa_finalize() properly thread safe
- Don't log locks if threads aren't being used
- Add some more mutex guards to places using _mmi
- Specific lock names now appear in the --ftrace logs
- Fix mkdeps.com generating invalid Makefiles sometimes
- Simplify and fix bugs in the test runner infrastructure
- Fix issue where sometimes some functions wouldn't be logged
2022-06-12 11:57:00 -07:00
Justine Tunney
1f229e4efc Use re-entrant locks on stdio 2022-05-22 08:28:33 -07:00
Justine Tunney
db0d8dd806 Support Linux binfmt_misc and APE loading on Apple
The "no modify self" variant of Actually Portable Executable is now
supported on all platforms. If you use `$(APE_NO_MODIFY_SELF)` then
ld.bfd will embed a 4096 byte ELF binary and a 4096 byte Macho file
which are installed on the fly to ${TMPDIR:-/tmp}, which enables us
launch the executable, without needing to copy the whole executable

To prevent it from copying a tiny executable to your temp directory
you need to install the `ape` command (renamed from ape-loader), to
a system path. For example:

    # FreeBSD / NetBSD / OpenBSD
    make -j8 o//ape/ape
    cp o//ape/ape /usr/bin/ape

    # Mac OS
    # make -j8 o//ape/ape.macho
    curl https://justine.lol/ape.macho >/usr/bin/ape
    chmod +x /usr/bin/ape

On Linux you can get even more performance with the new binfmt_misc
support which makes launching non-modifying APE binaries as fast as
launching ELF executables. Running the following command:

    # Linux
    ape/apeinstall.sh

Will copy APE loader to /usr/bin/ape and register with binfmt_misc
Lastly, this change also fixes a really interesting race condition
with OpenBSD thread joining.
2022-05-21 09:28:25 -07:00
Justine Tunney
c8a2f04058 Reduce ftrace overhead to 280ns 2022-05-20 04:46:42 -07:00
Justine Tunney
4245da19e2 Remove some old dead code from ftrace 2022-05-20 04:01:50 -07:00
Justine Tunney
c21412bc2c Remove carriage return from function traces 2022-05-19 17:46:14 -07:00
Justine Tunney
ec2cb88058 Make fixes and improvements
- Document more compiler flags
- Expose new __print_maps() api
- Better overflow checking in mmap()
- Improve the shell example somewhat
- Fix minor runtime bugs regarding stacks
- Make kill() on fork()+execve()'d children work
- Support CLONE_CHILD_CLEARTID for proper joining
- Fix recent possible deadlock regression with --ftrace
2022-05-19 16:57:49 -07:00
Justine Tunney
9208c83f7a Make some systemic improvements
- add vdso dump utility
- tests now log stack usage
- rename g_ftrace to __ftrace
- make internal spinlocks go faster
- add conformant c11 atomics library
- function tracing now logs stack usage
- make function call tracing thread safe
- add -X unsecure (no ssl) mode to redbean
- munmap() has more consistent behavior now
- pacify fsync() calls on python unit tests
- make --strace flag work better in redbean
- start minimizing and documenting compiler flags
2022-05-18 16:52:36 -07:00
Justine Tunney
55de4ca6b5 Support thread local storage 2022-05-16 13:20:08 -07:00
Justine Tunney
2f56ebfe78 Do code cleanup use duff device linenoise i/o 2022-04-22 18:56:52 -07:00
Justine Tunney
dc0ea6640e Fix bugs with recent change
This change makes further effort towards improving our poll()
implementation on the New Technology. The stdin worker didn't work out
so well for Python so it's not being used for now. System call tracing
with the --strace flag should now be less noisy now on Windows unless
you modify the strace.internal.h defines to turn on some optional ones
that are most useful for debugging the system call wrappers.
2022-04-16 10:40:23 -07:00
Justine Tunney
046c7ebd4a Improve locks and signals
- Introduce fast spinlock API
- Double rand64() perf w/ spinlock
- Improve raise() on New Technology
- Support gettid() across platforms
- Implement SA_NODEFER on New Technology
- Move the lock intrinsics into LIBC_INTRIN
- Make SIGTRAP recoverable on New Technology
- Block SIGCHLD in wait4() on New Technology
- Add threading prototypes for XNU and FreeBSD
- Rewrite abort() fixing its minor bugs on XNU/NT
- Shave down a lot of the content in libc/bits/bits.h
- Let signal handlers modify CPU registers on New Technology
2022-04-12 05:20:17 -07:00
Justine Tunney
23b72eb617 Add support for symbol table in .com files
This change fixes minor bugs and adds a feature, which lets us store the
ELF symbol table, inside the ZIP directory. We use the path /zip/.symtab
which can be safely removed using a zip editing tool, to make the binary
smaller after compilation. This supplements the existing method of using
a separate .com.dbg file, which is still supported. The intent is people
don't always know that it's a good idea to download the debug file. It's
not great having someone's first experience be a crash report, that only
has numbers rather than symbols. This will help fix that!
2022-03-23 06:34:46 -07:00
Justine Tunney
b45d50b690 Make improvements
- Fix build flakes
- Polyfill SIGWINCH on Windows
- Fix an execve issue on Windows
- Make strerror show more information
- Improve cmd.exe setup/teardown on Windows
- Support bracketed paste mode in Blinkenlights
- Show keyboard shortcuts in Blinkenlights status bar
- Fixed copy_file_range() and copyfile() w/ zip filesystem
- Size optimize GetDosArgv() to keep life.com 12kb in size
- Improve Blinkenlights ability to load weird ELF executables
- Fix program_executable_name and add GetInterpreterExecutableName
- Make Python in tiny mode fail better if docstrings are requested
- Update Python test exclusions in tiny* modes such as tinylinux
- Add bulletproof unbreakable kprintf() troubleshooting function
- Remove "oldskool" keyword from ape.S for virus scanners
- Fix issue that caused backtraces to not print sometimes
- Improve Blinkenlights serial uart character i/o
- Make clock_gettime() not clobber errno on xnu
- Improve sha256 cpuid check for old computers
- Integrate some bestline linenoise fixes
- Show runit process names better in htop
- Remove SIGPIPE from ShowCrashReports()
- Make realpath() not clobber errno
- Avoid attaching GDB on non-Linux
- Improve img.com example
2022-03-16 13:40:10 -07:00
Justine Tunney
226aaf3547 Improve memory safety
This commit makes numerous refinements to cosmopolitan memory handling.

The default stack size has been reduced from 2mb to 128kb. A new macro
is now provided so you can easily reconfigure the stack size to be any
value you want. Work around the breaking change by adding to your main:

    STATIC_STACK_SIZE(0x00200000);  // 2mb stack

If you're not sure how much stack you need, then you can use:

    STATIC_YOINK("stack_usage_logging");

After which you can `sort -nr o/$MODE/stack.log`. Based on the unit test
suite, nothing in the Cosmopolitan repository (except for Python) needs
a stack size greater than 30kb. There are also new macros for detecting
the size and address of the stack at runtime, e.g. GetStackAddr(). We
also now support sigaltstack() so if you want to see nice looking crash
reports whenever a stack overflow happens, you can put this in main():

    ShowCrashReports();

Under `make MODE=dbg` and `make MODE=asan` the unit testing framework
will now automatically print backtraces of memory allocations when
things like memory leaks happen. Bugs are now fixed in ASAN global
variable overrun detection. The memtrack and asan runtimes also handle
edge cases now. The new tools helped to identify a few memory leaks,
which are fixed by this change.

This change should fix an issue reported in #288 with ARG_MAX limits.
Fixing this doubled the performance of MKDEPS.COM and AR.COM yet again.
2021-10-13 17:27:13 -07:00
Gautham
d852640a1e
Add Python ftrace contextmanager (#285) 2021-10-13 11:00:25 -07:00
Justine Tunney
9cb54218ab Add error checks to Python objectifier (#281)
PYOBJ.COM was failing when statically analyzing _pyio.py in MODE=dbg
because co_consts contained a big number, which dirtied the interpreter
exception state. We now do comprehensive error checking w/ Python API.

The -DSTACK_FRAME_UNLIMITED CPPFLAG has been removed from DES since its
self test function has been fixed to use heap memory rather than making
aggressive use of the stack.

This change also fixes a regression with function tracing (the --ftrace
flag a.k.a. ftrace_install() a.k.a. cosmo.ftrace) in ASAN build modes.
Lastly, the _tracemalloc module should now always be available for use
in MODE=dbg.
2021-10-02 06:17:17 -07:00
Justine Tunney
39bf41f4eb Make numerous improvements
- Python static hello world now 1.8mb
- Python static fully loaded now 10mb
- Python HTTPS client now uses MbedTLS
- Python REPL now completes import stmts
- Increase stack size for Python for now
- Begin synthesizing posixpath and ntpath
- Restore Python \N{UNICODE NAME} support
- Restore Python NFKD symbol normalization
- Add optimized code path for Intel SHA-NI
- Get more Python unit tests passing faster
- Get Python help() pagination working on NT
- Python hashlib now supports MbedTLS PBKDF2
- Make memcpy/memmove/memcmp/bcmp/etc. faster
- Add Mersenne Twister and Vigna to LIBC_RAND
- Provide privileged __printf() for error code
- Fix zipos opendir() so that it reports ENOTDIR
- Add basic chmod() implementation for Windows NT
- Add Cosmo's best functions to Python cosmo module
- Pin function trace indent depth to that of caller
- Show memory diagram on invalid access in MODE=dbg
- Differentiate stack overflow on crash in MODE=dbg
- Add stb_truetype and tools for analyzing font files
- Upgrade to UNICODE 13 and reduce its binary footprint
- COMPILE.COM now logs resource usage of build commands
- Start implementing basic poll() support on bare metal
- Set getauxval(AT_EXECFN) to GetModuleFileName() on NT
- Add descriptions to strerror() in non-TINY build modes
- Add COUNTBRANCH() macro to help with micro-optimizations
- Make error / backtrace / asan / memory code more unbreakable
- Add fast perfect C implementation of μ-Law and a-Law audio codecs
- Make strtol() functions consistent with other libc implementations
- Improve Linenoise implementation (see also github.com/jart/bestline)
- COMPILE.COM now suppresses stdout/stderr of successful build commands
2021-09-28 01:52:34 -07:00
Justine Tunney
51904e2687 Improve Python and Linenoise
This change reinvents all the GNU Readline features I discovered that I
couldn't live without, e.g. UTF-8, CTRL-R search and CTRL-Y yanking. It
now feels just as good in terms of user interface from the subconscious
workflow perspective. It's real nice to finally have an embeddable line
reader that's actually good with a 30 kb footprint and a bsd-2 license.

This change adds a directory to the examples folder, explaining how the
new Python compiler may be used.  Some of the bugs with Python binaries
have been addressed but overall it's still a work in progress.
2021-09-11 22:30:37 -07:00
Justine Tunney
398f0c16fb Add SNI support to redbean and improve SSL perf
This change makes SSL virtual hosting possible. You can now load
multiple certificates for multiple domains and redbean will just
figure out which one to use, even if you only have 1 ip address.
You can also use a jumbo certificate that lists all your domains
in the the subject alternative names.

This change also makes performance improvements to MbedTLS. Here
are some benchmarks vs. cc1920749e

                                   BEFORE    AFTER   (microsecs)
suite_ssl.com                     2512881   191738 13.11x faster
suite_pkparse.com                   36291     3295 11.01x faster
suite_x509parse.com                854669   120293  7.10x faster
suite_pkwrite.com                    6549     1265  5.18x faster
suite_ecdsa.com                     53347    18778  2.84x faster
suite_pk.com                        49051    18717  2.62x faster
suite_ecdh.com                      19535     9502  2.06x faster
suite_shax.com                      15848     7965  1.99x faster
suite_rsa.com                      353257   184828  1.91x faster
suite_x509write.com                162646    85733  1.90x faster
suite_ecp.com                       20503    11050  1.86x faster
suite_hmac_drbg.no_reseed.com       19528    11417  1.71x faster
suite_hmac_drbg.nopr.com            12460     8010  1.56x faster
suite_mpi.com                      687124   442661  1.55x faster
suite_hmac_drbg.pr.com              11890     7752  1.53x faster

There aren't any special tricks to the performance imporvements.
It's mostly due to code cleanup, assembly and intel instructions
like mulx, adox, and adcx.
2021-07-23 13:56:13 -07:00
Justine Tunney
f3e28aa192 Make SSL handshakes much faster
This change boosts SSL handshake performance from 2,627 to ~10,000 per
second which is the same level of performance as NGINX at establishing
secure connections. That's impressive if we consider that redbean is a
forking frontend application server. This was accomplished by:

  1. Enabling either SSL session caching or SSL tickets. We choose to
     use tickets since they reduce network round trips too and that's
     a more important metric than wrk'ing localhost.

  2. Fixing mbedtls_mpi_sub_abs() which is the most frequently called
     function. It's called about 12,000 times during an SSL handshake
     since it's the basis of most arithmetic operations like addition
     and for some strange reason it was designed to make two needless
     copies in addition to calling malloc and free. That's now fixed.

  3. Improving TLS output buffering during the SSL handshake only, so
     that only a single is write and read system call is needed until
     blocking on the ping pong.

redbean will now do a better job wiping sensitive memory from a child
process as soon as it's not needed. The nice thing about fork is it's
much faster than reverse proxying so the goal is to use the different
address spaces along with setuid() to minimize the risk that a server
key will be compromised in the event that application code is hacked.
2021-07-11 23:17:47 -07:00