Commit graph

912 commits

Author SHA1 Message Date
Justine Tunney
1312f60245
Strongly link tr and sed into system() and popen() 2024-11-15 21:23:49 -08:00
Justine Tunney
cafdb456ed
Strongly link glob() into system() and popen() 2024-11-15 20:37:34 -08:00
Justine Tunney
5edc0819c0
Define glob64 2024-10-12 15:26:10 -07:00
Justine Tunney
a8bc7ac119
Import some Chromium Zlib changes 2024-10-11 07:04:02 -07:00
Justine Tunney
dcf9596620
Make more fixups and quality assurance 2024-10-07 15:29:53 -07:00
Justine Tunney
12cc2de22e
Make contended mutexes 30% faster on aarch64
On Raspberry Pi 5, benchmark_mu_contended takes 359µs in *NSYNC upstream
and in Cosmopolitan it takes 272µs.
2024-09-26 09:24:25 -07:00
Justine Tunney
dd8c4dbd7d
Write more tests for signal handling
There's now a much stronger level of assurance that signaling on Windows
will be atomic, low-latency, low tail latency, and shall never deadlock.
2024-09-21 05:24:56 -07:00
Justine Tunney
87a6669900
Make more Windows socket fixes and improvements
This change makes send() / sendto() always block on Windows. It's needed
because poll(POLLOUT) doesn't guarantee a socket is immediately writable
on Windows, and it caused rsync to fail because it made that assumption.
The only exception is when a SO_SNDTIMEO is specified which will EAGAIN.

Tests are added confirming MSG_WAITALL and MSG_NOSIGNAL work as expected
on all our supported OSes. Most of the platform-specific MSG_FOO magnums
have been deleted, with the exception of MSG_FASTOPEN. Your --strace log
will now show MSG_FOO flags as symbols rather than numbers.

I've also removed cv_wait_example_test because it's 0.3% flaky with Qemu
under system load since it depends on a process being readily scheduled.
2024-09-18 20:29:42 -07:00
Justine Tunney
bb7942e557
Improve socket option story 2024-09-17 01:17:07 -07:00
Justine Tunney
949c398327
Clean up more code 2024-09-15 02:45:16 -07:00
Gabriel Ravier
e3d28de8a6
Fix UB in gdtoa hexadecimal float scanf and strtod (#1288)
When reading hexadecimal floats, cosmopolitan would previously sometimes
print a number of warnings relating to undefined behavior on left shift:

third_party/gdtoa/gethex.c:172: ubsan warning: signed left shift changed
sign bit or overflowed 12 'int' 28 'int' is undefined behavior

This is because gdtoa assumes left shifts are safe when overflow happens
even on signed integers - this is false: the C standard considers it UB.
This is easy to fix, by simply casting the shifted value to unsigned, as
doing so does not change the value or the semantics of the left shifting
(except for avoiding the undefined behavior, as the C standard specifies
that unsigned overflow yields wraparound, avoiding undefined behaviour).

This commit does this, and adds a testcase that previously triggered UB.
(this also adds test macros to test for exact float equality, instead of
the existing {EXPECT,ASSERT}_FLOAT_EQ macros which only tests inputs for
being "almost equal" (with a significant epsilon) whereas exact equality
makes more sense for certain things such as reading floats from strings,
and modifies other testcases for sscanf/fscanf of floats to utilize it).
2024-09-14 17:11:04 -07:00
Justine Tunney
ed1f992cb7
Fix default open mode in redbean unix.open() 2024-09-14 00:10:21 -07:00
Justine Tunney
b5fcb59a85
Implement more bf16/fp16 compiler runtimes
Fixes #1259
2024-09-13 05:06:34 -07:00
Justine Tunney
a5c0189bf6
Make vim startup faster
It appears that GetFileAttributes(u"\\etc\\passwd") can take two seconds
on Windows 10 at unpredictable times for reasons which are mysterious to
me. Let's try avoiding that path entirely and pray to Microsoft it works
2024-09-11 00:52:34 -07:00
Justine Tunney
fbdf9d028c
Rewrite Windows poll()
We can now await signals, files, pipes, and console simultaneously. This
change also gives a deeper review and testing to changes made yesterday.
2024-09-10 20:04:02 -07:00
Justine Tunney
cceddd21b2
Reduce latency of poll() on Windows
When polling sockets poll() can now let you know about an event in about
10µs rather than 10ms. If you're not polling sockets then poll() reports
console events now in microseconds instead of milliseconds.
2024-09-10 04:12:21 -07:00
Justine Tunney
2f48a02b44
Make recursive mutexes faster
Recursive mutexes now go as fast as normal mutexes. The tradeoff is they
are no longer safe to use in signal handlers. However you can still have
signal safe mutexes if you set your mutex to both recursive and pshared.
You can also make functions that use recursive mutexes signal safe using
sigprocmask to ensure recursion doesn't happen due to any signal handler

The impact of this change is that, on Windows, many functions which edit
the file descriptor table rely on recursive mutexes, e.g. open(). If you
develop your app so it uses pread() and pwrite() then your app should go
very fast when performing a heavily multithreaded and contended workload

For example, when scaling to 40+ cores, *NSYNC mutexes can go as much as
1000x faster (in CPU time) than the naive recursive lock implementation.
Now recursive will use *NSYNC under the hood when it's possible to do so
2024-09-10 00:08:59 -07:00
Justine Tunney
95fee8614d
Test recursive mutex code more 2024-09-09 00:19:23 -07:00
Gabriel Ravier
4754f200ee
Fix printf-family long double prec/rounding issues (#1283)
Currently, in cosmopolitan, there is no handling of the current rounding
mode for long double conversions, such that round-to-nearest gets always
used, regardless of the current rounding mode. %Le also improperly calls
gdtoa with a too small precision (which led to relatively similar bugs).

This patch fixes these issues, in particular by modifying the FPI object
passed to gdtoa such that it is modifiable (so that __fmt can adjust its
rounding field to correspond to FLT_ROUNDS (note that this is not needed
for dtoa, which checks FLT_ROUNDS directly)) and ors STRTOG_Neg into the
kind field in both of the __fmt_dfpbits and __fmt_ldfpbits functions, as
the gdtoa function also depends on it to be able to accurately round any
negative arguments. The change to kind also requires a few other changes
to make sure kind's upper bits (which include STRTOG_Neg) are masked off
when attempting to only examine the lower bits' value. Furthermore, this
patch also makes exactly one change in gdtoa, which appears to be needed
to fix rounding issues with FE_TOWARDZERO (this seems like a gdtoa bug).

The patch also adds a few tests for these issues, along with also taking
the opportunity to clean up some of the previous tests to do the asserts
in the right order (i.e. with the first argument as the expected result,
and the second one being used as the value that it is compared against).
2024-09-07 18:26:04 -07:00
Justine Tunney
d1157d471f
Upgrade pl_mpeg
This change gets printvideo working on aarch64. Performance improvements
have been introduced for magikarp decimation on aarch64. The last of the
old portable x86 intrinsics library is gone, but it still lives in Blink
2024-09-06 19:10:34 -07:00
Justine Tunney
dd8544c3bd
Delve into clock rabbit hole
The worst issue I had with consts.sh for clock_gettime is how it defined
too many clocks. So I looked into these clocks all day to figure out how
how they overlap in functionality. I discovered counter-intuitive things
such as how CLOCK_MONOTONIC should be CLOCK_UPTIME on MacOS and BSD, and
that CLOCK_BOOTTIME should be CLOCK_MONOTONIC on MacOS / BSD. Windows 10
also has some incredible new APIs, that let us simplify clock_gettime().

  - Linux CLOCK_REALTIME         -> GetSystemTimePreciseAsFileTime()
  - Linux CLOCK_MONOTONIC        -> QueryUnbiasedInterruptTimePrecise()
  - Linux CLOCK_MONOTONIC_RAW    -> QueryUnbiasedInterruptTimePrecise()
  - Linux CLOCK_REALTIME_COARSE  -> GetSystemTimeAsFileTime()
  - Linux CLOCK_MONOTONIC_COARSE -> QueryUnbiasedInterruptTime()
  - Linux CLOCK_BOOTTIME         -> QueryInterruptTimePrecise()

Documentation on the clock crew has been added to clock_gettime() in the
docstring and in redbean's documentation too. You can read that to learn
interesting facts about eight essential clocks that survived this purge.
This is original research you will not find on Google, OpenAI, or Claude

I've tested this change by porting *NSYNC to become fully clock agnostic
since it has extensive tests for spotting irregularities in time. I have
also included these tests in the default build so they no longer need to
be run manually. Both CLOCK_REALTIME and CLOCK_MONOTONIC are good across
the entire amd64 and arm64 test fleets.
2024-09-04 01:32:46 -07:00
Justine Tunney
3c61a541bd
Introduce pthread_condattr_setclock()
This is one of the few POSIX APIs that was missing. It lets you choose a
monotonic clock for your condition variables. This might improve perf on
some platforms. It might also grant more flexibility with NTP configs. I
know Qt is one project that believes it needs this. To introduce this, I
needed to change some the *NSYNC APIs, to support passing a clock param.
There's also new benchmarks, demonstrating Cosmopolitan's supremacy over
many libc implementations when it comes to mutex performance. Cygwin has
an alarmingly bad pthread_mutex_t implementation. It is so bad that they
would have been significantly better off if they'd used naive spinlocks.
2024-09-02 23:45:42 -07:00
Justine Tunney
90460ceb3c
Make Cosmo mutexes competitive with Apple Libc
While we have always licked glibc and musl libc on gnu/systemd sadly the
Apple Libc implementation of pthread_mutex_t is better than ours. It may
be due to how the XNU kernel and M2 microprocessor are in league when it
comes to scheduling processes and the NSYNC behavior is being penalized.
We can solve this by leaning more heavily on ulock using Drepper's algo.
It's kind of ironic that Linux's official mutexes work terribly on Linux
but almost as good as Apple Libc if used on MacOS.
2024-09-02 19:03:11 -07:00
Justine Tunney
2ec413b5a9
Fix bugs in poll(), select(), ppoll(), and pselect()
poll() and select() now delegate to ppoll() and pselect() for assurances
that both polyfill implementations are correct and well-tested. Poll now
polyfills XNU and BSD quirks re: the hanndling of POLLNVAL and the other
similar status flags. This change resolves a misunderstanding concerning
how select(exceptfds) is intended to map to POLPRI. We now use E2BIG for
bouncing requests that exceed the 64 handle limit on Windows. With pipes
and consoles on Windows our poll impl will now report POLLHUP correctly.

Issues with Windows path generation have been fixed. For example, it was
problematic on Windows to say: posix_spawn_file_actions_addchdir_np("/")
due to the need to un-UNC paths in some additional places. Calling fstat
on UNC style volume path handles will now work. posix_spawn now supports
simulating the opening of /dev/null and other special paths on Windows.

Cosmopolitan no longer defines epoll(). I think wepoll is a nice project
for using epoll() on Windows socket handles. However we need generalized
file descriptor support to make epoll() for Windows work well enough for
inclusion in a C library. It's also not worth having epoll() if we can't
get it to work on XNU and BSD OSes which provide different abstractions.
Even epoll() on Linux isn't that great of an abstraction since it's full
of footguns. Last time I tried to get it to be useful I had little luck.
Considering how long it took to get poll() and select() to be consistent
across platforms, we really have no business claiming to have epoll too.
While it'd be nice to have fully implemented, the only software that use
epoll() are event i/o libraries used by things like nodejs. Event i/o is
not the best paradigm for handling i/o; threads make so much more sense.
2024-09-02 00:29:52 -07:00
Gabriel Ravier
a089c07ddc
Fix printf funcs on memory pressure with floats (#1275)
Cosmopolitan's printf-family functions will currently crash if one tries
formatting a floating point number with a larger precision (large enough
that gdtoa attempts to allocate memory to format the number) while under
memory pressure (i.e. when malloc fails) because gdtoa fails to check if
malloc fails.

The added tests (which would previously crash under cosmopolitan without
this patch) show how to reproduce the issue.

This patch fixes this, and adds the aforementioned tests.
2024-09-01 14:42:14 -07:00
Justine Tunney
7c83f4abc8
Make improvements
- wcsstr() is now linearly complex
- strstr16() is now linearly complex
- strstr() is now vectorized on aarch64 (10x)
- strstr() now uses KMP on pathological cases
- memmem() is now vectorized on aarch64 (10x)
- memmem() now uses KMP on pathological cases
- Disable shared_ptr::owner_before until fixed
- Make iswlower(), iswupper() consistent with glibc
- Remove figure space from iswspace() implementation
- Include line and paragraph separator in iswcntrl()
- Use Musl wcwidth(), iswalpha(), iswpunct(), towlower(), towupper()
2024-09-01 01:27:47 -07:00
Justine Tunney
c9152b6f14
Release Cosmopolitan v3.8.0
This change switches c++ exception handling from sjlj to standard dwarf.
It's needed because clang for aarch64 doesn't support sjlj. It turns out
that libunwind had a bare-metal configuration that made this easy to do.

This change gets the new experimental cosmocc -mclang flag in a state of
working so well that it can now be used to build all of llamafile and it
goes 3x faster in terms of build latency, without trading away any perf.

The int_fast16_t and int_fast32_t types are now always defined as 32-bit
in the interest of having more abi consistency between cosmocc -mgcc and
-mclang mode.
2024-08-30 20:14:07 -07:00
Justine Tunney
884d89235f
Harden against aba problem 2024-08-26 20:01:55 -07:00
Justine Tunney
610c951f71
Fix the build 2024-08-26 16:44:05 -07:00
Justine Tunney
111ec9a989
Fix bug we added to *NSYNC a while ago
This is believed to fix a crash, that's possible in nsync_waiter_free_()
when you call pthread_cond_timedwait(), or nsync_cv_wait_with_deadline()
where an assertion can fail. Thanks ipv4.games for helping me find this!
2024-08-26 12:25:50 -07:00
Justine Tunney
863c704684
Add string similarity function 2024-08-17 16:45:07 -07:00
Justine Tunney
60e697f7b2
Move LoadZipArgs() to cosmo.h 2024-08-17 12:06:27 -07:00
Justine Tunney
eb6e96f036
Change InfoZIP to not auto-append .zip to pathname 2024-08-17 06:45:23 -07:00
Justine Tunney
77be460290
Make Windows REPLs great again 2024-08-17 06:32:10 -07:00
Justine Tunney
1d532ba3f8
Disable some anti-Musl Lua tests 2024-08-17 02:20:08 -07:00
Justine Tunney
b2a1811c01
Add missing pragma 2024-08-16 21:49:28 -07:00
Justine Tunney
2eda50929b
Add stdfloat header
Fixes #1260
2024-08-16 21:38:00 -07:00
Justine Tunney
1671283f1a
Avoid clobbering errno 2024-08-15 23:54:14 -07:00
Justine Tunney
0a79c6961f
Make malloc scalable on all platforms
It turns out sched_getcpu() didn't work on many platforms. So the system
call now has tests and is well documented. We now employ new workarounds
on platforms where it isn't supported in our malloc() implementation. It
was previously the case that malloc() was only scalable on Linux/Windows
for x86-64. Now the other platforms are scalable too.
2024-08-15 23:32:53 -07:00
Justine Tunney
31194165d2
Remove .internal from more header filenames 2024-08-04 12:52:25 -07:00
Justine Tunney
4ed4a1095a
Improve build latency 2024-07-31 01:21:27 -07:00
Justine Tunney
8d8aecb6d9
Avoid legacy instruction penalties on x86 2024-07-31 01:02:38 -07:00
Justine Tunney
bb815eafaf
Update Musl Libc code
We now have implement all of Musl's localization code, the same way that
Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"),
just in case anything stops working as expected.
2024-07-30 22:51:29 -07:00
Justine Tunney
1092d9b7e8
Remove old usages of %n 2024-07-29 01:39:36 -07:00
Justine Tunney
cf1559c448
Remove __threaded variable 2024-07-28 23:43:30 -07:00
Justine Tunney
c1a0b017e9
Fix the build 2024-07-28 21:02:04 -07:00
Justine Tunney
77d3a07ff2
Fix std::filesystem
This change makes a second pass, at fixing the errno issue with libcxx's
filesystem code. Previously, 89.01% of LLVM's test suite was passing and
now 98.59% of their tests pass. Best of all, it's now possible for Clang
to be built as a working APE binary that can to compile the Cosmopolitan
repository. Please note it has only been vetted so far for some objects,
and more work would obviously need to be done in cosmo, to fix warnings.
2024-07-28 17:31:21 -07:00
Justine Tunney
0d7c272d3f
Don't use sendfile() in libcxx 2024-07-28 17:31:21 -07:00
Justine Tunney
f147d3dde9
Fix some static analysis issues 2024-07-27 09:16:54 -07:00
Justine Tunney
cdfcee51ca
Properly serialize fork() operations
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.

   threads      count task
         1       1062 fork+exit+wait
         2        668 fork+exit+wait
         4         66 fork+exit+wait
         8         19 fork+exit+wait
        16         22 fork+exit+wait
        32         16 fork+exit+wait

Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions

   threads      count task
         1       1085 fork+exit+wait
         2        842 fork+exit+wait
         4        532 fork+exit+wait
         8        400 fork+exit+wait
        16        276 fork+exit+wait
        32         66 fork+exit+wait

With OpenBSD which also lacks vfork(), things were just as bad as NetBSD

   threads      count task
         1        584 fork+exit+wait
         2        687 fork+exit+wait
         4        206 fork+exit+wait
         8         24 fork+exit+wait
        16         33 fork+exit+wait
        32         26 fork+exit+wait

But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC

   threads      count task
         1        525 fork+exit+wait
         2        580 fork+exit+wait
         4        451 fork+exit+wait
         8        479 fork+exit+wait
        16        408 fork+exit+wait
        32        373 fork+exit+wait

This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating

Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.

   threads      count task
         1       2559 vfork+exit+wait
         2       5389 vfork+exit+wait
         4      34933 vfork+exit+wait
         8      43273 vfork+exit+wait
        16      49648 vfork+exit+wait
        32      40247 vfork+exit+wait

So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
2024-07-27 08:23:44 -07:00