- wcsstr() is now linearly complex
- strstr16() is now linearly complex
- strstr() is now vectorized on aarch64 (10x)
- strstr() now uses KMP on pathological cases
- memmem() is now vectorized on aarch64 (10x)
- memmem() now uses KMP on pathological cases
- Disable shared_ptr::owner_before until fixed
- Make iswlower(), iswupper() consistent with glibc
- Remove figure space from iswspace() implementation
- Include line and paragraph separator in iswcntrl()
- Use Musl wcwidth(), iswalpha(), iswpunct(), towlower(), towupper()
This change switches c++ exception handling from sjlj to standard dwarf.
It's needed because clang for aarch64 doesn't support sjlj. It turns out
that libunwind had a bare-metal configuration that made this easy to do.
This change gets the new experimental cosmocc -mclang flag in a state of
working so well that it can now be used to build all of llamafile and it
goes 3x faster in terms of build latency, without trading away any perf.
The int_fast16_t and int_fast32_t types are now always defined as 32-bit
in the interest of having more abi consistency between cosmocc -mgcc and
-mclang mode.
The server was originally written before I implemented support for POSIX
thread cancelation. We now use standard pthreads APIs instead of talking
directly to *NSYNC. This means we're no longer using *NSYNC notes, which
aren't as good as the POSIX thread cancelation support I added to *NSYNC
which was only made possible by making *NSYNC part of libc. I believe it
will solve a crash we observed recently with ipv4.games, courtesy of the
individual who goes by the hacker alias Lambro.
This was suppressed recently and it's the worst possible idea when doing
greenfield software development with C. I'm so sorry it slipped through.
If the C standards committee was smart they would change the standard so
that implicit int becomes implicit long. Then problems such as this will
never occur and we could even use traditional C safely if we wanted too.
C++ code compiles very slowly with cosmocc, possibly because we're using
LLVM LIBCXX with GCC, and LLVM doesn't work as hard to make GCC go fast.
Therefore, it should be possible, to ask cosmocc to favor Clang over GCC
under the hood. On llamafile, my intention's to use this to make certain
files, e.g. llama.cpp/common.cpp, go from taking 17 seconds to 5 seconds
This new -mclang flag isn't ready for production yet since there's still
the question of how to get Clang to generate SJLJ exception code. If you
use this, then it's recommended you also pass -fno-exceptions.
The tradeoff is we're adding a 121mb binary to the cosmocc distribution.
There are no plans as of yet to fully migrate to Clang since GCC is very
good and has always treated us well.
This is believed to fix a crash, that's possible in nsync_waiter_free_()
when you call pthread_cond_timedwait(), or nsync_cv_wait_with_deadline()
where an assertion can fail. Thanks ipv4.games for helping me find this!
This change implements the compiler runtime for ARM v8.1 ISE atomics and
gets rid of the mandatory -mno-outline-atomics flag. It can dramatically
speed things up, on newer ARM CPUs, as indicated by the changed lines in
test/libc/thread/footek_test.c. In llamafile dispatching on hwcap atomic
also shaved microseconds off synchronization barriers.
So far I haven't found any way to run native Arm64 code on Windows Arm64
without using MSVC. When I build a PE binary from scratch that should be
a valid Windows Arm64 program, the OS refuses to run it. Possibly due to
requiring additional content like XML manifests or relocation or control
flow integrity data that isn't normally required on x64. I've also tried
using VirtualAlloc2() to JIT an Arm64 native function, but VirtualAlloc2
always fails with invalid parameter. I tried using MSVC to create an ARM
DLL that my x64 emulated program can link at runtime, to pass a function
pointer with ARM code, but LoadLibrary() rejects ARM DLLs as invalid exe
The only option left, is likely to write a new program like ape/ape-m1.c
which can be compiled by MSVC to load and run an AARCH64 ELF executable.
The emulated x64 binary would detect emulation using IsWow64Process2 and
then drop the loader executable in a temporary folder, and re-launch the
original executable, using the Arm64 segments of the cosmocc fat binary.
It turns out sched_getcpu() didn't work on many platforms. So the system
call now has tests and is well documented. We now employ new workarounds
on platforms where it isn't supported in our malloc() implementation. It
was previously the case that malloc() was only scalable on Linux/Windows
for x86-64. Now the other platforms are scalable too.
Using https://nightly.link/ with GitHub actions artifacts you can have a
nightly build (but not a _release_ -- there's no releasing or
pre-releasing happening) of cosmocc.
Example URL if this PR were merged:
https://nightly.link/jart/cosmopolitan/workflows/nightly-cosmocc/master/cosmocc.zip
Or you can just download it directly from the GitHub "Actions"
https://github.com/jart/cosmopolitan/actions workflow summary page of a
particular run
example from my own fork:
![image](https://github.com/user-attachments/assets/8ba708dd-8289-4f8b-932c-cf535ee86f62)
could download by clicking on the artifact
or by using third-party service to provide a link for unauthenticated
requests (like wget or curl)
https://nightly.link/jcbhmr/cosmopolitan/workflows/tool-cosmocc-package-sh/master/cosmocc.zip
this would be useful for users who don't want to or can't figure out how
to build cosmocc themselves (like Windows) but still want to use a
nightly build since a fix hasn't been released as a release version yet.
this would also be a good way to test the release process but instead of
pushing the `cosmocc.zip` to _wherever it goes now_ you publish it as a
github actions artifact for the very few nightly bleeding edge users to
use & test.
you don't have to use https://nightly.link or recommend it or anything;
i just know its a cool way to wget or curl the URLs instead of
downloading it via your browser web UI. particularly useful for
remote/ssh/web-ide development.