Commit graph

872 commits

Author SHA1 Message Date
Justine Tunney
4ed4a1095a
Improve build latency 2024-07-31 01:21:27 -07:00
Justine Tunney
8d8aecb6d9
Avoid legacy instruction penalties on x86 2024-07-31 01:02:38 -07:00
Justine Tunney
bb815eafaf
Update Musl Libc code
We now have implement all of Musl's localization code, the same way that
Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"),
just in case anything stops working as expected.
2024-07-30 22:51:29 -07:00
Justine Tunney
1092d9b7e8
Remove old usages of %n 2024-07-29 01:39:36 -07:00
Justine Tunney
cf1559c448
Remove __threaded variable 2024-07-28 23:43:30 -07:00
Justine Tunney
c1a0b017e9
Fix the build 2024-07-28 21:02:04 -07:00
Justine Tunney
77d3a07ff2
Fix std::filesystem
This change makes a second pass, at fixing the errno issue with libcxx's
filesystem code. Previously, 89.01% of LLVM's test suite was passing and
now 98.59% of their tests pass. Best of all, it's now possible for Clang
to be built as a working APE binary that can to compile the Cosmopolitan
repository. Please note it has only been vetted so far for some objects,
and more work would obviously need to be done in cosmo, to fix warnings.
2024-07-28 17:31:21 -07:00
Justine Tunney
0d7c272d3f
Don't use sendfile() in libcxx 2024-07-28 17:31:21 -07:00
Justine Tunney
f147d3dde9
Fix some static analysis issues 2024-07-27 09:16:54 -07:00
Justine Tunney
cdfcee51ca
Properly serialize fork() operations
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.

   threads      count task
         1       1062 fork+exit+wait
         2        668 fork+exit+wait
         4         66 fork+exit+wait
         8         19 fork+exit+wait
        16         22 fork+exit+wait
        32         16 fork+exit+wait

Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions

   threads      count task
         1       1085 fork+exit+wait
         2        842 fork+exit+wait
         4        532 fork+exit+wait
         8        400 fork+exit+wait
        16        276 fork+exit+wait
        32         66 fork+exit+wait

With OpenBSD which also lacks vfork(), things were just as bad as NetBSD

   threads      count task
         1        584 fork+exit+wait
         2        687 fork+exit+wait
         4        206 fork+exit+wait
         8         24 fork+exit+wait
        16         33 fork+exit+wait
        32         26 fork+exit+wait

But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC

   threads      count task
         1        525 fork+exit+wait
         2        580 fork+exit+wait
         4        451 fork+exit+wait
         8        479 fork+exit+wait
        16        408 fork+exit+wait
        32        373 fork+exit+wait

This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating

Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.

   threads      count task
         1       2559 vfork+exit+wait
         2       5389 vfork+exit+wait
         4      34933 vfork+exit+wait
         8      43273 vfork+exit+wait
        16      49648 vfork+exit+wait
        32      40247 vfork+exit+wait

So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
2024-07-27 08:23:44 -07:00
Justine Tunney
efb3a34608
Fix package.sh build error 2024-07-26 06:57:24 -07:00
Justine Tunney
642e9cb91a
Introduce cosmocc flags -mdbg -mtiny -moptlinux
The cosmocc.zip toolchain will now include four builds of the libcosmo.a
runtime libraries. You can pass the -mdbg flag if you want to debug your
cosmopolitan runtime. You can pass the -moptlinux flag if you don't want
windows code lurking in your binary. See tool/cosmocc/README.md for more
details on how these flags may be used and their important implications.
2024-07-26 05:10:25 -07:00
Justine Tunney
edd5e2c8e3
Expose gethostbyname() 2024-07-25 19:21:35 -07:00
Justine Tunney
2c4b88753b
Add special errno handling to libcxx 2024-07-25 01:23:02 -07:00
Justine Tunney
d3a13e8d70
Improve lock hierarchy
- NetBSD no longer needs a spin lock to create semaphores
- Windows fork() now locks process manager in correct order
2024-07-24 16:05:48 -07:00
Justine Tunney
5dd7ddb9ea
Remove bad defines from early days of project
These definitions were causing issues with building LLVM. It is possible
they also caused crashes we've seen with our MacOS ARM64 OpenMP support.
2024-07-24 12:11:21 -07:00
Justine Tunney
f25fbbaaeb
Use libcxx abi v1 2024-07-24 09:49:48 -07:00
Justine Tunney
e398f3887c
Make more improvements to threads and mappings
- NetBSD should now have faster synchronization
- POSIX barriers may now be shared across processes
- An edge case with memory map tracking has been fixed
- Grand Central Dispatch is no longer used on MacOS ARM64
- POSIX mutexes in normal mode now use futexes across processes
2024-07-24 01:19:54 -07:00
Justine Tunney
2187d6d2dd
Fix MODE=optlinux for GitHub Actions 2024-07-23 04:17:22 -07:00
Justine Tunney
0602ff6bab
Fix MODE=optlinux and MODE=tiny builds 2024-07-23 04:04:19 -07:00
Justine Tunney
5660ec4741
Release Cosmopolitan v3.6.0
This release is an atomic upgrade to GCC 14.1.0 with C23 and C++23
2024-07-23 03:28:19 -07:00
Justine Tunney
7ebaff34c6
Fix ctype.h and wctype.h 2024-07-21 15:54:17 -07:00
Justine Tunney
86d884cce2
Get rid of .internal.h convention in LIBC_INTRIN 2024-07-19 19:38:00 -07:00
Justine Tunney
1ff037df3c
Add some documentation 2024-07-19 04:46:26 -07:00
Justine Tunney
f590e96abd
Work around QEMU bugs 2024-07-07 15:42:46 -07:00
Justine Tunney
f7780de24b
Make realloc() go 100x faster on Linux/NetBSD
Cosmopolitan now supports mremap(), which is only supported on Linux and
NetBSD. First, it allows memory mappings to be relocated without copying
them; this can dramatically speed up data structures like std::vector if
the array size grows larger than 256kb. The mremap() system call is also
10x faster than munmap() when shrinking large memory mappings.

There's now two functions, getpagesize() and getgransize() which help to
write portable code that uses mmap(MAP_FIXED). Alternative sysconf() may
be called with our new _SC_GRANSIZE. The madvise() system call now has a
better wrapper with improved documentation.
2024-07-07 12:40:30 -07:00
Justine Tunney
8c645fa1ee
Make mmap() scalable
It's now possible to create thousands of thousands of sparse independent
memory mappings, without any slowdown. The memory manager is better with
tracking memory protection now, particularly on Windows in a precise way
that can be restored during fork(). You now have the highest quality mem
manager possible. It's even better than some OSes like XNU, where mmap()
is implemented as an O(n) operation which means sadly things aren't much
improved over there. With this change the llamafile HTTP server endpoint
at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec
2024-07-05 23:26:00 -07:00
Justine Tunney
01587de761
Simplify memory manager 2024-07-05 05:47:15 -07:00
Justine Tunney
15ea0524b3
Reduce code size of mandatory runtime
This change reduces o/tiny/examples/life from 44kb to 24kb in size since
it avoids linking mmap() when unnecessary. This is important, to helping
cosmo not completely lose touch with its roots.
2024-07-04 02:50:20 -07:00
Justine Tunney
ca4cf67eb8
Include more programs in cosmocc
The Cosmopolitan Compiler Collection now includes the following programs

- `ar.ape` is a faster alternative to `ar rcsD` for creating determistic
  static archives. It's ~10x faster than GNU because it isn't quadratic.
  It'll even outperform LLVM ar by 2x, thanks to writev/copy_file_range.

- `sha256sum.ape` is a faster alternative to the `sha256sum` command. It
  goes 2x faster since it leverages vectorized assembly implementations.

- `resymbol` is a brand new program we invented, like objcopy, that lets
  you rename all the global symbols in a .o file to have a new suffix or
  prefix. In the future, this will be used by cosmocc automatically when
  building -O3 math kernels, that need to be vectorized for all hardware

- `gzip.ape` is a faster version of the `gzip` command, that is included
  by most Linux distros. It gains better performance using Chromium Zlib
  which, once again, includes highly optimized assembly, that Mark Adler
  won't merge into the official MS-DOS compatible zlib codebase.

- `cocmd` is the cosmopolitan shell. It can function as a faster `sh -c`
  alternative than bash and dash as the `SHELL = /opt/cosmocc/bin/cocmd`
  at the top of your Makefile. Please note you should be using the cosmo
  fork of GNU make (already included), since normal make won't recognize
  this as a bourne-compatible shell and remove the execve() optimization
  which makes things slower. In some ways that's true. This doesn't have
  a complete POSIX shell implementation. However it's enough for cosmo's
  mono repo. It also implements faster behaviors in some respects.

The following programs are also introduced, which aren't as interesting.
The main reason why they're here is so Cosmopolitan's mono repo shall be
able to remove build/bootstrap/ in future editions. That way we can keep
build utilities better up to date, without bloating the git history much

- `chmod.ape` for hermeticity
- `cp.ape` for hermeticity
- `echo.ape` for hermeticity
- `objbincopy` is an objcopy-like tool that's used to build ape loader
- `package.ape` is used for strict dependency checking of object graph
- `rm.ape` for hermeticity
- `touch.ape` for hermeticity
2024-07-01 02:05:25 -07:00
Justine Tunney
199662071a
Make std::random_device use getentropy() 2024-06-24 07:32:07 -07:00
Justine Tunney
d461c6f47d
Do more quality assurance work 2024-06-24 06:53:49 -07:00
Justine Tunney
c4c812c154
Introduce ctl::set and ctl::map
We now have a C++ red-black tree implementation that implements standard
template library compatible APIs while compiling 10x faster than libcxx.
It's not as beautiful as the red-black tree implementation in Plinko but
this will get the job done and the test proves it upholds all invariants

This change also restores CheckForMemoryLeaks() support and fixes a real
actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.
2024-06-23 22:27:11 -07:00
Justine Tunney
388e236360
Revert misguided dlmalloc optimization 2024-06-22 09:55:02 -07:00
Justine Tunney
d1d4388201
Delete ASAN
It hasn't been helpful enough to be justify the maintenance burden. What
actually does help is mprotect(), kprintf(), --ftrace and --strace which
can always be counted upon to work correctly. We aren't losing much with
this change. Support for ASAN on AARCH64 was never implemented. Applying
ASAN to the core libc runtimes was disabled many months ago. If there is
some way to have an ASAN runtime for user programs that is less invasive
we can potentially consider reintroducing support. But now is premature.
2024-06-22 05:45:49 -07:00
Justine Tunney
6ffed14b9c
Rewrite memory manager
Actually Portable Executable now supports Android. Cosmo's old mmap code
required a 47 bit address space. The new implementation is very agnostic
and supports both smaller address spaces (e.g. embedded) and even modern
56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper

Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb
granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native
page size. This fixes a longstanding POSIX conformance issue, concerning
file mappings that overlap the end of file. Other aspects of conformance
have been improved too, such as the subtleties of address assignment and
and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE

On Windows, mappings larger than 100 megabytes won't be broken down into
thousands of independent 64kb mappings. Support for MAP_STACK is removed
by this change; please use NewCosmoStack() instead.

Stack overflow avoidance is now being implemented using the POSIX thread
APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old
error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.
2024-06-22 05:45:11 -07:00
Michael Lenaghan
0dbf01bf1d
Bring Lua to 5.4.6. (#1214)
This essentially re-does the work of #875 on top of master.

This is what I did to check that Cosmo's Lua extensions still worked:

```
$ build/bootstrap/make MODE=aarch64 o/aarch64/third_party/lua/lua
$ ape o/aarch64/third_party/lua/lua
>: 10
10
>: 010
8
>: 0b10
2
>: string.byte("\e")
27
>: "Hello, %s" % {"world"}
Hello, world
>: "*" * 3
***
```

`luaL_traceback2` was used to show the stack trace with parameter
values; it's used in `LuaCallWithTrace`, which is used in Redbean to run
Lua code. You should be able to see the extended stack trace by running
something like this: `redbean -e "function a(b)c()end a(2)"` (with
"params" indicating the extended stack trace):

```
stack traceback:
 [string "function a(b)c()end a(2)"]:1: in function 'a', params: b = 2;
 [string "function a(b)c()end a(2)"]:1: in main chunk
```
@pkulchenko confirmed that I get the expected result with the updated
code.

This is what I did to check that Lua itself still worked:

```
$ cd third_party/lua/test/
$ ape ../../../o/aarch64/third_party/lua/lua all.lua
```

There's one test failure, in `files.lua`:

```
***** FILE 'files.lua'*****
testing i/o
../../../o/aarch64/third_party/lua/lua: files.lua:84: assertion failed!
stack traceback:
[C]: in function 'assert'
files.lua:84: in main chunk
(...tail calls...)
all.lua:195: in main chunk
[C]: in ?
.>>> closing state <<<
```

That isn't a result of these changes; the same test is failing in
master.

The failure is here:

```lua
if not _port then   -- invalid seek
  local status, msg, code = io.stdin:seek("set", 1000)
  assert(not status and type(msg) == "string" and type(code) == "number")
end
```

The test expects a seek to offset 1,000 on stdin to fail — but it
doesn't. `status` ends up being the new offset rather than `nil`.

If I comment out that one test, the remaining tests succeed.
2024-06-15 20:13:08 -04:00
Justine Tunney
cc2c1893c5
Fix some nits 2024-06-05 04:05:49 -07:00
Justine Tunney
3609f65de3
Make malloc() go 200x faster
If pthread_create() is linked into the binary, then the cosmo runtime
will create an independent dlmalloc arena for each core. Whenever the
malloc() function is used it will index `g_heaps[sched_getcpu() / 2]`
to find the arena with the greatest hyperthread / numa locality. This
may be configured via an environment variable. For example if you say
`export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways.
Your process may be configured to have anywhere between 1 - 128 heaps

We need this revision because it makes multithreaded C++ applications
faster. For example, an HTTP server I'm working on that makes extreme
use of the STL went from 16k to 2000k requests per second, after this
change was made. To understand why, try out the malloc_test benchmark
which calls malloc() + realloc() in a loop across many threads, which
sees a a 250x improvement in process clock time and 200x on wall time

The tradeoff is this adds ~25ns of latency to individual malloc calls
compared to MODE=tiny, once the cosmo runtime has transitioned into a
fully multi-threaded state. If you don't need malloc() to be scalable
then cosmo provides many options for you. For starters the heap count
variable above can be set to put the process back in single heap mode
plus you can go even faster still, if you include tinymalloc.inc like
many of the programs in tool/build/.. are already doing since that'll
shave tens of kb off your binary footprint too. Theres also MODE=tiny
which is configured to use just 1 plain old dlmalloc arena by default

Another tradeoff is we need more memory now (except in MODE=tiny), to
track the provenance of memory allocation. This is so allocations can
be freely shared across threads, and because OSes can reschedule code
to different CPUs at any time.
2024-06-05 02:02:14 -07:00
Justine Tunney
9aa353d88b
Document __demangle() and fix a const func ptr bug 2024-06-02 04:15:48 -07:00
Justine Tunney
ea081b262c
Add some noexcept annotations 2024-06-01 03:19:53 -07:00
Justine Tunney
fae1c32267
Encode ±INFINITY as ±1e5000
The V8 behavior of encoding infinity as null doesn't make sense to me.
Using ±1e5000 is better, because JSON.parse decodes it as INFINITY and
the information is preserved. This could be a breaking change for some
2024-06-01 03:19:50 -07:00
Justine Tunney
e4d25d68e4
Drop support for Windows 8
Microsoft caused some very gentle breakages for Cosmopolitan. They
removed the version information from the PEB which caused uname to
report WINDOWS 0.0.0. We should have called GetVersionExW but that
doesn't really exist anymore either. Windows policy is now to give
whatever version we used in ape/ape.S. Windows8 has been EOL since
2023-01-10 so lets avoid our modern executables being relegated to
legacy infrastructure. Requiring Windows 10+ going forward lets us
remove runtime compatibility bloat from the codebase. Further note
Cosmopolitan maintains a Windows Vista branch on GitHub, so anyone
preferring the older versions, can still have a future with Cosmo.

Another neat thing this fixes is UTF-8 support in the console. The
changes Microsoft made broke the if statement that enabled UTF8 in
terminals. This explains why bug reports had broken arrows. In the
future this should be less of an issue, since the PEB code is gone
which means we more strictly conform to only Microsoft's WIN32 API
2024-05-29 19:37:47 -07:00
Justine Tunney
f31a98d50a
Fix bug with realpath() on Windows 2024-05-29 18:47:01 -07:00
Justine Tunney
a05ce3ad9d
Support avx512f + vpclmulqdq crc32() acceleration
Cosmo's _Cz_crc32() function now goes 73 GiB/s on Threadripper. This
will significantly improve the performance of the PKZIP file format.
This algorithm is also used by apelink, to create deterministic ids.
2024-05-29 10:13:37 -07:00
Justine Tunney
b74b974cfd
Introduce #include <tinygetopt.h>
The normal getopt() function is bloated because it links printf(). This
change exports the original authentic bsd getopt function, that cosmo's
always used internally so cosmocc users don't need to include internals
2024-05-29 10:11:17 -07:00
Justine Tunney
07cef612c3
Make dlmalloc 2.4x faster for multithreading
This change adds a TLS freelist for small dynamic memory allocations.
Cosmopolitan's TIB is now 512 bytes in size. Single-threaded malloc()
performance isn't impacted by this, until pthread_create() is called.
Single-threaded programs may also want to consider using:

    #include "libc/mem/tinymalloc.inc"

Which will shave 30k off the executable size and sometimes go faster.
2024-05-28 11:18:34 -07:00
Justine Tunney
deaef81463
Favor siginfo_t over struct siginfo 2024-05-28 02:34:17 -07:00
Justine Tunney
c638eabfe0
Fix compiler warning 2024-05-27 02:23:24 -07:00
Justine Tunney
8e68384e15
Upgrade to 2022-era LLVM LIBCXX 2024-05-27 02:12:27 -07:00