Commit graph

1545 commits

Author SHA1 Message Date
Justine Tunney
3f26dfbb31
Share file offset across execve() on Windows
This is a breaking change. It defines the new environment variable named
_COSMO_FDS_V2 which is used for inheriting non-stdio file descriptors on
execve() or posix_spawn(). No effort has been spent thus far integrating
with the older variable. If a new binary launches the older ones or vice
versa they'll only be able to pass stdin / stdout / stderr to each other
therefore it's important that you upgrade all your cosmo binaries if you
depend on this functionality. You'll be glad you did because inheritance
of file descriptors is more aligned with the POSIX standard than before.
2024-08-03 17:48:00 -07:00
Justine Tunney
761c6ad615
Share file offset across processes
This change ensures that if a file descriptor for an open disk file gets
shared by multiple processes within a process tree, then lseek() changes
will be visible across processes, and read() / write() are synchronized.
Note this only applies to Windows, because UNIX kernels already do this.
2024-08-03 01:39:11 -07:00
Justine Tunney
a80ab3f8fe
Implement bf16 compiler runtime library 2024-08-02 02:04:53 -07:00
Gautham
9ebacb7892
Convert GCC 14 errors back to warnings (#1247)
https://gcc.gnu.org/gcc-14/porting_to.html#warnings-as-errors

Changing these to warnings helps build code with `cosmocc`. Perhaps this
can be a patch to `cosmocc` or skipped entirely.
2024-08-01 21:42:40 -07:00
Justine Tunney
f8cfc89eba
Allow -c to be specified with -E in cosmocc 2024-07-31 02:09:15 -07:00
Justine Tunney
8d8aecb6d9
Avoid legacy instruction penalties on x86 2024-07-31 01:02:38 -07:00
Justine Tunney
1fba310e22
Delete mislocated headers 2024-07-31 01:01:00 -07:00
Justine Tunney
bb815eafaf
Update Musl Libc code
We now have implement all of Musl's localization code, the same way that
Musl implements localization. You may need setlocale(LC_ALL, "C.UTF-8"),
just in case anything stops working as expected.
2024-07-30 22:51:29 -07:00
Justine Tunney
3dab207351
Remove mkfifo() prototype 2024-07-29 07:42:54 -07:00
Justine Tunney
d40acc60b1
Detect more x86 features 2024-07-29 00:16:29 -07:00
Justine Tunney
cf1559c448
Remove __threaded variable 2024-07-28 23:43:30 -07:00
Justine Tunney
01b09bc817
Support printf %n directive 2024-07-28 22:27:06 -07:00
Justine Tunney
18964e5d76
Fix remove() directory on Windows 2024-07-28 17:31:21 -07:00
Justine Tunney
e18fe1e112
Freshen build/bootstrap/cocmd
See https://news.ycombinator.com/item?id=41055121
2024-07-27 23:22:11 -07:00
Justine Tunney
8621034d42
Release Cosmopolitan v3.6.2 2024-07-27 20:20:54 -07:00
Justine Tunney
f147d3dde9
Fix some static analysis issues 2024-07-27 09:16:54 -07:00
Justine Tunney
cdfcee51ca
Properly serialize fork() operations
This change solves an issue where many threads attempting to spawn forks
at once would cause fork() performance to degrade with the thread count.
Things got real nasty on NetBSD, which slowed down the whole test fleet,
because there's no vfork() and we're forced to use fork() in our server.

   threads      count task
         1       1062 fork+exit+wait
         2        668 fork+exit+wait
         4         66 fork+exit+wait
         8         19 fork+exit+wait
        16         22 fork+exit+wait
        32         16 fork+exit+wait

Things are now much less bad on NetBSD, but not great, since it does not
have futexes; we rely on its semaphore file descriptors to do conditions

   threads      count task
         1       1085 fork+exit+wait
         2        842 fork+exit+wait
         4        532 fork+exit+wait
         8        400 fork+exit+wait
        16        276 fork+exit+wait
        32         66 fork+exit+wait

With OpenBSD which also lacks vfork(), things were just as bad as NetBSD

   threads      count task
         1        584 fork+exit+wait
         2        687 fork+exit+wait
         4        206 fork+exit+wait
         8         24 fork+exit+wait
        16         33 fork+exit+wait
        32         26 fork+exit+wait

But since OpenBSD has futexes fork() works terrifically thanks to *NSYNC

   threads      count task
         1        525 fork+exit+wait
         2        580 fork+exit+wait
         4        451 fork+exit+wait
         8        479 fork+exit+wait
        16        408 fork+exit+wait
        32        373 fork+exit+wait

This issue would most likely only manifest itself, when pthread_atfork()
callers manage to slip a spin lock into the outermost position of fork's
list of locks. Since fork() is very slow, a spin lock can be devastating

Needless to say vfork() rules and anyone who says differently is kidding
themselves. Look at what a FreeBSD 14.1 virtual machine with equal specs
can do over the course of three hundred milliseconds.

   threads      count task
         1       2559 vfork+exit+wait
         2       5389 vfork+exit+wait
         4      34933 vfork+exit+wait
         8      43273 vfork+exit+wait
        16      49648 vfork+exit+wait
        32      40247 vfork+exit+wait

So it's a shame that so few OSes support vfork(). It creates an unsavory
situation, where someone wanting to build a server that spawns processes
would be better served to not use threads and favor a multiprocess model
2024-07-27 08:23:44 -07:00
Justine Tunney
18a620cc1a
Make some improvements of little consequence 2024-07-27 08:20:18 -07:00
Justine Tunney
690d3df66e
Expand the virtual address space on Windows 2024-07-27 08:19:05 -07:00
Justine Tunney
642e9cb91a
Introduce cosmocc flags -mdbg -mtiny -moptlinux
The cosmocc.zip toolchain will now include four builds of the libcosmo.a
runtime libraries. You can pass the -mdbg flag if you want to debug your
cosmopolitan runtime. You can pass the -moptlinux flag if you don't want
windows code lurking in your binary. See tool/cosmocc/README.md for more
details on how these flags may be used and their important implications.
2024-07-26 05:10:25 -07:00
Justine Tunney
59692b0882
Make spinlocks faster (take two)
This change is green on x86 and arm test fleet.
2024-07-26 00:45:24 -07:00
Justine Tunney
02e1cbcd00
Revert "Make spin locks go faster"
This reverts commit c8e25d811c.
2024-07-25 22:24:32 -07:00
Justine Tunney
0679cfeb41
Fix build 2024-07-25 22:12:08 -07:00
Justine Tunney
c8e25d811c
Make spin locks go faster 2024-07-25 17:37:11 -07:00
Justine Tunney
7d88343973
Release Cosmopolitan v3.6.1 2024-07-25 13:34:02 -07:00
Justine Tunney
2c4b88753b
Add special errno handling to libcxx 2024-07-25 01:23:02 -07:00
Justine Tunney
0f486a13c8
Fix fdlibm license 2024-07-24 20:42:08 -07:00
Justine Tunney
1020dd41cc
bzero() should be defined without special defines 2024-07-24 16:15:30 -07:00
Justine Tunney
d3a13e8d70
Improve lock hierarchy
- NetBSD no longer needs a spin lock to create semaphores
- Windows fork() now locks process manager in correct order
2024-07-24 16:05:48 -07:00
Justine Tunney
7ba9a73840
Remove more _Atomic keywords from public headers
It's been thirteen years and C++ still hasn't implemented this wonderful
simple builtin keyword. In C++23 a solution was provided for making this
work in C++ which is libcxx's stdatomic.h. Including that header schleps
in literally 253 unique header files!! Many of the header files it needs
are libc header files like pthread.h where we need to have the _Atomic()
keyword, but since <atomic> depends on pthreads we can't have it include
the <stdatomic.h> header that defines _Atomic for C++ users, and instead
we simply make the type non-atomic, hoping and praying only C code shall
use those internal data structures. This just shows how STL clowns can't
be trusted to define the innermost primitives of a language. They should
instead be focusing on being the best at algorithms and data structures.
2024-07-24 13:56:03 -07:00
Justine Tunney
5dd7ddb9ea
Remove bad defines from early days of project
These definitions were causing issues with building LLVM. It is possible
they also caused crashes we've seen with our MacOS ARM64 OpenMP support.
2024-07-24 12:11:21 -07:00
Justine Tunney
f25fbbaaeb
Use libcxx abi v1 2024-07-24 09:49:48 -07:00
Justine Tunney
fbc4b03d4c
Restore support for AMD K8 2024-07-24 08:59:29 -07:00
Justine Tunney
e398f3887c
Make more improvements to threads and mappings
- NetBSD should now have faster synchronization
- POSIX barriers may now be shared across processes
- An edge case with memory map tracking has been fixed
- Grand Central Dispatch is no longer used on MacOS ARM64
- POSIX mutexes in normal mode now use futexes across processes
2024-07-24 01:19:54 -07:00
Justine Tunney
5660ec4741
Release Cosmopolitan v3.6.0
This release is an atomic upgrade to GCC 14.1.0 with C23 and C++23
2024-07-23 03:28:19 -07:00
Justine Tunney
62ace3623a
Release Cosmopolitan v3.5.9 2024-07-22 21:02:40 -07:00
Justine Tunney
6e809ee49b
Add unit test for process shared conditions 2024-07-22 18:48:54 -07:00
Justine Tunney
61c36c1dd6
Allow pthread_condattr_setpshared() to set shared 2024-07-22 18:41:45 -07:00
Justine Tunney
0a9a6f86bb
Support process shared condition variables 2024-07-22 16:35:29 -07:00
Justine Tunney
3de6632be6
Graduate some clock_gettime() constants to #define
- CLOCK_THREAD_CPUTIME_ID
- CLOCK_PROCESS_CPUTIME_ID

Cosmo now supports the above constants universally across supported OSes
therefore it's now safe to let programs detect their presence w/ #ifdefs
2024-07-22 07:14:35 -07:00
Justine Tunney
62a97c919f
Fix typos in APE specification
Fixes #1244
2024-07-22 01:41:44 -07:00
Justine Tunney
5d2d9e9640
Add back missing TlsAlloc() call
Cosmopolitan Libc once called this important function although somewhere
along the way, possibly in a refactoring, it got removed and __tls_alloc
has always been zero ever since.
2024-07-21 20:45:27 -07:00
Justine Tunney
e08a4cd99e
Release Cosmopolitan v3.5.8 2024-07-21 17:01:33 -07:00
Justine Tunney
7ebaff34c6
Fix ctype.h and wctype.h 2024-07-21 15:54:17 -07:00
Justine Tunney
30afd6ddbb
Improve multithreading 2024-07-21 14:40:45 -07:00
Justine Tunney
d3167126aa
Fix regression with last commit 2024-07-20 16:43:48 -07:00
Justine Tunney
29ce25c767
Start writing formal specification for APE 2024-07-20 10:04:22 -07:00
Justine Tunney
7996bf67b5
Release Cosmopolitan v3.5.7 2024-07-20 03:48:57 -07:00
Justine Tunney
626a5d02ee
Add missing lock statement 2024-07-20 03:47:22 -07:00
Justine Tunney
527aaa41eb
Prevent MODE=tiny ShowCrashReports() looping 2024-07-20 03:34:37 -07:00
Justine Tunney
3374cbba73
Release Cosmopolitan v3.5.6 2024-07-20 02:43:10 -07:00
Justine Tunney
2018cac11f
Use better memory strategy on Windows
Rather than using the the rollo global to pick addresses, we select them
randomly now using a conservative vaspace.
2024-07-20 02:20:03 -07:00
Justine Tunney
6a5d4ed65b
Fix bug with disabling sigaltstack() 2024-07-20 01:00:16 -07:00
Justine Tunney
493ffc9b7f
Release Cosmopolitan v3.5.5 2024-07-19 22:33:17 -07:00
Justine Tunney
101fb3d9b3
Make some new Windows 10 memory APIs available 2024-07-19 22:26:49 -07:00
Justine Tunney
86d884cce2
Get rid of .internal.h convention in LIBC_INTRIN 2024-07-19 19:38:00 -07:00
Justine Tunney
0ed916ad5c
Fix a bug in example code 2024-07-19 19:11:28 -07:00
Justine Tunney
1029dcc597
Reduce default stack size from 256kb to 81kb
This is the same as Musl Libc. Please note it only applies to threads.
2024-07-19 14:18:06 -07:00
Ikko Eltociear Ashimine
c697133a2d
Fix typo in accept4-sysv.c (#1235) 2024-07-19 05:46:29 -07:00
Justine Tunney
1ff037df3c
Add some documentation 2024-07-19 04:46:26 -07:00
Justine Tunney
567d8fe32d
Create variables for page size 2024-07-18 21:16:53 -07:00
Justine Tunney
23dfb79d33
Fix minor suboptimalities in memory manager 2024-07-18 19:19:51 -07:00
Justine Tunney
76cea6c687
Squeeze more performance out of memory manager 2024-07-08 03:08:42 -07:00
Justine Tunney
3f2a1b696e
Fix greenbean example
The memory leak detector was crashing. When using gc() you shouldn't use
the CheckForMemoryLeaks() function from inside the same function, due to
how it runs the atexit handlers.
2024-07-07 17:52:33 -07:00
Justine Tunney
f7780de24b
Make realloc() go 100x faster on Linux/NetBSD
Cosmopolitan now supports mremap(), which is only supported on Linux and
NetBSD. First, it allows memory mappings to be relocated without copying
them; this can dramatically speed up data structures like std::vector if
the array size grows larger than 256kb. The mremap() system call is also
10x faster than munmap() when shrinking large memory mappings.

There's now two functions, getpagesize() and getgransize() which help to
write portable code that uses mmap(MAP_FIXED). Alternative sysconf() may
be called with our new _SC_GRANSIZE. The madvise() system call now has a
better wrapper with improved documentation.
2024-07-07 12:40:30 -07:00
Justine Tunney
6be030cd7c
Fix MODE=tinylinux build 2024-07-06 01:51:08 -07:00
Justine Tunney
8c645fa1ee
Make mmap() scalable
It's now possible to create thousands of thousands of sparse independent
memory mappings, without any slowdown. The memory manager is better with
tracking memory protection now, particularly on Windows in a precise way
that can be restored during fork(). You now have the highest quality mem
manager possible. It's even better than some OSes like XNU, where mmap()
is implemented as an O(n) operation which means sadly things aren't much
improved over there. With this change the llamafile HTTP server endpoint
at /tokenize with a prompt of 50 tokens is now able to handle 2.6m r/sec
2024-07-05 23:26:00 -07:00
Justine Tunney
3756870635
Implement new red-black tree 2024-07-05 12:56:03 -07:00
Justine Tunney
fc65422660
Remove __mmap() and __munmap() 2024-07-05 12:55:46 -07:00
Justine Tunney
01587de761
Simplify memory manager 2024-07-05 05:47:15 -07:00
Justine Tunney
5a9a08d1cf
Fix regression in elf2pe program 2024-07-04 04:02:20 -07:00
Justine Tunney
bd6d9ff99a
Get deathstar demo working again on metal 2024-07-04 03:44:17 -07:00
Justine Tunney
15ea0524b3
Reduce code size of mandatory runtime
This change reduces o/tiny/examples/life from 44kb to 24kb in size since
it avoids linking mmap() when unnecessary. This is important, to helping
cosmo not completely lose touch with its roots.
2024-07-04 02:50:20 -07:00
Justine Tunney
70f77aad33
Release Cosmopolitan v3.5.4 2024-07-01 07:17:57 -07:00
Justine Tunney
61370983e1
Complete the Windows TLS fix made in e437bed00 2024-07-01 07:17:57 -07:00
Justine Tunney
239f8ce76e
Release Cosmopolitan v3.5.3 2024-07-01 02:07:56 -07:00
Justine Tunney
e437bed006
Fix crash caused when Windows needs a lot of TLS 2024-06-30 20:53:43 -07:00
Justine Tunney
76957983cf
Make POSIX threads improvements
- Ensure SIGTHR isn't blocked in newly created threads
- Use TIB rather than thread_local for thread atexits
- Make POSIX thread keys atomic within thread
- Don't bother logging prctl() to --strace
- Log thread destructor names to --strace
2024-06-30 15:38:59 -07:00
Justine Tunney
387310c659
Fix issue with ctl::vector constructor 2024-06-30 02:26:38 -07:00
Justine Tunney
4cb5e21ba8
Introduce pthread_decimate_np() api
This is useful with CheckForMemoryLeaks().
2024-06-30 02:26:06 -07:00
Justine Tunney
1bf2d8e308
Further improve mmap() locking story
The way to use double linked lists, is to remove all the things you want
to work on, insert them into a new list on the stack. Then once you have
all the work items, you release the lock, do your work, and then lock it
again, to add the shelled out items back to a global freelist.
2024-06-29 17:12:43 -07:00
Justine Tunney
98e684622b
Add iostream to CTL 2024-06-29 15:45:09 -07:00
Justine Tunney
617ddfee93
Release Cosmopolitan v3.5.2 2024-06-29 10:58:47 -07:00
Justine Tunney
464858dbb4
Fix bugs with new memory manager
This fixes a regression in mmap(MAP_FIXED) on Windows caused by a recent
revision. This change also fixes ZipOS so it no longer needs a MAP_FIXED
mapping to open files from the PKZIP store. The memory mapping mutex was
implemented incorrectly earlier which meant that ftrace and strace could
cause cause crashes. This lock and other recursive mutexes are rewritten
so that it should be provable that recursive mutexes in cosmopolitan are
asynchronous signal safe.
2024-06-29 10:53:57 -07:00
Justine Tunney
a16eb76f5e
Fix build break 2024-06-29 04:34:27 -07:00
Justine Tunney
021c53ba32
Add more CTL content 2024-06-28 19:09:54 -07:00
Justine Tunney
572ac7d100
Release Cosmopolitan v3.5.1 2024-06-24 06:54:15 -07:00
Justine Tunney
d461c6f47d
Do more quality assurance work 2024-06-24 06:53:49 -07:00
Justine Tunney
67b19ae733
Release Cosmopolitan v3.5.0 2024-06-23 22:45:14 -07:00
Justine Tunney
c4c812c154
Introduce ctl::set and ctl::map
We now have a C++ red-black tree implementation that implements standard
template library compatible APIs while compiling 10x faster than libcxx.
It's not as beautiful as the red-black tree implementation in Plinko but
this will get the job done and the test proves it upholds all invariants

This change also restores CheckForMemoryLeaks() support and fixes a real
actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.
2024-06-23 22:27:11 -07:00
Justine Tunney
f2c8ddbbe3
Fix --strace use-after-free in pthread_join() 2024-06-22 06:05:52 -07:00
Justine Tunney
d1d4388201
Delete ASAN
It hasn't been helpful enough to be justify the maintenance burden. What
actually does help is mprotect(), kprintf(), --ftrace and --strace which
can always be counted upon to work correctly. We aren't losing much with
this change. Support for ASAN on AARCH64 was never implemented. Applying
ASAN to the core libc runtimes was disabled many months ago. If there is
some way to have an ASAN runtime for user programs that is less invasive
we can potentially consider reintroducing support. But now is premature.
2024-06-22 05:45:49 -07:00
Justine Tunney
6ffed14b9c
Rewrite memory manager
Actually Portable Executable now supports Android. Cosmo's old mmap code
required a 47 bit address space. The new implementation is very agnostic
and supports both smaller address spaces (e.g. embedded) and even modern
56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper

Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb
granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native
page size. This fixes a longstanding POSIX conformance issue, concerning
file mappings that overlap the end of file. Other aspects of conformance
have been improved too, such as the subtleties of address assignment and
and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE

On Windows, mappings larger than 100 megabytes won't be broken down into
thousands of independent 64kb mappings. Support for MAP_STACK is removed
by this change; please use NewCosmoStack() instead.

Stack overflow avoidance is now being implemented using the POSIX thread
APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old
error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.
2024-06-22 05:45:11 -07:00
Steven Dee (Jōshin)
9a5a13854d
CTL: utility.h, use ctl::swap in string (#1227)
* Add ctl utility.h

Implements forward, move, swap, and declval. This commit also adds a def
for nullptr_t to cxx.inc. We need it now because the CTL headers stopped
including anything from libc++, so we no longer get their basic types.

* Use ctl::swap in string

The STL spec says that swap is located in the string_view header anyawy.
Performance-wise this is a noop, but it’s slightly cleaner.
2024-06-19 01:00:59 -04:00
Steven Dee (Jōshin)
a795017416
Fix c.inc _Atomic define for C++ (#1231)
c.inc (AFAICT erroneously) defined _Atomic(t) as `volatile t *`, when it
should have just said `volatile t`, when __STDC_VERSION__ was too small.
This happens when we’re compiling C++, but in C++11, _Atomic is a define
supplied by the STL rather than a keyword supplied by the compiler. Wait
though, it gets better: in C++11, _Atomic hooks you into the morass that
is stdatomic.h, and ultimately refers everything back to std::atomic<T>.

The gory, horrifying details are in libcxx's __atomic/cxx_atomic_impl.h.
The tldr is that for our purposes it’s fine to just say volatile and use
the normal libc/intrin/atomic.h functions.
2024-06-17 21:12:02 -07:00
Jōshin
89fc95fefd
Rerun clang-format on the repo (#1217)
🚨 clang-format changes output per version!

This is with version 19.0.0. The modifications seem to be fixing the old
version’s errors - mainly involving omitted whitespace around binary ops
and inserted whitespace between goto labels and colons (if followed by a
curly brace.)

Also fixes a few mistakes made by e.g. someone (ahem) forgetting to pass
his ctl/string.h modifications through it.

We should add this to .git-blame-ignore-revs once we have its final hash
on master.
2024-06-15 16:34:48 -04:00
Justine Tunney
cc2c1893c5
Fix some nits 2024-06-05 04:05:49 -07:00
Justine Tunney
3093f0e467
Release Cosmopolitan v3.4.0 2024-06-05 03:07:03 -07:00
Justine Tunney
3609f65de3
Make malloc() go 200x faster
If pthread_create() is linked into the binary, then the cosmo runtime
will create an independent dlmalloc arena for each core. Whenever the
malloc() function is used it will index `g_heaps[sched_getcpu() / 2]`
to find the arena with the greatest hyperthread / numa locality. This
may be configured via an environment variable. For example if you say
`export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways.
Your process may be configured to have anywhere between 1 - 128 heaps

We need this revision because it makes multithreaded C++ applications
faster. For example, an HTTP server I'm working on that makes extreme
use of the STL went from 16k to 2000k requests per second, after this
change was made. To understand why, try out the malloc_test benchmark
which calls malloc() + realloc() in a loop across many threads, which
sees a a 250x improvement in process clock time and 200x on wall time

The tradeoff is this adds ~25ns of latency to individual malloc calls
compared to MODE=tiny, once the cosmo runtime has transitioned into a
fully multi-threaded state. If you don't need malloc() to be scalable
then cosmo provides many options for you. For starters the heap count
variable above can be set to put the process back in single heap mode
plus you can go even faster still, if you include tinymalloc.inc like
many of the programs in tool/build/.. are already doing since that'll
shave tens of kb off your binary footprint too. Theres also MODE=tiny
which is configured to use just 1 plain old dlmalloc arena by default

Another tradeoff is we need more memory now (except in MODE=tiny), to
track the provenance of memory allocation. This is so allocations can
be freely shared across threads, and because OSes can reschedule code
to different CPUs at any time.
2024-06-05 02:02:14 -07:00
Justine Tunney
9906f299bb
Refactor and improve CTL and other code 2024-06-04 05:45:48 -07:00