This change introduces accumulate, addressof, advance, all_of, distance,
array, enable_if, allocator_traits, back_inserter, bad_alloc, is_signed,
any_of, copy, exception, fill, fill_n, is_same, is_same_v, out_of_range,
lexicographical_compare, is_integral, uninitialized_fill_n, is_unsigned,
numeric_limits, uninitialized_fill, iterator_traits, move_backward, min,
max, iterator_tag, move_iterator, reverse_iterator, uninitialized_move_n
This change experiments with rewriting the ctl::vector class to make the
CTL design more similar to the STL. So far it has not slowed things down
to have 42 #include lines rather than 2, since it's still almost nothing
compared to LLVM's code. In fact the closer we can flirt with being just
like libcxx, the better chance we might have of discovering exactly what
makes it so slow to compile. It would be an enormous discovery if we can
find one simple trick to solving the issue there instead.
This also fixes a bug in `ctl::string(const string &s)` when `s` is big.
We now have a C++ red-black tree implementation that implements standard
template library compatible APIs while compiling 10x faster than libcxx.
It's not as beautiful as the red-black tree implementation in Plinko but
this will get the job done and the test proves it upholds all invariants
This change also restores CheckForMemoryLeaks() support and fixes a real
actual bug I discovered with Doug Lea's dlmalloc_inspect_all() function.
Actually Portable Executable now supports Android. Cosmo's old mmap code
required a 47 bit address space. The new implementation is very agnostic
and supports both smaller address spaces (e.g. embedded) and even modern
56-bit PML5T paging for x86 which finally came true on Zen4 Threadripper
Cosmopolitan no longer requires UNIX systems to observe the Windows 64kb
granularity; i.e. sysconf(_SC_PAGE_SIZE) will now report the host native
page size. This fixes a longstanding POSIX conformance issue, concerning
file mappings that overlap the end of file. Other aspects of conformance
have been improved too, such as the subtleties of address assignment and
and the various subtleties surrounding MAP_FIXED and MAP_FIXED_NOREPLACE
On Windows, mappings larger than 100 megabytes won't be broken down into
thousands of independent 64kb mappings. Support for MAP_STACK is removed
by this change; please use NewCosmoStack() instead.
Stack overflow avoidance is now being implemented using the POSIX thread
APIs. Please use GetStackBottom() and GetStackAddr(), instead of the old
error-prone GetStackAddr() and HaveStackMemory() APIs which are removed.
Explicitly value-initializes the deleter, even though I have not found a
way to get the deleter to act like it’s been default-initialized in unit
tests so far.
Uses auto in reset. The static cast is apparently not needed (unless I’m
missing some case I didn’t think of.)
Implements the general move constructor - turns out that the reason this
didn’t work before was that default_delete<U> was not move constructible
from default_delete<T>.
Drop inline specifiers from functions defined entirely inside the struct
definition since they are implicitly inline.
* Cleans up reset to match spec
Remove the variants from the T[] specialization. Also follow the spec on
the order of operations in reset, which may matter if we are deleting an
object that has a reference to the unique_ptr that is being reset. (?)
* Tests Base/Derived reset.
* Adds some constexpr declarations.
* Adds default_delete specialization for T[].
* Makes parameters const.
Moves some isbig checks into string.h, enabling smarter optimizations to
be made on small strings. Also we no longer zero out our string prior to
calling the various constructors, buying back the performance we lost on
big strings when we made the small-string optimization. We further add a
little optimization to the big_string copy constructor: if the string is
using half or more of its capacity, then we don’t recompute capacity and
just take the old string’s. As well, the copy constructor always makes a
small string when it will fit, even if copied from a big string that got
truncated.
This also reworks the test to follow the idiom adopted elsewhere re stl,
and adds a helper function to tell if a string is small based on data().
* Add ctl utility.h
Implements forward, move, swap, and declval. This commit also adds a def
for nullptr_t to cxx.inc. We need it now because the CTL headers stopped
including anything from libc++, so we no longer get their basic types.
* Use ctl::swap in string
The STL spec says that swap is located in the string_view header anyawy.
Performance-wise this is a noop, but it’s slightly cleaner.
The way unique_ptr is supposed to work is as a purely compile-time check
that your raw pointers are getting deleted when they go out of scope. It
should ideally emit the same exact machine code as if you were using raw
pointers with manual deletes.
Part of what this means is that under normal circumstances, a unique_ptr
shouldn’t take up more space than a raw pointer - in other words, sizeof
unique_ptr<T> should == sizeof(T*).
The present PR doesn’t bother with the specialization for array types. I
also left a couple other parts of the STL API unimplemented. I’d love to
see someone else implement these, or I’ll get to them at some point.
🚨 clang-format changes output per version!
This is with version 19.0.0. The modifications seem to be fixing the old
version’s errors - mainly involving omitted whitespace around binary ops
and inserted whitespace between goto labels and colons (if followed by a
curly brace.)
Also fixes a few mistakes made by e.g. someone (ahem) forgetting to pass
his ctl/string.h modifications through it.
We should add this to .git-blame-ignore-revs once we have its final hash
on master.
This replaces the STL <new> header. Mainly, it defines a global operator
new and operator delete, as well as the placement versions of these. The
placement versions are required to not get compile errors when trying to
write a placement new statement.
Each of these operators is defined with many, many different variants. A
glance at new.cc is recommended followed by a chaser of the Alexandrescu
talk "std::allocator is to Allocation as std::vector is to Vexation". We
must provide a global-namespace source-level definition of each operator
and it is illegal for any of them to be marked inline, so here we are.
The upshot is that we no longer need to include <new>, and our optional/
vector headers are self-contained.
Manually manage the lifetime of `value_` by using an anonymous
`union`. This fixes a bunch of double-frees and double-constructs.
Additionally move the `present_` flag last. When `T` has padding
`present_` will be placed there saving `alignof(T)` bytes from
`sizeof(optional<T>)`.
There were a few errors in how capacity and memory was being handled for
small strings. The capacity errors meant that small strings would become
big strings too soon, and the memory error introduced undefined behavior
that was caught by CheckMemoryLeaks in our test file but only sometimes.
The crucial change is in reserve: we only copy n bytes into p2, and then
we manually set the null terminator instead of expecting it to have been
there already. (E.g. it might not be there for an empty small string.)
We also fix one other doozy in append when we were exactly at the small-
to-big string boundary: we set the last byte (i.e., the remainder field)
to 0, then decremented it, giving us size_t max. Whoops. We boneheadedly
fix this by setting the 0 byte after we've fixed up the remainder, so it
is at worst a no-op.
Otherwise, capacity now works the same for small strings as it does with
big strings: it's the amount of space available including the null byte.
We test all of this with a new test that only gets included if our class
under test is not std::string (presumably meaning it's ctl::string.) The
test manually verifies that the small string optimization behaves how we
expect.
Since this test checks against std::string, we go ahead and include that
other header from the STL.
Also modifies the new test we introduced to also run on std::string, but
it just does the append without expecting anything about how its data is
stored. We also check that the string has the right value afterwards.
A small-string optimization is a way of reusing inline storage space for
sufficiently small strings, rather than allocating them on the heap. The
current approach takes after an old Facebook string class: it reuses the
highest-order byte for flags and small-string size, in such a way that a
maximally-sized small string will have its last byte zeroed, making it a
null terminator for the C string.
The only flag we have is in the highest-order bit, that says whether the
string is big (set) or small (cleared.) Most of the logic switches based
on the value of this bit; e.g. data() returns big()->p if it's set, else
small()->buf if it's cleared. For a small string, the capacity is always
fixed at sizeof(string) - 1 bytes; we store the length in the last byte,
but we store it as the number of remaining bytes of capacity, so that at
max size, the last byte will read zero and serve as our null terminator.
Morally speaking, our class's storage is a union over two POD C structs.
For now I gravitated towards a slightly more obtuse approach: the string
class itself contains a blob of the right size, and we alias that blob's
pointer for the two structs, taking some care not to run afoul of object
lifetime rules in C++. If anyone wants to improve on this, contributions
are welcome.
This commit also introduces the `ctl::__` namespace. It can't be legally
spelled by library users, and serves as our version of boost's "detail".
We introduced a string::swap function, and we now use that in operator=.
operator= now takes its argument by value, so we never need to check for
the case where the pointers are equal and can just swap the entire store
of the argument with our own, leaving the C++ destructor to free our old
storage afterwards.
There are probably still a few places where our capacity is slightly off
and we grow too fast, although there don't appear to be any where we are
too slow. I will leave these to be fixed in future changes.
If pthread_create() is linked into the binary, then the cosmo runtime
will create an independent dlmalloc arena for each core. Whenever the
malloc() function is used it will index `g_heaps[sched_getcpu() / 2]`
to find the arena with the greatest hyperthread / numa locality. This
may be configured via an environment variable. For example if you say
`export COSMOPOLITAN_HEAP_COUNT=1` then you can restore the old ways.
Your process may be configured to have anywhere between 1 - 128 heaps
We need this revision because it makes multithreaded C++ applications
faster. For example, an HTTP server I'm working on that makes extreme
use of the STL went from 16k to 2000k requests per second, after this
change was made. To understand why, try out the malloc_test benchmark
which calls malloc() + realloc() in a loop across many threads, which
sees a a 250x improvement in process clock time and 200x on wall time
The tradeoff is this adds ~25ns of latency to individual malloc calls
compared to MODE=tiny, once the cosmo runtime has transitioned into a
fully multi-threaded state. If you don't need malloc() to be scalable
then cosmo provides many options for you. For starters the heap count
variable above can be set to put the process back in single heap mode
plus you can go even faster still, if you include tinymalloc.inc like
many of the programs in tool/build/.. are already doing since that'll
shave tens of kb off your binary footprint too. Theres also MODE=tiny
which is configured to use just 1 plain old dlmalloc arena by default
Another tradeoff is we need more memory now (except in MODE=tiny), to
track the provenance of memory allocation. This is so allocations can
be freely shared across threads, and because OSes can reschedule code
to different CPUs at any time.