cosmopolitan/ctl/string.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

479 lines
10 KiB
C
Raw Normal View History

// -*-mode:c++;indent-tabs-mode:nil;c-basic-offset:4;tab-width:8;coding:utf-8-*-
// vi: set et ft=cpp ts=4 sts=4 sw=4 fenc=utf-8 :vi
#ifndef CTL_STRING_H_
#define CTL_STRING_H_
#include "reverse_iterator.h"
#include "string_view.h"
namespace ctl {
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
class string;
2024-07-01 10:48:28 +00:00
ctl::string strcat(ctl::string_view, ctl::string_view) noexcept __wur;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
namespace __ {
constexpr size_t string_size = 3 * sizeof(size_t);
constexpr size_t sso_max = string_size - 1;
constexpr size_t big_mask = ~(1ull << (8ull * sizeof(size_t) - 1ull));
struct small_string
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
char buf[sso_max];
// interpretation is: size == sso_max - rem
unsigned char rem;
#if 0
size_t rem : 7;
size_t big : 1 /* = 0 */;
#endif
};
struct big_string
{
char* p;
size_t n;
// interpretation is: capacity == c & big_mask
size_t c;
#if 0
size_t c : sizeof(size_t) * 8 - 1;
size_t big : 1 /* = 1 */;
#endif
};
} // namespace __
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
class string
{
public:
using value_type = char;
using size_type = size_t;
using difference_type = ptrdiff_t;
using pointer = value_type*;
using const_pointer = const value_type*;
using reference = value_type&;
using const_reference = const value_type&;
using iterator = pointer;
using const_iterator = const_pointer;
using reverse_iterator = ctl::reverse_iterator<iterator>;
using const_reverse_iterator = ctl::reverse_iterator<const_iterator>;
static constexpr size_t npos = -1;
string() noexcept
{
__builtin_memset(blob, 0, sizeof(size_t) * 2);
// equivalent to set_small_size(0) but also zeroes memory
*(((size_t*)blob) + 2) = __::sso_max << (sizeof(size_t) - 1) * 8;
}
2024-07-01 10:48:28 +00:00
string(const ctl::string_view s) noexcept
{
if (s.n <= __::sso_max) {
__builtin_memcpy(blob, s.p, s.n);
__builtin_memset(blob + s.n, 0, __::sso_max - s.n);
set_small_size(s.n);
} else {
init_big(s);
}
}
explicit string(const size_t n, const char ch = 0) noexcept
{
if (n <= __::sso_max) {
__builtin_memset(blob, ch, n);
__builtin_memset(blob + n, 0, __::sso_max - n);
set_small_size(n);
} else {
init_big(n, ch);
}
}
string(const char* const p) noexcept
2024-07-01 10:48:28 +00:00
: string(ctl::string_view(p, __builtin_strlen(p)))
{
}
string(const string& r) noexcept
{
if (r.size() <= __::sso_max) {
__builtin_memcpy(blob, r.data(), __::string_size);
set_small_size(r.size());
} else {
init_big(r);
}
}
string(const char* const p, const size_t n) noexcept
2024-07-01 10:48:28 +00:00
: string(ctl::string_view(p, n))
{
}
~string() /* noexcept */
{
if (isbig())
destroy_big();
}
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
string& operator=(string) noexcept;
const char* c_str() const noexcept;
void pop_back() noexcept;
void grow(size_t) noexcept;
void reserve(size_t) noexcept;
void resize(size_t, char = 0) noexcept;
void append(char) noexcept;
void append(char, size_t) noexcept;
void append(unsigned long) noexcept;
void append(const void*, size_t) noexcept;
2024-07-01 10:48:28 +00:00
string& insert(size_t, ctl::string_view) noexcept;
string& erase(size_t = 0, size_t = npos) noexcept;
string substr(size_t = 0, size_t = npos) const noexcept;
2024-07-01 10:48:28 +00:00
string& replace(size_t, size_t, ctl::string_view) noexcept;
bool operator==(ctl::string_view) const noexcept;
bool operator!=(ctl::string_view) const noexcept;
bool contains(ctl::string_view) const noexcept;
bool ends_with(ctl::string_view) const noexcept;
bool starts_with(ctl::string_view) const noexcept;
size_t find(char, size_t = 0) const noexcept;
2024-07-01 10:48:28 +00:00
size_t find(ctl::string_view, size_t = 0) const noexcept;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
void swap(string& s) noexcept
{
ctl::swap(blob, s.blob);
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
string(string&& s) noexcept
{
__builtin_memcpy(blob, s.blob, __::string_size);
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
s.set_small_size(0);
}
void clear() noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.n = 0;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
set_small_size(0);
}
}
bool empty() const noexcept
{
2024-06-29 02:07:35 +00:00
return isbig() ? !__b.n : __s.rem >= __::sso_max;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
2024-06-29 02:07:35 +00:00
__attribute__((__always_inline__)) char* data() noexcept
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
{
2024-06-29 02:07:35 +00:00
return isbig() ? __b.p : __s.buf;
}
2024-06-29 02:07:35 +00:00
__attribute__((__always_inline__)) const char* data() const noexcept
{
2024-06-29 02:07:35 +00:00
return isbig() ? __b.p : __s.buf;
}
2024-06-29 02:07:35 +00:00
__attribute__((__always_inline__)) size_t size() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
#if 0
2024-06-29 02:07:35 +00:00
if (!isbig() && __s.rem > __::sso_max)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
__builtin_trap();
#endif
2024-06-29 02:07:35 +00:00
return isbig() ? __b.n : __::sso_max - __s.rem;
}
size_t length() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return size();
}
size_t capacity() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
#if 0
2024-06-29 02:07:35 +00:00
if (isbig() && __b.c <= __::sso_max)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
__builtin_trap();
#endif
2024-06-29 02:07:35 +00:00
return isbig() ? __::big_mask & __b.c : __::string_size;
}
iterator begin() noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data();
}
const_iterator begin() const noexcept
{
return data();
}
const_iterator cbegin() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data();
}
reverse_iterator rbegin() noexcept
{
return reverse_iterator(end());
}
const_reverse_iterator rbegin() const noexcept
{
return const_reverse_iterator(end());
}
const_reverse_iterator crbegin() const noexcept
{
return const_reverse_iterator(end());
}
iterator end() noexcept
{
return data() + size();
}
const_iterator end() const noexcept
{
return data() + size();
}
const_iterator cend() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data() + size();
}
reverse_iterator rend() noexcept
{
return reverse_iterator(begin());
}
const_reverse_iterator rend() const noexcept
{
return const_reverse_iterator(cbegin());
}
const_reverse_iterator crend() const noexcept
{
return const_reverse_iterator(cbegin());
}
char& front()
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[0];
}
const char& front() const
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[0];
}
char& back()
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[size() - 1];
}
const char& back() const
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[size() - 1];
}
char& operator[](size_t i) noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (i >= size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[i];
}
const char& operator[](const size_t i) const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (i >= size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data()[i];
}
void push_back(const char ch) noexcept
{
append(ch);
}
2024-07-01 10:48:28 +00:00
void append(const ctl::string_view s) noexcept
{
append(s.p, s.n);
}
2024-07-01 10:48:28 +00:00
operator ctl::string_view() const noexcept
{
2024-07-01 10:48:28 +00:00
return ctl::string_view(data(), size());
}
string& operator=(const char* s) noexcept
{
clear();
append(s);
return *this;
}
2024-07-01 10:48:28 +00:00
string& operator=(const ctl::string_view s) noexcept
{
clear();
append(s);
return *this;
}
string& operator+=(const char x) noexcept
{
append(x);
return *this;
}
2024-07-01 14:10:35 +00:00
string& operator+=(const char* s) noexcept
{
append(s);
return *this;
}
string& operator+=(const ctl::string s) noexcept
{
append(s);
return *this;
}
2024-07-01 10:48:28 +00:00
string& operator+=(const ctl::string_view s) noexcept
{
append(s);
return *this;
}
2024-07-01 12:40:38 +00:00
string operator+(const char c) const noexcept
{
char s[2] = { c };
return strcat(*this, s);
}
2024-07-01 14:10:35 +00:00
string operator+(const char* s) const noexcept
2024-07-01 12:40:38 +00:00
{
return strcat(*this, s);
}
2024-07-01 14:10:35 +00:00
string operator+(const string& s) const noexcept
{
return strcat(*this, s);
}
2024-07-01 14:10:35 +00:00
string operator+(const ctl::string_view s) const noexcept
2024-07-01 12:40:38 +00:00
{
return strcat(*this, s);
}
2024-07-01 10:48:28 +00:00
int compare(const ctl::string_view s) const noexcept
{
return strcmp(*this, s);
}
2024-07-01 10:48:28 +00:00
bool operator<(const ctl::string_view s) const noexcept
{
return compare(s) < 0;
}
2024-07-01 10:48:28 +00:00
bool operator<=(const ctl::string_view s) const noexcept
{
return compare(s) <= 0;
}
2024-07-01 10:48:28 +00:00
bool operator>(const ctl::string_view s) const noexcept
{
return compare(s) > 0;
}
2024-07-01 10:48:28 +00:00
bool operator>=(const ctl::string_view s) const noexcept
{
return compare(s) >= 0;
}
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
private:
void destroy_big() noexcept;
void init_big(const string&) noexcept;
2024-07-01 10:48:28 +00:00
void init_big(ctl::string_view) noexcept;
void init_big(size_t, char) noexcept;
2024-06-29 02:07:35 +00:00
__attribute__((__always_inline__)) bool isbig() const noexcept
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
{
return *(blob + __::sso_max) & 0x80;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
void set_small_size(const size_t size) noexcept
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
{
if (size > __::sso_max)
__builtin_trap();
__s.rem = __::sso_max - size;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
void set_big_string(char* const p, const size_t n, const size_t c2) noexcept
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
{
if (c2 > __::big_mask)
__builtin_trap();
__b.p = p;
__b.n = n;
__b.c = c2 | ~__::big_mask;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
2024-07-01 10:48:28 +00:00
friend string strcat(ctl::string_view, ctl::string_view) noexcept;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
union
{
__::big_string __b;
__::small_string __s;
char blob[__::string_size];
};
};
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
static_assert(sizeof(string) == __::string_size);
static_assert(sizeof(__::small_string) == __::string_size);
static_assert(sizeof(__::big_string) == __::string_size);
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(int) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(long) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(long long) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(unsigned) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(unsigned long) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(unsigned long long) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(float) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(double) noexcept;
2024-07-01 12:40:38 +00:00
ctl::string
2024-07-01 12:54:22 +00:00
to_string(long double) noexcept;
2024-07-01 12:40:38 +00:00
} // namespace ctl
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wliteral-suffix"
inline ctl::string
operator"" s(const char* s, size_t n)
{
return ctl::string(s, n);
}
#pragma GCC diagnostic pop
#endif // CTL_STRING_H_