cosmopolitan/ctl/string.cc

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

387 lines
8.2 KiB
C++
Raw Normal View History

// -*- mode:c++; indent-tabs-mode:nil; c-basic-offset:4; coding:utf-8 -*-
// vi: set et ft=cpp ts=4 sts=4 sw=4 fenc=utf-8 :vi
//
// Copyright 2024 Justine Alexandra Roberts Tunney
//
// Permission to use, copy, modify, and/or distribute this software for
// any purpose with or without fee is hereby granted, provided that the
// above copyright notice and this permission notice appear in all copies.
//
// THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
// WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
// AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
// DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
// PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
// TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
// PERFORMANCE OF THIS SOFTWARE.
#include "string.h"
#include <__atomic/fence.h>
#include <stdckdint.h>
#include "libc/mem/mem.h"
#include "libc/str/str.h"
namespace ctl {
void
string::destroy_big() noexcept
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
{
2024-06-29 02:07:35 +00:00
auto* b = &__b;
if (b->n) {
if (b->n >= b->c)
__builtin_trap();
if (b->p[b->n])
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
if (b->c && !b->p)
__builtin_trap();
free(b->p);
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
void
string::init_big(const string& s) noexcept
{
2024-06-29 02:07:35 +00:00
char* p2;
size_t size = s.size();
size_t need = size + 1;
size_t capacity = need;
if (!(p2 = (char*)malloc(capacity)))
__builtin_trap();
memcpy(p2, s.data(), need);
set_big_string(p2, size, capacity);
}
void
string::init_big(const string_view s) noexcept
{
char* p2;
size_t need;
if (ckd_add(&need, s.n, 1 /* nul */ + 15))
__builtin_trap();
need &= -16;
if (!(p2 = (char*)malloc(need)))
__builtin_trap();
memcpy(p2, s.p, s.n);
p2[s.n] = 0;
set_big_string(p2, s.n, need);
}
void
string::init_big(const size_t n, const char ch) noexcept
{
size_t need;
char* p2;
if (ckd_add(&need, n, 1 /* nul */ + 15))
__builtin_trap();
need &= -16;
if (!(p2 = (char*)malloc(need)))
__builtin_trap();
memset(p2, ch, n);
p2[n] = 0;
set_big_string(p2, n, need);
}
const char*
string::c_str() const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
return "";
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (size() >= capacity())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (data()[size()])
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return data();
}
void
string::reserve(size_t c2) noexcept
{
char* p2;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
size_t n = size();
if (c2 < n + 1)
c2 = n + 1;
if (c2 <= __::string_size)
return;
if (ckd_add(&c2, c2, 15))
__builtin_trap();
c2 &= -16;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!isbig()) {
if (!(p2 = (char*)malloc(c2)))
__builtin_trap();
memcpy(p2, data(), __::string_size);
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
if (!(p2 = (char*)realloc(__b.p, c2)))
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
__builtin_trap();
}
std::atomic_signal_fence(std::memory_order_seq_cst);
set_big_string(p2, n, c2);
}
void
string::resize(const size_t n2, const char ch) noexcept
{
size_t c2;
if (ckd_add(&c2, n2, 1))
__builtin_trap();
reserve(c2);
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (n2 > size())
memset(data() + size(), ch, n2 - size());
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.p[__b.n = n2] = 0;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
set_small_size(n2);
data()[size()] = 0;
}
}
void
string::append(const char ch) noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
size_t n2;
if (ckd_add(&n2, size(), 2))
__builtin_trap();
if (n2 > capacity()) {
Fix some memory issues with ctl::string (#1201) There were a few errors in how capacity and memory was being handled for small strings. The capacity errors meant that small strings would become big strings too soon, and the memory error introduced undefined behavior that was caught by CheckMemoryLeaks in our test file but only sometimes. The crucial change is in reserve: we only copy n bytes into p2, and then we manually set the null terminator instead of expecting it to have been there already. (E.g. it might not be there for an empty small string.) We also fix one other doozy in append when we were exactly at the small- to-big string boundary: we set the last byte (i.e., the remainder field) to 0, then decremented it, giving us size_t max. Whoops. We boneheadedly fix this by setting the 0 byte after we've fixed up the remainder, so it is at worst a no-op. Otherwise, capacity now works the same for small strings as it does with big strings: it's the amount of space available including the null byte. We test all of this with a new test that only gets included if our class under test is not std::string (presumably meaning it's ctl::string.) The test manually verifies that the small string optimization behaves how we expect. Since this test checks against std::string, we go ahead and include that other header from the STL. Also modifies the new test we introduced to also run on std::string, but it just does the append without expecting anything about how its data is stored. We also check that the string has the right value afterwards.
2024-06-07 05:15:37 +00:00
size_t c2 = capacity();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (ckd_add(&c2, c2, c2 >> 1))
__builtin_trap();
reserve(c2);
}
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
data()[size()] = ch;
if (isbig()) {
2024-06-29 02:07:35 +00:00
++__b.n;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
--__s.rem;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
Fix some memory issues with ctl::string (#1201) There were a few errors in how capacity and memory was being handled for small strings. The capacity errors meant that small strings would become big strings too soon, and the memory error introduced undefined behavior that was caught by CheckMemoryLeaks in our test file but only sometimes. The crucial change is in reserve: we only copy n bytes into p2, and then we manually set the null terminator instead of expecting it to have been there already. (E.g. it might not be there for an empty small string.) We also fix one other doozy in append when we were exactly at the small- to-big string boundary: we set the last byte (i.e., the remainder field) to 0, then decremented it, giving us size_t max. Whoops. We boneheadedly fix this by setting the 0 byte after we've fixed up the remainder, so it is at worst a no-op. Otherwise, capacity now works the same for small strings as it does with big strings: it's the amount of space available including the null byte. We test all of this with a new test that only gets included if our class under test is not std::string (presumably meaning it's ctl::string.) The test manually verifies that the small string optimization behaves how we expect. Since this test checks against std::string, we go ahead and include that other header from the STL. Also modifies the new test we introduced to also run on std::string, but it just does the append without expecting anything about how its data is stored. We also check that the string has the right value afterwards.
2024-06-07 05:15:37 +00:00
data()[size()] = 0;
}
void
string::grow(const size_t size) noexcept
{
size_t need;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (ckd_add(&need, this->size(), size))
__builtin_trap();
if (ckd_add(&need, need, 1))
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (need <= capacity())
return;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
size_t c2 = capacity();
if (!c2)
__builtin_trap();
while (c2 < need)
if (ckd_add(&c2, c2, c2 >> 1))
__builtin_trap();
reserve(c2);
}
void
string::append(const char ch, const size_t size) noexcept
{
grow(size);
if (size)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
memset(data() + this->size(), ch, size);
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.n += size;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
__s.rem -= size;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
data()[this->size()] = 0;
}
void
string::append(const void* data, const size_t size) noexcept
{
grow(size);
if (size)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
memcpy(this->data() + this->size(), data, size);
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.n += size;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
__s.rem -= size;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
this->data()[this->size()] = 0;
}
void
string::pop_back() noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (isbig()) {
2024-06-29 02:07:35 +00:00
--__b.n;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
++__s.rem;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
data()[size()] = 0;
}
string&
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
string::operator=(string s) noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
swap(s);
return *this;
}
bool
string::operator==(const string_view s) const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (size() != s.n)
return false;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!s.n)
return true;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return !memcmp(data(), s.p, s.n);
}
bool
string::operator!=(const string_view s) const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (size() != s.n)
return true;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (!s.n)
return false;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return !!memcmp(data(), s.p, s.n);
}
bool
string::contains(const string_view s) const noexcept
{
if (!s.n)
return true;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return !!memmem(data(), size(), s.p, s.n);
}
bool
string::ends_with(const string_view s) const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (size() < s.n)
return false;
if (!s.n)
return true;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return !memcmp(data() + size() - s.n, s.p, s.n);
}
bool
string::starts_with(const string_view s) const noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (size() < s.n)
return false;
if (!s.n)
return true;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return !memcmp(data(), s.p, s.n);
}
size_t
string::find(const char ch, const size_t pos) const noexcept
{
char* q;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if ((q = (char*)memchr(data(), ch, size())))
return q - data();
return npos;
}
size_t
string::find(const string_view s, const size_t pos) const noexcept
{
char* q;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (pos > size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if ((q = (char*)memmem(data() + pos, size() - pos, s.p, s.n)))
return q - data();
return npos;
}
string
string::substr(const size_t pos, size_t count) const noexcept
{
size_t last;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (pos > size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (count > size() - pos)
count = size() - pos;
if (ckd_add(&last, pos, count))
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
last = size();
if (last > size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
return string(data() + pos, count);
}
string&
string::replace(const size_t pos,
const size_t count,
const string_view s) noexcept
{
size_t last;
if (ckd_add(&last, pos, count))
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (last > size())
__builtin_trap();
size_t need;
if (ckd_add(&need, pos, s.n))
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
size_t extra = size() - last;
if (ckd_add(&need, need, extra))
__builtin_trap();
size_t c2;
if (ckd_add(&c2, need, 1))
__builtin_trap();
reserve(c2);
if (extra)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
memmove(data() + pos + s.n, data() + last, extra);
memcpy(data() + pos, s.p, s.n);
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.p[__b.n = need] = 0;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
set_small_size(need);
data()[size()] = 0;
}
return *this;
}
string&
string::insert(const size_t i, const string_view s) noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (i > size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
size_t extra = size() - i;
size_t need;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (ckd_add(&need, size(), s.n))
__builtin_trap();
if (ckd_add(&need, need, 1))
__builtin_trap();
reserve(need);
if (extra)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
memmove(data() + i + s.n, data() + i, extra);
memcpy(data() + i, s.p, s.n);
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.n += s.n;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
2024-06-29 02:07:35 +00:00
__s.rem -= s.n;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
}
data()[size()] = 0;
return *this;
}
string&
string::erase(const size_t pos, size_t count) noexcept
{
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (pos > size())
__builtin_trap();
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
if (count > size() - pos)
count = size() - pos;
size_t extra = size() - (pos + count);
if (extra)
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
memmove(data() + pos, data() + pos + count, extra);
if (isbig()) {
2024-06-29 02:07:35 +00:00
__b.n = pos + extra;
ctl::string small-string optimization (#1199) A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator. Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome. This commit also introduces the `ctl::__` namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail". We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards. There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
2024-06-07 00:50:51 +00:00
} else {
set_small_size(pos + extra);
}
data()[size()] = 0;
return *this;
}
} // namespace ctl