Make numerous improvements

- Python static hello world now 1.8mb
- Python static fully loaded now 10mb
- Python HTTPS client now uses MbedTLS
- Python REPL now completes import stmts
- Increase stack size for Python for now
- Begin synthesizing posixpath and ntpath
- Restore Python \N{UNICODE NAME} support
- Restore Python NFKD symbol normalization
- Add optimized code path for Intel SHA-NI
- Get more Python unit tests passing faster
- Get Python help() pagination working on NT
- Python hashlib now supports MbedTLS PBKDF2
- Make memcpy/memmove/memcmp/bcmp/etc. faster
- Add Mersenne Twister and Vigna to LIBC_RAND
- Provide privileged __printf() for error code
- Fix zipos opendir() so that it reports ENOTDIR
- Add basic chmod() implementation for Windows NT
- Add Cosmo's best functions to Python cosmo module
- Pin function trace indent depth to that of caller
- Show memory diagram on invalid access in MODE=dbg
- Differentiate stack overflow on crash in MODE=dbg
- Add stb_truetype and tools for analyzing font files
- Upgrade to UNICODE 13 and reduce its binary footprint
- COMPILE.COM now logs resource usage of build commands
- Start implementing basic poll() support on bare metal
- Set getauxval(AT_EXECFN) to GetModuleFileName() on NT
- Add descriptions to strerror() in non-TINY build modes
- Add COUNTBRANCH() macro to help with micro-optimizations
- Make error / backtrace / asan / memory code more unbreakable
- Add fast perfect C implementation of μ-Law and a-Law audio codecs
- Make strtol() functions consistent with other libc implementations
- Improve Linenoise implementation (see also github.com/jart/bestline)
- COMPILE.COM now suppresses stdout/stderr of successful build commands
This commit is contained in:
Justine Tunney 2021-09-27 22:58:51 -07:00
parent fa7b4f5bd1
commit 39bf41f4eb
806 changed files with 77494 additions and 63859 deletions

View file

@ -21,20 +21,51 @@
/**
* Compares memory case-insensitively.
*
* memcasecmp n=0 992 picoseconds
* memcasecmp n=1 1 ns/byte 590 mb/s
* memcasecmp n=2 1 ns/byte 843 mb/s
* memcasecmp n=3 1 ns/byte 885 mb/s
* memcasecmp n=4 1 ns/byte 843 mb/s
* memcasecmp n=5 1 ns/byte 820 mb/s
* memcasecmp n=6 1 ns/byte 770 mb/s
* memcasecmp n=7 1 ns/byte 765 mb/s
* memcasecmp n=8 206 ps/byte 4,724 mb/s
* memcasecmp n=9 220 ps/byte 4,428 mb/s
* memcasecmp n=15 617 ps/byte 1,581 mb/s
* memcasecmp n=16 124 ps/byte 7,873 mb/s
* memcasecmp n=17 155 ps/byte 6,274 mb/s
* memcasecmp n=31 341 ps/byte 2,860 mb/s
* memcasecmp n=32 82 ps/byte 11,810 mb/s
* memcasecmp n=33 100 ps/byte 9,743 mb/s
* memcasecmp n=80 53 ps/byte 18,169 mb/s
* memcasecmp n=128 49 ps/byte 19,890 mb/s
* memcasecmp n=256 45 ps/byte 21,595 mb/s
* memcasecmp n=16384 42 ps/byte 22,721 mb/s
* memcasecmp n=32768 40 ps/byte 24,266 mb/s
* memcasecmp n=131072 40 ps/byte 24,337 mb/s
*
* @return is <0, 0, or >0 based on uint8_t comparison
*/
int memcasecmp(const void *p, const void *q, size_t n) {
int c;
size_t i;
unsigned u;
uint64_t w;
const unsigned char *a, *b;
if ((a = p) != (b = q)) {
for (i = 0; i < n; ++i) {
while (i + 8 <= n) {
w = READ64LE(a);
w ^= READ64LE(b);
if (w) {
i += (unsigned)__builtin_ctzll(w) >> 3;
if ((w = (((uint64_t)a[0] << 000 | (uint64_t)a[1] << 010 |
(uint64_t)a[2] << 020 | (uint64_t)a[3] << 030 |
(uint64_t)a[4] << 040 | (uint64_t)a[5] << 050 |
(uint64_t)a[6] << 060 | (uint64_t)a[7] << 070) ^
((uint64_t)b[0] << 000 | (uint64_t)b[1] << 010 |
(uint64_t)b[2] << 020 | (uint64_t)b[3] << 030 |
(uint64_t)b[4] << 040 | (uint64_t)b[5] << 050 |
(uint64_t)b[6] << 060 | (uint64_t)b[7] << 070)))) {
u = __builtin_ctzll(w);
i += u >> 3;
break;
} else {
i += 8;