Initial import

2025-07-08 20:28:30 +00:00 · 2020-06-15 07:18:57 -07:00 · 2020-06-15 07:18:57 -07:00 · c91b3c5006
commit c91b3c5006
14915 changed files with 590219 additions and 0 deletions
--- a/third_party/dlmalloc/COPYING
+++ b/third_party/dlmalloc/COPYING
@ -0,0 +1,10 @@
+/	Since dlmalloc is public domain, we intend to keep it that way. To the
+/	extent possible under law, Justine Tunney has waived all copyright and
+/	related or neighboring rights to her /third_party/dlmalloc changes, as
+/	it is written in the following disclaimers:
+/	  • unlicense.org
+/	  • creativecommons.org/publicdomain/zero/1.0/
+
+.ident	"\n
+dlmalloc (Public Domain CC0)
+Credit: Doug Lea <dl@cs.oswego.edu>"
--- a/third_party/dlmalloc/README
+++ b/third_party/dlmalloc/README
@ -0,0 +1,825 @@
+
+  This is a version (aka dlmalloc) of malloc/free/realloc written by
+  Doug Lea and released to the public domain, as explained at
+  http://creativecommons.org/publicdomain/zero/1.0/ Send questions,
+  comments, complaints, performance data, etc to dl@cs.oswego.edu
+
+  Version 2.8.6 Wed Aug 29 06:57:58 2012  Doug Lea
+   Note: There may be an updated version of this malloc obtainable at
+           ftp://gee.cs.oswego.edu/pub/misc/malloc.c
+         Check before installing!
+
+* Quickstart
+
+  This library is all in one file to simplify the most common usage:
+  ftp it, compile it (-O3), and link it into another program. All of
+  the compile-time options default to reasonable values for use on
+  most platforms.  You might later want to step through various
+  compile-time and dynamic tuning options.
+
+  For convenience, an include file for code using this malloc is at:
+     ftp://gee.cs.oswego.edu/pub/misc/malloc-2.8.6.h
+  You don't really need this .h file unless you call functions not
+  defined in your system include files.  The .h file contains only the
+  excerpts from this file needed for using this malloc on ANSI C/C++
+  systems, so long as you haven't changed compile-time options about
+  naming and tuning parameters.  If you do, then you can create your
+  own malloc.h that does include all settings by cutting at the point
+  indicated below. Note that you may already by default be using a C
+  library containing a malloc that is based on some version of this
+  malloc (for example in linux). You might still want to use the one
+  in this file to customize settings or to avoid overheads associated
+  with library versions.
+
+* Vital statistics:
+
+  Supported pointer/size_t representation:       4 or 8 bytes
+       size_t MUST be an unsigned type of the same width as
+       pointers. (If you are using an ancient system that declares
+       size_t as a signed type, or need it to be a different width
+       than pointers, you can use a previous release of this malloc
+       (e.g. 2.7.2) supporting these.)
+
+  Alignment:                                     8 bytes (minimum)
+       Is set to 16 for NexGen32e.
+
+  Minimum overhead per allocated chunk:   4 or  8 bytes (if 4byte sizes)
+                                          8 or 16 bytes (if 8byte sizes)
+       Each malloced chunk has a hidden word of overhead holding size
+       and status information, and additional cross-check word
+       if FOOTERS is defined.
+
+  Minimum allocated size: 4-byte ptrs:  16 bytes    (including overhead)
+                          8-byte ptrs:  32 bytes    (including overhead)
+
+       Even a request for zero bytes (i.e., malloc(0)) returns a
+       pointer to something of the minimum allocatable size.
+       The maximum overhead wastage (i.e., number of extra bytes
+       allocated than were requested in malloc) is less than or equal
+       to the minimum size, except for requests >= mmap_threshold that
+       are serviced via mmap(), where the worst case wastage is about
+       32 bytes plus the remainder from a system page (the minimal
+       mmap unit); typically 4096 or 8192 bytes.
+
+  Security: static-safe; optionally more or less
+       The "security" of malloc refers to the ability of malicious
+       code to accentuate the effects of errors (for example, freeing
+       space that is not currently malloc'ed or overwriting past the
+       ends of chunks) in code that calls malloc.  This malloc
+       guarantees not to modify any memory locations below the base of
+       heap, i.e., static variables, even in the presence of usage
+       errors.  The routines additionally detect most improper frees
+       and reallocs.  All this holds as long as the static bookkeeping
+       for malloc itself is not corrupted by some other means.  This
+       is only one aspect of security -- these checks do not, and
+       cannot, detect all possible programming errors.
+
+       If FOOTERS is defined nonzero, then each allocated chunk
+       carries an additional check word to verify that it was malloced
+       from its space.  These check words are the same within each
+       execution of a program using malloc, but differ across
+       executions, so externally crafted fake chunks cannot be
+       freed. This improves security by rejecting frees/reallocs that
+       could corrupt heap memory, in addition to the checks preventing
+       writes to statics that are always on.  This may further improve
+       security at the expense of time and space overhead.  (Note that
+       FOOTERS may also be worth using with MSPACES.)
+
+       By default detected errors cause the program to abort (calling
+       "abort()"). You can override this to instead proceed past
+       errors by defining PROCEED_ON_ERROR.  In this case, a bad free
+       has no effect, and a malloc that encounters a bad address
+       caused by user overwrites will ignore the bad address by
+       dropping pointers and indices to all known memory. This may
+       be appropriate for programs that should continue if at all
+       possible in the face of programming errors, although they may
+       run out of memory because dropped memory is never reclaimed.
+
+       If you don't like either of these options, you can define
+       CORRUPTION_ERROR_ACTION and USAGE_ERROR_ACTION to do anything
+       else. And if if you are sure that your program using malloc has
+       no errors or vulnerabilities, you can define TRUSTWORTHY to 1,
+       which might (or might not) provide a small performance improvement.
+
+       It is also possible to limit the maximum total allocatable
+       space, using malloc_set_footprint_limit. This is not
+       designed as a security feature in itself (calls to set limits
+       are not screened or privileged), but may be useful as one
+       aspect of a secure implementation.
+
+  Thread-safety: NOT thread-safe unless USE_LOCKS defined non-zero
+       When USE_LOCKS is defined, each public call to malloc, free,
+       etc is surrounded with a lock. By default, this uses a plain
+       pthread mutex, win32 critical section, or a spin-lock if if
+       available for the platform and not disabled by setting
+       USE_SPIN_LOCKS=0.  However, if USE_RECURSIVE_LOCKS is defined,
+       recursive versions are used instead (which are not required for
+       base functionality but may be needed in layered extensions).
+       Using a global lock is not especially fast, and can be a major
+       bottleneck.  It is designed only to provide minimal protection
+       in concurrent environments, and to provide a basis for
+       extensions.  If you are using malloc in a concurrent program,
+       consider instead using nedmalloc
+       (http://www.nedprod.com/programs/portable/nedmalloc/) or
+       ptmalloc (See http://www.malloc.de), which are derived from
+       versions of this malloc.
+
+  System requirements: Any combination of MORECORE and/or MMAP/MUNMAP
+       This malloc can use unix sbrk or any emulation (invoked using
+       the CALL_MORECORE macro) and/or mmap/munmap or any emulation
+       (invoked using CALL_MMAP/CALL_MUNMAP) to get and release system
+       memory.  On most unix systems, it tends to work best if both
+       MORECORE and MMAP are enabled.  On Win32, it uses emulations
+       based on VirtualAlloc. It also uses common C library functions
+       like memset.
+
+  Compliance: I believe it is compliant with the Single Unix Specification
+       (See http://www.unix.org). Also SVID/XPG, ANSI C, and probably
+       others as well.
+
+* Overview of algorithms
+
+  This is not the fastest, most space-conserving, most portable, or
+  most tunable malloc ever written. However it is among the fastest
+  while also being among the most space-conserving, portable and
+  tunable.  Consistent balance across these factors results in a good
+  general-purpose allocator for malloc-intensive programs.
+
+  In most ways, this malloc is a best-fit allocator. Generally, it
+  chooses the best-fitting existing chunk for a request, with ties
+  broken in approximately least-recently-used order. (This strategy
+  normally maintains low fragmentation.) However, for requests less
+  than 256bytes, it deviates from best-fit when there is not an
+  exactly fitting available chunk by preferring to use space adjacent
+  to that used for the previous small request, as well as by breaking
+  ties in approximately most-recently-used order. (These enhance
+  locality of series of small allocations.)  And for very large requests
+  (>= 256Kb by default), it relies on system memory mapping
+  facilities, if supported.  (This helps avoid carrying around and
+  possibly fragmenting memory used only for large chunks.)
+
+  All operations (except malloc_stats and mallinfo) have execution
+  times that are bounded by a constant factor of the number of bits in
+  a size_t, not counting any clearing in calloc or copying in realloc,
+  or actions surrounding MORECORE and MMAP that have times
+  proportional to the number of non-contiguous regions returned by
+  system allocation routines, which is often just 1. In real-time
+  applications, you can optionally suppress segment traversals using
+  NO_SEGMENT_TRAVERSAL, which assures bounded execution even when
+  system allocators return non-contiguous spaces, at the typical
+  expense of carrying around more memory and increased fragmentation.
+
+  The implementation is not very modular and seriously overuses
+  macros. Perhaps someday all C compilers will do as good a job
+  inlining modular code as can now be done by brute-force expansion,
+  but now, enough of them seem not to.
+
+  Some compilers issue a lot of warnings about code that is
+  dead/unreachable only on some platforms, and also about intentional
+  uses of negation on unsigned types. All known cases of each can be
+  ignored.
+
+  For a longer but out of date high-level description, see
+     http://gee.cs.oswego.edu/dl/html/malloc.html
+
+* MSPACES
+  If MSPACES is defined, then in addition to malloc, free, etc.,
+  this file also defines mspace_malloc, mspace_free, etc. These
+  are versions of malloc routines that take an "mspace" argument
+  obtained using create_mspace, to control all internal bookkeeping.
+  If ONLY_MSPACES is defined, only these versions are compiled.
+  So if you would like to use this allocator for only some allocations,
+  and your system malloc for others, you can compile with
+  ONLY_MSPACES and then do something like...
+    static mspace mymspace = create_mspace(0,0); // for example
+    #define mymalloc(bytes)  mspace_malloc(mymspace, bytes)
+
+  (Note: If you only need one instance of an mspace, you can instead
+  use "USE_DL_PREFIX" to relabel the global malloc.)
+
+  You can similarly create thread-local allocators by storing
+  mspaces as thread-locals. For example:
+    static __thread mspace tlms = 0;
+    void*  tlmalloc(size_t bytes) {
+      if (tlms == 0) tlms = create_mspace(0, 0);
+      return mspace_malloc(tlms, bytes);
+    }
+    void  tlfree(void* mem) { mspace_free(tlms, mem); }
+
+  Unless FOOTERS is defined, each mspace is completely independent.
+  You cannot allocate from one and free to another (although
+  conformance is only weakly checked, so usage errors are not always
+  caught). If FOOTERS is defined, then each chunk carries around a tag
+  indicating its originating mspace, and frees are directed to their
+  originating spaces. Normally, this requires use of locks.
+
+ ─────────────────────────  Compile-time options ───────────────────────────
+
+Be careful in setting #define values for numerical constants of type
+size_t. On some systems, literal values are not automatically extended
+to size_t precision unless they are explicitly casted. You can also
+use the symbolic values SIZE_MAX, SIZE_T_ONE, etc below.
+
+WIN32                    default: defined if _WIN32 defined
+  Defining WIN32 sets up defaults for MS environment and compilers.
+  Otherwise defaults are for unix. Beware that there seem to be some
+  cases where this malloc might not be a pure drop-in replacement for
+  Win32 malloc: Random-looking failures from Win32 GDI API's (eg;
+  SetDIBits()) may be due to bugs in some video driver implementations
+  when pixel buffers are malloc()ed, and the region spans more than
+  one VirtualAlloc()ed region. Because dlmalloc uses a small (64Kb)
+  default granularity, pixel buffers may straddle virtual allocation
+  regions more often than when using the Microsoft allocator.  You can
+  avoid this by using VirtualAlloc() and VirtualFree() for all pixel
+  buffers rather than using malloc().  If this is not possible,
+  recompile this malloc with a larger DEFAULT_GRANULARITY. Note:
+  in cases where MSC and gcc (cygwin) are known to differ on WIN32,
+  conditions use _MSC_VER to distinguish them.
+
+DLMALLOC_EXPORT       default: extern
+  Defines how public APIs are declared. If you want to export via a
+  Windows DLL, you might define this as
+    #define DLMALLOC_EXPORT extern  __declspec(dllexport)
+  If you want a POSIX ELF shared object, you might use
+    #define DLMALLOC_EXPORT extern __attribute__((visibility("default")))
+
+MALLOC_ALIGNMENT         default: (size_t)(2 * sizeof(void *))
+  Controls the minimum alignment for malloc'ed chunks.  It must be a
+  power of two and at least 8, even on machines for which smaller
+  alignments would suffice. It may be defined as larger than this
+  though. Note however that code and data structures are optimized for
+  the case of 8-byte alignment.
+
+MSPACES                  default: 0 (false)
+  If true, compile in support for independent allocation spaces.
+  This is only supported if HAVE_MMAP is true.
+
+ONLY_MSPACES             default: 0 (false)
+  If true, only compile in mspace versions, not regular versions.
+
+USE_LOCKS                default: 0 (false)
+  Causes each call to each public routine to be surrounded with
+  pthread or WIN32 mutex lock/unlock. (If set true, this can be
+  overridden on a per-mspace basis for mspace versions.) If set to a
+  non-zero value other than 1, locks are used, but their
+  implementation is left out, so lock functions must be supplied manually,
+  as described below.
+
+USE_SPIN_LOCKS           default: 1 iff USE_LOCKS and spin locks available
+  If true, uses custom spin locks for locking. This is currently
+  supported only gcc >= 4.1, older gccs on x86 platforms, and recent
+  MS compilers.  Otherwise, posix locks or win32 critical sections are
+  used.
+
+USE_RECURSIVE_LOCKS      default: not defined
+  If defined nonzero, uses recursive (aka reentrant) locks, otherwise
+  uses plain mutexes. This is not required for malloc proper, but may
+  be needed for layered allocators such as nedmalloc.
+
+LOCK_AT_FORK            default: not defined
+  If defined nonzero, performs pthread_atfork upon initialization
+  to initialize child lock while holding parent lock. The implementation
+  assumes that pthread locks (not custom locks) are being used. In other
+  cases, you may need to customize the implementation.
+
+FOOTERS                  default: 0
+  If true, provide extra checking and dispatching by placing
+  information in the footers of allocated chunks. This adds
+  space and time overhead.
+
+TRUSTWORTHY                 default: 0
+  If true, omit checks for usage errors and heap space overwrites.
+
+USE_DL_PREFIX            default: NOT defined
+  Causes compiler to prefix all public routines with the string 'dl'.
+  This can be useful when you only want to use this malloc in one part
+  of a program, using your regular system malloc elsewhere.
+
+MALLOC_INSPECT_ALL       default: NOT defined
+  If defined, compiles malloc_inspect_all and mspace_inspect_all, that
+  perform traversal of all heap space.  Unless access to these
+  functions is otherwise restricted, you probably do not want to
+  include them in secure implementations.
+
+MALLOC_ABORT           default: defined as abort()
+  Defines how to abort on failed checks.  On most systems, a failed
+  check cannot die with an "assert" or even print an informative
+  message, because the underlying print routines in turn call malloc,
+  which will fail again.  Generally, the best policy is to simply call
+  abort(). It's not very useful to do more than this because many
+  errors due to overwriting will show up as address faults (null, odd
+  addresses etc) rather than malloc-triggered checks, so will also
+  abort.  Also, most compilers know that abort() does not return, so
+  can better optimize code conditionally calling it.
+
+PROCEED_ON_ERROR           default: defined as 0 (false)
+  Controls whether detected bad addresses cause them to bypassed
+  rather than aborting. If set, detected bad arguments to free and
+  realloc are ignored. And all bookkeeping information is zeroed out
+  upon a detected overwrite of freed heap space, thus losing the
+  ability to ever return it from malloc again, but enabling the
+  application to proceed. If PROCEED_ON_ERROR is defined, the
+  static variable malloc_corruption_error_count is compiled in
+  and can be examined to see if errors have occurred. This option
+  generates slower code than the default abort policy.
+
+DEBUG                    default: NOT defined
+  The DEBUG setting is mainly intended for people trying to modify
+  this code or diagnose problems when porting to new platforms.
+  However, it may also be able to better isolate user errors than just
+  using runtime checks.  The assertions in the check routines spell
+  out in more detail the assumptions and invariants underlying the
+  algorithms.  The checking is fairly extensive, and will slow down
+  execution noticeably. Calling malloc_stats or mallinfo with DEBUG
+  set will attempt to check every non-mmapped allocated and free chunk
+  in the course of computing the summaries.
+
+ABORT_ON_ASSERT_FAILURE   default: defined as 1 (true)
+  Debugging assertion failures can be nearly impossible if your
+  version of the assert macro causes malloc to be called, which will
+  lead to a cascade of further failures, blowing the runtime stack.
+  ABORT_ON_ASSERT_FAILURE cause assertions failures to call abort(),
+  which will usually make debugging easier.
+
+MALLOC_FAILURE_ACTION     default: sets errno to ENOMEM, or no-op on win32
+  The action to take before "return 0" when malloc fails to be able to
+  return memory because there is none available.
+
+HAVE_MORECORE             default: 1 (true) unless win32 or ONLY_MSPACES
+  True if this system supports sbrk or an emulation of it.
+
+MORECORE                  default: sbrk
+  The name of the sbrk-style system routine to call to obtain more
+  memory.  See below for guidance on writing custom MORECORE
+  functions. The type of the argument to sbrk/MORECORE varies across
+  systems.  It cannot be size_t, because it supports negative
+  arguments, so it is normally the signed type of the same width as
+  size_t (sometimes declared as "intptr_t").  It doesn't much matter
+  though. Internally, we only call it with arguments less than half
+  the max value of a size_t, which should work across all reasonable
+  possibilities, although sometimes generating compiler warnings.
+
+MORECORE_CONTIGUOUS       default: 1 (true) if HAVE_MORECORE
+  If true, take advantage of fact that consecutive calls to MORECORE
+  with positive arguments always return contiguous increasing
+  addresses.  This is true of unix sbrk. It does not hurt too much to
+  set it true anyway, since malloc copes with non-contiguities.
+  Setting it false when definitely non-contiguous saves time
+  and possibly wasted space it would take to discover this though.
+
+MORECORE_CANNOT_TRIM      default: NOT defined
+  True if MORECORE cannot release space back to the system when given
+  negative arguments. This is generally necessary only if you are
+  using a hand-crafted MORECORE function that cannot handle negative
+  arguments.
+
+NO_SEGMENT_TRAVERSAL       default: 0
+  If non-zero, suppresses traversals of memory segments
+  returned by either MORECORE or CALL_MMAP. This disables
+  merging of segments that are contiguous, and selectively
+  releasing them to the OS if unused, but bounds execution times.
+
+HAVE_MMAP                 default: 1 (true)
+  True if this system supports mmap or an emulation of it.  If so, and
+  HAVE_MORECORE is not true, MMAP is used for all system
+  allocation. If set and HAVE_MORECORE is true as well, MMAP is
+  primarily used to directly allocate very large blocks. It is also
+  used as a backup strategy in cases where MORECORE fails to provide
+  space from system. Note: A single call to MUNMAP is assumed to be
+  able to unmap memory that may have be allocated using multiple calls
+  to MMAP, so long as they are adjacent.
+
+HAVE_MREMAP               default: 1 on linux, else 0
+  If true realloc() uses mremap() to re-allocate large blocks and
+  extend or shrink allocation spaces.
+
+MMAP_CLEARS               default: 1 except on WINCE.
+  True if mmap clears memory so calloc doesn't need to. This is true
+  for standard unix mmap using /dev/zero and on WIN32 except for WINCE.
+
+USE_BUILTIN_FFS            default: 0 (i.e., not used)
+  Causes malloc to use the builtin ffs() function to compute indices.
+  Some compilers may recognize and intrinsify ffs to be faster than the
+  supplied C version. Also, the case of x86 using gcc is special-cased
+  to an asm instruction, so is already as fast as it can be, and so
+  this setting has no effect. Similarly for Win32 under recent MS compilers.
+  (On most x86s, the asm version is only slightly faster than the C version.)
+
+malloc_getpagesize         default: derive from system includes, or 4096.
+  The system page size. To the extent possible, this malloc manages
+  memory from the system in page-size units.  This may be (and
+  usually is) a function rather than a constant. This is ignored
+  if WIN32, where page size is determined using getSystemInfo during
+  initialization.
+
+NO_MALLINFO                default: 0
+  If defined, don't compile "mallinfo". This can be a simple way
+  of dealing with mismatches between system declarations and
+  those in this file.
+
+MALLINFO_FIELD_TYPE        default: size_t
+  The type of the fields in the mallinfo struct. This was originally
+  defined as "int" in SVID etc, but is more usefully defined as
+  size_t. The value is used only if  HAVE_USR_INCLUDE_MALLOC_H is not set
+
+NO_MALLOC_STATS            default: 0
+  If defined, don't compile "malloc_stats". This avoids calls to
+  fprintf and bringing in stdio dependencies you might not want.
+
+REALLOC_ZERO_BYTES_FREES    default: not defined
+  This should be set if a call to realloc with zero bytes should
+  be the same as a call to free. Some people think it should. Otherwise,
+  since this malloc returns a unique pointer for malloc(0), so does
+  realloc(p, 0).
+
+LACKS_UNISTD_H, LACKS_FCNTL_H, LACKS_SYS_PARAM_H, LACKS_SYS_MMAN_H
+LACKS_STRINGS_H, LACKS_STRING_H, LACKS_SYS_TYPES_H,  LACKS_ERRNO_H
+LACKS_STDLIB_H LACKS_SCHED_H LACKS_TIME_H  default: NOT defined unless on WIN32
+  Define these if your system does not have these header files.
+  You might need to manually insert some of the declarations they provide.
+
+DEFAULT_GRANULARITY        default: page size if MORECORE_CONTIGUOUS,
+                                system_info.dwAllocationGranularity in WIN32,
+                                otherwise 64K.
+      Also settable using mallopt(M_GRANULARITY, x)
+  The unit for allocating and deallocating memory from the system.  On
+  most systems with contiguous MORECORE, there is no reason to
+  make this more than a page. However, systems with MMAP tend to
+  either require or encourage larger granularities.  You can increase
+  this value to prevent system allocation functions to be called so
+  often, especially if they are slow.  The value must be at least one
+  page and must be a power of two.  Setting to 0 causes initialization
+  to either page size or win32 region size.  (Note: In previous
+  versions of malloc, the equivalent of this option was called
+  "TOP_PAD")
+
+DEFAULT_TRIM_THRESHOLD    default: 2MB
+      Also settable using mallopt(M_TRIM_THRESHOLD, x)
+  The maximum amount of unused top-most memory to keep before
+  releasing via malloc_trim in free().  Automatic trimming is mainly
+  useful in long-lived programs using contiguous MORECORE.  Because
+  trimming via sbrk can be slow on some systems, and can sometimes be
+  wasteful (in cases where programs immediately afterward allocate
+  more large chunks) the value should be high enough so that your
+  overall system performance would improve by releasing this much
+  memory.  As a rough guide, you might set to a value close to the
+  average size of a process (program) running on your system.
+  Releasing this much memory would allow such a process to run in
+  memory.  Generally, it is worth tuning trim thresholds when a
+  program undergoes phases where several large chunks are allocated
+  and released in ways that can reuse each other's storage, perhaps
+  mixed with phases where there are no such chunks at all. The trim
+  value must be greater than page size to have any useful effect.  To
+  disable trimming completely, you can set to SIZE_MAX. Note that the trick
+  some people use of mallocing a huge space and then freeing it at
+  program startup, in an attempt to reserve system memory, doesn't
+  have the intended effect under automatic trimming, since that memory
+  will immediately be returned to the system.
+
+DEFAULT_MMAP_THRESHOLD       default: 256K
+      Also settable using mallopt(M_MMAP_THRESHOLD, x)
+  The request size threshold for using MMAP to directly service a
+  request. Requests of at least this size that cannot be allocated
+  using already-existing space will be serviced via mmap.  (If enough
+  normal freed space already exists it is used instead.)  Using mmap
+  segregates relatively large chunks of memory so that they can be
+  individually obtained and released from the host system. A request
+  serviced through mmap is never reused by any other request (at least
+  not directly; the system may just so happen to remap successive
+  requests to the same locations).  Segregating space in this way has
+  the benefits that: Mmapped space can always be individually released
+  back to the system, which helps keep the system level memory demands
+  of a long-lived program low.  Also, mapped memory doesn't become
+  `locked' between other chunks, as can happen with normally allocated
+  chunks, which means that even trimming via malloc_trim would not
+  release them.  However, it has the disadvantage that the space
+  cannot be reclaimed, consolidated, and then used to service later
+  requests, as happens with normal chunks.  The advantages of mmap
+  nearly always outweigh disadvantages for "large" chunks, but the
+  value of "large" may vary across systems.  The default is an
+  empirically derived value that works well in most systems. You can
+  disable mmap by setting to SIZE_MAX.
+
+MAX_RELEASE_CHECK_RATE   default: 4095 unless not HAVE_MMAP
+  The number of consolidated frees between checks to release
+  unused segments when freeing. When using non-contiguous segments,
+  especially with multiple mspaces, checking only for topmost space
+  doesn't always suffice to trigger trimming. To compensate for this,
+  free() will, with a period of MAX_RELEASE_CHECK_RATE (or the
+  current number of segments, if greater) try to release unused
+  segments to the OS when freeing chunks that result in
+  consolidation. The best value for this parameter is a compromise
+  between slowing down frees with relatively costly checks that
+  rarely trigger versus holding on to unused memory. To effectively
+  disable, set to SIZE_MAX. This may lead to a very slight speed
+  improvement at the expense of carrying around more memory.
+
+────────────────────────────────────────────────────────────────────────────────
+
+History:
+
+    v2.8.6 Wed Aug 29 06:57:58 2012  Doug Lea
+      * fix bad comparison in dlposix_memalign
+      * don't reuse adjusted asize in sys_alloc
+      * add LOCK_AT_FORK -- thanks to Kirill Artamonov for the suggestion
+      * reduce compiler warnings -- thanks to all who reported/suggested these
+
+    v2.8.5 Sun May 22 10:26:02 2011  Doug Lea  (dl at gee)
+      * Always perform unlink checks unless TRUSTWORTHY
+      * Add posix_memalign.
+      * Improve realloc to expand in more cases; expose realloc_in_place.
+        Thanks to Peter Buhr for the suggestion.
+      * Add footprint_limit, inspect_all, bulk_free. Thanks
+        to Barry Hayes and others for the suggestions.
+      * Internal refactorings to avoid calls while holding locks
+      * Use non-reentrant locks by default. Thanks to Roland McGrath
+        for the suggestion.
+      * Small fixes to mspace_destroy, reset_on_error.
+      * Various configuration extensions/changes. Thanks
+         to all who contributed these.
+
+    V2.8.4a Thu Apr 28 14:39:43 2011 (dl at gee.cs.oswego.edu)
+      * Update Creative Commons URL
+
+    V2.8.4 Wed May 27 09:56:23 2009  Doug Lea  (dl at gee)
+      * Use zeros instead of prev foot for is_mmapped
+      * Add mspace_track_large_chunks; thanks to Jean Brouwers
+      * Fix set_inuse in internal_realloc; thanks to Jean Brouwers
+      * Fix insufficient sys_alloc padding when using 16byte alignment
+      * Fix bad error check in mspace_footprint
+      * Adaptations for ptmalloc; thanks to Wolfram Gloger.
+      * Reentrant spin locks; thanks to Earl Chew and others
+      * Win32 improvements; thanks to Niall Douglas and Earl Chew
+      * Add NO_SEGMENT_TRAVERSAL and MAX_RELEASE_CHECK_RATE options
+      * Extension hook in malloc_state
+      * Various small adjustments to reduce warnings on some compilers
+      * Various configuration extensions/changes for more platforms. Thanks
+         to all who contributed these.
+
+    V2.8.3 Thu Sep 22 11:16:32 2005  Doug Lea  (dl at gee)
+      * Add max_footprint functions
+      * Ensure all appropriate literals are size_t
+      * Fix conditional compilation problem for some #define settings
+      * Avoid concatenating segments with the one provided
+        in create_mspace_with_base
+      * Rename some variables to avoid compiler shadowing warnings
+      * Use explicit lock initialization.
+      * Better handling of sbrk interference.
+      * Simplify and fix segment insertion, trimming and mspace_destroy
+      * Reinstate REALLOC_ZERO_BYTES_FREES option from 2.7.x
+      * Thanks especially to Dennis Flanagan for help on these.
+
+    V2.8.2 Sun Jun 12 16:01:10 2005  Doug Lea  (dl at gee)
+      * Fix memalign brace error.
+
+    V2.8.1 Wed Jun  8 16:11:46 2005  Doug Lea  (dl at gee)
+      * Fix improper #endif nesting in C++
+      * Add explicit casts needed for C++
+
+    V2.8.0 Mon May 30 14:09:02 2005  Doug Lea  (dl at gee)
+      * Use trees for large bins
+      * Support mspaces
+      * Use segments to unify sbrk-based and mmap-based system allocation,
+        removing need for emulation on most platforms without sbrk.
+      * Default safety checks
+      * Optional footer checks. Thanks to William Robertson for the idea.
+      * Internal code refactoring
+      * Incorporate suggestions and platform-specific changes.
+        Thanks to Dennis Flanagan, Colin Plumb, Niall Douglas,
+        Aaron Bachmann,  Emery Berger, and others.
+      * Speed up non-fastbin processing enough to remove fastbins.
+      * Remove useless cfree() to avoid conflicts with other apps.
+      * Remove internal memcpy, memset. Compilers handle builtins better.
+      * Remove some options that no one ever used and rename others.
+
+    V2.7.2 Sat Aug 17 09:07:30 2002  Doug Lea  (dl at gee)
+      * Fix malloc_state bitmap array misdeclaration
+
+    V2.7.1 Thu Jul 25 10:58:03 2002  Doug Lea  (dl at gee)
+      * Allow tuning of FIRST_SORTED_BIN_SIZE
+      * Use PTR_UINT as type for all ptr->int casts. Thanks to John Belmonte.
+      * Better detection and support for non-contiguousness of MORECORE.
+        Thanks to Andreas Mueller, Conal Walsh, and Wolfram Gloger
+      * Bypass most of malloc if no frees. Thanks To Emery Berger.
+      * Fix freeing of old top non-contiguous chunk im sysmalloc.
+      * Raised default trim and map thresholds to 256K.
+      * Fix mmap-related #defines. Thanks to Lubos Lunak.
+      * Fix copy macros; added LACKS_FCNTL_H. Thanks to Neal Walfield.
+      * Branch-free bin calculation
+      * Default trim and mmap thresholds now 256K.
+
+    V2.7.0 Sun Mar 11 14:14:06 2001  Doug Lea  (dl at gee)
+      * Introduce independent_comalloc and independent_calloc.
+        Thanks to Michael Pachos for motivation and help.
+      * Make optional .h file available
+      * Allow > 2GB requests on 32bit systems.
+      * new WIN32 sbrk, mmap, munmap, lock code from <Walter@GeNeSys-e.de>.
+        Thanks also to Andreas Mueller <a.mueller at paradatec.de>,
+        and Anonymous.
+      * Allow override of MALLOC_ALIGNMENT (Thanks to Ruud Waij for
+        helping test this.)
+      * memalign: check alignment arg
+      * realloc: don't try to shift chunks backwards, since this
+        leads to  more fragmentation in some programs and doesn't
+        seem to help in any others.
+      * Collect all cases in malloc requiring system memory into sysmalloc
+      * Use mmap as backup to sbrk
+      * Place all internal state in malloc_state
+      * Introduce fastbins (although similar to 2.5.1)
+      * Many minor tunings and cosmetic improvements
+      * Introduce USE_PUBLIC_MALLOC_WRAPPERS, USE_MALLOC_LOCK
+      * Introduce MALLOC_FAILURE_ACTION, MORECORE_CONTIGUOUS
+        Thanks to Tony E. Bennett <tbennett@nvidia.com> and others.
+      * Include errno.h to support default failure action.
+
+    V2.6.6 Sun Dec  5 07:42:19 1999  Doug Lea  (dl at gee)
+      * return null for negative arguments
+      * Added Several WIN32 cleanups from Martin C. Fong <mcfong at yahoo.com>
+         * Add 'LACKS_SYS_PARAM_H' for those systems without 'sys/param.h'
+          (e.g. WIN32 platforms)
+         * Cleanup header file inclusion for WIN32 platforms
+         * Cleanup code to avoid Microsoft Visual C++ compiler complaints
+         * Add 'USE_DL_PREFIX' to quickly allow co-existence with existing
+           memory allocation routines
+         * Set 'malloc_getpagesize' for WIN32 platforms (needs more work)
+         * Use 'assert' rather than 'ASSERT' in WIN32 code to conform to
+           usage of 'assert' in non-WIN32 code
+         * Improve WIN32 'sbrk()' emulation's 'findRegion()' routine to
+           avoid infinite loop
+      * Always call 'fREe()' rather than 'free()'
+
+    V2.6.5 Wed Jun 17 15:57:31 1998  Doug Lea  (dl at gee)
+      * Fixed ordering problem with boundary-stamping
+
+    V2.6.3 Sun May 19 08:17:58 1996  Doug Lea  (dl at gee)
+      * Added pvalloc, as recommended by H.J. Liu
+      * Added 64bit pointer support mainly from Wolfram Gloger
+      * Added anonymously donated WIN32 sbrk emulation
+      * Malloc, calloc, getpagesize: add optimizations from Raymond Nijssen
+      * malloc_extend_top: fix mask error that caused wastage after
+        foreign sbrks
+      * Add linux mremap support code from HJ Liu
+
+    V2.6.2 Tue Dec  5 06:52:55 1995  Doug Lea  (dl at gee)
+      * Integrated most documentation with the code.
+      * Add support for mmap, with help from
+        Wolfram Gloger (Gloger@lrz.uni-muenchen.de).
+      * Use last_remainder in more cases.
+      * Pack bins using idea from  colin@nyx10.cs.du.edu
+      * Use ordered bins instead of best-fit threshhold
+      * Eliminate block-local decls to simplify tracing and debugging.
+      * Support another case of realloc via move into top
+      * Fix error occuring when initial sbrk_base not word-aligned.
+      * Rely on page size for units instead of SBRK_UNIT to
+        avoid surprises about sbrk alignment conventions.
+      * Add mallinfo, mallopt. Thanks to Raymond Nijssen
+        (raymond@es.ele.tue.nl) for the suggestion.
+      * Add `pad' argument to malloc_trim and top_pad mallopt parameter.
+      * More precautions for cases where other routines call sbrk,
+        courtesy of Wolfram Gloger (Gloger@lrz.uni-muenchen.de).
+      * Added macros etc., allowing use in linux libc from
+        H.J. Lu (hjl@gnu.ai.mit.edu)
+      * Inverted this history list
+
+    V2.6.1 Sat Dec  2 14:10:57 1995  Doug Lea  (dl at gee)
+      * Re-tuned and fixed to behave more nicely with V2.6.0 changes.
+      * Removed all preallocation code since under current scheme
+        the work required to undo bad preallocations exceeds
+        the work saved in good cases for most test programs.
+      * No longer use return list or unconsolidated bins since
+        no scheme using them consistently outperforms those that don't
+        given above changes.
+      * Use best fit for very large chunks to prevent some worst-cases.
+      * Added some support for debugging
+
+    V2.6.0 Sat Nov  4 07:05:23 1995  Doug Lea  (dl at gee)
+      * Removed footers when chunks are in use. Thanks to
+        Paul Wilson (wilson@cs.texas.edu) for the suggestion.
+
+    V2.5.4 Wed Nov  1 07:54:51 1995  Doug Lea  (dl at gee)
+      * Added malloc_trim, with help from Wolfram Gloger
+        (wmglo@Dent.MED.Uni-Muenchen.DE).
+
+    V2.5.3 Tue Apr 26 10:16:01 1994  Doug Lea  (dl at g)
+
+    V2.5.2 Tue Apr  5 16:20:40 1994  Doug Lea  (dl at g)
+      * realloc: try to expand in both directions
+      * malloc: swap order of clean-bin strategy;
+      * realloc: only conditionally expand backwards
+      * Try not to scavenge used bins
+      * Use bin counts as a guide to preallocation
+      * Occasionally bin return list chunks in first scan
+      * Add a few optimizations from colin@nyx10.cs.du.edu
+
+    V2.5.1 Sat Aug 14 15:40:43 1993  Doug Lea  (dl at g)
+      * faster bin computation & slightly different binning
+      * merged all consolidations to one part of malloc proper
+         (eliminating old malloc_find_space & malloc_clean_bin)
+      * Scan 2 returns chunks (not just 1)
+      * Propagate failure in realloc if malloc returns 0
+      * Add stuff to allow compilation on non-ANSI compilers
+          from kpv@research.att.com
+
+    V2.5 Sat Aug  7 07:41:59 1993  Doug Lea  (dl at g.oswego.edu)
+      * removed potential for odd address access in prev_chunk
+      * removed dependency on getpagesize.h
+      * misc cosmetics and a bit more internal documentation
+      * anticosmetics: mangled names in macros to evade debugger strangeness
+      * tested on sparc, hp-700, dec-mips, rs6000
+          with gcc & native cc (hp, dec only) allowing
+          Detlefs & Zorn comparison study (in SIGPLAN Notices.)
+
+    Trial version Fri Aug 28 13:14:29 1992  Doug Lea  (dl at g.oswego.edu)
+      * Based loosely on libg++-1.2X malloc. (It retains some of the overall
+         structure of old version,  but most details differ.)
+
+/* ──────────────────── Alternative MORECORE functions ─────────────────── */
+
+/*
+  Guidelines for creating a custom version of MORECORE:
+
+  * For best performance, MORECORE should allocate in multiples of pagesize.
+  * MORECORE may allocate more memory than requested. (Or even less,
+      but this will usually result in a malloc failure.)
+  * MORECORE must not allocate memory when given argument zero, but
+      instead return one past the end address of memory from previous
+      nonzero call.
+  * For best performance, consecutive calls to MORECORE with positive
+      arguments should return increasing addresses, indicating that
+      space has been contiguously extended.
+  * Even though consecutive calls to MORECORE need not return contiguous
+      addresses, it must be OK for malloc'ed chunks to span multiple
+      regions in those cases where they do happen to be contiguous.
+  * MORECORE need not handle negative arguments -- it may instead
+      just return MFAIL when given negative arguments.
+      Negative arguments are always multiples of pagesize. MORECORE
+      must not misinterpret negative args as large positive unsigned
+      args. You can suppress all such calls from even occurring by defining
+      MORECORE_CANNOT_TRIM,
+
+  As an example alternative MORECORE, here is a custom allocator
+  kindly contributed for pre-OSX macOS.  It uses virtually but not
+  necessarily physically contiguous non-paged memory (locked in,
+  present and won't get swapped out).  You can use it by uncommenting
+  this section, adding some #includes, and setting up the appropriate
+  defines above:
+
+      #define MORECORE osMoreCore
+
+  There is also a shutdown routine that should somehow be called for
+  cleanup upon program exit.
+
+  #define MAX_POOL_ENTRIES 100
+  #define MINIMUM_MORECORE_SIZE  (64 * 1024U)
+  static int next_os_pool;
+  void *our_os_pools[MAX_POOL_ENTRIES];
+
+  void *osMoreCore(int size)
+  {
+    void *ptr = 0;
+    static void *sbrk_top = 0;
+
+    if (size > 0)
+    {
+      if (size < MINIMUM_MORECORE_SIZE)
+         size = MINIMUM_MORECORE_SIZE;
+      if (CurrentExecutionLevel() == kTaskLevel)
+         ptr = PoolAllocateResident(size + RM_PAGE_SIZE, 0);
+      if (ptr == 0)
+      {
+        return (void *) MFAIL;
+      }
+      // save ptrs so they can be freed during cleanup
+      our_os_pools[next_os_pool] = ptr;
+      next_os_pool++;
+      ptr = (void *) ((((size_t) ptr) + RM_PAGE_MASK) & ~RM_PAGE_MASK);
+      sbrk_top = (char *) ptr + size;
+      return ptr;
+    }
+    else if (size < 0)
+    {
+      // we don't currently support shrink behavior
+      return (void *) MFAIL;
+    }
+    else
+    {
+      return sbrk_top;
+    }
+  }
+
+  // cleanup any allocated memory pools
+  // called as last thing before shutting down driver
+
+  void osCleanupMem(void)
+  {
+    void **ptr;
+
+    for (ptr = our_os_pools; ptr < &our_os_pools[MAX_POOL_ENTRIES]; ptr++)
+      if (*ptr)
+      {
+         PoolDeallocate(*ptr);
+         *ptr = 0;
+      }
+  }
+
+*/
--- a/third_party/dlmalloc/README.cosmo
+++ b/third_party/dlmalloc/README.cosmo
@ -0,0 +1,15 @@
+Numerous local changes were made while vendoring Doug Lee's original
+dlmalloc sources. Those changes basically boil down to:
+
+  1. Fewer #ifdefs
+  2. More modules (so linker can do a better job)
+  3. Delete code we don't need (cf. Knight Capital)
+  4. Readability / stylistic consistency
+
+Since we haven't made any genuine improvements to Doug Lee's legendary
+allocator, we feel this folder faithfully presents his intended work, in
+harmony with Cosmopolitan conventions.
+
+The only deleted code we're sure has compelling merit is the mspace
+functionality. If we ever need memory pools, they might be more
+appropriately vendored under //third_party/dlmalloc_mspace.
--- a/third_party/dlmalloc/bulk_free.c
+++ b/third_party/dlmalloc/bulk_free.c
@ -0,0 +1,56 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/**
+ * Frees and clears (sets to NULL) each non-null pointer in the given
+ * array. This is likely to be faster than freeing them one-by-one. If
+ * footers are used, pointers that have been allocated in different
+ * mspaces are not freed or cleared, and the count of all such pointers
+ * is returned. For large arrays of pointers with poor locality, it may
+ * be worthwhile to sort this array before calling bulk_free.
+ */
+size_t bulk_free(void *array[], size_t nelem) {
+  /*
+   * Try to free all pointers in the given array. Note: this could be
+   * made faster, by delaying consolidation, at the price of disabling
+   * some user integrity checks, We still optimize some consolidations
+   * by combining adjacent chunks before freeing, which will occur often
+   * if allocated with ialloc or the array is sorted.
+   */
+  size_t unfreed = 0;
+  if (!PREACTION(gm)) {
+    void **a;
+    void **fence = &(array[nelem]);
+    for (a = array; a != fence; ++a) {
+      void *mem = *a;
+      if (mem != 0) {
+        mchunkptr p = mem2chunk(ADDRESS_DEATH_ACTION(mem));
+        size_t psize = chunksize(p);
+#if FOOTERS
+        if (get_mstate_for(p) != gm) {
+          ++unfreed;
+          continue;
+        }
+#endif
+        check_inuse_chunk(gm, p);
+        *a = 0;
+        if (RTCHECK(ok_address(gm, p) && ok_inuse(p))) {
+          void **b = a + 1; /* try to merge with next chunk */
+          mchunkptr next = next_chunk(p);
+          if (b != fence && *b == chunk2mem(next)) {
+            size_t newsize = chunksize(next) + psize;
+            set_inuse(gm, p, newsize);
+            *b = chunk2mem(p);
+          } else
+            dlmalloc_dispose_chunk(gm, p, psize);
+        } else {
+          CORRUPTION_ERROR_ACTION(gm);
+          break;
+        }
+      }
+    }
+    if (should_trim(gm, gm->topsize)) dlmalloc_sys_trim(gm, 0);
+    POSTACTION(gm);
+  }
+  return unfreed;
+}
--- a/third_party/dlmalloc/dlindependent_calloc.c
+++ b/third_party/dlmalloc/dlindependent_calloc.c
@ -0,0 +1,228 @@
+#include "libc/mem/mem.h"
+#include "libc/str/str.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/*
+  Common support for independent_X routines, handling
+    all of the combinations that can result.
+  The opts arg has:
+    bit 0 set if all elements are same size (using sizes[0])
+    bit 1 set if elements should be zeroed
+*/
+static void **ialloc(mstate m, size_t n_elements, size_t *sizes, int opts,
+                     void *chunks[]) {
+  size_t element_size;   /* chunksize of each element, if all same */
+  size_t contents_size;  /* total size of elements */
+  size_t array_size;     /* request size of pointer array */
+  void *mem;             /* malloced aggregate space */
+  mchunkptr p;           /* corresponding chunk */
+  size_t remainder_size; /* remaining bytes while splitting */
+  void **marray;         /* either "chunks" or malloced ptr array */
+  mchunkptr array_chunk; /* chunk for malloced ptr array */
+  flag_t was_enabled;    /* to disable mmap */
+  size_t size;
+  size_t i;
+
+  ensure_initialization();
+  /* compute array length, if needed */
+  if (chunks != 0) {
+    if (n_elements == 0) return chunks; /* nothing to do */
+    marray = chunks;
+    array_size = 0;
+  } else {
+    /* if empty req, must still return chunk representing empty array */
+    if (n_elements == 0) return (void **)dlmalloc(0);
+    marray = 0;
+    array_size = request2size(n_elements * (sizeof(void *)));
+  }
+
+  /* compute total element size */
+  if (opts & 0x1) { /* all-same-size */
+    element_size = request2size(*sizes);
+    contents_size = n_elements * element_size;
+  } else { /* add up all the sizes */
+    element_size = 0;
+    contents_size = 0;
+    for (i = 0; i != n_elements; ++i) contents_size += request2size(sizes[i]);
+  }
+
+  size = contents_size + array_size;
+
+  /*
+     Allocate the aggregate chunk.  First disable direct-mmapping so
+     malloc won't use it, since we would not be able to later
+     free/realloc space internal to a segregated mmap region.
+  */
+  was_enabled = use_mmap(m);
+  disable_mmap(m);
+  mem = dlmalloc(size - CHUNK_OVERHEAD);
+  if (was_enabled) enable_mmap(m);
+  if (mem == 0) return 0;
+
+  if (PREACTION(m)) return 0;
+  p = mem2chunk(mem);
+  remainder_size = chunksize(p);
+
+  assert(!is_mmapped(p));
+
+  if (opts & 0x2) { /* optionally clear the elements */
+    memset((size_t *)mem, 0, remainder_size - SIZE_T_SIZE - array_size);
+  }
+
+  /* If not provided, allocate the pointer array as final part of chunk */
+  if (marray == 0) {
+    size_t array_chunk_size;
+    array_chunk = chunk_plus_offset(p, contents_size);
+    array_chunk_size = remainder_size - contents_size;
+    marray = ADDRESS_BIRTH_ACTION((void **)(chunk2mem(array_chunk)));
+    set_size_and_pinuse_of_inuse_chunk(m, array_chunk, array_chunk_size);
+    remainder_size = contents_size;
+  }
+
+  /* split out elements */
+  for (i = 0;; ++i) {
+    marray[i] = ADDRESS_BIRTH_ACTION(chunk2mem(p));
+    if (i != n_elements - 1) {
+      if (element_size != 0)
+        size = element_size;
+      else
+        size = request2size(sizes[i]);
+      remainder_size -= size;
+      set_size_and_pinuse_of_inuse_chunk(m, p, size);
+      p = chunk_plus_offset(p, size);
+    } else { /* the final element absorbs any overallocation slop */
+      set_size_and_pinuse_of_inuse_chunk(m, p, remainder_size);
+      break;
+    }
+  }
+
+#if DEBUG + MODE_DBG + 0
+  if (marray != chunks) {
+    /* final element must have exactly exhausted chunk */
+    if (element_size != 0) {
+      assert(remainder_size == element_size);
+    } else {
+      assert(remainder_size == request2size(sizes[i]));
+    }
+    check_inuse_chunk(m, mem2chunk(marray));
+  }
+  for (i = 0; i != n_elements; ++i) check_inuse_chunk(m, mem2chunk(marray[i]));
+
+#endif /* DEBUG */
+
+  POSTACTION(m);
+  return marray;
+}
+
+/**
+ * independent_calloc(size_t n_elements, size_t element_size, void* chunks[]);
+ *
+ * independent_calloc is similar to calloc, but instead of returning a
+ * single cleared space, it returns an array of pointers to n_elements
+ * independent elements that can hold contents of size elem_size, each
+ * of which starts out cleared, and can be independently freed,
+ * realloc'ed etc. The elements are guaranteed to be adjacently
+ * allocated (this is not guaranteed to occur with multiple callocs or
+ * mallocs), which may also improve cache locality in some applications.
+ *
+ * The "chunks" argument is optional (i.e., may be null, which is
+ * probably the most typical usage). If it is null, the returned array
+ * is itself dynamically allocated and should also be freed when it is
+ * no longer needed. Otherwise, the chunks array must be of at least
+ * n_elements in length. It is filled in with the pointers to the
+ * chunks.
+ *
+ * In either case, independent_calloc returns this pointer array, or
+ * null if the allocation failed. * If n_elements is zero and "chunks"
+ * is null, it returns a chunk representing an array with zero elements
+ * (which should be freed if not wanted).
+ *
+ * Each element must be freed when it is no longer needed. This can be
+ * done all at once using bulk_free.
+ *
+ * independent_calloc simplifies and speeds up implementations of many
+ * kinds of pools. * It may also be useful when constructing large data
+ * structures that initially have a fixed number of fixed-sized nodes,
+ * but the number is not known at compile time, and some of the nodes
+ * may later need to be freed. For example:
+ *
+ *   struct Node { int item; struct Node* next; };
+ *   struct Node* build_list() {
+ *     struct Node **pool;
+ *     int n = read_number_of_nodes_needed();
+ *     if (n <= 0) return 0;
+ *     pool = (struct Node**)(independent_calloc(n, sizeof(struct Node), 0);
+ *     if (pool == 0) die();
+ *     // organize into a linked list...
+ *     struct Node* first = pool[0];
+ *     for (i = 0; i < n-1; ++i)
+ *     pool[i]->next = pool[i+1];
+ *     free(pool); * // Can now free the array (or not, if it is needed later)
+ *     return first;
+ *   }
+ */
+void **dlindependent_calloc(size_t n_elements, size_t elem_size,
+                            void *chunks[]) {
+  size_t sz = elem_size; /* serves as 1-element array */
+  return ialloc(gm, n_elements, &sz, 3, chunks);
+}
+
+/**
+ * independent_comalloc(size_t n_elements, size_t sizes[], void* chunks[]);
+ *
+ * independent_comalloc allocates, all at once, a set of n_elements
+ * chunks with sizes indicated in the "sizes" array. It returns an array
+ * of pointers to these elements, each of which can be independently
+ * freed, realloc'ed etc. The elements are guaranteed to be adjacently
+ * allocated (this is not guaranteed to occur with multiple callocs or
+ * mallocs), which may also improve cache locality in some applications.
+ *
+ * The "chunks" argument is optional (i.e., may be null). If it is null
+ * the returned array is itself dynamically allocated and should also
+ * be freed when it is no longer needed. Otherwise, the chunks array
+ * must be of at least n_elements in length. It is filled in with the
+ * pointers to the chunks.
+ *
+ * In either case, independent_comalloc returns this pointer array, or
+ * null if the allocation failed.  If n_elements is zero and chunks is
+ * null, it returns a chunk representing an array with zero elements
+ * (which should be freed if not wanted).
+ *
+ * Each element must be freed when it is no longer needed. This can be
+ * done all at once using bulk_free.
+ *
+ * independent_comallac differs from independent_calloc in that each
+ * element may have a different size, and also that it does not
+ * automatically clear elements.
+ *
+ * independent_comalloc can be used to speed up allocation in cases
+ * where several structs or objects must always be allocated at the
+ * same time.  For example:
+ *
+ *   struct Head { ... }
+ *   struct Foot { ... }
+ *
+ *   void send_message(char* msg) {
+ *     int msglen = strlen(msg);
+ *     size_t sizes[3] = { sizeof(struct Head), msglen, sizeof(struct Foot) };
+ *     void* chunks[3];
+ *     if (independent_comalloc(3, sizes, chunks) == 0)
+ *       die();
+ *     struct Head* head = (struct Head*)(chunks[0]);
+ *     char*        body = (char*)(chunks[1]);
+ *     struct Foot* foot = (struct Foot*)(chunks[2]);
+ *     // ...
+ *   }
+ *
+ * In general though, independent_comalloc is worth using only for
+ * larger values of n_elements. For small values, you probably won't
+ * detect enough difference from series of malloc calls to bother.
+ *
+ * Overuse of independent_comalloc can increase overall memory usage,
+ * since it cannot reuse existing noncontiguous small chunks that might
+ * be available for some of the elements.
+ */
+void **dlindependent_comalloc(size_t n_elements, size_t sizes[],
+                              void *chunks[]) {
+  return ialloc(gm, n_elements, sizes, 0, chunks);
+}
--- a/third_party/dlmalloc/dlmalloc-debug.c
+++ b/third_party/dlmalloc/dlmalloc-debug.c
@ -0,0 +1,247 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/* Check properties of any chunk, whether free, inuse, mmapped etc  */
+static void do_check_any_chunk(mstate m, mchunkptr p) {
+  assert((is_aligned(chunk2mem(p))) || (p->head == FENCEPOST_HEAD));
+  assert(ok_address(m, p));
+}
+
+/* Check properties of top chunk */
+void do_check_top_chunk(mstate m, mchunkptr p) {
+  msegmentptr sp = segment_holding(m, (char*)p);
+  size_t sz = p->head & ~INUSE_BITS; /* third-lowest bit can be set! */
+  assert(sp != 0);
+  assert((is_aligned(chunk2mem(p))) || (p->head == FENCEPOST_HEAD));
+  assert(ok_address(m, p));
+  assert(sz == m->topsize);
+  assert(sz > 0);
+  assert(sz == ((sp->base + sp->size) - (char*)p) - TOP_FOOT_SIZE);
+  assert(pinuse(p));
+  assert(!pinuse(chunk_plus_offset(p, sz)));
+}
+
+/* Check properties of (inuse) mmapped chunks */
+void do_check_mmapped_chunk(mstate m, mchunkptr p) {
+  size_t sz = chunksize(p);
+  size_t len = (sz + (p->prev_foot) + MMAP_FOOT_PAD);
+  assert(is_mmapped(p));
+  assert(use_mmap(m));
+  assert((is_aligned(chunk2mem(p))) || (p->head == FENCEPOST_HEAD));
+  assert(ok_address(m, p));
+  assert(!is_small(sz));
+  assert((len & (mparams.page_size - SIZE_T_ONE)) == 0);
+  assert(chunk_plus_offset(p, sz)->head == FENCEPOST_HEAD);
+  assert(chunk_plus_offset(p, sz + SIZE_T_SIZE)->head == 0);
+}
+
+/* Check properties of inuse chunks */
+void do_check_inuse_chunk(mstate m, mchunkptr p) {
+  do_check_any_chunk(m, p);
+  assert(is_inuse(p));
+  assert(next_pinuse(p));
+  /* If not pinuse and not mmapped, previous chunk has OK offset */
+  assert(is_mmapped(p) || pinuse(p) || next_chunk(prev_chunk(p)) == p);
+  if (is_mmapped(p)) do_check_mmapped_chunk(m, p);
+}
+
+/* Check properties of free chunks */
+void do_check_free_chunk(mstate m, mchunkptr p) {
+  size_t sz = chunksize(p);
+  mchunkptr next = chunk_plus_offset(p, sz);
+  do_check_any_chunk(m, p);
+  assert(!is_inuse(p));
+  assert(!next_pinuse(p));
+  assert(!is_mmapped(p));
+  if (p != m->dv && p != m->top) {
+    if (sz >= MIN_CHUNK_SIZE) {
+      assert((sz & CHUNK_ALIGN_MASK) == 0);
+      assert(is_aligned(chunk2mem(p)));
+      assert(next->prev_foot == sz);
+      assert(pinuse(p));
+      assert(next == m->top || is_inuse(next));
+      assert(p->fd->bk == p);
+      assert(p->bk->fd == p);
+    } else /* markers are always of size SIZE_T_SIZE */
+      assert(sz == SIZE_T_SIZE);
+  }
+}
+
+/* Check properties of malloced chunks at the point they are malloced */
+void do_check_malloced_chunk(mstate m, void* mem, size_t s) {
+  if (mem != 0) {
+    mchunkptr p = mem2chunk(mem);
+    size_t sz = p->head & ~INUSE_BITS;
+    do_check_inuse_chunk(m, p);
+    assert((sz & CHUNK_ALIGN_MASK) == 0);
+    assert(sz >= MIN_CHUNK_SIZE);
+    assert(sz >= s);
+    /* unless mmapped, size is less than MIN_CHUNK_SIZE more than request */
+    assert(is_mmapped(p) || sz < (s + MIN_CHUNK_SIZE));
+  }
+}
+
+/* Check a tree and its subtrees.  */
+static void do_check_tree(mstate m, tchunkptr t) {
+  tchunkptr head = 0;
+  tchunkptr u = t;
+  bindex_t tindex = t->index;
+  size_t tsize = chunksize(t);
+  bindex_t idx;
+  compute_tree_index(tsize, idx);
+  assert(tindex == idx);
+  assert(tsize >= MIN_LARGE_SIZE);
+  assert(tsize >= minsize_for_tree_index(idx));
+  assert((idx == NTREEBINS - 1) || (tsize < minsize_for_tree_index((idx + 1))));
+
+  do { /* traverse through chain of same-sized nodes */
+    do_check_any_chunk(m, ((mchunkptr)u));
+    assert(u->index == tindex);
+    assert(chunksize(u) == tsize);
+    assert(!is_inuse(u));
+    assert(!next_pinuse(u));
+    assert(u->fd->bk == u);
+    assert(u->bk->fd == u);
+    if (u->parent == 0) {
+      assert(u->child[0] == 0);
+      assert(u->child[1] == 0);
+    } else {
+      assert(head == 0); /* only one node on chain has parent */
+      head = u;
+      assert(u->parent != u);
+      assert(u->parent->child[0] == u || u->parent->child[1] == u ||
+             *((tbinptr*)(u->parent)) == u);
+      if (u->child[0] != 0) {
+        assert(u->child[0]->parent == u);
+        assert(u->child[0] != u);
+        do_check_tree(m, u->child[0]);
+      }
+      if (u->child[1] != 0) {
+        assert(u->child[1]->parent == u);
+        assert(u->child[1] != u);
+        do_check_tree(m, u->child[1]);
+      }
+      if (u->child[0] != 0 && u->child[1] != 0) {
+        assert(chunksize(u->child[0]) < chunksize(u->child[1]));
+      }
+    }
+    u = u->fd;
+  } while (u != t);
+  assert(head != 0);
+}
+
+/*  Check all the chunks in a treebin.  */
+static void do_check_treebin(mstate m, bindex_t i) {
+  tbinptr* tb = treebin_at(m, i);
+  tchunkptr t = *tb;
+  int empty = (m->treemap & (1U << i)) == 0;
+  if (t == 0) assert(empty);
+  if (!empty) do_check_tree(m, t);
+}
+
+/*  Check all the chunks in a smallbin.  */
+static void do_check_smallbin(mstate m, bindex_t i) {
+  sbinptr b = smallbin_at(m, i);
+  mchunkptr p = b->bk;
+  unsigned int empty = (m->smallmap & (1U << i)) == 0;
+  if (p == b) assert(empty);
+  if (!empty) {
+    for (; p != b; p = p->bk) {
+      size_t size = chunksize(p);
+      mchunkptr q;
+      /* each chunk claims to be free */
+      do_check_free_chunk(m, p);
+      /* chunk belongs in bin */
+      assert(small_index(size) == i);
+      assert(p->bk == b || chunksize(p->bk) == chunksize(p));
+      /* chunk is followed by an inuse chunk */
+      q = next_chunk(p);
+      if (q->head != FENCEPOST_HEAD) do_check_inuse_chunk(m, q);
+    }
+  }
+}
+
+/* Find x in a bin. Used in other check functions. */
+static int bin_find(mstate m, mchunkptr x) {
+  size_t size = chunksize(x);
+  if (is_small(size)) {
+    bindex_t sidx = small_index(size);
+    sbinptr b = smallbin_at(m, sidx);
+    if (smallmap_is_marked(m, sidx)) {
+      mchunkptr p = b;
+      do {
+        if (p == x) return 1;
+      } while ((p = p->fd) != b);
+    }
+  } else {
+    bindex_t tidx;
+    compute_tree_index(size, tidx);
+    if (treemap_is_marked(m, tidx)) {
+      tchunkptr t = *treebin_at(m, tidx);
+      size_t sizebits = size << leftshift_for_tree_index(tidx);
+      while (t != 0 && chunksize(t) != size) {
+        t = t->child[(sizebits >> (SIZE_T_BITSIZE - SIZE_T_ONE)) & 1];
+        sizebits <<= 1;
+      }
+      if (t != 0) {
+        tchunkptr u = t;
+        do {
+          if (u == (tchunkptr)x) return 1;
+        } while ((u = u->fd) != t);
+      }
+    }
+  }
+  return 0;
+}
+
+/* Traverse each chunk and check it; return total */
+static size_t traverse_and_check(mstate m) {
+  size_t sum = 0;
+  if (is_initialized(m)) {
+    msegmentptr s = &m->seg;
+    sum += m->topsize + TOP_FOOT_SIZE;
+    while (s != 0) {
+      mchunkptr q = align_as_chunk(s->base);
+      mchunkptr lastq = 0;
+      assert(pinuse(q));
+      while (segment_holds(s, q) && q != m->top && q->head != FENCEPOST_HEAD) {
+        sum += chunksize(q);
+        if (is_inuse(q)) {
+          assert(!bin_find(m, q));
+          do_check_inuse_chunk(m, q);
+        } else {
+          assert(q == m->dv || bin_find(m, q));
+          assert(lastq == 0 || is_inuse(lastq)); /* Not 2 consecutive free */
+          do_check_free_chunk(m, q);
+        }
+        lastq = q;
+        q = next_chunk(q);
+      }
+      s = s->next;
+    }
+  }
+  return sum;
+}
+
+/* Check all properties of malloc_state. */
+void do_check_malloc_state(mstate m) {
+  bindex_t i;
+  size_t total;
+  /* check bins */
+  for (i = 0; i < NSMALLBINS; ++i) do_check_smallbin(m, i);
+  for (i = 0; i < NTREEBINS; ++i) do_check_treebin(m, i);
+  if (m->dvsize != 0) { /* check dv chunk */
+    do_check_any_chunk(m, m->dv);
+    assert(m->dvsize == chunksize(m->dv));
+    assert(m->dvsize >= MIN_CHUNK_SIZE);
+    assert(bin_find(m, m->dv) == 0);
+  }
+  if (m->top != 0) { /* check top chunk */
+    do_check_top_chunk(m, m->top);
+    /*assert(m->topsize == chunksize(m->top)); redundant */
+    assert(m->topsize > 0);
+    assert(bin_find(m, m->top) == 0);
+  }
+  total = traverse_and_check(m);
+  assert(total <= m->footprint);
+  assert(m->footprint <= m->max_footprint);
+}
--- a/third_party/dlmalloc/dlmalloc-usable.c
+++ b/third_party/dlmalloc/dlmalloc-usable.c
@ -0,0 +1,10 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+size_t dlmalloc_usable_size(const void* mem) {
+  if (mem != 0) {
+    mchunkptr p = mem2chunk(mem);
+    if (is_inuse(p)) return chunksize(p) - overhead_for(p);
+  }
+  return 0;
+}
--- a/third_party/dlmalloc/dlmalloc.c
+++ b/third_party/dlmalloc/dlmalloc.c
--- a/third_party/dlmalloc/dlmalloc.h
+++ b/third_party/dlmalloc/dlmalloc.h
--- a/third_party/dlmalloc/dlmalloc.mk
+++ b/third_party/dlmalloc/dlmalloc.mk
@ -0,0 +1,60 @@
+#-*-mode:makefile-gmake;indent-tabs-mode:t;tab-width:8;coding:utf-8-*-┐
+#───vi: set et ft=make ts=8 tw=8 fenc=utf-8 :vi───────────────────────┘
+
+PKGS += THIRD_PARTY_DLMALLOC
+
+THIRD_PARTY_DLMALLOC_ARTIFACTS += THIRD_PARTY_DLMALLOC_A
+THIRD_PARTY_DLMALLOC = $(THIRD_PARTY_DLMALLOC_A_DEPS) $(THIRD_PARTY_DLMALLOC_A)
+THIRD_PARTY_DLMALLOC_A = o/$(MODE)/third_party/dlmalloc/dlmalloc.a
+THIRD_PARTY_DLMALLOC_A_FILES := $(wildcard third_party/dlmalloc/*)
+THIRD_PARTY_DLMALLOC_A_HDRS = $(filter %.h,$(THIRD_PARTY_DLMALLOC_A_FILES))
+THIRD_PARTY_DLMALLOC_A_SRCS_S = $(filter %.S,$(THIRD_PARTY_DLMALLOC_A_FILES))
+THIRD_PARTY_DLMALLOC_A_SRCS_C = $(filter %.c,$(THIRD_PARTY_DLMALLOC_A_FILES))
+
+THIRD_PARTY_DLMALLOC_A_SRCS =					\
+	$(THIRD_PARTY_DLMALLOC_A_SRCS_S)			\
+	$(THIRD_PARTY_DLMALLOC_A_SRCS_C)
+
+THIRD_PARTY_DLMALLOC_A_OBJS =					\
+	$(THIRD_PARTY_DLMALLOC_A_SRCS:%=o/$(MODE)/%.zip.o)	\
+	$(THIRD_PARTY_DLMALLOC_A_SRCS_S:%.S=o/$(MODE)/%.o)	\
+	$(THIRD_PARTY_DLMALLOC_A_SRCS_C:%.c=o/$(MODE)/%.o)
+
+THIRD_PARTY_DLMALLOC_A_CHECKS =					\
+	$(THIRD_PARTY_DLMALLOC_A).pkg				\
+	$(THIRD_PARTY_DLMALLOC_A_HDRS:%=o/$(MODE)/%.ok)
+
+THIRD_PARTY_DLMALLOC_A_DIRECTDEPS =				\
+	LIBC_CALLS						\
+	LIBC_CONV						\
+	LIBC_FMT						\
+	LIBC_NEXGEN32E						\
+	LIBC_RUNTIME						\
+	LIBC_STR						\
+	LIBC_STUBS						\
+	LIBC_SYSV						\
+	LIBC_SYSV_CALLS
+
+THIRD_PARTY_DLMALLOC_A_DEPS :=					\
+	$(call uniq,$(foreach x,$(THIRD_PARTY_DLMALLOC_A_DIRECTDEPS),$($(x))))
+
+$(THIRD_PARTY_DLMALLOC_A):					\
+		third_party/dlmalloc/				\
+		$(THIRD_PARTY_DLMALLOC_A).pkg			\
+		$(THIRD_PARTY_DLMALLOC_A_OBJS)
+
+$(THIRD_PARTY_DLMALLOC_A).pkg:					\
+		$(THIRD_PARTY_DLMALLOC_A_OBJS)			\
+		$(foreach x,$(THIRD_PARTY_DLMALLOC_A_DIRECTDEPS),$($(x)_A).pkg)
+
+THIRD_PARTY_DLMALLOC_LIBS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)))
+THIRD_PARTY_DLMALLOC_SRCS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_SRCS))
+THIRD_PARTY_DLMALLOC_HDRS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_HDRS))
+THIRD_PARTY_DLMALLOC_BINS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_BINS))
+THIRD_PARTY_DLMALLOC_CHECKS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_CHECKS))
+THIRD_PARTY_DLMALLOC_OBJS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_OBJS))
+THIRD_PARTY_DLMALLOC_TESTS = $(foreach x,$(THIRD_PARTY_DLMALLOC_ARTIFACTS),$($(x)_TESTS))
+$(THIRD_PARTY_DLMALLOC_OBJS): $(BUILD_FILES) third_party/dlmalloc/dlmalloc.mk
+
+.PHONY: o/$(MODE)/third_party/dlmalloc
+o/$(MODE)/third_party/dlmalloc: $(THIRD_PARTY_DLMALLOC_CHECKS)
--- a/third_party/dlmalloc/dlmalloc_stats.c
+++ b/third_party/dlmalloc/dlmalloc_stats.c
@ -0,0 +1,47 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+#include "libc/mem/mem.h"
+#include "libc/str/str.h"
+
+/**
+ * Prints on stderr the amount of space obtained from the system (both
+ * via sbrk and mmap), the maximum amount (which may be more than
+ * current if malloc_trim and/or munmap got called), and the current
+ * number of bytes allocated via malloc (or realloc, etc) but not yet
+ * freed. Note that this is the number of bytes allocated, not the
+ * number requested. It will be larger than the number requested because
+ * of alignment and bookkeeping overhead. Because it includes alignment
+ * wastage as being in use, this figure may be greater than zero even
+ * when no user-level chunks are allocated.
+ *
+ * The reported current and maximum system memory can be inaccurate if a
+ * program makes other calls to system memory allocation functions
+ * (normally sbrk) outside of malloc.
+ *
+ * malloc_stats prints only the most commonly interesting statistics.
+ * More information can be obtained by calling mallinfo.
+ */
+struct MallocStats dlmalloc_stats(mstate m) {
+  struct MallocStats res;
+  memset(&res, 0, sizeof(res));
+  ensure_initialization();
+  if (!PREACTION(m)) {
+    check_malloc_state(m);
+    if (is_initialized(m)) {
+      msegmentptr s = &m->seg;
+      res.maxfp = m->max_footprint;
+      res.fp = m->footprint;
+      res.used = res.fp - (m->topsize + TOP_FOOT_SIZE);
+      while (s != 0) {
+        mchunkptr q = align_as_chunk(s->base);
+        while (segment_holds(s, q) && q != m->top &&
+               q->head != FENCEPOST_HEAD) {
+          if (!is_inuse(q)) res.used -= chunksize(q);
+          q = next_chunk(q);
+        }
+        s = s->next;
+      }
+    }
+    POSTACTION(m); /* drop lock */
+  }
+  return res;
+}
--- a/third_party/dlmalloc/dlmemalign-impl.c
+++ b/third_party/dlmalloc/dlmemalign-impl.c
@ -0,0 +1,70 @@
+#include "libc/mem/mem.h"
+#include "libc/sysv/errfuns.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+void* dlmemalign$impl(mstate m, size_t alignment, size_t bytes) {
+  void* mem = 0;
+  if (alignment < MIN_CHUNK_SIZE) /* must be at least a minimum chunk size */
+    alignment = MIN_CHUNK_SIZE;
+  if ((alignment & (alignment - SIZE_T_ONE)) != 0) { /* Ensure a power of 2 */
+    size_t a = MALLOC_ALIGNMENT << 1;
+    while (a < alignment) a <<= 1;
+    alignment = a;
+  }
+  if (bytes >= MAX_REQUEST - alignment) {
+    if (m != 0) { /* Test isn't needed but avoids compiler warning */
+      enomem();
+    }
+  } else {
+    size_t nb = request2size(bytes);
+    size_t req = nb + alignment + MIN_CHUNK_SIZE - CHUNK_OVERHEAD;
+    mem = dlmalloc(req);
+    if (mem != 0) {
+      mchunkptr p = mem2chunk(mem);
+      if (PREACTION(m)) return 0;
+      if ((((size_t)(mem)) & (alignment - 1)) != 0) { /* misaligned */
+        /*
+          Find an aligned spot inside chunk.  Since we need to give
+          back leading space in a chunk of at least MIN_CHUNK_SIZE, if
+          the first calculation places us at a spot with less than
+          MIN_CHUNK_SIZE leader, we can move to the next aligned spot.
+          We've allocated enough total room so that this is always
+          possible.
+        */
+        char* br = (char*)mem2chunk((size_t)(
+            ((size_t)((char*)mem + alignment - SIZE_T_ONE)) & -alignment));
+        char* pos =
+            ((size_t)(br - (char*)(p)) >= MIN_CHUNK_SIZE) ? br : br + alignment;
+        mchunkptr newp = (mchunkptr)pos;
+        size_t leadsize = pos - (char*)(p);
+        size_t newsize = chunksize(p) - leadsize;
+        if (is_mmapped(p)) { /* For mmapped chunks, just adjust offset */
+          newp->prev_foot = p->prev_foot + leadsize;
+          newp->head = newsize;
+        } else { /* Otherwise, give back leader, use the rest */
+          set_inuse(m, newp, newsize);
+          set_inuse(m, p, leadsize);
+          dlmalloc_dispose_chunk(m, p, leadsize);
+        }
+        p = newp;
+      }
+      /* Give back spare room at the end */
+      if (!is_mmapped(p)) {
+        size_t size = chunksize(p);
+        if (size > nb + MIN_CHUNK_SIZE) {
+          size_t remainder_size = size - nb;
+          mchunkptr remainder = chunk_plus_offset(p, nb);
+          set_inuse(m, p, nb);
+          set_inuse(m, remainder, remainder_size);
+          dlmalloc_dispose_chunk(m, remainder, remainder_size);
+        }
+      }
+      mem = chunk2mem(p);
+      assert(chunksize(p) >= nb);
+      assert(((size_t)mem & (alignment - 1)) == 0);
+      check_inuse_chunk(m, p);
+      POSTACTION(m);
+    }
+  }
+  return ADDRESS_BIRTH_ACTION(mem);
+}
--- a/third_party/dlmalloc/dlmemalign.c
+++ b/third_party/dlmalloc/dlmemalign.c
@ -0,0 +1,9 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+#include "libc/mem/mem.h"
+
+void *dlmemalign(size_t alignment, size_t bytes) {
+  if (alignment <= MALLOC_ALIGNMENT) {
+    return dlmalloc(bytes);
+  }
+  return dlmemalign$impl(gm, alignment, bytes);
+}
--- a/third_party/dlmalloc/dlposix_memalign.c
+++ b/third_party/dlmalloc/dlposix_memalign.c
@ -0,0 +1,25 @@
+#include "libc/errno.h"
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+int dlposix_memalign(void** pp, size_t alignment, size_t bytes) {
+  void* mem = 0;
+  if (alignment == MALLOC_ALIGNMENT)
+    mem = dlmalloc(bytes);
+  else {
+    size_t d = alignment / sizeof(void*);
+    size_t r = alignment % sizeof(void*);
+    if (r != 0 || d == 0 || (d & (d - SIZE_T_ONE)) != 0)
+      return EINVAL;
+    else if (bytes <= MAX_REQUEST - alignment) {
+      if (alignment < MIN_CHUNK_SIZE) alignment = MIN_CHUNK_SIZE;
+      mem = dlmemalign$impl(gm, alignment, bytes);
+    }
+  }
+  if (mem == 0)
+    return ENOMEM;
+  else {
+    *pp = mem;
+    return 0;
+  }
+}
--- a/third_party/dlmalloc/dlpvalloc.c
+++ b/third_party/dlmalloc/dlpvalloc.c
@ -0,0 +1,10 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+void *dlpvalloc(size_t bytes) {
+  size_t pagesz;
+  ensure_initialization();
+  pagesz = mparams.page_size;
+  return dlmemalign(pagesz,
+                    (bytes + pagesz - SIZE_T_ONE) & ~(pagesz - SIZE_T_ONE));
+}
--- a/third_party/dlmalloc/dlrealloc_in_place.c
+++ b/third_party/dlmalloc/dlrealloc_in_place.c
@ -0,0 +1,33 @@
+#include "libc/mem/mem.h"
+#include "libc/sysv/errfuns.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+void *dlrealloc_in_place(void *oldmem, size_t bytes) {
+  void *mem = 0;
+  if (oldmem != 0) {
+    if (bytes >= MAX_REQUEST) {
+      enomem();
+    } else {
+      size_t nb = request2size(bytes);
+      mchunkptr oldp = mem2chunk(oldmem);
+#if !FOOTERS
+      mstate m = gm;
+#else  /* FOOTERS */
+      mstate m = get_mstate_for(oldp);
+      if (!ok_magic(m)) {
+        USAGE_ERROR_ACTION(m, oldmem);
+        return 0;
+      }
+#endif /* FOOTERS */
+      if (!PREACTION(m)) {
+        mchunkptr newp = dlmalloc_try_realloc_chunk(m, oldp, nb, 0);
+        POSTACTION(m);
+        if (newp == oldp) {
+          check_inuse_chunk(m, newp);
+          mem = oldmem;
+        }
+      }
+    }
+  }
+  return mem;
+}
--- a/third_party/dlmalloc/dlvalloc.c
+++ b/third_party/dlmalloc/dlvalloc.c
@ -0,0 +1,9 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+void *dlvalloc(size_t bytes) {
+  size_t pagesz;
+  ensure_initialization();
+  pagesz = mparams.page_size;
+  return dlmemalign(pagesz, bytes);
+}
--- a/third_party/dlmalloc/mallinfo.c
+++ b/third_party/dlmalloc/mallinfo.c
@ -0,0 +1,60 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+#include "libc/mem/mem.h"
+
+/**
+ * Returns (by copy) a struct containing various summary statistics:
+ *
+ * arena:     current total non-mmapped bytes allocated from system
+ * ordblks:   the number of free chunks
+ * smblks:    always zero.
+ * hblks:     current number of mmapped regions
+ * hblkhd:    total bytes held in mmapped regions
+ * usmblks:   the maximum total allocated space. This will be greater
+ *            than current total if trimming has occurred.
+ * fsmblks:   always zero
+ * uordblks:  current total allocated space (normal or mmapped)
+ * fordblks:  total free space
+ * keepcost:  the maximum number of bytes that could ideally be released
+ *            back to system via malloc_trim. ("ideally" means that
+ *            it ignores page restrictions etc.)
+ *
+ * Because these fields are ints, but internal bookkeeping may
+ * be kept as longs, the reported values may wrap around zero and
+ * thus be inaccurate.
+ */
+struct mallinfo mallinfo(void) {
+  struct mallinfo nm = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+  ensure_initialization();
+  if (!PREACTION(gm)) {
+    check_malloc_state(gm);
+    if (is_initialized(gm)) {
+      size_t nfree = SIZE_T_ONE; /* top always free */
+      size_t mfree = gm->topsize + TOP_FOOT_SIZE;
+      size_t sum = mfree;
+      msegmentptr s = &gm->seg;
+      while (s != 0) {
+        mchunkptr q = align_as_chunk(s->base);
+        while (segment_holds(s, q) && q != gm->top &&
+               q->head != FENCEPOST_HEAD) {
+          size_t sz = chunksize(q);
+          sum += sz;
+          if (!is_inuse(q)) {
+            mfree += sz;
+            ++nfree;
+          }
+          q = next_chunk(q);
+        }
+        s = s->next;
+      }
+      nm.arena = sum;
+      nm.ordblks = nfree;
+      nm.hblkhd = gm->footprint - sum;
+      nm.usmblks = gm->max_footprint;
+      nm.uordblks = gm->footprint - mfree;
+      nm.fordblks = mfree;
+      nm.keepcost = gm->topsize;
+    }
+    POSTACTION(gm);
+  }
+  return nm;
+}
--- a/third_party/dlmalloc/malloc_footprint.c
+++ b/third_party/dlmalloc/malloc_footprint.c
@ -0,0 +1,12 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+#include "libc/mem/mem.h"
+
+/**
+ * Returns the number of bytes obtained from the system. The total
+ * number of bytes allocated by malloc, realloc etc., is less than this
+ * value. Unlike mallinfo, this function returns only a precomputed
+ * result, so can be called frequently to monitor memory consumption.
+ * Even if locks are otherwise defined, this function does not use them,
+ * so results might not be up to date.
+ */
+size_t malloc_footprint(void) { return gm->footprint; }
--- a/third_party/dlmalloc/malloc_footprint_limit.c
+++ b/third_party/dlmalloc/malloc_footprint_limit.c
@ -0,0 +1,15 @@
+#include "libc/limits.h"
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/**
+ * Returns the number of bytes that the heap is allowed to obtain from
+ * the system, returning the last value returned by
+ * malloc_set_footprint_limit, or the maximum size_t value if never set.
+ * The returned value reflects a permission. There is no guarantee that
+ * this number of bytes can actually be obtained from the system.
+ */
+size_t malloc_footprint_limit(void) {
+  size_t maf = gm->footprint_limit;
+  return maf == 0 ? SIZE_MAX : maf;
+}
--- a/third_party/dlmalloc/malloc_inspect_all.c
+++ b/third_party/dlmalloc/malloc_inspect_all.c
@ -0,0 +1,71 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+static void internal_inspect_all(mstate m,
+                                 void (*handler)(void* start, void* end,
+                                                 size_t used_bytes,
+                                                 void* callback_arg),
+                                 void* arg) {
+  if (is_initialized(m)) {
+    mchunkptr top = m->top;
+    msegmentptr s;
+    for (s = &m->seg; s != 0; s = s->next) {
+      mchunkptr q = align_as_chunk(s->base);
+      while (segment_holds(s, q) && q->head != FENCEPOST_HEAD) {
+        mchunkptr next = next_chunk(q);
+        size_t sz = chunksize(q);
+        size_t used;
+        void* start;
+        if (is_inuse(q)) {
+          used = sz - CHUNK_OVERHEAD; /* must not be mmapped */
+          start = chunk2mem(q);
+        } else {
+          used = 0;
+          if (is_small(sz)) { /* offset by possible bookkeeping */
+            start = (void*)((char*)q + sizeof(struct malloc_chunk));
+          } else {
+            start = (void*)((char*)q + sizeof(struct malloc_tree_chunk));
+          }
+        }
+        if (start < (void*)next) /* skip if all space is bookkeeping */
+          handler(start, next, used, arg);
+        if (q == top) break;
+        q = next;
+      }
+    }
+  }
+}
+
+/**
+ * Traverses the heap and calls the given handler for each managed
+ * region, skipping all bytes that are (or may be) used for bookkeeping
+ * purposes.  Traversal does not include include chunks that have been
+ * directly memory mapped. Each reported region begins at the start
+ * address, and continues up to but not including the end address.  The
+ * first used_bytes of the region contain allocated data. If
+ * used_bytes is zero, the region is unallocated. The handler is
+ * invoked with the given callback argument. If locks are defined, they
+ * are held during the entire traversal. It is a bad idea to invoke
+ * other malloc functions from within the handler.
+ *
+ * For example, to count the number of in-use chunks with size greater
+ * than 1000, you could write:
+ *
+ *     static int count = 0;
+ *     void count_chunks(void* start, void* end, size_t used, void* arg) {
+ *       if (used >= 1000) ++count;
+ *     }
+ *
+ * then,
+ *
+ *     malloc_inspect_all(count_chunks, NULL);
+ */
+void malloc_inspect_all(void (*handler)(void* start, void* end,
+                                        size_t used_bytes, void* callback_arg),
+                        void* arg) {
+  ensure_initialization();
+  if (!PREACTION(gm)) {
+    internal_inspect_all(gm, handler, arg);
+    POSTACTION(gm);
+  }
+}
--- a/third_party/dlmalloc/malloc_max_footprint.c
+++ b/third_party/dlmalloc/malloc_max_footprint.c
@ -0,0 +1,14 @@
+#include "third_party/dlmalloc/dlmalloc.h"
+#include "libc/mem/mem.h"
+
+/**
+ * Returns the maximum number of bytes obtained from the system. This
+ * value will be greater than current footprint if deallocated space has
+ * been reclaimed by the system. The peak number of bytes allocated by
+ * malloc, realloc etc., is less than this value. Unlike mallinfo, this
+ * function returns only a precomputed result, so can be called
+ * frequently to monitor memory consumption. Even if locks are otherwise
+ * defined, this function does not use them, so results might not be up
+ * to date.
+ */
+size_t malloc_max_footprint(void) { return gm->max_footprint; }
--- a/third_party/dlmalloc/malloc_set_footprint_limit.c
+++ b/third_party/dlmalloc/malloc_set_footprint_limit.c
@ -0,0 +1,25 @@
+#include "libc/limits.h"
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/**
+ * Sets the maximum number of bytes to obtain from the system, causing
+ * failure returns from malloc and related functions upon attempts to
+ * exceed this value. The argument value may be subject to page rounding
+ * to an enforceable limit; this actual value is returned. Using an
+ * argument of the maximum possible size_t effectively disables checks.
+ * If the argument is less than or equal to the current
+ * malloc_footprint, then all future allocations that require additional
+ * system memory will fail. However, invocation cannot retroactively
+ * deallocate existing used memory.
+ */
+size_t malloc_set_footprint_limit(size_t bytes) {
+  size_t result;                                 /* invert sense of 0 */
+  if (bytes == 0) result = granularity_align(1); /* Use minimal size */
+  if (bytes == SIZE_MAX) {
+    result = 0; /* disable */
+  } else {
+    result = granularity_align(bytes);
+  }
+  return gm->footprint_limit = result;
+}
--- a/third_party/dlmalloc/malloc_trim.c
+++ b/third_party/dlmalloc/malloc_trim.c
@ -0,0 +1,31 @@
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/**
+ * If possible, gives memory back to the system (via negative arguments
+ * to sbrk) if there is unused memory at the `high' end of the malloc
+ * pool or in unused MMAP segments. You can call this after freeing
+ * large blocks of memory to potentially reduce the system-level memory
+ * requirements of a program. However, it cannot guarantee to reduce
+ * memory. Under some allocation patterns, some large free blocks of
+ * memory will be locked between two used chunks, so they cannot be
+ * given back to the system.
+ *
+ * The `pad' argument to malloc_trim represents the amount of free
+ * trailing space to leave untrimmed. If this argument is zero, only the
+ * minimum amount of memory to maintain internal data structures will be
+ * left. Non-zero arguments can be supplied to maintain enough trailing
+ * space to service future expected allocations without having to
+ * re-obtain memory from the system.
+ *
+ * @return 1 if it actually released any memory, else 0
+ */
+int malloc_trim(size_t pad) {
+  int result = 0;
+  ensure_initialization();
+  if (!PREACTION(gm)) {
+    result = dlmalloc_sys_trim(gm, pad);
+    POSTACTION(gm);
+  }
+  return result;
+}
--- a/third_party/dlmalloc/mallopt.c
+++ b/third_party/dlmalloc/mallopt.c
@ -0,0 +1,42 @@
+#include "libc/limits.h"
+#include "libc/mem/mem.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+/**
+ * Sets memory allocation parameter.
+ *
+ * The format is to provide a (parameter-number, parameter-value) pair.
+ * mallopt then sets the corresponding parameter to the argument value
+ * if it can (i.e., so long as the value is meaningful), and returns 1
+ * if successful else 0. SVID/XPG/ANSI defines four standard param
+ * numbers for mallopt, normally defined in malloc.h. None of these are
+ * use in this malloc, so setting them has no effect. But this malloc
+ * also supports other options in mallopt:
+ *
+ * Symbol            param #  default    allowed param values
+ * M_TRIM_THRESHOLD     -1   2*1024*1024   any   (-1U disables trimming)
+ * M_GRANULARITY        -2     page size   any power of 2 >= page size
+ * M_MMAP_THRESHOLD     -3      256*1024   any   (or 0 if no MMAP support)
+ */
+bool32 mallopt(int param_number, int value) {
+  size_t val;
+  ensure_initialization();
+  val = (value == -1) ? SIZE_MAX : (size_t)value;
+  switch (param_number) {
+    case M_TRIM_THRESHOLD:
+      mparams.trim_threshold = val;
+      return true;
+    case M_GRANULARITY:
+      if (val >= mparams.page_size && ((val & (val - 1)) == 0)) {
+        mparams.granularity = val;
+        return true;
+      } else {
+        return false;
+      }
+    case M_MMAP_THRESHOLD:
+      mparams.mmap_threshold = val;
+      return true;
+    default:
+      return false;
+  }
+}
--- a/third_party/dlmalloc/mtrace.c
+++ b/third_party/dlmalloc/mtrace.c
@ -0,0 +1,55 @@
+/*-*- mode:c;indent-tabs-mode:nil;c-basic-offset:2;tab-width:8;coding:utf-8 -*-│
+│vi: set net ft=c ts=2 sts=2 sw=2 fenc=utf-8                                :vi│
+╞══════════════════════════════════════════════════════════════════════════════╡
+│ Copyright 2020 Justine Alexandra Roberts Tunney                              │
+│                                                                              │
+│ This program is free software; you can redistribute it and/or modify         │
+│ it under the terms of the GNU General Public License as published by         │
+│ the Free Software Foundation; version 2 of the License.                      │
+│                                                                              │
+│ This program is distributed in the hope that it will be useful, but          │
+│ WITHOUT ANY WARRANTY; without even the implied warranty of                   │
+│ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU             │
+│ General Public License for more details.                                     │
+│                                                                              │
+│ You should have received a copy of the GNU General Public License            │
+│ along with this program; if not, write to the Free Software                  │
+│ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA                │
+│ 02110-1301 USA                                                               │
+╚─────────────────────────────────────────────────────────────────────────────*/
+#include "libc/conv/itoa.h"
+#include "libc/runtime/missioncritical.h"
+#include "libc/str/str.h"
+#include "third_party/dlmalloc/dlmalloc.h"
+
+static uintptr_t lastfree_;
+
+void *AddressBirthAction(void *addr) {
+  char buf[64], *p;
+  p = buf;
+  p = stpcpy(p, __FUNCTION__);
+  p = stpcpy(p, ": 0x");
+  p += uint64toarray_radix16((uintptr_t)addr, p);
+  *p++ = '\n';
+  __print(buf, p - buf);
+  if (lastfree_ == (uintptr_t)addr) {
+    lastfree_ = 0;
+  }
+  return addr;
+}
+
+void *AddressDeathAction(void *addr) {
+  char buf[64], *p;
+  p = buf;
+  p = stpcpy(p, __FUNCTION__);
+  p = stpcpy(p, ": 0x");
+  p += uint64toarray_radix16((uintptr_t)addr, p);
+  if (lastfree_ != (uintptr_t)addr) {
+    lastfree_ = (uintptr_t)addr;
+  } else {
+    p = stpcpy(p, " [OBVIOUS DOUBLE FREE]");
+  }
+  *p++ = '\n';
+  __print(buf, p - buf);
+  return addr;
+}