Initial import

2025-06-27 14:58:30 +00:00 · 2020-06-15 07:18:57 -07:00 · 2020-06-15 07:18:57 -07:00 · c91b3c5006
commit c91b3c5006
14915 changed files with 590219 additions and 0 deletions
--- a/third_party/stb/README.cosmo
+++ b/third_party/stb/README.cosmo
@ -0,0 +1,23 @@
+LOCAL CHANGES
+
+  - Rewrite endian code so it's optimizable
+  - Add malloc() to functions w/ frames greater than PAGESIZE
+  - Removed undefined behavior
+  - Removed BMP [endian code made it 100x slower than PNG/JPEG]
+  - Removed PIC [never heard of it]
+  - Removed TGA [consider imaagemagick convert command]
+  - Removed PSD [consider imaagemagick convert command]
+  - Removed HDR [mine eyes and wikipedia agree stb gamma math is off]
+  - Patched PNG loading edge case
+  - Fixed code C standard says is undefined
+  - PNG now uses ultra-fast Chromium zlib w/ CLMUL crc32
+  - Removed unnecessary ifdefs
+  - Removed MSVC torture code
+
+SYNCHRONIZATION POINT
+
+  commit f67165c2bb2af3060ecae7d20d6f731173485ad0
+  Author: Sean Barrett <sean2@nothings.org>
+  Date:   Mon Oct 28 09:30:02 2019 -0700
+
+      Update README.md
--- a/third_party/stb/README.txt
+++ b/third_party/stb/README.txt
@ -0,0 +1,371 @@
+/*
+ * stb_image - v2.23 - public domain image loader - http://nothings.org/stb
+ *                                no warranty implied; use at your own risk
+ *
+ * [heavily modified by justine tunney]
+ *
+ *    JPEG baseline & progressive (12 bpc/arithmetic not supported, same
+ *                      as stock IJG lib) PNG 1/2/4/8/16-bit-per-channel
+ *    GIF (*comp always reports as 4-channel)
+ *    HDR (radiance rgbE format)
+ *    PNM (PPM and PGM binary only)
+ *
+ *    Animated GIF still needs a proper API, but here's one way to do it:
+ *        http://gist.github.com/urraka/685d9a6340b26b830d49
+ *
+ *    - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
+ *    - decode from arbitrary I/O callbacks
+ *
+ * ============================    Contributors    =========================
+ *
+ * Image formats                          Extensions, features
+ *  Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
+ *  Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
+ *  Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
+ *  Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
+ *  Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
+ *  Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
+ *  Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
+ *  github:urraka (animated gif)           Junggon Kim (PNM comments)
+ *  Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
+ *                                         socks-the-fox (16-bit PNG)
+ *                                         Jeremy Sawicki (ImageNet JPGs)
+ *                                         Mikhail Morozov (1-bit BMP)
+ * Optimizations & bugfixes                Anael Seghezzi (is-16-bit query)
+ *  Fabian "ryg" Giesen
+ *  Arseny Kapoulkine
+ *  John-Mark Allen
+ *  Carmelo J Fdez-Aguera
+ *
+ * Bug & warning fixes
+ * Marc LeBlanc            David Woo          Guillaume George   Martins Mozeiko
+ * Christpher Lloyd        Jerry Jansson      Joseph Thomson     Phil Jordan
+ * Dave Moore              Roy Eltham         Hayaki Saito       Nathan Reed
+ * Won Chun                Luke Graham        Johan Duparc       Nick Verigakis
+ * the Horde3D community   Thomas Ruf         Ronny Chevalier    github:rlyeh
+ * Janez Zemva             John Bartholomew   Michal Cichon github:romigrou
+ * Jonathan Blow           Ken Hamada         Tero Hanninen      github:svdijk
+ * Laurent Gomila          Cort Stratton      Sergio Gonzalez    github:snagar
+ * Aruelien Pocheville     Thibault Reuille   Cass Everitt       github:Zelex
+ * Ryamond Barbiero        Paul Du Bois       Engin Manap        github:grim210
+ * Aldo Culquicondor       Philipp Wiesemann  Dale Weiler        github:sammyhw
+ * Oriol Ferrer Mesia      Josh Tobin         Matthew Gregan     github:phprus
+ * Julian Raschke          Gregory Mullen     Baldur Karlsson
+ * github:poppolopoppo Christian Floisand      Kevin Schmidt      JR Smith
+ * github:darealshinji Blazej Dariusz Roszkowski github:Michaelangel007
+ */
+
+/*
+ * DOCUMENTATION
+ *
+ * Limitations:
+ *    - no 12-bit-per-channel JPEG
+ *    - no JPEGs with arithmetic coding
+ *    - GIF always returns *comp=4
+ *
+ * Basic usage (see HDR discussion below for HDR usage):
+ *    int x,y,n;
+ *    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
+ *    // ... process data if not NULL ...
+ *    // ... x = width, y = height, n = # 8-bit components per pixel ...
+ *    // ... replace '0' with '1'..'4' to force that many components per pixel
+ *    // ... but 'n' will always be the number that it would have been if you
+ *    said 0 stbi_image_free(data)
+ *
+ * Standard parameters:
+ *    int *x                 -- outputs image width in pixels
+ *    int *y                 -- outputs image height in pixels
+ *    int *channels_in_file  -- outputs # of image components in image file
+ *    int desired_channels   -- if non-zero, # of image components requested in
+ *    result
+ *
+ * The return value from an image loader is an 'unsigned char *' which points
+ * to the pixel data, or NULL on an allocation failure or if the image is
+ * corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
+ * with each pixel consisting of N interleaved 8-bit components; the first
+ * pixel pointed to is top-left-most in the image. There is no padding between
+ * image scanlines or between pixels, regardless of format. The number of
+ * components N is 'desired_channels' if desired_channels is non-zero, or
+ * *channels_in_file otherwise. If desired_channels is non-zero,
+ * *channels_in_file has the number of components that _would_ have been
+ * output otherwise. E.g. if you set desired_channels to 4, you will always
+ * get RGBA output, but you can check *channels_in_file to see if it's trivially
+ * opaque because e.g. there were only 3 channels in the source image.
+ *
+ * An output image with N components has the following components interleaved
+ * in this order in each pixel:
+ *
+ *     N=#comp     components
+ *       1           grey
+ *       2           grey, alpha
+ *       3           red, green, blue
+ *       4           red, green, blue, alpha
+ *
+ * If image loading fails for any reason, the return value will be NULL,
+ * and *x, *y, *channels_in_file will be unchanged. The function
+ * stbi_failure_reason() can be queried for an extremely brief, end-user
+ * unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
+ * to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get
+ * slightly more user-friendly ones.
+ *
+ * Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
+ *
+ * ===========================================================================
+ *
+ * I/O callbacks
+ *
+ * I/O callbacks allow you to read from arbitrary sources, like packaged
+ * files or some other source. Data read from callbacks are processed
+ * through a small internal buffer (currently 128 bytes) to try to reduce
+ * overhead.
+ *
+ * The three functions you must define are "read" (reads some bytes of data),
+ * "skip" (skips some bytes of data), "eof" (reports if the stream is at the
+ * end).
+ *
+ * ===========================================================================
+ *
+ * HDR image support   (disable by defining STBI_NO_HDR)
+ *
+ * stb_image supports loading HDR images in general, and currently the Radiance
+ * .HDR file format specifically. You can still load any file through the
+ * existing interface; if you attempt to load an HDR file, it will be
+ * automatically remapped to LDR, assuming gamma 2.2 and an arbitrary scale
+ * factor defaulting to 1; both of these constants can be reconfigured through
+ * this interface:
+ *
+ *     stbi_hdr_to_ldr_gamma(2.2f);
+ *     stbi_hdr_to_ldr_scale(1.0f);
+ *
+ * (note, do not use _inverse_ constants; stbi_image will invert them
+ * appropriately).
+ *
+ * Additionally, there is a new, parallel interface for loading files as
+ * (linear) floats to preserve the full dynamic range:
+ *
+ *    float *data = stbi_loadf(filename, &x, &y, &n, 0);
+ *
+ * If you load LDR images through this interface, those images will
+ * be promoted to floating point values, run through the inverse of
+ * constants corresponding to the above:
+ *
+ *     stbi_ldr_to_hdr_scale(1.0f);
+ *     stbi_ldr_to_hdr_gamma(2.2f);
+ *
+ * Finally, given a filename (or an open file or memory block--see header
+ * file for details) containing image data, you can query for the "most
+ * appropriate" interface to use (that is, whether the image is HDR or
+ * not), using:
+ *
+ *     stbi_is_hdr(char *filename);
+ *
+ * ===========================================================================
+ *
+ * iPhone PNG support:
+ *
+ * By default we convert iphone-formatted PNGs back to RGB, even though
+ * they are internally encoded differently. You can disable this conversion
+ * by calling stbi_convert_iphone_png_to_rgb(0), in which case
+ * you will always just get the native iphone "format" through (which
+ * is BGR stored in RGB).
+ *
+ * Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
+ * pixel to remove any premultiplied alpha *only* if the image file explicitly
+ * says there's premultiplied data (currently only happens in iPhone images,
+ * and only if iPhone convert-to-rgb processing is on).
+ *
+ * ===========================================================================
+ *
+ * ADDITIONAL CONFIGURATION
+ *
+ *  - You can suppress implementation of any of the decoders to reduce
+ *    your code footprint by #defining one or more of the following
+ *    symbols before creating the implementation.
+ *
+ *        STBI_NO_JPEG
+ *        STBI_NO_PNG
+ *        STBI_NO_GIF
+ *        STBI_NO_HDR
+ *        STBI_NO_PNM   (.ppm and .pgm)
+ *
+ *   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
+ *     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
+ *
+ */
+
+/* stb_image_resize - v0.96 - public domain image resizing
+ * by Jorge L Rodriguez (@VinoBS) - 2014
+ * http://github.com/nothings/stb
+ *
+ * Written with emphasis on usability, portability, and efficiency. (No
+ * SIMD or threads, so it be easily outperformed by libs that use those.)
+ * Only scaling and translation is supported, no rotations or shears.
+ * Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
+ *
+ * QUICKSTART
+ *    stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
+ *                             output_pixels, out_w, out_h, 0, num_channels)
+ *    stbir_resize_float(...)
+ *    stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
+ *                             output_pixels, out_w, out_h, 0,
+ *                             num_channels , alpha_chan  , 0)
+ *    stbir_resize_uint8_srgb_edgemode(
+ *                             input_pixels , in_w , in_h , 0,
+ *                             output_pixels, out_w, out_h, 0,
+ *                             num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
+ *                                                          // WRAP/REFLECT/ZERO
+ */
+
+/*
+ * DOCUMENTATION
+ *
+ *    SRGB & FLOATING POINT REPRESENTATION
+ *       The sRGB functions presume IEEE floating point. If you do not have
+ *       IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
+ *       a slower implementation.
+ *
+ *    MEMORY ALLOCATION
+ *       The resize functions here perform a single memory allocation using
+ *       malloc. To control the memory allocation, before the #include that
+ *       triggers the implementation, do:
+ *
+ *          #define STBIR_MALLOC(size,context) ...
+ *          #define STBIR_FREE(ptr,context)   ...
+ *
+ *       Each resize function makes exactly one call to malloc/free, so to use
+ *       temp memory, store the temp memory in the context and return that.
+ *
+ *    ASSERT
+ *       Define STBIR_ASSERT(boolval) to override assert() and not use assert.h
+ *
+ *    OPTIMIZATION
+ *       Define STBIR_SATURATE_INT to compute clamp values in-range using
+ *       integer operations instead of float operations. This may be faster
+ *       on some platforms.
+ *
+ *    DEFAULT FILTERS
+ *       For functions which don't provide explicit control over what filters
+ *       to use, you can change the compile-time defaults with
+ *
+ *          #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
+ *          #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
+ *
+ *       See stbir_filter in the header-file section for the list of filters.
+ *
+ *    NEW FILTERS
+ *       A number of 1D filter kernels are used. For a list of
+ *       supported filters see the stbir_filter enum. To add a new filter,
+ *       write a filter function and add it to stbir__filter_info_table.
+ *
+ *    PROGRESS
+ *       For interactive use with slow resize operations, you can install
+ *       a progress-report callback:
+ *
+ *          #define STBIR_PROGRESS_REPORT(val)   some_func(val)
+ *
+ *       The parameter val is a float which goes from 0 to 1 as progress
+ *       is made.
+ *
+ *       For example:
+ *
+ *          static void my_progress_report(float progress);
+ *          #define STBIR_PROGRESS_REPORT(val) my_progress_report(val)
+ *
+ *          #define STB_IMAGE_RESIZE_IMPLEMENTATION
+ *
+ *          static void my_progress_report(float progress)
+ *          {
+ *             printf("Progress: %f%%\n", progress*100);
+ *          }
+ *
+ *    MAX CHANNELS
+ *       If your image has more than 64 channels, define STBIR_MAX_CHANNELS
+ *       to the max you'll have.
+ *
+ *    ALPHA CHANNEL
+ *       Most of the resizing functions provide the ability to control how
+ *       the alpha channel of an image is processed. The important things
+ *       to know about this:
+ *
+ *       1. The best mathematically-behaved version of alpha to use is
+ *       called "premultiplied alpha", in which the other color channels
+ *       have had the alpha value multiplied in. If you use premultiplied
+ *       alpha, linear filtering (such as image resampling done by this
+ *       library, or performed in texture units on GPUs) does the "right
+ *       thing". While premultiplied alpha is standard in the movie CGI
+ *       industry, it is still uncommon in the videogame/real-time world.
+ *
+ *       If you linearly filter non-premultiplied alpha, strange effects
+ *       occur. (For example, the 50/50 average of 99% transparent bright green
+ *       and 1% transparent black produces 50% transparent dark green when
+ *       non-premultiplied, whereas premultiplied it produces 50%
+ *       transparent near-black. The former introduces green energy
+ *       that doesn't exist in the source image.)
+ *
+ *       2. Artists should not edit premultiplied-alpha images; artists
+ *       want non-premultiplied alpha images. Thus, art tools generally output
+ *       non-premultiplied alpha images.
+ *
+ *       3. You will get best results in most cases by converting images
+ *       to premultiplied alpha before processing them mathematically.
+ *
+ *       4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
+ *       resizer does not do anything special for the alpha channel;
+ *       it is resampled identically to other channels. This produces
+ *       the correct results for premultiplied-alpha images, but produces
+ *       less-than-ideal results for non-premultiplied-alpha images.
+ *
+ *       5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
+ *       then the resizer weights the contribution of input pixels
+ *       based on their alpha values, or, equivalently, it multiplies
+ *       the alpha value into the color channels, resamples, then divides
+ *       by the resultant alpha value. Input pixels which have alpha=0 do
+ *       not contribute at all to output pixels unless _all_ of the input
+ *       pixels affecting that output pixel have alpha=0, in which case
+ *       the result for that pixel is the same as it would be without
+ *       STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
+ *       input images in integer formats. For input images in float format,
+ *       input pixels with alpha=0 have no effect, and output pixels
+ *       which have alpha=0 will be 0 in all channels. (For float images,
+ *       you can manually achieve the same result by adding a tiny epsilon
+ *       value to the alpha channel of every image, and then subtracting
+ *       or clamping it at the end.)
+ *
+ *       6. You can suppress the behavior described in #5 and make
+ *       all-0-alpha pixels have 0 in all channels by #defining
+ *       STBIR_NO_ALPHA_EPSILON.
+ *
+ *       7. You can separately control whether the alpha channel is
+ *       interpreted as linear or affected by the colorspace. By default
+ *       it is linear; you almost never want to apply the colorspace.
+ *       (For example, graphics hardware does not apply sRGB conversion
+ *       to the alpha channel.)
+ *
+ * CONTRIBUTORS
+ *    Jorge L Rodriguez: Implementation
+ *    Sean Barrett: API design, optimizations
+ *    Aras Pranckevicius: bugfix
+ *    Nathan Reed: warning fixes
+ *
+ * REVISIONS
+ *    0.96 (2019-03-04) fixed warnings
+ *    0.95 (2017-07-23) fixed warnings
+ *    0.94 (2017-03-18) fixed warnings
+ *    0.93 (2017-03-03) fixed bug with certain combinations of heights
+ *    0.92 (2017-01-02) fix integer overflow on large (>2GB) images
+ *    0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
+ *    0.90 (2014-09-17) first released version
+ *
+ * LICENSE
+ *   See end of file for license information.
+ *
+ * TODO
+ *    Don't decode all of the image data when only processing a partial tile
+ *    Don't use full-width decode buffers when only processing a partial tile
+ *    When processing wide images, break processing into tiles so data fits in
+ *    L1 cache Installable filters? Resize that respects alpha test coverage
+ *       (Reference code: FloatImage::alphaTestCoverage and
+ * FloatImage::scaleAlphaToCoverage:
+ *       https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp
+ * )
+ */
--- a/third_party/stb/idct-sse.S
+++ b/third_party/stb/idct-sse.S
@ -0,0 +1,427 @@
+/*-*- mode:asm; indent-tabs-mode:t; tab-width:8; coding:utf-8               -*-│
+│vi: set et ft=asm ts=8 tw=8 fenc=utf-8                                     :vi│
+╞══════════════════════════════════════════════════════════════════════════════╡
+│ Copyright 2020 Justine Alexandra Roberts Tunney                              │
+│                                                                              │
+│ This program is free software; you can redistribute it and/or modify         │
+│ it under the terms of the GNU General Public License as published by         │
+│ the Free Software Foundation; version 2 of the License.                      │
+│                                                                              │
+│ This program is distributed in the hope that it will be useful, but          │
+│ WITHOUT ANY WARRANTY; without even the implied warranty of                   │
+│ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU             │
+│ General Public License for more details.                                     │
+│                                                                              │
+│ You should have received a copy of the GNU General Public License            │
+│ along with this program; if not, write to the Free Software                  │
+│ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA                │
+│ 02110-1301 USA                                                               │
+╚─────────────────────────────────────────────────────────────────────────────*/
+#include "libc/macros.h"
+
+/	Computes inverse discrete cosine transform.
+/
+/	@note	used to decode jpeg
+	.p2align 4
+stbi__idct_simd$sse:
+	push	%rbp
+	mov	%rsp,%rbp
+	movslq	%esi,%rsi
+	lea	(%rdi,%rsi),%rax
+	sub	$96,%rsp
+	movdqa	32(%rdx),%xmm0
+	movdqa	112(%rdx),%xmm9
+	movdqa	48(%rdx),%xmm1
+	movdqa	80(%rdx),%xmm7
+	movdqa	%xmm0,%xmm2
+	punpcklwd 96(%rdx),%xmm2
+	punpckhwd 96(%rdx),%xmm0
+	movdqa	%xmm9,%xmm8
+	movdqa	16(%rdx),%xmm5
+	movdqa	%xmm2,%xmm3
+	movdqa	%xmm2,%xmm6
+	movdqa	%xmm0,%xmm2
+	pmaddwd	.LC1(%rip),%xmm3
+	movdqa	%xmm0,%xmm4
+	pmaddwd	.LC1(%rip),%xmm2
+	pmaddwd	.LC0(%rip),%xmm4
+	punpckhwd %xmm1,%xmm8
+	pmaddwd	.LC0(%rip),%xmm6
+	movaps	%xmm3,-48(%rbp)
+	movdqa	(%rdx),%xmm3
+	movaps	%xmm2,-64(%rbp)
+	movdqa	64(%rdx),%xmm2
+	movdqa	%xmm3,%xmm0
+	movaps	%xmm4,-32(%rbp)
+	paddw	%xmm2,%xmm0
+	psubw	%xmm2,%xmm3
+	movaps	%xmm6,-16(%rbp)
+	movdqa	%xmm0,%xmm4
+	pxor	%xmm0,%xmm0
+	movdqa	%xmm0,%xmm11
+	movdqa	%xmm0,%xmm12
+	movdqa	%xmm0,%xmm2
+	punpcklwd %xmm4,%xmm11
+	punpckhwd %xmm3,%xmm12
+	punpcklwd %xmm3,%xmm2
+	movdqa	%xmm11,%xmm13
+	movdqa	%xmm0,%xmm11
+	movdqa	%xmm12,%xmm3
+	punpckhwd %xmm4,%xmm11
+	movdqa	%xmm8,%xmm12
+	movdqa	%xmm8,%xmm4
+	movdqa	%xmm11,%xmm14
+	movdqa	%xmm7,%xmm8
+	movdqa	%xmm9,%xmm11
+	punpckhwd %xmm5,%xmm8
+	psrad	$4,%xmm3
+	punpcklwd %xmm1,%xmm11
+	psrad	$4,%xmm13
+	psrad	$4,%xmm14
+	movdqa	%xmm11,%xmm15
+	movaps	%xmm13,-80(%rbp)
+	movdqa	%xmm8,%xmm6
+	paddw	%xmm7,%xmm1
+	pmaddwd	.LC3(%rip),%xmm15
+	movaps	%xmm14,-96(%rbp)
+	movdqa	%xmm8,%xmm14
+	movdqa	%xmm5,%xmm8
+	pmaddwd	.LC2(%rip),%xmm11
+	pmaddwd	.LC2(%rip),%xmm12
+	paddw	%xmm9,%xmm8
+	psrad	$4,%xmm2
+	pmaddwd	.LC3(%rip),%xmm4
+	pmaddwd	.LC5(%rip),%xmm6
+	pmaddwd	.LC4(%rip),%xmm14
+	movdqa	%xmm4,%xmm10
+	movdqa	%xmm7,%xmm4
+	movdqa	%xmm8,%xmm7
+	punpcklwd %xmm5,%xmm4
+	punpcklwd %xmm1,%xmm7
+	punpckhwd %xmm1,%xmm8
+	movdqa	%xmm4,%xmm13
+	movdqa	%xmm7,%xmm9
+	pmaddwd	.LC5(%rip),%xmm4
+	pmaddwd	.LC6(%rip),%xmm9
+	movdqa	%xmm8,%xmm5
+	movdqa	%xmm7,%xmm1
+	pmaddwd	.LC7(%rip),%xmm8
+	pmaddwd	.LC6(%rip),%xmm5
+	movdqa	%xmm15,%xmm7
+	paddd	%xmm9,%xmm11
+	paddd	%xmm9,%xmm4
+	movdqa	.LC8(%rip),%xmm9
+	paddd	%xmm8,%xmm14
+	paddd	%xmm10,%xmm8
+	movdqa	-96(%rbp),%xmm10
+	paddd	-64(%rbp),%xmm10
+	pmaddwd	.LC7(%rip),%xmm1
+	pmaddwd	.LC4(%rip),%xmm13
+	paddd	%xmm5,%xmm12
+	paddd	%xmm5,%xmm6
+	paddd	%xmm9,%xmm10
+	movdqa	-80(%rbp),%xmm5
+	paddd	-48(%rbp),%xmm5
+	paddd	%xmm1,%xmm13
+	paddd	%xmm1,%xmm7
+	movdqa	%xmm10,%xmm1
+	psubd	%xmm6,%xmm10
+	paddd	%xmm9,%xmm5
+	paddd	%xmm6,%xmm1
+	psrad	$10,%xmm10
+	movdqa	-16(%rbp),%xmm6
+	movdqa	%xmm1,%xmm15
+	movdqa	%xmm5,%xmm1
+	psubd	%xmm4,%xmm5
+	psrad	$10,%xmm5
+	paddd	%xmm4,%xmm1
+	paddd	%xmm2,%xmm6
+	packssdw %xmm10,%xmm5
+	movdqa	-32(%rbp),%xmm10
+	paddd	%xmm9,%xmm6
+	paddd	%xmm9,%xmm2
+	psrad	$10,%xmm15
+	psrad	$10,%xmm1
+	psubd	-16(%rbp),%xmm2
+	paddd	%xmm3,%xmm10
+	paddd	%xmm9,%xmm3
+	packssdw %xmm15,%xmm1
+	paddd	%xmm9,%xmm10
+	psubd	-32(%rbp),%xmm3
+	movdqa	%xmm10,%xmm4
+	psubd	%xmm8,%xmm10
+	paddd	%xmm8,%xmm4
+	psrad	$10,%xmm10
+	movdqa	%xmm4,%xmm15
+	movdqa	%xmm6,%xmm4
+	psubd	%xmm7,%xmm6
+	psrad	$10,%xmm6
+	psrad	$10,%xmm15
+	paddd	%xmm7,%xmm4
+	movdqa	%xmm3,%xmm7
+	psubd	%xmm14,%xmm3
+	packssdw %xmm10,%xmm6
+	psrad	$10,%xmm3
+	psrad	$10,%xmm4
+	paddd	%xmm14,%xmm7
+	movdqa	%xmm7,%xmm8
+	movdqa	%xmm2,%xmm7
+	psubd	%xmm13,%xmm2
+	paddd	%xmm13,%xmm7
+	psrad	$10,%xmm8
+	packssdw %xmm15,%xmm4
+	psrad	$10,%xmm7
+	psrad	$10,%xmm2
+	packssdw %xmm8,%xmm7
+	movdqa	-80(%rbp),%xmm8
+	packssdw %xmm3,%xmm2
+	paddd	%xmm9,%xmm8
+	paddd	-96(%rbp),%xmm9
+	psubd	-48(%rbp),%xmm8
+	psubd	-64(%rbp),%xmm9
+	movdqa	%xmm8,%xmm3
+	movdqa	%xmm9,%xmm10
+	psubd	%xmm11,%xmm8
+	paddd	%xmm12,%xmm10
+	paddd	%xmm11,%xmm3
+	psrad	$10,%xmm8
+	psrad	$10,%xmm10
+	psrad	$10,%xmm3
+	psubd	%xmm12,%xmm9
+	psrad	$10,%xmm9
+	packssdw %xmm10,%xmm3
+	movdqa	%xmm1,%xmm10
+	packssdw %xmm9,%xmm8
+	movdqa	%xmm7,%xmm9
+	punpckhwd %xmm6,%xmm7
+	punpcklwd %xmm6,%xmm9
+	punpcklwd %xmm8,%xmm10
+	punpckhwd %xmm8,%xmm1
+	movdqa	%xmm3,%xmm6
+	movdqa	%xmm4,%xmm8
+	punpckhwd %xmm5,%xmm3
+	punpcklwd %xmm5,%xmm6
+	punpcklwd %xmm2,%xmm8
+	movdqa	%xmm3,%xmm5
+	punpckhwd %xmm2,%xmm4
+	movdqa	%xmm8,%xmm3
+	movdqa	%xmm10,%xmm2
+	punpckhwd %xmm6,%xmm8
+	punpcklwd %xmm6,%xmm3
+	punpcklwd %xmm9,%xmm2
+	movdqa	%xmm8,%xmm6
+	movdqa	%xmm4,%xmm8
+	punpckhwd %xmm9,%xmm10
+	punpcklwd %xmm5,%xmm8
+	punpckhwd %xmm5,%xmm4
+	movdqa	%xmm2,%xmm5
+	punpcklwd %xmm3,%xmm5
+	punpckhwd %xmm3,%xmm2
+	movdqa	%xmm1,%xmm15
+	movdqa	%xmm10,%xmm3
+	punpckhwd %xmm7,%xmm1
+	punpckhwd %xmm6,%xmm10
+	punpcklwd %xmm6,%xmm3
+	movdqa	%xmm1,%xmm6
+	punpckhwd %xmm4,%xmm1
+	punpcklwd %xmm4,%xmm6
+	movdqa	%xmm3,%xmm4
+	punpcklwd %xmm7,%xmm15
+	punpcklwd %xmm6,%xmm4
+	punpckhwd %xmm6,%xmm3
+	movdqa	%xmm15,%xmm7
+	movdqa	%xmm4,%xmm6
+	punpcklwd %xmm8,%xmm7
+	movdqa	%xmm3,%xmm11
+	movdqa	%xmm4,%xmm12
+	movdqa	%xmm3,%xmm4
+	movdqa	%xmm5,%xmm3
+	paddw	%xmm7,%xmm3
+	movdqa	%xmm1,%xmm9
+	punpckhwd %xmm8,%xmm15
+	punpcklwd %xmm10,%xmm9
+	psubw	%xmm7,%xmm5
+	movdqa	%xmm15,%xmm7
+	movdqa	%xmm9,%xmm14
+	punpcklwd %xmm2,%xmm7
+	movdqa	%xmm1,%xmm8
+	pmaddwd	.LC0(%rip),%xmm6
+	punpckhwd %xmm10,%xmm8
+	paddw	%xmm15,%xmm10
+	movaps	%xmm6,-16(%rbp)
+	pmaddwd	.LC1(%rip),%xmm4
+	movdqa	%xmm0,%xmm6
+	pmaddwd	.LC0(%rip),%xmm11
+	pmaddwd	.LC2(%rip),%xmm14
+	pmaddwd	.LC1(%rip),%xmm12
+	pmaddwd	.LC3(%rip),%xmm9
+	movaps	%xmm4,-64(%rbp)
+	movdqa	%xmm3,%xmm4
+	movdqa	%xmm0,%xmm3
+	punpckhwd %xmm4,%xmm6
+	punpcklwd %xmm4,%xmm3
+	movdqa	%xmm0,%xmm4
+	movaps	%xmm11,-32(%rbp)
+	movdqa	%xmm6,%xmm13
+	movdqa	%xmm15,%xmm6
+	punpcklwd %xmm5,%xmm4
+	movaps	%xmm12,-48(%rbp)
+	punpckhwd %xmm2,%xmm6
+	paddw	%xmm1,%xmm2
+	punpckhwd %xmm5,%xmm0
+	movdqa	%xmm14,%xmm11
+	movdqa	%xmm2,%xmm5
+	movdqa	%xmm7,%xmm14
+	punpckhwd %xmm10,%xmm2
+	psrad	$4,%xmm13
+	punpcklwd %xmm10,%xmm5
+	movaps	%xmm13,-80(%rbp)
+	movdqa	%xmm8,%xmm12
+	movdqa	%xmm5,%xmm10
+	pmaddwd	.LC4(%rip),%xmm14
+	pmaddwd	.LC6(%rip),%xmm10
+	movdqa	%xmm2,%xmm15
+	pmaddwd	.LC7(%rip),%xmm5
+	pmaddwd	.LC3(%rip),%xmm8
+	pmaddwd	.LC5(%rip),%xmm7
+	movdqa	%xmm14,%xmm13
+	movdqa	%xmm6,%xmm14
+	paddd	%xmm5,%xmm13
+	paddd	%xmm5,%xmm9
+	pmaddwd	.LC5(%rip),%xmm6
+	psrad	$4,%xmm3
+	pmaddwd	.LC6(%rip),%xmm15
+	paddd	%xmm10,%xmm7
+	paddd	%xmm10,%xmm11
+	psrad	$4,%xmm4
+	pmaddwd	.LC2(%rip),%xmm12
+	psrad	$4,%xmm0
+	pmaddwd	.LC4(%rip),%xmm14
+	pmaddwd	.LC7(%rip),%xmm2
+	movdqa	-80(%rbp),%xmm5
+	paddd	%xmm15,%xmm12
+	paddd	-64(%rbp),%xmm5
+	paddd	%xmm2,%xmm14
+	paddd	%xmm8,%xmm2
+	movdqa	-48(%rbp),%xmm8
+	paddd	%xmm6,%xmm15
+	movdqa	.LC9(%rip),%xmm6
+	paddd	%xmm3,%xmm8
+	paddd	%xmm6,%xmm8
+	paddd	%xmm6,%xmm5
+	movdqa	%xmm5,%xmm10
+	movdqa	%xmm8,%xmm1
+	psubd	%xmm15,%xmm5
+	psubd	%xmm7,%xmm8
+	psrad	$17,%xmm5
+	paddd	%xmm7,%xmm1
+	movdqa	-32(%rbp),%xmm7
+	psrad	$17,%xmm8
+	paddd	%xmm15,%xmm10
+	paddd	%xmm6,%xmm3
+	packssdw %xmm5,%xmm8
+	movdqa	-16(%rbp),%xmm5
+	paddd	%xmm0,%xmm7
+	paddd	%xmm6,%xmm0
+	paddd	%xmm6,%xmm7
+	psrad	$17,%xmm10
+	psubd	-32(%rbp),%xmm0
+	paddd	%xmm4,%xmm5
+	psrad	$17,%xmm1
+	movdqa	%xmm7,%xmm15
+	paddd	%xmm6,%xmm5
+	packssdw %xmm10,%xmm1
+	psubd	%xmm2,%xmm7
+	movdqa	%xmm5,%xmm10
+	paddd	%xmm6,%xmm4
+	psubd	%xmm9,%xmm5
+	psubd	-16(%rbp),%xmm4
+	psrad	$17,%xmm7
+	paddd	%xmm2,%xmm15
+	psrad	$17,%xmm5
+	psubd	-48(%rbp),%xmm3
+	paddd	-80(%rbp),%xmm6
+	packssdw %xmm7,%xmm5
+	movdqa	%xmm4,%xmm2
+	movdqa	%xmm0,%xmm7
+	psubd	-64(%rbp),%xmm6
+	paddd	%xmm14,%xmm7
+	psrad	$17,%xmm15
+	paddd	%xmm13,%xmm2
+	psubd	%xmm14,%xmm0
+	psrad	$17,%xmm7
+	psubd	%xmm13,%xmm4
+	psrad	$17,%xmm0
+	paddd	%xmm9,%xmm10
+	psrad	$17,%xmm2
+	psrad	$17,%xmm4
+	packuswb %xmm8,%xmm5
+	packssdw %xmm0,%xmm4
+	packssdw %xmm7,%xmm2
+	movdqa	%xmm3,%xmm0
+	movdqa	%xmm6,%xmm7
+	psrad	$17,%xmm10
+	paddd	%xmm11,%xmm0
+	paddd	%xmm12,%xmm7
+	psubd	%xmm12,%xmm6
+	packssdw %xmm15,%xmm10
+	psubd	%xmm11,%xmm3
+	psrad	$17,%xmm7
+	packuswb %xmm10,%xmm1
+	psrad	$17,%xmm0
+	psrad	$17,%xmm6
+	psrad	$17,%xmm3
+	packssdw %xmm7,%xmm0
+	packssdw %xmm6,%xmm3
+	packuswb %xmm0,%xmm2
+	movdqa	%xmm1,%xmm0
+	packuswb %xmm4,%xmm3
+	movdqa	%xmm2,%xmm4
+	punpckhbw %xmm5,%xmm2
+	punpcklbw %xmm3,%xmm0
+	punpcklbw %xmm5,%xmm4
+	punpckhbw %xmm3,%xmm1
+	movdqa	%xmm2,%xmm3
+	movdqa	%xmm0,%xmm2
+	movdqa	%xmm1,%xmm5
+	punpcklbw %xmm4,%xmm2
+	punpckhbw %xmm4,%xmm0
+	punpcklbw %xmm3,%xmm5
+	movdqa	%xmm2,%xmm4
+	punpckhbw %xmm5,%xmm2
+	punpckhbw %xmm3,%xmm1
+	punpcklbw %xmm5,%xmm4
+	movdqa	%xmm0,%xmm3
+	punpckhbw %xmm1,%xmm0
+	movq	%xmm4,(%rdi)
+	pshufd	$78,%xmm4,%xmm4
+	punpcklbw %xmm1,%xmm3
+	movq	%xmm4,(%rax)
+	add	%rsi,%rax
+	movq	%xmm2,(%rax)
+	add	%rsi,%rax
+	pshufd	$78,%xmm2,%xmm2
+	movq	%xmm2,(%rax)
+	add	%rsi,%rax
+	movq	%xmm3,(%rax)
+	add	%rsi,%rax
+	pshufd	$78,%xmm3,%xmm3
+	movq	%xmm3,(%rax)
+	movq	%xmm0,(%rax,%rsi)
+	pshufd	$78,%xmm0,%xmm0
+	movq	%xmm0,(%rax,%rsi,2)
+	leave
+	ret
+	.endfn	stbi__idct_simd$sse,globl
+
+	.rodata.cst16
+.LC0:	.value	2217,-5350,2217,-5350,2217,-5350,2217,-5350
+.LC1:	.value	5352,2217,5352,2217,5352,2217,5352,2217
+.LC2:	.value	-6811,-8034,-6811,-8034,-6811,-8034,-6811,-8034
+.LC3:	.value	-8034,4552,-8034,4552,-8034,4552,-8034,4552
+.LC4:	.value	6813,-1597,6813,-1597,6813,-1597,6813,-1597
+.LC5:	.value	-1597,4552,-1597,4552,-1597,4552,-1597,4552
+.LC6:	.value	1131,4816,1131,4816,1131,4816,1131,4816
+.LC7:	.value	4816,-5681,4816,-5681,4816,-5681,4816,-5681
+.LC8:	.long	0x200,0x200,0x200,0x200
+.LC9:	.long	0x1010000,0x1010000,0x1010000,0x1010000
--- a/third_party/stb/internal.h
+++ b/third_party/stb/internal.h
@ -0,0 +1,19 @@
+#ifndef COSMOPOLITAN_THIRD_PARTY_STB_INTERNAL_H_
+#define COSMOPOLITAN_THIRD_PARTY_STB_INTERNAL_H_
+#if !(__ASSEMBLER__ + __LINKER__ + 0)
+COSMOPOLITAN_C_START_
+
+void stbi__YCbCr_to_RGB_row(unsigned char *, const unsigned char *,
+                            const unsigned char *, const unsigned char *,
+                            unsigned, unsigned) hidden;
+int stbi__YCbCr_to_RGB_row$sse2(unsigned char *, const unsigned char *,
+                                const unsigned char *, const unsigned char *,
+                                unsigned) hidden;
+void stbi__idct_simd$sse(unsigned char *out, int out_stride,
+                         short data[64]) hidden;
+void stbi__idct_simd$avx(unsigned char *out, int out_stride,
+                         short data[64]) hidden;
+
+COSMOPOLITAN_C_END_
+#endif /* !(__ASSEMBLER__ + __LINKER__ + 0) */
+#endif /* COSMOPOLITAN_THIRD_PARTY_STB_INTERNAL_H_ */
--- a/third_party/stb/stb.mk
+++ b/third_party/stb/stb.mk
@ -0,0 +1,82 @@
+#-*-mode:makefile-gmake;indent-tabs-mode:t;tab-width:8;coding:utf-8-*-┐
+#───vi: set et ft=make ts=8 tw=8 fenc=utf-8 :vi───────────────────────┘
+
+PKGS += THIRD_PARTY_STB
+
+THIRD_PARTY_STB_SRCS = $(THIRD_PARTY_STB_A_SRCS)
+THIRD_PARTY_STB_HDRS = $(THIRD_PARTY_STB_A_HDRS)
+
+THIRD_PARTY_STB_ARTIFACTS += THIRD_PARTY_STB_A
+THIRD_PARTY_STB = $(THIRD_PARTY_STB_A_DEPS) $(THIRD_PARTY_STB_A)
+THIRD_PARTY_STB_A = o/$(MODE)/third_party/stb/stb.a
+THIRD_PARTY_STB_A_FILES := $(wildcard third_party/stb/*)
+THIRD_PARTY_STB_A_HDRS = $(filter %.h,$(THIRD_PARTY_STB_A_FILES))
+THIRD_PARTY_STB_A_SRCS_S = $(filter %.S,$(THIRD_PARTY_STB_A_FILES))
+THIRD_PARTY_STB_A_SRCS_C = $(filter %.c,$(THIRD_PARTY_STB_A_FILES))
+THIRD_PARTY_STB_A_OBJS_S = $(THIRD_PARTY_STB_A_SRCS_S:%.S=o/$(MODE)/%.o)
+THIRD_PARTY_STB_A_OBJS_C = $(THIRD_PARTY_STB_A_SRCS_C:%.c=o/$(MODE)/%.o)
+
+THIRD_PARTY_STB_A_SRCS =				\
+	$(THIRD_PARTY_STB_A_SRCS_S)			\
+	$(THIRD_PARTY_STB_A_SRCS_C)
+
+THIRD_PARTY_STB_A_OBJS =				\
+	$(THIRD_PARTY_STB_A_SRCS:%=o/$(MODE)/%.zip.o)	\
+	$(THIRD_PARTY_STB_A_OBJS_S)			\
+	$(THIRD_PARTY_STB_A_OBJS_C)
+
+THIRD_PARTY_STB_A_DIRECTDEPS =				\
+	DSP_CORE					\
+	LIBC_ALG					\
+	LIBC_FMT					\
+	LIBC_STDIO					\
+	LIBC_BITS					\
+	LIBC_CONV					\
+	LIBC_MEM					\
+	LIBC_NEXGEN32E					\
+	LIBC_TINYMATH					\
+	LIBC_RUNTIME					\
+	LIBC_STR					\
+	LIBC_LOG					\
+	LIBC_X						\
+	LIBC_STUBS					\
+	THIRD_PARTY_ZLIB
+
+THIRD_PARTY_STB_A_DEPS :=				\
+	$(call uniq,$(foreach x,$(THIRD_PARTY_STB_A_DIRECTDEPS),$($(x))))
+
+THIRD_PARTY_STB_A_CHECKS =				\
+	$(THIRD_PARTY_STB_A).pkg			\
+	$(THIRD_PARTY_STB_A_HDRS:%=o/$(MODE)/%.ok)
+
+$(THIRD_PARTY_STB_A):					\
+		third_party/stb/			\
+		$(THIRD_PARTY_STB_A).pkg		\
+		$(THIRD_PARTY_STB_A_OBJS)
+
+$(THIRD_PARTY_STB_A).pkg:				\
+		$(THIRD_PARTY_STB_A_OBJS)		\
+		$(foreach x,$(THIRD_PARTY_STB_A_DIRECTDEPS),$($(x)_A).pkg)
+
+$(THIRD_PARTY_STB_A_OBJS):				\
+		OVERRIDE_CFLAGS +=			\
+			-ffunction-sections		\
+			-fdata-sections
+
+o/$(MODE)/third_party/stb/stb_image_write.o		\
+o/$(MODE)/third_party/stb/stb_image.o:			\
+		OVERRIDE_CFLAGS +=			\
+			-ftrapv
+
+$(THIRD_PARTY_STB_A_OBJS):				\
+		OVERRIDE_CPPFLAGS +=			\
+			-DSTACK_FRAME_UNLIMITED
+
+THIRD_PARTY_STB_LIBS = $(foreach x,$(THIRD_PARTY_STB_ARTIFACTS),$($(x)))
+THIRD_PARTY_STB_SRCS = $(foreach x,$(THIRD_PARTY_STB_ARTIFACTS),$($(x)_SRCS))
+THIRD_PARTY_STB_CHECKS = $(foreach x,$(THIRD_PARTY_STB_ARTIFACTS),$($(x)_CHECKS))
+THIRD_PARTY_STB_OBJS = $(foreach x,$(THIRD_PARTY_STB_ARTIFACTS),$($(x)_OBJS))
+$(THIRD_PARTY_STB_OBJS): $(BUILD_FILES) third_party/stb/stb.mk
+
+.PHONY: o/$(MODE)/third_party/stb
+o/$(MODE)/third_party/stb: $(THIRD_PARTY_STB_CHECKS)
--- a/third_party/stb/stb_image.c
+++ b/third_party/stb/stb_image.c
--- a/third_party/stb/stb_image.h
+++ b/third_party/stb/stb_image.h
@ -0,0 +1,118 @@
+#ifndef COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_H_
+#define COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_H_
+#if !(__ASSEMBLER__ + __LINKER__ + 0)
+COSMOPOLITAN_C_START_
+
+enum {
+  STBI_default = 0,  // only used for desired_channels
+  STBI_grey = 1,
+  STBI_grey_alpha = 2,
+  STBI_rgb = 3,
+  STBI_rgb_alpha = 4
+};
+
+struct FILE;
+
+typedef struct {
+  int (*read)(void *user, char *data,
+              int size);  // fill 'data' with 'size' bytes.  return number of
+                          // bytes actually read
+  void (*skip)(void *user, int n);  // skip the next 'n' bytes, or 'unget' the
+                                    // last -n bytes if negative
+  int (*eof)(void *user);  // returns nonzero if we are at end of file/data
+} stbi_io_callbacks;
+
+//
+// 8-bits-per-channel interface
+//
+
+unsigned char *stbi_load_from_memory(unsigned char const *buffer, int len,
+                                     int *x, int *y, int *channels_in_file,
+                                     int desired_channels) mallocesque;
+unsigned char *stbi_load_from_callbacks(stbi_io_callbacks const *clbk,
+                                        void *user, int *x, int *y,
+                                        int *channels_in_file,
+                                        int desired_channels);
+
+unsigned char *stbi_load(char const *filename, int *x, int *y,
+                         int *channels_in_file, int desired_channels);
+unsigned char *stbi_load_from_file(struct FILE *f, int *x, int *y,
+                                   int *channels_in_file, int desired_channels);
+// for stbi_load_from_file, file pointer is left pointing immediately after
+// image
+
+unsigned char *stbi_load_gif_from_memory(unsigned char const *buffer, int len,
+                                         int **delays, int *x, int *y, int *z,
+                                         int *comp, int req_comp);
+
+//
+// 16-bits-per-channel interface
+//
+
+unsigned short *stbi_load_16_from_memory(unsigned char const *buffer, int len,
+                                         int *x, int *y, int *channels_in_file,
+                                         int desired_channels);
+unsigned short *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk,
+                                            void *user, int *x, int *y,
+                                            int *channels_in_file,
+                                            int desired_channels);
+
+unsigned short *stbi_load_16(char const *filename, int *x, int *y,
+                             int *channels_in_file, int desired_channels);
+unsigned short *stbi_load_from_file_16(struct FILE *f, int *x, int *y,
+                                       int *channels_in_file,
+                                       int desired_channels);
+
+// get a VERY brief reason for failure
+// NOT THREADSAFE
+const char *stbi_failure_reason(void);
+
+// free the loaded image -- this is just free()
+void stbi_image_free(void *retval_from_stbi_load);
+
+// get image dimensions & components without fully decoding
+int stbi_info_from_memory(unsigned char const *buffer, int len, int *x, int *y,
+                          int *comp);
+int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
+                             int *y, int *comp);
+int stbi_is_16_bit_from_memory(unsigned char const *buffer, int len);
+int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+
+int stbi_info(char const *filename, int *x, int *y, int *comp);
+int stbi_info_from_file(struct FILE *f, int *x, int *y, int *comp);
+int stbi_is_16_bit(char const *filename);
+int stbi_is_16_bit_from_file(struct FILE *f);
+
+// for image formats that explicitly notate that they have premultiplied alpha,
+// we just return the colors as stored in the file. set this flag to force
+// unpremultiplication. results are undefined if the unpremultiply overflow.
+void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
+
+// indicate whether we should process iphone images back to canonical format,
+// or just pass them through "as-is"
+void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
+
+// flip the image vertically, so the first pixel in the output array is the
+// bottom left
+void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
+
+// ZLIB client - used by PNG, available for other purposes
+
+char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len,
+                                        int initial_size, int *outlen);
+char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len,
+                                                   int initial_size,
+                                                   int *outlen,
+                                                   int parse_header);
+char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
+int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer,
+                            int ilen);
+
+char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len,
+                                       int *outlen);
+int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen,
+                                     const char *ibuffer, int ilen);
+
+COSMOPOLITAN_C_END_
+#endif /* !(__ASSEMBLER__ + __LINKER__ + 0) */
+#endif /* COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_H_ */
--- a/third_party/stb/stb_image_resize.c
+++ b/third_party/stb/stb_image_resize.c
--- a/third_party/stb/stb_image_resize.h
+++ b/third_party/stb/stb_image_resize.h
@ -0,0 +1,192 @@
+#ifndef COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_RESIZE_H_
+#define COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_RESIZE_H_
+#if !(__ASSEMBLER__ + __LINKER__ + 0)
+COSMOPOLITAN_C_START_
+
+// Easy-to-use API:
+//
+//     * "input pixels" points to an array of image data with 'num_channels'
+//     channels (e.g. RGB=3, RGBA=4)
+//     * input_w is input image width (x-axis), input_h is input image height
+//     (y-axis)
+//     * stride is the offset between successive rows of image data in memory,
+//     in bytes. you can
+//       specify 0 to mean packed continuously in memory
+//     * alpha channel is treated identically to other channels.
+//     * colorspace is linear or sRGB as specified by function name
+//     * returned result is 1 for success or 0 in case of an error.
+//       #define STBIR_ASSERT() to trigger an assert on parameter validation
+//       errors.
+//     * Memory required grows approximately linearly with input and output
+//     size, but with
+//       discontinuities at input_w == output_w and input_h == output_h.
+//     * These functions use a "default" resampling filter defined at compile
+//     time. To change the filter,
+//       you can change the compile-time defaults by #defining
+//       STBIR_DEFAULT_FILTER_UPSAMPLE and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or
+//       you can use the medium-complexity API.
+
+int stbir_resize_uint8(const unsigned char* input_pixels, int input_w,
+                       int input_h, int input_stride_in_bytes,
+                       unsigned char* output_pixels, int output_w, int output_h,
+                       int output_stride_in_bytes, int num_channels);
+
+int stbir_resize_float(const float* input_pixels, int input_w, int input_h,
+                       int input_stride_in_bytes, float* output_pixels,
+                       int output_w, int output_h, int output_stride_in_bytes,
+                       int num_channels);
+
+// The following functions interpret image data as gamma-corrected sRGB.
+// Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
+// or otherwise provide the index of the alpha channel. Flags value
+// of 0 will probably do the right thing if you're not sure what
+// the flags mean.
+
+#define STBIR_ALPHA_CHANNEL_NONE -1
+
+// Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
+// use alpha-weighted resampling (effectively premultiplying, resampling,
+// then unpremultiplying).
+#define STBIR_FLAG_ALPHA_PREMULTIPLIED (1 << 0)
+// The specified alpha channel should be handled as gamma-corrected value even
+// when doing sRGB operations.
+#define STBIR_FLAG_ALPHA_USES_COLORSPACE (1 << 1)
+
+int stbir_resize_uint8_srgb(const unsigned char* input_pixels, int input_w,
+                            int input_h, int input_stride_in_bytes,
+                            unsigned char* output_pixels, int output_w,
+                            int output_h, int output_stride_in_bytes,
+                            int num_channels, int alpha_channel, int flags);
+
+typedef enum {
+  STBIR_EDGE_CLAMP = 1,
+  STBIR_EDGE_REFLECT = 2,
+  STBIR_EDGE_WRAP = 3,
+  STBIR_EDGE_ZERO = 4,
+} stbir_edge;
+
+// This function adds the ability to specify how requests to sample off the edge
+// of the image are handled.
+int stbir_resize_uint8_srgb_edgemode(const unsigned char* input_pixels,
+                                     int input_w, int input_h,
+                                     int input_stride_in_bytes,
+                                     unsigned char* output_pixels, int output_w,
+                                     int output_h, int output_stride_in_bytes,
+                                     int num_channels, int alpha_channel,
+                                     int flags, stbir_edge edge_wrap_mode);
+
+// Medium-complexity API
+//
+// This extends the easy-to-use API as follows:
+//
+//     * Alpha-channel can be processed separately
+//       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
+//         * Alpha channel will not be gamma corrected (unless
+//         flags&STBIR_FLAG_GAMMA_CORRECT)
+//         * Filters will be weighted by alpha channel (unless
+//         flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
+//     * Filter can be selected explicitly
+//     * uint16 image type
+//     * sRGB colorspace available for all types
+//     * context parameter for passing to STBIR_MALLOC
+
+typedef enum {
+  // use same filter type that easy-to-use API chooses
+  STBIR_FILTER_DEFAULT = 0,
+  // a trapezoid w/1-pixel wide ramps, same result as box for integer
+  // scale ratios
+  STBIR_FILTER_BOX = 1,
+  // On upsampling, produces same results as bilinear texture filtering
+  STBIR_FILTER_TRIANGLE = 2,
+  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
+  STBIR_FILTER_CUBICBSPLINE = 3,
+  // An interpolating cubic spline
+  STBIR_FILTER_CATMULLROM = 4,
+  // Mitchell-Netrevalli filter with B=1/3, C=1/3
+  STBIR_FILTER_MITCHELL = 5,
+} stbir_filter;
+
+typedef enum {
+  STBIR_COLORSPACE_LINEAR,
+  STBIR_COLORSPACE_SRGB,
+  STBIR_MAX_COLORSPACES,
+} stbir_colorspace;
+
+// The following functions are all identical except for the type of the image
+// data
+
+int stbir_resize_uint8_generic(const unsigned char* input_pixels, int input_w,
+                               int input_h, int input_stride_in_bytes,
+                               unsigned char* output_pixels, int output_w,
+                               int output_h, int output_stride_in_bytes,
+                               int num_channels, int alpha_channel, int flags,
+                               stbir_edge edge_wrap_mode, stbir_filter filter,
+                               stbir_colorspace space, void* alloc_context);
+
+int stbir_resize_uint16_generic(const uint16_t* input_pixels, int input_w,
+                                int input_h, int input_stride_in_bytes,
+                                uint16_t* output_pixels, int output_w,
+                                int output_h, int output_stride_in_bytes,
+                                int num_channels, int alpha_channel, int flags,
+                                stbir_edge edge_wrap_mode, stbir_filter filter,
+                                stbir_colorspace space, void* alloc_context);
+
+int stbir_resize_float_generic(const float* input_pixels, int input_w,
+                               int input_h, int input_stride_in_bytes,
+                               float* output_pixels, int output_w, int output_h,
+                               int output_stride_in_bytes, int num_channels,
+                               int alpha_channel, int flags,
+                               stbir_edge edge_wrap_mode, stbir_filter filter,
+                               stbir_colorspace space, void* alloc_context);
+
+// Full-complexity API
+//
+// This extends the medium API as follows:
+//
+//     * uint32 image type
+//     * not typesafe
+//     * separate filter types for each axis
+//     * separate edge modes for each axis
+//     * can specify scale explicitly for subpixel correctness
+//     * can specify image source tile using texture coordinates
+
+typedef enum {
+  STBIR_TYPE_UINT8,
+  STBIR_TYPE_UINT16,
+  STBIR_TYPE_UINT32,
+  STBIR_TYPE_FLOAT,
+  STBIR_MAX_TYPES
+} stbir_datatype;
+
+int stbir_resize(const void* input_pixels, int input_w, int input_h,
+                 int input_stride_in_bytes, void* output_pixels, int output_w,
+                 int output_h, int output_stride_in_bytes,
+                 stbir_datatype datatype, int num_channels, int alpha_channel,
+                 int flags, stbir_edge edge_mode_horizontal,
+                 stbir_edge edge_mode_vertical, stbir_filter filter_horizontal,
+                 stbir_filter filter_vertical, stbir_colorspace space,
+                 void* alloc_context);
+
+int stbir_resize_subpixel(
+    const void* input_pixels, int input_w, int input_h,
+    int input_stride_in_bytes, void* output_pixels, int output_w, int output_h,
+    int output_stride_in_bytes, stbir_datatype datatype, int num_channels,
+    int alpha_channel, int flags, stbir_edge edge_mode_horizontal,
+    stbir_edge edge_mode_vertical, stbir_filter filter_horizontal,
+    stbir_filter filter_vertical, stbir_colorspace space, void* alloc_context,
+    float x_scale, float y_scale, float x_offset, float y_offset);
+
+int stbir_resize_region(
+    const void* input_pixels, int input_w, int input_h,
+    int input_stride_in_bytes, void* output_pixels, int output_w, int output_h,
+    int output_stride_in_bytes, stbir_datatype datatype, int num_channels,
+    int alpha_channel, int flags, stbir_edge edge_mode_horizontal,
+    stbir_edge edge_mode_vertical, stbir_filter filter_horizontal,
+    stbir_filter filter_vertical, stbir_colorspace space, void* alloc_context,
+    float s0, float t0, float s1, float t1);
+// (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing
+// style: [0, 1]x[0, 1]) of a region of the input image to use.
+
+COSMOPOLITAN_C_END_
+#endif /* !(__ASSEMBLER__ + __LINKER__ + 0) */
+#endif /* COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_RESIZE_H_ */
--- a/third_party/stb/stb_image_write.c
+++ b/third_party/stb/stb_image_write.c
--- a/third_party/stb/stb_image_write.h
+++ b/third_party/stb/stb_image_write.h
@ -0,0 +1,34 @@
+#ifndef COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_WRITE_H_
+#define COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_WRITE_H_
+#if !(__ASSEMBLER__ + __LINKER__ + 0)
+COSMOPOLITAN_C_START_
+
+extern int stbi_write_png_compression_level;
+extern int stbi__flip_vertically_on_write;
+extern int stbi_write_tga_with_rle;
+extern int stbi_write_force_png_filter;
+
+int stbi_write_png(const char *, int, int, int, const void *, int);
+int stbi_write_bmp(const char *, int, int, int, const void *);
+int stbi_write_tga(const char *, int, int, int, const void *);
+int stbi_write_hdr(const char *, int, int, int, const float *);
+int stbi_write_jpg(const char *, int, int, int, const void *, int);
+
+typedef void stbi_write_func(void *, void *, int);
+
+int stbi_write_png_to_func(stbi_write_func *, void *, int, int, int,
+                           const void *, int);
+int stbi_write_bmp_to_func(stbi_write_func *, void *, int, int, int,
+                           const void *);
+int stbi_write_tga_to_func(stbi_write_func *, void *, int, int, int,
+                           const void *);
+int stbi_write_hdr_to_func(stbi_write_func *, void *, int, int, int,
+                           const float *);
+int stbi_write_jpg_to_func(stbi_write_func *, void *, int, int, int,
+                           const void *, int);
+
+void stbi_flip_vertically_on_write(int);
+
+COSMOPOLITAN_C_END_
+#endif /* !(__ASSEMBLER__ + __LINKER__ + 0) */
+#endif /* COSMOPOLITAN_THIRD_PARTY_STB_STB_IMAGE_WRITE_H_ */
--- a/third_party/stb/stb_image_write_png.c
+++ b/third_party/stb/stb_image_write_png.c
@ -0,0 +1,378 @@
+/* stb_image_write - v1.13 - public domain - http://nothings.org/stb
+ * writes out PNG/BMP/TGA/JPEG/HDR images to C stdio - Sean Barrett 2010-2015
+ *                                  no warranty implied; use at your own risk
+ *
+ * ABOUT:
+ *
+ *    This file is a library for writing images to stdio or a callback.
+ *
+ *    The PNG output is not optimal; it is 20-50% larger than the file
+ *    written by a decent optimizing implementation; though providing a
+ *    custom zlib compress function (see STBIW_ZLIB_COMPRESS) can
+ *    mitigate that. This library is designed for source code
+ *    compactness and simplicity, not optimal image file size or
+ *    run-time performance.
+ *
+ * USAGE:
+ *
+ *    There are five functions, one for each image file format:
+ *
+ *      stbi_write_png
+ *      stbi_write_bmp
+ *      stbi_write_tga
+ *      stbi_write_jpg
+ *      stbi_write_hdr
+ *
+ *      stbi_flip_vertically_on_write
+ *
+ *    There are also five equivalent functions that use an arbitrary
+ *    write function. You are expected to open/close your
+ *    file-equivalent before and after calling these:
+ *
+ *      stbi_write_png_to_func
+ *      stbi_write_bmp_to_func
+ *      stbi_write_tga_to_func
+ *      stbi_write_hdr_to_func
+ *      stbi_write_jpg_to_func
+ *
+ *    where the callback is:
+ *       void stbi_write_func(void *context, void *data, int size);
+ *
+ *    You can configure it with these:
+ *       stbi_write_tga_with_rle
+ *       stbi_write_png_compression_level
+ *       stbi_write_force_png_filter
+ *
+ *    Each function returns 0 on failure and non-0 on success.
+ *
+ *    The functions create an image file defined by the parameters. The
+ *    image is a rectangle of pixels stored from left-to-right,
+ *    top-to-bottom. Each pixel contains 'comp' channels of data stored
+ *    interleaved with 8-bits per channel, in the following order: 1=Y,
+ *    2=YA, 3=RGB, 4=RGBA. (Y is monochrome color.) The rectangle is 'w'
+ *    pixels wide and 'h' pixels tall. The *data pointer points to the
+ *    first byte of the top-left-most pixel. For PNG, "stride_in_bytes"
+ *    is the distance in bytes from the first byte of a row of pixels to
+ *    the first byte of the next row of pixels.
+ *
+ *    PNG creates output files with the same number of components as the
+ *    input. The BMP format expands Y to RGB in the file format and does
+ *    not output alpha.
+ *
+ *    PNG supports writing rectangles of data even when the bytes
+ *    storing rows of data are not consecutive in memory (e.g.
+ *    sub-rectangles of a larger image), by supplying the stride between
+ *    the beginning of adjacent rows. The other formats do not. (Thus
+ *    you cannot write a native-format BMP through the BMP writer, both
+ *    because it is in BGR order and because it may have padding at the
+ *    end of the line.)
+ *
+ *    PNG allows you to set the deflate compression level by setting the
+ *    global variable 'stbi_write_png_compression_level' (it defaults to
+ *    8).
+ *
+ *    HDR expects linear float data. Since the format is always 32-bit
+ *    rgb(e) data, alpha (if provided) is discarded, and for monochrome
+ *    data it is replicated across all three channels.
+ *
+ *    TGA supports RLE or non-RLE compressed data. To use
+ *    non-RLE-compressed data, set the global variable
+ *    'stbi_write_tga_with_rle' to 0.
+ *
+ *    JPEG does ignore alpha channels in input data; quality is between
+ *    1 and 100. Higher quality looks better but results in a bigger
+ *    image. JPEG baseline (no JPEG progressive).
+ *
+ * CREDITS:
+ *
+ *
+ *    Sean Barrett           -    PNG/BMP/TGA
+ *    Baldur Karlsson        -    HDR
+ *    Jean-Sebastien Guay    -    TGA monochrome
+ *    Tim Kelsey             -    misc enhancements
+ *    Alan Hickman           -    TGA RLE
+ *    Emmanuel Julien        -    initial file IO callback implementation
+ *    Jon Olick              -    original jo_jpeg.cpp code
+ *    Daniel Gibson          -    integrate JPEG, allow external zlib
+ *    Aarni Koskela          -    allow choosing PNG filter
+ *
+ *    bugfixes:
+ *       github:Chribba
+ *       Guillaume Chereau
+ *       github:jry2
+ *       github:romigrou
+ *       Sergio Gonzalez
+ *       Jonas Karlsson
+ *       Filip Wasil
+ *       Thatcher Ulrich
+ *       github:poppolopoppo
+ *       Patrick Boettcher
+ *       github:xeekworx
+ *       Cap Petschulat
+ *       Simon Rodriguez
+ *       Ivan Tikhonov
+ *       github:ignotion
+ *       Adam Schackart
+ *
+ * LICENSE
+ *
+ *   Public Domain (www.unlicense.org)
+ */
+#include "libc/assert.h"
+#include "libc/conv/conv.h"
+#include "libc/limits.h"
+#include "libc/mem/mem.h"
+#include "libc/stdio/stdio.h"
+#include "libc/str/str.h"
+#include "third_party/stb/stb_image_write.h"
+#include "third_party/zlib/zlib.h"
+
+#define STBIW_UCHAR(x) (unsigned char)((x)&0xff)
+#define stbiw__wpng4(o, a, b, c, d)                                           \
+  ((o)[0] = STBIW_UCHAR(a), (o)[1] = STBIW_UCHAR(b), (o)[2] = STBIW_UCHAR(c), \
+   (o)[3] = STBIW_UCHAR(d), (o) += 4)
+#define stbiw__wp32(data, v) \
+  stbiw__wpng4(data, (v) >> 24, (v) >> 16, (v) >> 8, (v));
+#define stbiw__wptag(data, s) stbiw__wpng4(data, s[0], s[1], s[2], s[3])
+
+int stbi_write_png_compression_level = 4;
+int stbi_write_force_png_filter = -1;
+
+static unsigned char *stbi_zlib_compress(unsigned char *data, int size,
+                                         int *out_len, int quality) {
+  unsigned long newsize;
+  unsigned char *newdata, *trimdata;
+  assert(0 <= size && size <= INT_MAX);
+  if ((newdata = malloc((newsize = compressBound(size)))) &&
+      compress2(newdata, &newsize, data, size,
+                stbi_write_png_compression_level) == Z_OK) {
+    *out_len = newsize;
+    if ((trimdata = realloc(newdata, newsize))) {
+      return trimdata;
+    } else {
+      return newdata;
+    }
+  }
+  free(newdata);
+  return NULL;
+}
+
+static void stbiw__wpcrc(unsigned char **data, int len) {
+  unsigned int crc = crc32(0, *data - len - 4, len + 4);
+  stbiw__wp32(*data, crc);
+}
+
+forceinline unsigned char stbiw__paeth(int a, int b, int c) {
+  int p = a + b - c, pa = abs(p - a), pb = abs(p - b), pc = abs(p - c);
+  if (pa <= pb && pa <= pc) return STBIW_UCHAR(a);
+  if (pb <= pc) return STBIW_UCHAR(b);
+  return STBIW_UCHAR(c);
+}
+
+// @OPTIMIZE: provide an option that always forces left-predict or paeth predict
+static void stbiw__encode_png_line(unsigned char *pixels, int stride_bytes,
+                                   int width, int height, int y, int n,
+                                   int filter_type, signed char *line_buffer) {
+  const int mapping[] = {0, 1, 2, 3, 4};
+  const int firstmap[] = {0, 1, 0, 5, 6};
+  unsigned char *z;
+  int *mymap, i, type, signed_stride;
+
+  mymap = (y != 0) ? mapping : firstmap;
+  type = mymap[filter_type];
+  z = pixels +
+      stride_bytes * (stbi__flip_vertically_on_write ? height - 1 - y : y);
+  signed_stride = stbi__flip_vertically_on_write ? -stride_bytes : stride_bytes;
+
+  if (type == 0) {
+    memcpy(line_buffer, z, width * n);
+    return;
+  }
+
+  for (i = 0; i < n; ++i) {
+    switch (type) {
+      case 1:
+        line_buffer[i] = z[i];
+        break;
+      case 2:
+        line_buffer[i] = z[i] - z[i - signed_stride];
+        break;
+      case 3:
+        line_buffer[i] = z[i] - (z[i - signed_stride] >> 1);
+        break;
+      case 4:
+        line_buffer[i] =
+            (signed char)(z[i] - stbiw__paeth(0, z[i - signed_stride], 0));
+        break;
+      case 5:
+        line_buffer[i] = z[i];
+        break;
+      case 6:
+        line_buffer[i] = z[i];
+        break;
+    }
+  }
+
+  switch (type) {
+    case 1:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - z[i - n];
+      }
+      break;
+    case 2:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - z[i - signed_stride];
+      }
+      break;
+    case 3:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - ((z[i - n] + z[i - signed_stride]) >> 1);
+      }
+      break;
+    case 4:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - stbiw__paeth(z[i - n], z[i - signed_stride],
+                                             z[i - signed_stride - n]);
+      }
+      break;
+    case 5:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - (z[i - n] >> 1);
+      }
+      break;
+    case 6:
+      for (i = n; i < width * n; ++i) {
+        line_buffer[i] = z[i] - stbiw__paeth(z[i - n], 0, 0);
+      }
+      break;
+  }
+}
+
+unsigned char *stbi_write_png_to_mem(const unsigned char *pixels,
+                                     int stride_bytes, int x, int y, int n,
+                                     int *out_len) {
+  int force_filter = stbi_write_force_png_filter;
+  int ctype[5] = {-1, 0, 4, 2, 6};
+  unsigned char sig[8] = {137, 80, 78, 71, 13, 10, 26, 10};
+  unsigned char *out, *o, *filt, *zlib;
+  signed char *line_buffer;
+  int j, zlen;
+
+  if (stride_bytes == 0) stride_bytes = x * n;
+
+  if (force_filter >= 5) {
+    force_filter = -1;
+  }
+
+  filt = malloc((x * n + 1) * y);
+  if (!filt) return 0;
+  line_buffer = malloc(x * n);
+  if (!line_buffer) {
+    free(filt);
+    return 0;
+  }
+  for (j = 0; j < y; ++j) {
+    int filter_type;
+    if (force_filter > -1) {
+      filter_type = force_filter;
+      stbiw__encode_png_line(pixels, stride_bytes, x, y, j, n, force_filter,
+                             line_buffer);
+    } else {  // Estimate the best filter by running through all of them:
+      int best_filter = 0, best_filter_val = 0x7fffffff, est, i;
+      for (filter_type = 0; filter_type < 5; filter_type++) {
+        stbiw__encode_png_line(pixels, stride_bytes, x, y, j, n, filter_type,
+                               line_buffer);
+
+        // Estimate the entropy of the line using this filter; the less, the
+        // better.
+        est = 0;
+        for (i = 0; i < x * n; ++i) {
+          est += abs((signed char)line_buffer[i]);
+        }
+        if (est < best_filter_val) {
+          best_filter_val = est;
+          best_filter = filter_type;
+        }
+      }
+      if (filter_type != best_filter) {  // If the last iteration already got us
+                                         // the best filter, don't redo it
+        stbiw__encode_png_line(pixels, stride_bytes, x, y, j, n, best_filter,
+                               line_buffer);
+        filter_type = best_filter;
+      }
+    }
+    // when we get here, filter_type contains the filter type, and line_buffer
+    // contains the data
+    filt[j * (x * n + 1)] = (unsigned char)filter_type;
+    memmove(filt + j * (x * n + 1) + 1, line_buffer, x * n);
+  }
+  free(line_buffer);
+  zlib = stbi_zlib_compress(filt, y * (x * n + 1), &zlen,
+                            stbi_write_png_compression_level);
+  free(filt);
+  if (!zlib) return 0;
+
+  // each tag requires 12 bytes of overhead
+  out = malloc(8 + 12 + 13 + 12 + zlen + 12);
+  if (!out) return 0;
+  *out_len = 8 + 12 + 13 + 12 + zlen + 12;
+
+  o = out;
+  memmove(o, sig, 8);
+  o += 8;
+  stbiw__wp32(o, 13);  // header length
+  stbiw__wptag(o, "IHDR");
+  stbiw__wp32(o, x);
+  stbiw__wp32(o, y);
+  *o++ = 8;
+  *o++ = STBIW_UCHAR(ctype[n]);
+  *o++ = 0;
+  *o++ = 0;
+  *o++ = 0;
+  stbiw__wpcrc(&o, 13);
+
+  stbiw__wp32(o, zlen);
+  stbiw__wptag(o, "IDAT");
+  memmove(o, zlib, zlen);
+  o += zlen;
+  free(zlib);
+  stbiw__wpcrc(&o, zlen);
+
+  stbiw__wp32(o, 0);
+  stbiw__wptag(o, "IEND");
+  stbiw__wpcrc(&o, 0);
+
+  assert(o == out + *out_len);
+
+  return out;
+}
+
+int stbi_write_png(const char *filename, int x, int y, int comp,
+                   const void *data, int stride_bytes) {
+  int len;
+  FILE *f;
+  unsigned char *png;
+  png = stbi_write_png_to_mem(data, stride_bytes, x, y, comp, &len);
+  if (png == NULL) return 0;
+  f = fopen(filename, "wb");
+  if (!f) {
+    free(png);
+    return 0;
+  }
+  fwrite(png, 1, len, f);
+  fclose(f);
+  free(png);
+  return 1;
+}
+
+int stbi_write_png_to_func(stbi_write_func *func, void *context, int x, int y,
+                           int comp, const void *data, int stride_bytes) {
+  int len;
+  unsigned char *png;
+  png = stbi_write_png_to_mem((const unsigned char *)data, stride_bytes, x, y,
+                              comp, &len);
+  if (png == NULL) return 0;
+  func(context, png, len);
+  free(png);
+  return 1;
+}
--- a/third_party/stb/stb_vorbis.c
+++ b/third_party/stb/stb_vorbis.c
--- a/third_party/stb/stb_vorbis.h
+++ b/third_party/stb/stb_vorbis.h
@ -0,0 +1,264 @@
+#ifndef COSMOPOLITAN_THIRD_PARTY_STB_STB_VORBIS_H_
+#define COSMOPOLITAN_THIRD_PARTY_STB_STB_VORBIS_H_
+#include "libc/stdio/stdio.h"
+#if !(__ASSEMBLER__ + __LINKER__ + 0)
+COSMOPOLITAN_C_START_
+
+enum STBVorbisError {
+  VORBIS__no_error,
+  VORBIS_need_more_data = 1,     // not a real error
+  VORBIS_invalid_api_mixing,     // can't mix API modes
+  VORBIS_outofmem,               // not enough memory
+  VORBIS_feature_not_supported,  // uses floor 0
+  VORBIS_too_many_channels,      // STB_VORBIS_MAX_CHANNELS is too small
+  VORBIS_file_open_failure,      // fopen() failed
+  VORBIS_seek_without_length,    // can't seek in unknown-length file
+  VORBIS_unexpected_eof = 10,    // file is truncated?
+  VORBIS_seek_invalid,           // seek past EOF
+  VORBIS_invalid_setup = 20,     // decoding errors
+  VORBIS_invalid_stream,
+  VORBIS_missing_capture_pattern = 30,  // ogg errors
+  VORBIS_invalid_stream_structure_version,
+  VORBIS_continued_packet_flag_invalid,
+  VORBIS_incorrect_stream_serial_number,
+  VORBIS_invalid_first_page,
+  VORBIS_bad_packet_type,
+  VORBIS_cant_find_last_page,
+  VORBIS_seek_failed,
+  VORBIS_ogg_skeleton_not_supported
+};
+
+typedef struct {
+  char *alloc_buffer;
+  int alloc_buffer_length_in_bytes;
+} stb_vorbis_alloc;
+
+typedef struct stb_vorbis stb_vorbis;
+
+typedef struct {
+  unsigned int sample_rate;
+  int channels;
+  unsigned int setup_memory_required;
+  unsigned int setup_temp_memory_required;
+  unsigned int temp_memory_required;
+  int max_frame_size;
+} stb_vorbis_info;
+
+// get general information about the file
+stb_vorbis_info stb_vorbis_get_info(stb_vorbis *f);
+
+// get the last error detected (clears it, too)
+int stb_vorbis_get_error(stb_vorbis *f);
+
+// close an ogg vorbis file and free all memory in use
+void stb_vorbis_close(stb_vorbis *f);
+
+// this function returns the offset (in samples) from the beginning of the
+// file that will be returned by the next decode, if it is known, or -1
+// otherwise. after a flush_pushdata() call, this may take a while before
+// it becomes valid again.
+// NOT WORKING YET after a seek with PULLDATA API
+int stb_vorbis_get_sample_offset(stb_vorbis *f);
+
+// returns the current seek point within the file, or offset from the beginning
+// of the memory buffer. In pushdata mode it returns 0.
+unsigned int stb_vorbis_get_file_offset(stb_vorbis *f);
+
+////////////////////////////////////////////////////////////////////////////////
+// PUSHDATA
+
+// this API allows you to get blocks of data from any source and hand
+// them to stb_vorbis. you have to buffer them; stb_vorbis will tell
+// you how much it used, and you have to give it the rest next time;
+// and stb_vorbis may not have enough data to work with and you will
+// need to give it the same data again PLUS more. Note that the Vorbis
+// specification does not bound the size of an individual frame.
+
+stb_vorbis *stb_vorbis_open_pushdata(const unsigned char *datablock,
+                                     int datablock_length_in_bytes,
+                                     int *datablock_memory_consumed_in_bytes,
+                                     int *error,
+                                     const stb_vorbis_alloc *alloc_buffer);
+// create a vorbis decoder by passing in the initial data block containing
+//    the ogg&vorbis headers (you don't need to do parse them, just provide
+//    the first N bytes of the file--you're told if it's not enough, see below)
+// on success, returns an stb_vorbis *, does not set error, returns the amount
+// of
+//    data parsed/consumed on this call in *datablock_memory_consumed_in_bytes;
+// on failure, returns NULL on error and sets *error, does not change
+// *datablock_memory_consumed if returns NULL and *error is
+// VORBIS_need_more_data, then the input block was
+//       incomplete and you need to pass in a larger block from the start of the
+//       file
+
+int stb_vorbis_decode_frame_pushdata(
+    stb_vorbis *f, const unsigned char *datablock,
+    int datablock_length_in_bytes,
+    int *channels,    // place to write number of float * buffers
+    float ***output,  // place to write float ** array of float * buffers
+    int *samples      // place to write number of output samples
+);
+// decode a frame of audio sample data if possible from the passed-in data block
+//
+// return value: number of bytes we used from datablock
+//
+// possible cases:
+//     0 bytes used, 0 samples output (need more data)
+//     N bytes used, 0 samples output (resynching the stream, keep going)
+//     N bytes used, M samples output (one frame of data)
+// note that after opening a file, you will ALWAYS get one N-bytes,0-sample
+// frame, because Vorbis always "discards" the first frame.
+//
+// Note that on resynch, stb_vorbis will rarely consume all of the buffer,
+// instead only datablock_length_in_bytes-3 or less. This is because it wants
+// to avoid missing parts of a page header if they cross a datablock boundary,
+// without writing state-machiney code to record a partial detection.
+//
+// The number of channels returned are stored in *channels (which can be
+// NULL--it is always the same as the number of channels reported by
+// get_info). *output will contain an array of float* buffers, one per
+// channel. In other words, (*output)[0][0] contains the first sample from
+// the first channel, and (*output)[1][0] contains the first sample from
+// the second channel.
+
+void stb_vorbis_flush_pushdata(stb_vorbis *f);
+// inform stb_vorbis that your next datablock will not be contiguous with
+// previous ones (e.g. you've seeked in the data); future attempts to decode
+// frames will cause stb_vorbis to resynchronize (as noted above), and
+// once it sees a valid Ogg page (typically 4-8KB, as large as 64KB), it
+// will begin decoding the _next_ frame.
+//
+// if you want to seek using pushdata, you need to seek in your file, then
+// call stb_vorbis_flush_pushdata(), then start calling decoding, then once
+// decoding is returning you data, call stb_vorbis_get_sample_offset, and
+// if you don't like the result, seek your file again and repeat.
+
+////////////////////////////////////////////////////////////////////////////////
+// PULLING INPUT API
+//
+// This API assumes stb_vorbis is allowed to pull data from a source--
+// either a block of memory containing the _entire_ vorbis stream, or a
+// FILE * that you or it create, or possibly some other reading mechanism
+// if you go modify the source to replace the FILE * case with some kind
+// of callback to your code. (But if you don't support seeking, you may
+// just want to go ahead and use pushdata.)
+
+int stb_vorbis_decode_filename(const char *filename, int *channels,
+                               int *sample_rate, short **output);
+
+// decode an entire file and output the data interleaved into a malloc()ed
+// buffer stored in *output. The return value is the number of samples
+// decoded, or -1 if the file could not be opened or was not an ogg vorbis file.
+// When you're done with it, just free() the pointer returned in *output.
+int stb_vorbis_decode_memory(const unsigned char *mem, int len, int *channels,
+                             int *sample_rate, short **output);
+
+// create an ogg vorbis decoder from an ogg vorbis stream in memory (note
+// this must be the entire stream!). on failure, returns NULL and sets *error
+stb_vorbis *stb_vorbis_open_memory(const unsigned char *data, int len,
+                                   int *error,
+                                   const stb_vorbis_alloc *alloc_buffer);
+
+// create an ogg vorbis decoder from a filename via fopen(). on failure,
+// returns NULL and sets *error (possibly to VORBIS_file_open_failure).
+stb_vorbis *stb_vorbis_open_filename(const char *filename, int *error,
+                                     const stb_vorbis_alloc *alloc_buffer);
+
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell). on failure, returns NULL and sets *error.
+// note that stb_vorbis must "own" this stream; if you seek it in between
+// calls to stb_vorbis, it will become confused. Moreover, if you attempt to
+// perform stb_vorbis_seek_*() operations on this file, it will assume it
+// owns the _entire_ rest of the file after the start point. Use the next
+// function, stb_vorbis_open_file_section(), to limit it.
+stb_vorbis *stb_vorbis_open_file(FILE *f, int close_handle_on_close, int *error,
+                                 const stb_vorbis_alloc *alloc_buffer);
+
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell); the stream will be of length 'len' bytes.
+// on failure, returns NULL and sets *error. note that stb_vorbis must "own"
+// this stream; if you seek it in between calls to stb_vorbis, it will become
+// confused.
+stb_vorbis *stb_vorbis_open_file_section(FILE *f, int close_handle_on_close,
+                                         int *error,
+                                         const stb_vorbis_alloc *alloc_buffer,
+                                         unsigned int len);
+
+// these functions seek in the Vorbis file to (approximately) 'sample_number'.
+// after calling seek_frame(), the next call to get_frame_*() will include
+// the specified sample. after calling stb_vorbis_seek(), the next call to
+// stb_vorbis_get_samples_* will start with the specified sample. If you
+// do not need to seek to EXACTLY the target sample when using get_samples_*,
+// you can also use seek_frame().
+int stb_vorbis_seek_frame(stb_vorbis *f, unsigned int sample_number);
+int stb_vorbis_seek(stb_vorbis *f, unsigned int sample_number);
+
+// this function is equivalent to stb_vorbis_seek(f,0)
+int stb_vorbis_seek_start(stb_vorbis *f);
+
+// these functions return the total length of the vorbis stream
+unsigned int stb_vorbis_stream_length_in_samples(stb_vorbis *f);
+float stb_vorbis_stream_length_in_seconds(stb_vorbis *f);
+
+// decode the next frame and return the number of samples. the number of
+// channels returned are stored in *channels (which can be NULL--it is always
+// the same as the number of channels reported by get_info). *output will
+// contain an array of float* buffers, one per channel. These outputs will
+// be overwritten on the next call to stb_vorbis_get_frame_*.
+//
+// You generally should not intermix calls to stb_vorbis_get_frame_*()
+// and stb_vorbis_get_samples_*(), since the latter calls the former.
+int stb_vorbis_get_frame_float(stb_vorbis *f, int *channels, float ***output);
+
+int stb_vorbis_get_frame_short_interleaved(stb_vorbis *f, int num_c,
+                                           short *buffer, int num_shorts);
+// decode the next frame and return the number of *samples* per channel.
+// Note that for interleaved data, you pass in the number of shorts (the
+// size of your array), but the return value is the number of samples per
+// channel, not the total number of samples.
+//
+// The data is coerced to the number of channels you request according to the
+// channel coercion rules (see below). You must pass in the size of your
+// buffer(s) so that stb_vorbis will not overwrite the end of the buffer.
+// The maximum buffer size needed can be gotten from get_info(); however,
+// the Vorbis I specification implies an absolute maximum of 4096 samples
+// per channel.
+int stb_vorbis_get_frame_short(stb_vorbis *f, int num_c, short **buffer,
+                               int num_samples);
+
+// Channel coercion rules:
+//    Let M be the number of channels requested, and N the number of channels
+//    present, and Cn be the nth channel; let stereo L be the sum of all L and
+//    center channels, and stereo R be the sum of all R and center channels
+//    (channel assignment from the vorbis spec).
+//        M    N       output
+//        1    k      sum(Ck) for all k
+//        2    *      stereo L, stereo R
+//        k    l      k > l, the first l channels, then 0s
+//        k    l      k <= l, the first k channels
+//    Note that this is not _good_ surround etc. mixing at all! It's just so
+//    you get something useful.
+
+int stb_vorbis_get_samples_float_interleaved(stb_vorbis *f, int channels,
+                                             float *buffer, int num_floats);
+int stb_vorbis_get_samples_float(stb_vorbis *f, int channels, float **buffer,
+                                 int num_samples);
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. DOES NOT APPLY THE COERCION
+// RULES. Returns the number of samples stored per channel; it may be less than
+// requested at the end of the file. If there are no more samples in the file,
+// returns 0.
+
+int stb_vorbis_get_samples_short_interleaved(stb_vorbis *f, int channels,
+                                             short *buffer, int num_shorts);
+int stb_vorbis_get_samples_short(stb_vorbis *f, int channels, short **buffer,
+                                 int num_samples);
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. Applies the coercion rules above
+// to produce 'channels' channels. Returns the number of samples stored per
+// channel; it may be less than requested at the end of the file. If there are
+// no more samples in the file, returns 0.
+
+COSMOPOLITAN_C_END_
+#endif /* !(__ASSEMBLER__ + __LINKER__ + 0) */
+#endif /* COSMOPOLITAN_THIRD_PARTY_STB_STB_VORBIS_H_ */
--- a/third_party/stb/ycbcr-sse2.S
+++ b/third_party/stb/ycbcr-sse2.S
@ -0,0 +1,94 @@
+/*-*- mode:asm; indent-tabs-mode:t; tab-width:8; coding:utf-8               -*-│
+│vi: set et ft=asm ts=8 tw=8 fenc=utf-8                                     :vi│
+╞══════════════════════════════════════════════════════════════════════════════╡
+│ Copyright 2020 Justine Alexandra Roberts Tunney                              │
+│                                                                              │
+│ This program is free software; you can redistribute it and/or modify         │
+│ it under the terms of the GNU General Public License as published by         │
+│ the Free Software Foundation; version 2 of the License.                      │
+│                                                                              │
+│ This program is distributed in the hope that it will be useful, but          │
+│ WITHOUT ANY WARRANTY; without even the implied warranty of                   │
+│ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU             │
+│ General Public License for more details.                                     │
+│                                                                              │
+│ You should have received a copy of the GNU General Public License            │
+│ along with this program; if not, write to the Free Software                  │
+│ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA                │
+│ 02110-1301 USA                                                               │
+╚─────────────────────────────────────────────────────────────────────────────*/
+#include "libc/macros.h"
+
+	.align	16
+stbi__YCbCr_to_RGB_row$sse2:
+	.leafprologue
+	.profilable
+	xor	%eax,%eax
+	cmp	$8,%r8d
+	jl	1f
+	xor	%eax,%eax
+	movdqa	2f(%rip),%xmm2
+	movdqa	3f(%rip),%xmm8
+	movdqa	4f(%rip),%xmm9
+	movdqa	5f(%rip),%xmm10
+	movdqa	6f(%rip),%xmm4
+	movdqa	7f(%rip),%xmm5
+	.align	16
+0:	movq	(%rsi,%rax),%xmm6
+	movq	(%rcx,%rax),%xmm7
+	movq	(%rdx,%rax),%xmm1
+	movdqa	%xmm2,%xmm0
+	punpcklbw %xmm6,%xmm0
+	pxor	%xmm2,%xmm7
+	pxor	%xmm6,%xmm6
+	punpcklbw %xmm7,%xmm6
+	pxor	%xmm2,%xmm1
+	pxor	%xmm3,%xmm3
+	punpcklbw %xmm1,%xmm3
+	psrlw	$4,%xmm0
+	movdqa	%xmm6,%xmm7
+	pmulhw	%xmm8,%xmm7
+	movdqa	%xmm3,%xmm1
+	pmulhw	%xmm9,%xmm1
+	pmulhw	%xmm10,%xmm3
+	pmulhw	%xmm4,%xmm6
+	paddw	%xmm1,%xmm6
+	paddw	%xmm0,%xmm7
+	paddw	%xmm0,%xmm3
+	paddw	%xmm0,%xmm6
+	psraw	$4,%xmm7
+	psraw	$4,%xmm3
+	packuswb %xmm3,%xmm7
+	psraw	$4,%xmm6
+	packuswb %xmm5,%xmm6
+	movdqa	%xmm7,%xmm0
+	punpcklbw %xmm6,%xmm0
+	punpckhbw %xmm6,%xmm7
+	movdqa	%xmm0,%xmm1
+	punpcklwd %xmm7,%xmm1
+	punpckhwd %xmm7,%xmm0
+	movdqu	%xmm1,(%rdi,%rax,4)
+	movdqu	%xmm0,16(%rdi,%rax,4)
+	add	$8,%rax
+	lea	7(%rax),%r9d
+	cmp	%r8d,%r9d
+	jl	0b
+1:	.leafepilogue
+	.endfn	stbi__YCbCr_to_RGB_row$sse2,globl
+
+	.rodata.cst16
+2:	.byte	128,128,128,128,128,128,128,128
+	.zero	8
+3:	.short	5743,5743,5743,5743,5743,5743,5743,5743
+4:	.short	64126,64126,64126,64126,64126,64126,64126,64126
+5:	.short	7258,7258,7258,7258,7258,7258,7258,7258
+6:	.short	62611,62611,62611,62611,62611,62611,62611,62611
+7:	.short	255,255,255,255,255,255,255,255
+
+	.end
+/	These should be better but need to get them to work
+3:	.short	11485,11485,11485,11485,11485,11485,11485,11485		# J′R m=13 99.964387%
+4:	.short	-11277,-11277,-11277,-11277,-11277,-11277,-11277,-11277	# J′G m=15 99.935941%
+5:	.short	14516,14516,14516,14516,14516,14516,14516,14516		# J′B m=13 99.947219%
+6:	.short	-23401,-23401,-23401,-23401,-23401,-23401,-23401,-23401	# J′G m=15 99.935941%
+7:	.short	255,255,255,255,255,255,255,255
--- a/third_party/stb/ycbcr.c
+++ b/third_party/stb/ycbcr.c
@ -0,0 +1,57 @@
+/*-*- mode:c;indent-tabs-mode:nil;c-basic-offset:2;tab-width:8;coding:utf-8 -*-│
+│vi: set net ft=c ts=2 sts=2 sw=2 fenc=utf-8                                :vi│
+╞══════════════════════════════════════════════════════════════════════════════╡
+│ Copyright 2020 Justine Alexandra Roberts Tunney                              │
+│                                                                              │
+│ This program is free software; you can redistribute it and/or modify         │
+│ it under the terms of the GNU General Public License as published by         │
+│ the Free Software Foundation; version 2 of the License.                      │
+│                                                                              │
+│ This program is distributed in the hope that it will be useful, but          │
+│ WITHOUT ANY WARRANTY; without even the implied warranty of                   │
+│ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU             │
+│ General Public License for more details.                                     │
+│                                                                              │
+│ You should have received a copy of the GNU General Public License            │
+│ along with this program; if not, write to the Free Software                  │
+│ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA                │
+│ 02110-1301 USA                                                               │
+╚─────────────────────────────────────────────────────────────────────────────*/
+#include "libc/limits.h"
+#include "libc/log/check.h"
+#include "libc/log/log.h"
+#include "libc/macros.h"
+#include "libc/str/str.h"
+#include "third_party/stb/internal.h"
+
+/* this is a reduced-precision calculation of YCbCr-to-RGB introduced
+   to make sure the code produces the same results in both SIMD and scalar */
+#define FLOAT2FIXED(x) (((int)((x)*4096.0f + 0.5f)) << 8)
+
+void stbi__YCbCr_to_RGB_row(unsigned char *out, const unsigned char *y,
+                            const unsigned char *pcb, const unsigned char *pcr,
+                            unsigned count, unsigned step) {
+  unsigned i;
+  unsigned char b4[4];
+  int y_fixed, r, g, b, cr, cb;
+  CHECK(step == 3 || step == 4);
+  CHECK_LE(count, INT_MAX / 4 - 4);
+  for (i = step == 4 ? stbi__YCbCr_to_RGB_row$sse2(out, y, pcb, pcr, count) : 0;
+       i < count; ++i) {
+    y_fixed = (y[i] << 20) + (1 << 19); /* rounding */
+    cr = pcr[i] - 128;
+    cb = pcb[i] - 128;
+    r = y_fixed + cr * FLOAT2FIXED(1.40200f);
+    g = y_fixed + (cr * -FLOAT2FIXED(0.71414f)) +
+        ((cb * -FLOAT2FIXED(0.34414f)) & 0xffff0000);
+    b = y_fixed + cb * FLOAT2FIXED(1.77200f);
+    r /= 1048576;
+    g /= 1048576;
+    b /= 1048576;
+    b4[0] = MIN(255, MAX(0, r));
+    b4[1] = MIN(255, MAX(0, g));
+    b4[2] = MIN(255, MAX(0, b));
+    b4[3] = 255;
+    memcpy(out + i * step, b4, 4);
+  }
+}