This allows reading the metadata contained in tar-split
without expensively recreating the whole tar stream
including full contents.
We have two use cases for this:
- In a situation where tar-split is distributed along with
a separate metadata stream, ensuring that the two are
exactly consistent
- Reading the tar headers allows making a ~cheap check
of consistency of on-disk layers, just checking that the
files exist in expected sizes, without reading the full
contents.
This can be implemented outside of this repo, but it's
not ideal:
- The function necessarily hard-codes some assumptions
about how tar-split determines the boundaries of
SegmentType/FileType entries (or, indeed, whether it
uses FileType entries at all). That's best maintained
directly beside the code that creates this.
- The ExpectedPadding() value is not currently exported,
so the consumer would have to heuristically guess where
the padding ends.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Fixes#65
if the read bytes is 0, then don't even create the entry for that
padding.
This sounds like the solution for the issue opened, but I haven't found
a reproducer for this issue yet. :-\
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
this function is used widely and it's JSON. And it was not written in
such a way as to have exchangable codec.. per se
So, maybe I'll just kick out the idea of using https://github.com/ugorji/go
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
I intend to not make changes to this `archive/tar` that aren't from
upstream, or are not directly related to the usage by this project...
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
the pointer to the pool may be useful, but holding on that until I get
benchmarks of memory use to show the benefit.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
io.Copy usually allocates a 32kB buffer, and due to the large
number of files processed by tar-split, this shows up in Go profiles
as a very large alloc_space total.
It doesn't seem to actually be a measurable problem in any way,
but we can allocate the buffer only once per tar-split creation,
at no additional cost to existing allocations, so let's do so,
and remove the distraction.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
There is a discrepancy of behavior of `github.com/urfave/cli` between
using go1.12 and go1.15, when the dependency is not present as vendored
source. Now this builds fine with go1.12
There are users of tar-split as a package. It is the hope that by adding
this vendored source it does not impact them depending on tar-split
itself.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
The Go implementation of gzip is the only known to produce compressed
layers with the expected digest hashes.
This change allows compressed tar layer files to be produced, which is
useful for exporting layers from non-Go tools.
Now when golang 1.11 is out, 1.9 and older versions are no longer
supported. More to say, since the archive/tar is from go-1.11, it
uses some features from new Go versions (strings.Builder and sync.Map)
not supported by anything older than Go 1.10.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a port of commits adding RawHeader() to go-1.11 archive/tar.
In addition:
* simplify the rawBytes.Write() code in readHeader()
* ignore errors from rawBytes.Write(), as (at least for go-1.11)
it never returns an error, only panics (if the buffer grew too large)
Also, remove the internal/testenv from tar_tar.go to enable go test.
As working symlink detection is non-trivial on Windows, just skip
the test on that platform.
In addition to `go test`, I did some minimal manual testing, and
it seems this code creates tar-data.json.gz which is identical
to the one made by the old version.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To ensure we don't have regressions in our padding fix, add a test case
that attempts to crash the test by creating 20GB of random junk padding.
Signed-off-by: Aleksa Sarai <asarai@suse.de>