99c8914877
This allows reading the metadata contained in tar-split without expensively recreating the whole tar stream including full contents. We have two use cases for this: - In a situation where tar-split is distributed along with a separate metadata stream, ensuring that the two are exactly consistent - Reading the tar headers allows making a ~cheap check of consistency of on-disk layers, just checking that the files exist in expected sizes, without reading the full contents. This can be implemented outside of this repo, but it's not ideal: - The function necessarily hard-codes some assumptions about how tar-split determines the boundaries of SegmentType/FileType entries (or, indeed, whether it uses FileType entries at all). That's best maintained directly beside the code that creates this. - The ExpectedPadding() value is not currently exported, so the consumer would have to heuristically guess where the padding ends. Signed-off-by: Miloslav Trmač <mitr@redhat.com> |
||
---|---|---|
.. | ||
testdata | ||
assemble.go | ||
assemble_test.go | ||
disassemble.go | ||
disassemble_test.go | ||
doc.go | ||
iterate.go | ||
iterate_test.go | ||
README.md |
asm
This library for assembly and disassembly of tar archives, facilitated by
github.com/vbatts/tar-split/tar/storage
.
Concerns
For completely safe assembly/disassembly, there will need to be a Content
Addressable Storage (CAS) directory, that maps to a checksum in the
storage.Entity
of storage.FileType
.
This is due to the fact that tar archives can allow multiple records for the same path, but the last one effectively wins. Even if the prior records had a different payload.
In this way, when assembling an archive from relative paths, if the archive has multiple entries for the same path, then all payloads read in from a relative path would be identical.
Thoughts
Have a look-aside directory or storage. This way when a clobbering record is encountered from the tar stream, then the payload of the prior/existing file is stored to the CAS. This way the clobbering record's file payload can be extracted, but we'll have preserved the payload needed to reassemble a precise tar archive.
clobbered/path/to/file.[0-N]
alternatively
We could just not support tar streams that have clobbering file paths. Appending records to the archive is not incredibly common, and doesn't happen by default for most implementations. Not supporting them wouldn't be a security concern either, as if it did occur, we would reassemble an archive that doesn't validate signature/checksum, so it shouldn't be trusted anyway.
Otherwise, this will allow us to defer support for appended files as a FUTURE FEATURE.