1
0
Fork 1
mirror of https://github.com/vbatts/tar-split.git synced 2024-11-25 01:05:40 +00:00
Commit graph

75 commits

Author SHA1 Message Date
Miloslav Trmač
99c8914877 Add tar/asm.IterateHeaders
This allows reading the metadata contained in tar-split
without expensively recreating the whole tar stream
including full contents.

We have two use cases for this:
- In a situation where tar-split is distributed along with
  a separate metadata stream, ensuring that the two are
  exactly consistent
- Reading the tar headers allows making a ~cheap check
  of consistency of on-disk layers, just checking that the
  files exist in expected sizes, without reading the full
  contents.

This can be implemented outside of this repo, but it's
not ideal:
- The function necessarily hard-codes some assumptions
  about how tar-split determines the boundaries of
  SegmentType/FileType entries (or, indeed, whether it
  uses FileType entries at all). That's best maintained
  directly beside the code that creates this.
- The ExpectedPadding() value is not currently exported,
  so the consumer would have to heuristically guess where
  the padding ends.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-09-11 20:01:49 +02:00
Miloslav Trmač
cd197d3076 Correctly handle Read returning (0, nil)
It's not an EOF indication.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-07-22 02:35:45 +02:00
b6372414e5
tar/asm: don't add a padding entry if it has no bytes
Fixes #65

if the read bytes is 0, then don't even create the entry for that
padding.
This sounds like the solution for the issue opened, but I haven't found
a reproducer for this issue yet. :-\

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-07-21 09:02:43 -04:00
cad1f451fd
tar/asm: troubleshooting padding EOF issue
Reference #65

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-07-21 09:02:29 -04:00
guoguangwu
919f9abf38 chore: remove refs to deprecated io/ioutil
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-07-20 23:00:46 +08:00
e4450847fb
tar/storage: remove TODO's on sailed shipped for changing the encoding
this function is used widely and it's JSON. And it was not written in
such a way as to have exchangable codec.. per se
So, maybe I'll just kick out the idea of using https://github.com/ugorji/go

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-03-26 14:10:16 -04:00
2b88967591
*.go: gomft -s -w
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-03-25 21:05:25 -04:00
516158dbfb
*.go: linting project specific code
the pointer to the pool may be useful, but holding on that until I get
benchmarks of memory use to show the benefit.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-03-25 20:45:23 -04:00
70fb294a9b
tar/asm: go vet fixes
on go1.19.7

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2023-03-25 20:38:36 -04:00
Miloslav Trmač
8d76363085 Avoid a 32 kB file allocation on every bitBucketFilePutter.Put
io.Copy usually allocates a 32kB buffer, and due to the large
number of files processed by tar-split, this shows up in Go profiles
as a very large alloc_space total.

It doesn't seem to actually be a measurable problem in any way,
but we can allocate the buffer only once per tar-split creation,
at no additional cost to existing allocations, so let's do so,
and remove the distraction.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2021-08-21 03:24:39 +02:00
Aleksa Sarai
99430a8454
tar: asm: add an excess padding test case
To ensure we don't have regressions in our padding fix, add a test case
that attempts to crash the test by creating 20GB of random junk padding.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-11-08 02:35:01 +11:00
Aleksa Sarai
3d9db48dbe
tar: asm: store padding in chunks to avoid memory exhaustion
Previously, we would read the entire padding in a given archive into
memory in order to store it in the packer. This would cause memory
exhaustion if a malicious archive was crafted with very large amounts of
padding. Since a given SegmentType is reconstructed losslessly, we can
simply chunk up any padding into large segments to avoid this problem.
Use a reasonable default of 1MiB to avoid changing the tar-split.json of
existing archives that are not malformed.

Fixes: CVE-2017-14992
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-11-08 02:34:56 +11:00
7410961e75 tar/asm: failing test for lack of EOF nils
Reported-by: Derek McGowan <derek@mcgstyle.net>
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2016-09-26 13:39:03 -07:00
0de4e9db0c Merge pull request #27 from vbatts/bench_asm
tar/asm: basic benchmark on disasm/asm of testdata
2015-12-02 14:09:21 -06:00
1501fe6002 Merge pull request #22 from tonistiigi/stream-opt
Optimize tar stream generation
2015-12-02 14:09:08 -06:00
19b7e22058 tar/asm: basic benchmark on disasm/asm of testdata
```
PASS
BenchmarkAsm-4         5         238968475 ns/op        66841059 B/op       2449 allocs/op
ok      _/home/vbatts/src/vb/tar-split/tar/asm  2.267s
```

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-12-02 14:36:02 -05:00
2efe34695a tar/asm: remove unneeded Tee
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-12-02 12:56:52 -05:00
Tonis Tiigi
23b6435e6b Optimize tar stream generation
- New writeTo method allows to avoid creating extra pipe.
- Copy with a pooled buffer instead of allocating new buffer for each file.
- Avoid extra object allocations inside the loop.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2015-12-01 14:08:53 -08:00
11281e8c09 tar/storage: adding Getter Putter benchmark
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-12-01 15:31:48 -05:00
Tonis Tiigi
8b20f9161d Optimize JSON decoding
This allows to avoid extra allocations on `ReadBytes` and
decoding buffers.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2015-11-30 09:52:44 -08:00
10250c25e0 tar/asm: remove useless test
The iso-8859-1 archive is already tested round trip, and this test did
not do anything really.
2015-09-25 14:35:12 -04:00
7e38cefd4b common: remove in favor of stdlib unicode/utf8 2015-09-25 14:33:24 -04:00
8a361ef0d8 tar/storage: Sprintf is unnecessary
fmt.Sprintf() vs string() for this []byte conversion is too much and
does not provide any further safety.

https://gist.github.com/vbatts/ab17181086aed558dd3a
2015-09-24 09:51:58 -04:00
cde639172f tar/asm: work with non-utf8 entry names 2015-09-23 15:27:33 -04:00
032efafc29 tar/storage: work with raw (invalid utf8) names
When the entry name is not UTF-8, for example ISO-8859-1, then store the
raw bytes.
To accommodate this, we will have getters and setters for the entry's
name now. Since this most heavily affects the json marshalling, we'll
double check the sanity of the name before storing it in the JSONPacker.
2015-09-23 15:27:33 -04:00
39d06b9dc4 tar/common: get index of first invalid utf-8 char 2015-09-23 15:27:15 -04:00
2865353200 common: add a UTF-8 check helper 2015-09-23 15:27:13 -04:00
c76e42010e tar/asm: additional GNU LongLink testcase
Adding a minimal test case for GNU @LongLink.
Tested that it fails on v0.9.5, but now passes on v0.9.6 and master.
2015-08-14 07:55:18 -04:00
8f81a50860 Merge pull request #10 from LK4D4/fix_pipe_close
asm: Remove unreachable code
2015-08-13 15:36:42 -04:00
e72b4959f9 Merge pull request #9 from LK4D4/fix_json_tags
storage: Fix syntax of json tags
2015-08-13 15:35:20 -04:00
Alexander Morozov
45399711c2 tar/storage: Replace TeeReader with MultiWriter
It uses slightly less memory and more understandable.
Benchmar results:

benchmark             old ns/op     new ns/op     delta
BenchmarkPutter-4     57272         52375         -8.55%

benchmark             old allocs     new allocs     delta
BenchmarkPutter-4     21             19             -9.52%
benchmark             old bytes     new bytes     delta
BenchmarkPutter-4     19416         13336         -31.31%

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-13 11:43:31 -07:00
Alexander Morozov
ea73dc6f6f tar/storage: Benchmark for bufferFileGetPutter.Put
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-13 11:42:14 -07:00
Alexander Morozov
93c0a320a8 asm: Remove unreachable code
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-12 22:45:39 -07:00
Alexander Morozov
b1783bc86d storage: Fix syntax of json tags
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-12 22:41:28 -07:00
Alexander Morozov
e6df23162e Remove redundant TeeReader
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-12 16:46:04 -07:00
df8572a1eb tar/asm: check length before adding an entry 2015-08-11 15:57:20 -04:00
51b0481d4a tar/asm: adding a failing test due to GNU LongLink 2015-08-11 15:57:20 -04:00
Jonathan Boulle
caf6a872c9 tar/storage: switch to map[string]struct{} for set
Using an empty struct is more idiomatic/efficient for representing a
set-like container.
2015-07-22 15:32:49 -04:00
Jonathan Boulle
002d19f0b0 *: clean up assorted spelling/grammar issues
Various minor fixes noticed on walking through
2015-07-22 15:32:49 -04:00
e0e9886972 tar/asm: return instead of break
5ddec2ae4a (commitcomment-12290378)

Reported-by: Tibor Vass <tibor@docker.com>
2015-07-22 11:32:18 -04:00
c2c2dde4cb tar/storage: use filepath instead of path 2015-07-22 10:27:53 -04:00
6d59e7bc76 tar/asm: clean up return on errors
This closure on error message needs returns so that the error message is
bubbled up to the reader.
2015-07-21 12:10:09 -04:00
c74af0bae7 tar/asm: test was flipped 2015-07-20 17:26:16 -04:00
04172717de tar/asm: test for failure when mangling 2015-07-20 16:46:22 -04:00
e33913bf75 tar/asm: don't defer file closing
this `for {}` can read many files. defering the file handle close can
cause an EMFILE (too many open files).
2015-07-15 13:43:48 -04:00
86ada47639 tar/asm: handle nil tar Header
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-06-23 12:23:36 -04:00
ae13eaae94 tar/asm: remove uneeded goroutine
Reported-by: Derek McGowan <derek@mcgstyle.net>
2015-06-21 14:14:37 -04:00
46840c585a *: golint and docs 2015-03-09 14:11:11 -04:00
f7b9a6caee tar/asm: comments 2015-03-09 13:56:45 -04:00
4ab9185a57 tar/asm: package docs 2015-03-09 13:54:06 -04:00