README.md: information on metadata size

2015-03-10 11:41:20 -04:00 · 2015-03-10 11:41:20 -04:00 · 61b11c52f8
commit 61b11c52f8
parent 402c6217ac
1 changed files with 50 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -113,6 +113,56 @@ ca9e19966b892d9ad5960414abac01ef585a1e22  tar-split.tar
 ca9e19966b892d9ad5960414abac01ef585a1e22  tar-split.tar.out
 ```

+
+Stored Metadata
+---------------
+
+Since the raw bytes of the headers and padding are stored, you may be wondering
+what the size implications are. The headers are at least 512 bytes per
+file (sometimes more), at least 1024 null bytes on the end, and then various
+padding. This makes for a constant linear growth in the stored metadata, with a
+naive storage implementation.
+
+Reusing our prior example's `tar-split.tar`, let's build the checksize.go example:
+
+```
+go build ./checksize.go
+```
+
+```
+$ ./checksize ./tar-split.tar
+inspecting "tar-split.tar" (size 210k)
+ -- number of files: 50
+ -- size of metadata uncompressed: 53k
+ -- size of gzip compressed metadata: 3k
+```
+
+So assuming you've managed the extraction of the archive yourself, for reuse of
+the file payloads from a relative path, then the only additional storage
+implications are as little as 3kb.
+
+But let's look at a larger archive, with many files.
+
+```
+$ ls -sh ./d.tar
+1.4G ./d.tar
+$ ./checksize ~/d.tar 
+inspecting "/home/vbatts/d.tar" (size 1420749k)
+ -- number of files: 38718
+ -- size of metadata uncompressed: 43261k
+ -- size of gzip compressed metadata: 2251k
+```
+
+Here, an archive with 38,718 files has a compressed footprint of about 2mb.
+
+Rolling the null bytes on the end of the archive, we will assume a
+bytes-per-file rate for the storage implications.
+
+| uncompressed | compressed |
+| :----------: | :--------: |
+| ~ 1kb per/file | 0.06kb per/file |
+
+
 What's Next?
 ------------