tarsum: updates for jamtur01 comments

Signed-off-by: Vincent Batts <vbatts@redhat.com>
This commit is contained in:
Vincent Batts 2014-11-20 15:46:15 -05:00
parent 9a45c4235a
commit bd9c676bb7

View file

@ -1,5 +1,5 @@
page_title: TarSum checksum specification page_title: TarSum checksum specification
page_description: Documentation for algorithm used in the TarSum checksum calculation page_description: Documentation for algorithms used in the TarSum checksum calculation
page_keywords: docker, checksum, validation, tarsum page_keywords: docker, checksum, validation, tarsum
# TarSum Checksum Specification # TarSum Checksum Specification
@ -13,22 +13,21 @@ methods, and the versioning of this calculation.
## Introduction ## Introduction
The transportation of file systems, regarding docker, is done with tar(1) The transportation of filesystems, regarding Docker, is done with tar(1)
archives. There are a variety of tar serialization formats [2], and a key archives. There are a variety of tar serialization formats [2], and a key
concern here is ensuring a repeatable checksum given a set of inputs from a concern here is ensuring a repeatable checksum given a set of inputs from a
generic tar archive. Types of transportation include distribution to and from a generic tar archive. Types of transportation include distribution to and from a
registry endpoint, saving and loading through commands or docker daemon APIs, registry endpoint, saving and loading through commands or Docker daemon APIs,
transferring the build context from client to docker daemon, and committing the transferring the build context from client to Docker daemon, and committing the
filesystem of a container to become an image. filesystem of a container to become an image.
As tar archives are used for transit, but not preserved in many situations, the As tar archives are used for transit, but not preserved in many situations, the
focus of the algorithm is to ensure the integrity of the preserved filesystem, focus of the algorithm is to ensure the integrity of the preserved filesystem,
while maintaining a deterministic accountability. This includes neither while maintaining a deterministic accountability. This includes neither
constrain the ordering or manipulation of the files during the creation or constraining the ordering or manipulation of the files during the creation or
unpacking of the archive, nor include additional metadata state about the file unpacking of the archive, nor include additional metadata state about the file
system attributes. system attributes.
## Intended Audience ## Intended Audience
This document is outlining the methods used for consistent checksum calculation This document is outlining the methods used for consistent checksum calculation
@ -39,26 +38,23 @@ should accommodate the review of source code. Ultimately, this document should
be the starting point of further refinements to the algorithm and its future be the starting point of further refinements to the algorithm and its future
versions. versions.
## Concept ## Concept
The checksum mechanism must ensure the integrity and assurance of the The checksum mechanism must ensure the integrity and assurance of the
filesystem payload. filesystem payload.
## Checksum Algorithm Profile ## Checksum Algorithm Profile
A checksum mechanism must define the following operations and attributes: A checksum mechanism must define the following operations and attributes:
* associated hashing cipher - used to checksum each file payload and attribute * Associated hashing cipher - used to checksum each file payload and attribute
information. information.
* checksum list - each file of the file system archive has its checksum * Checksum list - each file of the filesystem archive has its checksum
calculated from the payload and attributes of the file. The final checksum is calculated from the payload and attributes of the file. The final checksum is
calculated from this list, with specific ordering. calculated from this list, with specific ordering.
* version - as the algorithm adapts to requirements, there are behaviors of the * Version - as the algorithm adapts to requirements, there are behaviors of the
algorithm to manage by versioning. algorithm to manage by versioning.
* archive being calculated - the tar archive having its checksum calculated * Archive being calculated - the tar archive having its checksum calculated
## Elements of TarSum checksum ## Elements of TarSum checksum
@ -73,13 +69,14 @@ There are two delimiters used:
Example: Example:
```
"tarsum.v1+sha256:220a60ecd4a3c32c282622a625a54db9ba0ff55b5ba9c29c7064a2bc358b6a3e" "tarsum.v1+sha256:220a60ecd4a3c32c282622a625a54db9ba0ff55b5ba9c29c7064a2bc358b6a3e"
| | \ | | | \ |
| | \ | | | \ |
|_version_|_cipher__|__ | |_version_|_cipher__|__ |
| \ | | \ |
|_calculation_mechanics_|______________________expected_sum_______________________| |_calculation_mechanics_|______________________expected_sum_______________________|
```
## Versioning ## Versioning
@ -92,51 +89,50 @@ The general algorithm will be describe further in the 'Calculation'.
This is the initial version of TarSum. This is the initial version of TarSum.
Its element in the checksum "tarsum" Its element in the TarSum checksum string is `tarsum`.
### Version1 ### Version1
Its element in the checksum "tarsum.v1" Its element in the TarSum checksum is `tarsum.v1`.
The notable changes in this version: The notable changes in this version:
* exclusion of file mtime from the file information headers, in each file * Exclusion of file `mtime` from the file information headers, in each file
checksum calculation checksum calculation
* inclusion of extended attributes (xattrs. Also seen as "SCHILY.xattr." prefixed Pax * Inclusion of extended attributes (`xattrs`. Also seen as `SCHILY.xattr.` prefixed Pax
tar file info headers) keys and values in each file checksum calculation tar file info headers) keys and values in each file checksum calculation
### VersionDev ### VersionDev
*Do not use unless validating refinements to the checksum algorithm* *Do not use unless validating refinements to the checksum algorithm*
Its element in the checksum "tarsum.dev" Its element in the TarSum checksum is `tarsum.dev`.
This is a floating place holder for a next version. The methods used for This is a floating place holder for a next version and grounds for testing
calculation are subject to change without notice. changes. The methods used for calculation are subject to change without notice,
and this version is for testing and not for production use.
## Ciphers ## Ciphers
The official default and standard hashing cipher used in the calculation mechanic The official default and standard hashing cipher used in the calculation mechanic
is "sha256". This refers to SHA256 hash algorithm as defined in FIPS 180-4. is `sha256`. This refers to SHA256 hash algorithm as defined in FIPS 180-4.
Though the algorithm itself is not exclusively bound to this single hashing Though the TarSum algorithm itself is not exclusively bound to the single
cipher, and support for alternate hashing ciphers was later added [1]. Presently hashing cipher `sha256`, support for alternate hashing ciphers was later added
use of this is for isolated use-cases and future-proofing the TarSum checksum [1]. Use cases for alternate cipher could include future-proofing TarSum
format. checksum format and using faster cipher hashes for tar filesystem checksums.
## Calculation ## Calculation
### Requirement ### Requirement
As mentioned earlier, the calculation is such that it takes into consideration As mentioned earlier, the calculation is such that it takes into consideration
the life and cycle of the tar archive. In that the tar archive is not an the lifecycle of the tar archive. In that the tar archive is not an immutable,
immutable, permanent artifact. Otherwise options like relying on a known hashing permanent artifact. Otherwise options like relying on a known hashing cipher
cipher checksum of the archive itself would be reliable enough. Since the tar checksum of the archive itself would be reliable enough. The tar archive of the
archive is used as a transportation medium, and is thrown away after its filesystem is used as a transportation medium for Docker images, and the
contents are extracted. Therefore, for consistent validation items such as archive is discarded once its contents are extracted. Therefore, for consistent
order of files in the tar archive and time stamps are subject to change once an validation items such as order of files in the tar archive and time stamps are
image is received. subject to change once an image is received.
### Process ### Process
@ -175,7 +171,6 @@ For >= Version1, the extented attribute headers ("SCHILY.xattr." prefixed pax
headers) included after the above list. These xattrs key/values are first headers) included after the above list. These xattrs key/values are first
sorted by the keys. sorted by the keys.
#### Header Format #### Header Format
The ordered headers are written to the hash in the format of The ordered headers are written to the hash in the format of
@ -184,13 +179,11 @@ The ordered headers are written to the hash in the format of
with no newline. with no newline.
#### Body #### Body
After the order headers of the file have been added to the checksum for the After the order headers of the file have been added to the checksum for the
file, the body of the file is written to the hash. file, the body of the file is written to the hash.
#### List of file sums #### List of file sums
The list of file sums is sorted by the string of the hexadecimal digest. The list of file sums is sorted by the string of the hexadecimal digest.
@ -199,7 +192,6 @@ If there are two files in the tar with matching paths, the order of occurrence
for that path is reflected for the sums of the corresponding file header and for that path is reflected for the sums of the corresponding file header and
body. body.
#### Final Checksum #### Final Checksum
Begin with a fresh or initial state of the associated hash cipher. If there is Begin with a fresh or initial state of the associated hash cipher. If there is
@ -211,7 +203,6 @@ The resulting digest is formatted per the Elements of TarSum checksum,
including the TarSum version, the associated hash cipher and the hexadecimal including the TarSum version, the associated hash cipher and the hexadecimal
encoded checksum digest. encoded checksum digest.
## Security Considerations ## Security Considerations
The initial version of TarSum has undergone one update that could invalidate The initial version of TarSum has undergone one update that could invalidate
@ -220,7 +211,6 @@ with same names as prior files in the archive. The latter file will clobber the
prior file of the same path. Due to this the algorithm now accounts for files prior file of the same path. Due to this the algorithm now accounts for files
with matching paths, and orders the list of file sums accordingly [3]. with matching paths, and orders the list of file sums accordingly [3].
## Footnotes ## Footnotes
* [0] Versioning https://github.com/docker/docker/commit/747f89cd327db9d50251b17797c4d825162226d0 * [0] Versioning https://github.com/docker/docker/commit/747f89cd327db9d50251b17797c4d825162226d0