tarsum: updates for jamtur01 comments
Signed-off-by: Vincent Batts <vbatts@redhat.com>
This commit is contained in:
parent
9a45c4235a
commit
bd9c676bb7
1 changed files with 36 additions and 46 deletions
|
@ -1,5 +1,5 @@
|
||||||
page_title: TarSum checksum specification
|
page_title: TarSum checksum specification
|
||||||
page_description: Documentation for algorithm used in the TarSum checksum calculation
|
page_description: Documentation for algorithms used in the TarSum checksum calculation
|
||||||
page_keywords: docker, checksum, validation, tarsum
|
page_keywords: docker, checksum, validation, tarsum
|
||||||
|
|
||||||
# TarSum Checksum Specification
|
# TarSum Checksum Specification
|
||||||
|
@ -7,58 +7,54 @@ page_keywords: docker, checksum, validation, tarsum
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
This document describes the algorithms used in performing the TarSum checksum
|
This document describes the algorithms used in performing the TarSum checksum
|
||||||
calculation on file system layers, the need for this method over existing
|
calculation on filesystem layers, the need for this method over existing
|
||||||
methods, and the versioning of this calculation.
|
methods, and the versioning of this calculation.
|
||||||
|
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
The transportation of file systems, regarding docker, is done with tar(1)
|
The transportation of filesystems, regarding Docker, is done with tar(1)
|
||||||
archives. There are a variety of tar serialization formats [2], and a key
|
archives. There are a variety of tar serialization formats [2], and a key
|
||||||
concern here is ensuring a repeatable checksum given a set of inputs from a
|
concern here is ensuring a repeatable checksum given a set of inputs from a
|
||||||
generic tar archive. Types of transportation include distribution to and from a
|
generic tar archive. Types of transportation include distribution to and from a
|
||||||
registry endpoint, saving and loading through commands or docker daemon APIs,
|
registry endpoint, saving and loading through commands or Docker daemon APIs,
|
||||||
transferring the build context from client to docker daemon, and committing the
|
transferring the build context from client to Docker daemon, and committing the
|
||||||
file system of a container to become an image.
|
filesystem of a container to become an image.
|
||||||
|
|
||||||
As tar archives are used for transit, but not preserved in many situations, the
|
As tar archives are used for transit, but not preserved in many situations, the
|
||||||
focus of the algorithm is to ensure the integrity of the preserved file system,
|
focus of the algorithm is to ensure the integrity of the preserved filesystem,
|
||||||
while maintaining a deterministic accountability. This includes neither
|
while maintaining a deterministic accountability. This includes neither
|
||||||
constrain the ordering or manipulation of the files during the creation or
|
constraining the ordering or manipulation of the files during the creation or
|
||||||
unpacking of the archive, nor include additional metadata state about the file
|
unpacking of the archive, nor include additional metadata state about the file
|
||||||
system attributes.
|
system attributes.
|
||||||
|
|
||||||
|
|
||||||
## Intended Audience
|
## Intended Audience
|
||||||
|
|
||||||
This document is outlining the methods used for consistent checksum calculation
|
This document is outlining the methods used for consistent checksum calculation
|
||||||
for file systems transported via tar archives.
|
for filesystems transported via tar archives.
|
||||||
|
|
||||||
Auditing these methodologies is an open and iterative process. This document
|
Auditing these methodologies is an open and iterative process. This document
|
||||||
should accommodate the review of source code. Ultimately, this document should
|
should accommodate the review of source code. Ultimately, this document should
|
||||||
be the starting point of further refinements to the algorithm and its future
|
be the starting point of further refinements to the algorithm and its future
|
||||||
versions.
|
versions.
|
||||||
|
|
||||||
|
|
||||||
## Concept
|
## Concept
|
||||||
|
|
||||||
The checksum mechanism must ensure the integrity and assurance of the
|
The checksum mechanism must ensure the integrity and assurance of the
|
||||||
file system payload.
|
filesystem payload.
|
||||||
|
|
||||||
|
|
||||||
## Checksum Algorithm Profile
|
## Checksum Algorithm Profile
|
||||||
|
|
||||||
A checksum mechanism must define the following operations and attributes:
|
A checksum mechanism must define the following operations and attributes:
|
||||||
|
|
||||||
* associated hashing cipher - used to checksum each file payload and attribute
|
* Associated hashing cipher - used to checksum each file payload and attribute
|
||||||
information.
|
information.
|
||||||
* checksum list - each file of the file system archive has its checksum
|
* Checksum list - each file of the filesystem archive has its checksum
|
||||||
calculated from the payload and attributes of the file. The final checksum is
|
calculated from the payload and attributes of the file. The final checksum is
|
||||||
calculated from this list, with specific ordering.
|
calculated from this list, with specific ordering.
|
||||||
* version - as the algorithm adapts to requirements, there are behaviors of the
|
* Version - as the algorithm adapts to requirements, there are behaviors of the
|
||||||
algorithm to manage by versioning.
|
algorithm to manage by versioning.
|
||||||
* archive being calculated - the tar archive having its checksum calculated
|
* Archive being calculated - the tar archive having its checksum calculated
|
||||||
|
|
||||||
|
|
||||||
## Elements of TarSum checksum
|
## Elements of TarSum checksum
|
||||||
|
|
||||||
|
@ -73,13 +69,14 @@ There are two delimiters used:
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
|
```
|
||||||
"tarsum.v1+sha256:220a60ecd4a3c32c282622a625a54db9ba0ff55b5ba9c29c7064a2bc358b6a3e"
|
"tarsum.v1+sha256:220a60ecd4a3c32c282622a625a54db9ba0ff55b5ba9c29c7064a2bc358b6a3e"
|
||||||
| | \ |
|
| | \ |
|
||||||
| | \ |
|
| | \ |
|
||||||
|_version_|_cipher__|__ |
|
|_version_|_cipher__|__ |
|
||||||
| \ |
|
| \ |
|
||||||
|_calculation_mechanics_|______________________expected_sum_______________________|
|
|_calculation_mechanics_|______________________expected_sum_______________________|
|
||||||
|
```
|
||||||
|
|
||||||
## Versioning
|
## Versioning
|
||||||
|
|
||||||
|
@ -92,51 +89,50 @@ The general algorithm will be describe further in the 'Calculation'.
|
||||||
|
|
||||||
This is the initial version of TarSum.
|
This is the initial version of TarSum.
|
||||||
|
|
||||||
Its element in the checksum "tarsum"
|
Its element in the TarSum checksum string is `tarsum`.
|
||||||
|
|
||||||
|
|
||||||
### Version1
|
### Version1
|
||||||
|
|
||||||
Its element in the checksum "tarsum.v1"
|
Its element in the TarSum checksum is `tarsum.v1`.
|
||||||
|
|
||||||
The notable changes in this version:
|
The notable changes in this version:
|
||||||
* exclusion of file mtime from the file information headers, in each file
|
* Exclusion of file `mtime` from the file information headers, in each file
|
||||||
checksum calculation
|
checksum calculation
|
||||||
* inclusion of extended attributes (xattrs. Also seen as "SCHILY.xattr." prefixed Pax
|
* Inclusion of extended attributes (`xattrs`. Also seen as `SCHILY.xattr.` prefixed Pax
|
||||||
tar file info headers) keys and values in each file checksum calculation
|
tar file info headers) keys and values in each file checksum calculation
|
||||||
|
|
||||||
### VersionDev
|
### VersionDev
|
||||||
|
|
||||||
*Do not use unless validating refinements to the checksum algorithm*
|
*Do not use unless validating refinements to the checksum algorithm*
|
||||||
|
|
||||||
Its element in the checksum "tarsum.dev"
|
Its element in the TarSum checksum is `tarsum.dev`.
|
||||||
|
|
||||||
This is a floating place holder for a next version. The methods used for
|
This is a floating place holder for a next version and grounds for testing
|
||||||
calculation are subject to change without notice.
|
changes. The methods used for calculation are subject to change without notice,
|
||||||
|
and this version is for testing and not for production use.
|
||||||
|
|
||||||
## Ciphers
|
## Ciphers
|
||||||
|
|
||||||
The official default and standard hashing cipher used in the calculation mechanic
|
The official default and standard hashing cipher used in the calculation mechanic
|
||||||
is "sha256". This refers to SHA256 hash algorithm as defined in FIPS 180-4.
|
is `sha256`. This refers to SHA256 hash algorithm as defined in FIPS 180-4.
|
||||||
|
|
||||||
Though the algorithm itself is not exclusively bound to this single hashing
|
Though the TarSum algorithm itself is not exclusively bound to the single
|
||||||
cipher, and support for alternate hashing ciphers was later added [1]. Presently
|
hashing cipher `sha256`, support for alternate hashing ciphers was later added
|
||||||
use of this is for isolated use-cases and future-proofing the TarSum checksum
|
[1]. Use cases for alternate cipher could include future-proofing TarSum
|
||||||
format.
|
checksum format and using faster cipher hashes for tar filesystem checksums.
|
||||||
|
|
||||||
## Calculation
|
## Calculation
|
||||||
|
|
||||||
### Requirement
|
### Requirement
|
||||||
|
|
||||||
As mentioned earlier, the calculation is such that it takes into consideration
|
As mentioned earlier, the calculation is such that it takes into consideration
|
||||||
the life and cycle of the tar archive. In that the tar archive is not an
|
the lifecycle of the tar archive. In that the tar archive is not an immutable,
|
||||||
immutable, permanent artifact. Otherwise options like relying on a known hashing
|
permanent artifact. Otherwise options like relying on a known hashing cipher
|
||||||
cipher checksum of the archive itself would be reliable enough. Since the tar
|
checksum of the archive itself would be reliable enough. The tar archive of the
|
||||||
archive is used as a transportation medium, and is thrown away after its
|
filesystem is used as a transportation medium for Docker images, and the
|
||||||
contents are extracted. Therefore, for consistent validation items such as
|
archive is discarded once its contents are extracted. Therefore, for consistent
|
||||||
order of files in the tar archive and time stamps are subject to change once an
|
validation items such as order of files in the tar archive and time stamps are
|
||||||
image is received.
|
subject to change once an image is received.
|
||||||
|
|
||||||
|
|
||||||
### Process
|
### Process
|
||||||
|
|
||||||
|
@ -175,7 +171,6 @@ For >= Version1, the extented attribute headers ("SCHILY.xattr." prefixed pax
|
||||||
headers) included after the above list. These xattrs key/values are first
|
headers) included after the above list. These xattrs key/values are first
|
||||||
sorted by the keys.
|
sorted by the keys.
|
||||||
|
|
||||||
|
|
||||||
#### Header Format
|
#### Header Format
|
||||||
|
|
||||||
The ordered headers are written to the hash in the format of
|
The ordered headers are written to the hash in the format of
|
||||||
|
@ -184,13 +179,11 @@ The ordered headers are written to the hash in the format of
|
||||||
|
|
||||||
with no newline.
|
with no newline.
|
||||||
|
|
||||||
|
|
||||||
#### Body
|
#### Body
|
||||||
|
|
||||||
After the order headers of the file have been added to the checksum for the
|
After the order headers of the file have been added to the checksum for the
|
||||||
file, the body of the file is written to the hash.
|
file, the body of the file is written to the hash.
|
||||||
|
|
||||||
|
|
||||||
#### List of file sums
|
#### List of file sums
|
||||||
|
|
||||||
The list of file sums is sorted by the string of the hexadecimal digest.
|
The list of file sums is sorted by the string of the hexadecimal digest.
|
||||||
|
@ -199,7 +192,6 @@ If there are two files in the tar with matching paths, the order of occurrence
|
||||||
for that path is reflected for the sums of the corresponding file header and
|
for that path is reflected for the sums of the corresponding file header and
|
||||||
body.
|
body.
|
||||||
|
|
||||||
|
|
||||||
#### Final Checksum
|
#### Final Checksum
|
||||||
|
|
||||||
Begin with a fresh or initial state of the associated hash cipher. If there is
|
Begin with a fresh or initial state of the associated hash cipher. If there is
|
||||||
|
@ -211,7 +203,6 @@ The resulting digest is formatted per the Elements of TarSum checksum,
|
||||||
including the TarSum version, the associated hash cipher and the hexadecimal
|
including the TarSum version, the associated hash cipher and the hexadecimal
|
||||||
encoded checksum digest.
|
encoded checksum digest.
|
||||||
|
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
The initial version of TarSum has undergone one update that could invalidate
|
The initial version of TarSum has undergone one update that could invalidate
|
||||||
|
@ -220,7 +211,6 @@ with same names as prior files in the archive. The latter file will clobber the
|
||||||
prior file of the same path. Due to this the algorithm now accounts for files
|
prior file of the same path. Due to this the algorithm now accounts for files
|
||||||
with matching paths, and orders the list of file sums accordingly [3].
|
with matching paths, and orders the list of file sums accordingly [3].
|
||||||
|
|
||||||
|
|
||||||
## Footnotes
|
## Footnotes
|
||||||
|
|
||||||
* [0] Versioning https://github.com/docker/docker/commit/747f89cd327db9d50251b17797c4d825162226d0
|
* [0] Versioning https://github.com/docker/docker/commit/747f89cd327db9d50251b17797c4d825162226d0
|
||||||
|
|
Loading…
Reference in a new issue