146 lines
6.8 KiB
Markdown
146 lines
6.8 KiB
Markdown
|
# OCI Content Descriptors
|
||
|
|
||
|
An OCI image consists of several different components, arranged in a [Merkle Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Merkle_tree).
|
||
|
References between components in the graph are expressed through _Content Descriptors_.
|
||
|
A Content Descriptor (or simply _Descriptor_) describes the disposition of the targeted content.
|
||
|
A Content Descriptor includes the type of the content, a content identifier (_digest_), and the byte-size of the raw content.
|
||
|
|
||
|
Descriptors SHOULD be embedded in other formats to securely reference external content.
|
||
|
|
||
|
Other formats SHOULD use descriptors to securely reference external content.
|
||
|
|
||
|
This section defines the `application/vnd.oci.descriptor.v1+json` [media type](media-types.md).
|
||
|
|
||
|
## Properties
|
||
|
|
||
|
A descriptor consists of a set of properties encapsulated in key-value fields.
|
||
|
|
||
|
The following fields contain the primary properties that constitute a Descriptor:
|
||
|
|
||
|
- **`mediaType`** *string*
|
||
|
|
||
|
This REQUIRED property contains the media type of the referenced content.
|
||
|
Values MUST comply with [RFC 6838][rfc6838], including the [naming requirements in its section 4.2][rfc6838-s4.2].
|
||
|
|
||
|
The OCI image specification defines [several of its own MIME types](media-types.md) for resources defined in the specification.
|
||
|
|
||
|
- **`digest`** *string*
|
||
|
|
||
|
This REQUIRED property is the _digest_ of the targeted content, conforming to the requirements outlined in [Digests and Verification](#digests-and-verification).
|
||
|
Retrieved content SHOULD be verified against this digest when consumed via untrusted sources.
|
||
|
|
||
|
- **`size`** *int64*
|
||
|
|
||
|
This REQUIRED property specifies the size, in bytes, of the raw content.
|
||
|
This property exists so that a client will have an expected size for the content before processing.
|
||
|
If the length of the retrieved content does not match the specified length, the content SHOULD NOT be trusted.
|
||
|
|
||
|
- **`urls`** *array of strings*
|
||
|
|
||
|
This OPTIONAL property specifies a list of URIs from which this object MAY be downloaded.
|
||
|
Each entry MUST conform to [RFC 3986][rfc3986].
|
||
|
Entries SHOULD use the `http` and `https` schemes, as defined in [RFC 7230][rfc7230-s2.7].
|
||
|
|
||
|
- **`annotations`** *string-string map*
|
||
|
|
||
|
This OPTIONAL property contains arbitrary metadata for this descriptor.
|
||
|
This OPTIONAL property MUST use the [annotation rules](annotations.md#rules).
|
||
|
|
||
|
### Reserved
|
||
|
|
||
|
The following field keys are reserved and MUST NOT be used by other specifications.
|
||
|
|
||
|
- **`data`** *string*
|
||
|
|
||
|
This key is RESERVED for future versions of the specification.
|
||
|
|
||
|
All other fields may be included in other OCI specifications.
|
||
|
Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification.
|
||
|
|
||
|
## Digests and Verification
|
||
|
|
||
|
The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
|
||
|
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
|
||
|
If the identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate the digest independently, and be certain that the correct content was obtained.
|
||
|
|
||
|
The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion and a _hex_ portion.
|
||
|
The algorithm identifies the methodology used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
|
||
|
|
||
|
The digest string MUST match the following grammar:
|
||
|
|
||
|
```
|
||
|
digest := algorithm ":" hex
|
||
|
algorithm := /[a-z0-9_+.-]+/
|
||
|
hex := /[a-f0-9]+/
|
||
|
```
|
||
|
|
||
|
Some example digest strings include the following:
|
||
|
|
||
|
digest | description |
|
||
|
----------------------------------------------------------------------------------|------------------------------------------------
|
||
|
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Common sha256 based digest |
|
||
|
|
||
|
Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest.
|
||
|
Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
|
||
|
Heavy processing before calculating a hash SHOULD be avoided.
|
||
|
Implementations MAY employ some canonicalization of the underlying content to ensure stable content identifiers.
|
||
|
|
||
|
### Algorithms
|
||
|
|
||
|
While the _algorithm_ component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
|
||
|
|
||
|
Let's use a simple example in pseudo-code to demonstrate a digest calculation:
|
||
|
A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`:
|
||
|
```
|
||
|
let ID(C) = Descriptor.digest
|
||
|
let C = <bytes>
|
||
|
let D = '<alg>:' + EncodeHex(H(C))
|
||
|
let verified = ID(C) == D
|
||
|
```
|
||
|
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
|
||
|
Content `C` is a string of bytes.
|
||
|
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_.
|
||
|
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
|
||
|
After verification, the following is true:
|
||
|
|
||
|
```
|
||
|
D == ID(C) == '<alg>:' + EncodeHex(H(C))
|
||
|
```
|
||
|
|
||
|
The _digest_ is confirmed as the content identifier by independently calculating the _digest_.
|
||
|
|
||
|
#### SHA-256
|
||
|
|
||
|
[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics.
|
||
|
Implementations MUST implement SHA-256 digest verification for use in descriptors.
|
||
|
|
||
|
## Examples
|
||
|
|
||
|
The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes:
|
||
|
|
||
|
```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson
|
||
|
{
|
||
|
"mediaType": "application/vnd.oci.image.manifest.v1+json",
|
||
|
"size": 7682,
|
||
|
"digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270"
|
||
|
}
|
||
|
```
|
||
|
|
||
|
In the following example, the descriptor indicates that the referenced manifest is retrievable from a particular URL:
|
||
|
|
||
|
```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson
|
||
|
{
|
||
|
"mediaType": "application/vnd.oci.image.manifest.v1+json",
|
||
|
"size": 7682,
|
||
|
"digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270",
|
||
|
"urls": [
|
||
|
"https://example.com/example-manifest"
|
||
|
]
|
||
|
}
|
||
|
```
|
||
|
|
||
|
[rfc3986]: https://tools.ietf.org/html/rfc3986
|
||
|
[rfc6838]: https://tools.ietf.org/html/rfc6838
|
||
|
[rfc6838-s4.2]: https://tools.ietf.org/html/rfc6838#section-4.2
|
||
|
[rfc7230-s2.7]: https://tools.ietf.org/html/rfc7230#section-2.7
|