design: add data-flow design document
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This commit is contained in:
parent
95c13af2f9
commit
a812199b07
1 changed files with 105 additions and 0 deletions
105
design/data-flow.md
Normal file
105
design/data-flow.md
Normal file
|
@ -0,0 +1,105 @@
|
||||||
|
# Data Flow
|
||||||
|
|
||||||
|
In the past, container systems have hidden the complexity of pulling container
|
||||||
|
images, hiding many details and complexity. This document intends to shed light
|
||||||
|
on that complexity and detail how a "pull" operation will look from the
|
||||||
|
perspective of a containerd user. We use the _bundle_ as the target object in
|
||||||
|
this workflow, and walk back from there to describe the full process. In this
|
||||||
|
context, we describe bothing pulling an image and creating a bundle from that
|
||||||
|
image.
|
||||||
|
|
||||||
|
With containerd, we redefine the "pull" to comprise the same set of steps
|
||||||
|
encompassed in prior container engines. In this model, an image defines a
|
||||||
|
collection of resources that can be used to create a _bundle_. There is no
|
||||||
|
specific format or object called an image. The goal of the pull is to produce a
|
||||||
|
set of steps is to resolve the resources that comprise an image, with the
|
||||||
|
separation providing lifecycle points in the process.
|
||||||
|
|
||||||
|
A reference implementation of the complete "pull", performed client-side, will
|
||||||
|
be provided as part of containerd, but there may not be a single "pull" API
|
||||||
|
call.
|
||||||
|
|
||||||
|
A rough diagram of the dataflow, along with the relevant components, is below.
|
||||||
|
|
||||||
|
![Data Flow](data-flow.png)
|
||||||
|
|
||||||
|
While the process proceeds left to right in the diagram, this document is
|
||||||
|
written right to left. By working through this process backwards, we can best
|
||||||
|
understand the approach employed by containerd.
|
||||||
|
|
||||||
|
## Running a Container
|
||||||
|
|
||||||
|
For containerd, we'd generally like to retrieve a _bundle_. This is the
|
||||||
|
runtime, on-disk container layout, which includes the filesystem and
|
||||||
|
configuration required to run the container.
|
||||||
|
|
||||||
|
Generically, speaking, we can say we have the following directory:
|
||||||
|
|
||||||
|
```
|
||||||
|
config.json
|
||||||
|
rootfs/
|
||||||
|
```
|
||||||
|
|
||||||
|
The contents of `config.json` isn't interesting in this context, but for
|
||||||
|
clarity, it may be the runc config or a containerd specific configuration file
|
||||||
|
for setting up a running container. The `rootfs` is a directory where
|
||||||
|
containerd will setup the runtime container's filesystem.
|
||||||
|
|
||||||
|
While containerd doesn't have the concept of an image, we can effectively build
|
||||||
|
this structure from an image, as projected into containerd. Given this, we can
|
||||||
|
say that are requirements for running a container are to do the following:
|
||||||
|
|
||||||
|
1. Convert the configuration from the container image into the target format
|
||||||
|
for the containerd runtime.
|
||||||
|
2. Reproduce the root filesystem from the container image. While we could
|
||||||
|
unpack this into `rootfs` in the bundle, we can also just pass this as a set
|
||||||
|
of mounts to the container configuration.
|
||||||
|
|
||||||
|
The above defines the framework in which we will operate. Put differently, we
|
||||||
|
can say that we want to create a bundle by creating these two components of a
|
||||||
|
bundle.
|
||||||
|
|
||||||
|
## Creating a Bundle
|
||||||
|
|
||||||
|
Now that we've defined what is required to run a container, a _bundle_, we need
|
||||||
|
to create one.
|
||||||
|
|
||||||
|
Let's say we have the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctr run ubuntu
|
||||||
|
```
|
||||||
|
|
||||||
|
This does no pulling of images. It only takes the name and creates a _bundle_.
|
||||||
|
Broken down into steps, the process looks as follows:
|
||||||
|
|
||||||
|
1. Lookup the digest of the image in metadata store.
|
||||||
|
2. Resolve the manifest in the content store.
|
||||||
|
3. Resolve the layer snapshots in the snapshot subsystem.
|
||||||
|
4. Transform the config into the target bundle format.
|
||||||
|
5. Create a runtime snapshot for the rootfs of the container, including resolution of mounts.
|
||||||
|
6. Run the container.
|
||||||
|
|
||||||
|
From this, we can understand the required resources to _pull_ an image:
|
||||||
|
|
||||||
|
1. An entry in the metadata store a name pointing at a particular digest.
|
||||||
|
2. The manifest must be available in the content store.
|
||||||
|
3. The result of successively applied layers must be available as a snapshot.
|
||||||
|
|
||||||
|
## Unpacking Layers
|
||||||
|
|
||||||
|
While this process may be pull or run driven, the idea is quite simple. For
|
||||||
|
each layer, apply the result to a snapshot of the previous layer. The result
|
||||||
|
should be stored under the chain id (as defined by OCI) of the resulting
|
||||||
|
application.
|
||||||
|
|
||||||
|
## Pulling an Image
|
||||||
|
|
||||||
|
With all the above defined, pulling an image simply becomes the following:
|
||||||
|
|
||||||
|
1. Fetch the manifest for the image, verify and store it.
|
||||||
|
2. Fetch each layer of the image manifest, verify and store them.
|
||||||
|
3. Store the manifest digest under the provided name.
|
||||||
|
|
||||||
|
Note that we leave off using the name to resolve a particular location. We'll
|
||||||
|
leave that for another doc!
|
Loading…
Reference in a new issue