design: add data-flow design document
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This commit is contained in:
parent
95c13af2f9
commit
a812199b07
1 changed files with 105 additions and 0 deletions
105
design/data-flow.md
Normal file
105
design/data-flow.md
Normal file
|
@ -0,0 +1,105 @@
|
|||
# Data Flow
|
||||
|
||||
In the past, container systems have hidden the complexity of pulling container
|
||||
images, hiding many details and complexity. This document intends to shed light
|
||||
on that complexity and detail how a "pull" operation will look from the
|
||||
perspective of a containerd user. We use the _bundle_ as the target object in
|
||||
this workflow, and walk back from there to describe the full process. In this
|
||||
context, we describe bothing pulling an image and creating a bundle from that
|
||||
image.
|
||||
|
||||
With containerd, we redefine the "pull" to comprise the same set of steps
|
||||
encompassed in prior container engines. In this model, an image defines a
|
||||
collection of resources that can be used to create a _bundle_. There is no
|
||||
specific format or object called an image. The goal of the pull is to produce a
|
||||
set of steps is to resolve the resources that comprise an image, with the
|
||||
separation providing lifecycle points in the process.
|
||||
|
||||
A reference implementation of the complete "pull", performed client-side, will
|
||||
be provided as part of containerd, but there may not be a single "pull" API
|
||||
call.
|
||||
|
||||
A rough diagram of the dataflow, along with the relevant components, is below.
|
||||
|
||||
![Data Flow](data-flow.png)
|
||||
|
||||
While the process proceeds left to right in the diagram, this document is
|
||||
written right to left. By working through this process backwards, we can best
|
||||
understand the approach employed by containerd.
|
||||
|
||||
## Running a Container
|
||||
|
||||
For containerd, we'd generally like to retrieve a _bundle_. This is the
|
||||
runtime, on-disk container layout, which includes the filesystem and
|
||||
configuration required to run the container.
|
||||
|
||||
Generically, speaking, we can say we have the following directory:
|
||||
|
||||
```
|
||||
config.json
|
||||
rootfs/
|
||||
```
|
||||
|
||||
The contents of `config.json` isn't interesting in this context, but for
|
||||
clarity, it may be the runc config or a containerd specific configuration file
|
||||
for setting up a running container. The `rootfs` is a directory where
|
||||
containerd will setup the runtime container's filesystem.
|
||||
|
||||
While containerd doesn't have the concept of an image, we can effectively build
|
||||
this structure from an image, as projected into containerd. Given this, we can
|
||||
say that are requirements for running a container are to do the following:
|
||||
|
||||
1. Convert the configuration from the container image into the target format
|
||||
for the containerd runtime.
|
||||
2. Reproduce the root filesystem from the container image. While we could
|
||||
unpack this into `rootfs` in the bundle, we can also just pass this as a set
|
||||
of mounts to the container configuration.
|
||||
|
||||
The above defines the framework in which we will operate. Put differently, we
|
||||
can say that we want to create a bundle by creating these two components of a
|
||||
bundle.
|
||||
|
||||
## Creating a Bundle
|
||||
|
||||
Now that we've defined what is required to run a container, a _bundle_, we need
|
||||
to create one.
|
||||
|
||||
Let's say we have the following:
|
||||
|
||||
```
|
||||
ctr run ubuntu
|
||||
```
|
||||
|
||||
This does no pulling of images. It only takes the name and creates a _bundle_.
|
||||
Broken down into steps, the process looks as follows:
|
||||
|
||||
1. Lookup the digest of the image in metadata store.
|
||||
2. Resolve the manifest in the content store.
|
||||
3. Resolve the layer snapshots in the snapshot subsystem.
|
||||
4. Transform the config into the target bundle format.
|
||||
5. Create a runtime snapshot for the rootfs of the container, including resolution of mounts.
|
||||
6. Run the container.
|
||||
|
||||
From this, we can understand the required resources to _pull_ an image:
|
||||
|
||||
1. An entry in the metadata store a name pointing at a particular digest.
|
||||
2. The manifest must be available in the content store.
|
||||
3. The result of successively applied layers must be available as a snapshot.
|
||||
|
||||
## Unpacking Layers
|
||||
|
||||
While this process may be pull or run driven, the idea is quite simple. For
|
||||
each layer, apply the result to a snapshot of the previous layer. The result
|
||||
should be stored under the chain id (as defined by OCI) of the resulting
|
||||
application.
|
||||
|
||||
## Pulling an Image
|
||||
|
||||
With all the above defined, pulling an image simply becomes the following:
|
||||
|
||||
1. Fetch the manifest for the image, verify and store it.
|
||||
2. Fetch each layer of the image manifest, verify and store them.
|
||||
3. Store the manifest digest under the provided name.
|
||||
|
||||
Note that we leave off using the name to resolve a particular location. We'll
|
||||
leave that for another doc!
|
Loading…
Reference in a new issue