design: add design document for snapshots

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2016-11-28 18:51:44 -08:00 · 2016-11-28 18:51:44 -08:00 · afb175d76d
commit afb175d76d
parent fc577a1cbb
1 changed files with 183 additions and 0 deletions
--- a/design/snapshots.md
+++ b/design/snapshots.md
@ -0,0 +1,183 @@
+# Snapshots
+
+Docker containers, from the beginning, have long been built on a snapshotting
+methodology known as _layers_. _Layers_ provide the ability to fork a
+filesystem, make changes then save the changeset back to a new layer.
+
+Historically, these have been tightly integrated into the Docker daemon as a
+component called the `graphdriver`. The `graphdriver` allows one to run the
+docker daemon on several different operating systems while still maintaining
+roughly similar snapshot semantics for committing and distribution changes to
+images.
+
+The `graphdriver` is deeply integrated with the import and export of images,
+including managing layer relationships and container runtime filesystems. The
+behavior of the `graphdriver` informs the transport of image formats.
+
+In this document, we propose a more flexible model for managing layers. It
+focuses on providing an API for the base snapshotting functionality without
+coupling so tightly to the structure of images and their identification. The
+minimal API simplifies behavior without sacrificing power. This makes the
+surface area for driver implementations smaller, ensuring that behavior is more
+consistent between implementations.
+
+These differ from the concept of the graphdriver in that the LayerManipulator
+has no knowledge of images or containers. Users simply prepare and commit
+directories. We also avoid the integration between graph drivers and the tar
+format used to represent the changesets.
+
+The best aspect is that we can get to this model by refactoring the existing
+graphdrivers, minimizing the need to new code and sprawling tests.
+
+## Scope
+
+In the past, the `graphdriver` component has provided quite a lot of
+funcionality in Docker. This includes serialization, hashing, unpacking,
+packing, mounting.
+
+This _snapshot manager_ will only provide mount-oriented oriented snapshot
+access with minimal metadata. Serialization, hashing, unpacking, packing and
+mounting are not included in this design, opting for common implementations
+between graphdrivers, rather than specialized ones. This is less of a problem
+for performance, since direct access to changesets is provided in the
+interface.
+
+## Architecture
+
+The _Snapshot Manager_ provides an API for allocating, snapshotting and mounting
+abstract, layer-based filesytems. The model works by building up sets of
+directories with parent-child relationships, known as _Snapshots_.
+
+Creating _snapshots_ is a transactional operation. Each _Snapshot_ may have a
+parent snapshot. When one starts a transaction on an existing snapshot, the
+result may only be used as a parent _after_ being committed.
+
+Every snapshot has an associated `diff` directory, which contains driver
+specific data. This may include parent information and changeset data,
+depending on the implementation. We define the empty string as the ancestor of
+all snapshots, which corresponds to the empty snapshot.
+
+The `target` directory represents the active snapshot location. The driver may
+maintain internal metadata associated with the `target` but the contents is
+generally manipulated by the client.
+
+### Operations
+
+The manifestation of _snapshots_ is facilitated by the _mount_ object and
+user-defined directories used for opaque data storage. When creating a new
+snapshot, the caller provides a directory where they would like the _snapshot_
+to be mounted, called the _target_. This operation returns a list of mounts
+that, if mounted, will have the fully prepared snapshot at the requested path.
+We call this the _prepare_ operation.
+
+Once a path is _prepared_ and mounted, the caller may write new data to the
+snapshot. Depending on application, a user may want to capture these changes or
+not.
+
+If the user wants to keep the changes, the _commit_ operation is employed.  The
+_commit_ operation takes the `target` directory, which represents an open
+transaction, and a `diff` directory. A successful result will end up with the
+difference between the parent and snapshot in the `diff` directory, which
+should be treated as opaque by the caller. This new `diff` directory can then
+be used as the `parent` in calls to future _prepare_ operations.
+
+If the user wants to discard the changes, the _rollback_ operation will release
+any resources associated with the snapshot. While rollback may a rare operation
+in other transactional systems, this is a common operation for containers.
+After removal, most containers will have _rollback_ called.
+
+For both _rollback_ and _commit_ the mounts provided by _prepare_ should be
+unmounted before calling these methods.
+
+### Graph metadata
+
+As snapshots are imported into the container system, a "graph" of snapshots and
+their parents will form. Queries over this graph must be a supported operation.
+Subsequently, each snapshot ends up representing 
+
+## How snapshots work
+
+To bring the terminology of _snapshots_, we are going to demonstrate the use of
+the _snapshot manager_ from perspective of importing layers. We'll use a Go API
+to represent the process.
+
+### Importing a Layer
+
+To import a layer, we simply have the _Snapshot Manager_ provide a list of
+mounts to be applied such that our dst will capture a changeset. We start
+out by getting a path to the layer tar file and creating a temp location to
+unpack it to:
+
+	layerPath, tmpLocation := getLayerPath(), mkTmpDir() // just a path to layer tar file.
+
+Per the terminology above, `tmpLocation` is known as the `target`. `layerPath`
+is simply a tar file, representing a changset. We start by using
+`SnapshotManager` to prepare the temporary location as a snapshot point:
+
+	lm := SnapshotManager()
+	mounts, err := lm.Prepare(tmpLocation, "")
+	if err != nil { ... }
+
+Note that we provide "" as the `parent`, since we are applying the diff to an
+empty directory. We get back a list of mounts from `SnapshotManager.Prepare`.
+Before proceeding, we perform all these mounts:
+
+	if err := MountAll(mounts); err != nil { ... }
+
+Once the mounts are performed, our temporary location is ready to capture
+a diff. In practice, this works similar to a filesystem transaction. The
+next step is to unpack the layer. We have a special function, `unpackLayer`
+that applies the contents of the layer to target location and calculates the
+DiffID of the unpacked layer (this is a requirement for docker
+implementation):
+
+	digest, err := unpackLayer(tmpLocation, layer) // unpack into layer location
+	if err != nil { ... }
+
+When the above completes, we should have a filesystem the represents the
+contents of the layer. Careful implementations should verify that digest
+matches the expected DiffID. When completed, we unmount the mounts:
+
+	unmount(mounts) // optional, for now
+
+Now that we've verified and unpacked our layer, we create a location to commit
+the actual diff. For this example, we are just going to use the layer `digest`,
+but in practice, this will probably be the `ChainID`:
+
+	diffPath := filepath.Join("/layers", digest) // name location for the uncompressed layer digest
+	if err := lm.Commit(diffPath, tmpLocation); err != nil { ... }
+
+The new layer has been imported as a _snapshot_ into the `SnapshotManager`
+under the name `diffPath`. `diffPath`, which is a user opaque directory
+location, can then be used as a parent in later snapshots.
+
+### Importing the Next Layer
+
+Making a layer depend on the above is identical to the process described
+above except that the parent is provided as diffPath when calling
+`Snapshot.Prepare`:
+
+	mounts, err := lm.Prepare(tmpLocation, parentDiffPath)
+
+Because have a provided a `parent`, the resulting `tmpLocation`, after
+mounting, will have the changes from above. Any new changes will be isolated to
+the snapshot `target`.
+
+We run the same unpacking process and commit as above to get the new `diff`.
+
+### Running a Container
+
+To run a container, we simply provide `SnapshotManager.Prepare` the `diff` of
+the image we want to start the container from. After mounting, the prepared
+path can be used directly as the container's filesystem:
+
+	mounts, err := lm.Prepare(containerRootFS, imageDiffPath)
+
+The returned mounts can then be passed directly to the container runtime. If
+one would like to create a new image from the filesystem,
+SnapshotManipulator.Commit is called:
+
+	if err := lm.Commit(newImageDiff, containerRootFS); err != nil { ... }
+
+Alternatively, for most container runs, Snapshot.Rollback will be
+called to signal `SnapshotManager` to abandon the changes.