Merge pull request #475 from stevvooe/dev-report-2017-01-27
reports: development report for 2017-01-27
This commit is contained in:
commit
9278a3310f
1 changed files with 256 additions and 0 deletions
256
reports/2017-01-27.md
Normal file
256
reports/2017-01-27.md
Normal file
|
@ -0,0 +1,256 @@
|
|||
# Development Report for Jan 27, 2017
|
||||
|
||||
This week we made a lot of progress on tools to work with local content storage
|
||||
and image distribution. These parts are critical in forming an end to end proof
|
||||
of concept, taking docker/oci images and turning them into bundles.
|
||||
|
||||
We also have defined a new GRPC protocol for interacting with the
|
||||
container-shim, which is used for robust container management.
|
||||
|
||||
## Maintainers
|
||||
|
||||
* https://github.com/docker/containerd/pull/473
|
||||
|
||||
Derek McGowan will be joining the containerd team as a maintainer. His
|
||||
extensive experience in graphdrivers and distribution will be invaluable to the
|
||||
containerd project.
|
||||
|
||||
## Shim over GRPC
|
||||
|
||||
* https://github.com/docker/containerd/pull/462
|
||||
|
||||
```
|
||||
NAME:
|
||||
containerd-shim -
|
||||
__ _ __ __ _
|
||||
_________ ____ / /_____ _(_)___ ___ _________/ / _____/ /_ (_)___ ___
|
||||
/ ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __ /_____/ ___/ __ \/ / __ `__ \
|
||||
/ /__/ /_/ / / / / /_/ /_/ / / / / / __/ / / /_/ /_____(__ ) / / / / / / / / /
|
||||
\___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/ \__,_/ /____/_/ /_/_/_/ /_/ /_/
|
||||
|
||||
shim for container lifecycle and reconnection
|
||||
|
||||
|
||||
USAGE:
|
||||
containerd-shim [global options] command [command options] [arguments...]
|
||||
|
||||
VERSION:
|
||||
1.0.0
|
||||
|
||||
COMMANDS:
|
||||
help, h Shows a list of commands or help for one command
|
||||
|
||||
GLOBAL OPTIONS:
|
||||
--debug enable debug output in logs
|
||||
--help, -h show help
|
||||
--version, -v print the version
|
||||
|
||||
```
|
||||
|
||||
This week we completed work on porting the shim over to GRPC. This allows us
|
||||
to have a more robust way to interface with the shim. It also allows us to
|
||||
have one shim per container where previously we had one shim per process. This
|
||||
drastically reduces the memory usage for exec processes.
|
||||
|
||||
We also had a lot of code in the containerd core for syncing with the shims
|
||||
during execution. This was because we needed ways to signal if the shim was
|
||||
running, the container was created or any errors on create and then starting
|
||||
the container's process. Getting this right and syncing was hard and required
|
||||
a lot of code. With the new flow it is just function calls via rpc.
|
||||
|
||||
```proto
|
||||
service Shim {
|
||||
rpc Create(CreateRequest) returns (CreateResponse);
|
||||
rpc Start(StartRequest) returns (google.protobuf.Empty);
|
||||
rpc Delete(DeleteRequest) returns (DeleteResponse);
|
||||
rpc Exec(ExecRequest) returns (ExecResponse);
|
||||
rpc Pty(PtyRequest) returns (google.protobuf.Empty);
|
||||
rpc Events(EventsRequest) returns (stream Event);
|
||||
rpc State(StateRequest) returns (StateResponse);
|
||||
}
|
||||
```
|
||||
|
||||
With the GRPC service it allows us to decouple the shim's lifecycle from the
|
||||
containers, in the way that we get synchronous feedback if the container failed
|
||||
to create, start, or exec from shim errors.
|
||||
|
||||
The overhead for adding GRPC to the shim is actually less than the initial
|
||||
implementation. We already had a few pipes that allowed you to control
|
||||
resizing of the pty master and exit events, now all replaced by one unix
|
||||
socket. Unix sockets are cheap and fast and we reduce our open fd count with
|
||||
way by not relying on multiple fifos.
|
||||
|
||||
We also added a subcommand to the `ctr` command for testing and interfacing
|
||||
with the shim. You can interact with a shim directly via `ctr shim` and get
|
||||
events, start containers, start exec processes.
|
||||
|
||||
## Distribution Tool
|
||||
|
||||
* https://github.com/docker/containerd/pull/452
|
||||
* https://github.com/docker/containerd/pull/472
|
||||
* https://github.com/docker/containerd/pull/474
|
||||
|
||||
Last week, @stevvooe committed the first parts of the distribution tool. The main
|
||||
component provided there was the `dist fetch` command. This has been followed
|
||||
up by several other low-level commands that interact with content resolution
|
||||
and local storage that can be used together to work with parts of images.
|
||||
|
||||
With this change, we add the following commands to the dist tool:
|
||||
|
||||
- `ingest`: verify and accept content into storage
|
||||
- `active`: display active ingest processes
|
||||
- `list`: list content in storage
|
||||
- `path`: provide a path to a blob by digest
|
||||
- `delete`: remove a piece of content from storage
|
||||
- `apply`: apply a layer to a directory
|
||||
|
||||
When this is more solidified, we can roll these up into higher-level
|
||||
operations that can be orchestrated through the `dist` tool or via GRPC.
|
||||
|
||||
As part of the _Development Report_, we thought it was a good idea to show
|
||||
these tools in depth. Specifically, we can show going from an image locator to
|
||||
a root filesystem with the current suite of commands.
|
||||
|
||||
### Fetching Image Resources
|
||||
|
||||
The first component added to the `dist` tool is the `fetch` command. It is a
|
||||
low-level command for fetching image resources, such as manifests and layers.
|
||||
It operates around the concept of `remotes`. Objects are fetched by providing a
|
||||
`locator` and an object identifier. The `locator`, roughly analogous to an
|
||||
image name or repository, is a schema-less URL. The following is an example of
|
||||
a `locator`:
|
||||
|
||||
```
|
||||
docker.io/library/redis
|
||||
```
|
||||
|
||||
When we say the `locator` is a "schema-less URL", we mean that it starts with
|
||||
hostname and has a path, representing some image repository. While the hostname
|
||||
may represent an actual location, we can pass it through arbitrary resolution
|
||||
systems to get the actual location. In that sense, it acts like a namespace.
|
||||
|
||||
In practice, the `locator` can be used to resolve a `remote`. Object
|
||||
identifiers are then passed to this remote, along with hints, which are then
|
||||
mapped to the specific protocol and retrieved. By dispatching on this common
|
||||
identifier, we should be able to support almost any protocol and discovery
|
||||
mechanism imaginable.
|
||||
|
||||
The actual `fetch` command currently provides anonymous access to Docker Hub
|
||||
images, keyed by the `locator` namespace `docker.io`. With a `locator`,
|
||||
`identifier` and `hint`, the correct protocol and endpoints are resolved and the
|
||||
resource is printed to stdout. As an example, one can fetch the manifest for
|
||||
`redis` with the following command:
|
||||
|
||||
```
|
||||
$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json
|
||||
```
|
||||
|
||||
Note that we have provided a mediatype "hint", nudging the fetch implementation
|
||||
to grab the correct endpoint. We can hash the output of that to fetch the same
|
||||
content by digest:
|
||||
|
||||
```
|
||||
$ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256)
|
||||
```
|
||||
|
||||
The hint now elided on the outer command, since we have affixed the content to
|
||||
a particular hash. The above shows us effectively fetches by tag, then by hash
|
||||
to demonstrate the equivalence when interacting with a remote.
|
||||
|
||||
This is just the beginning. We should be able to centralize configuration
|
||||
around fetch to implement a number of distribution methodologies that have been
|
||||
challenging or impossible up to this point.
|
||||
|
||||
Keep reading to see how this is used with the other commands to fetch complete
|
||||
images.
|
||||
|
||||
### Fetching all the layers of an image
|
||||
|
||||
If you are not yet entertained, let's bring `jq` and `xargs` into the mix for
|
||||
maximum fun. Our first task will be to collect the layers into a local content
|
||||
store with the `ingest` command.
|
||||
|
||||
The following incantation fetches the manifest and downloads each layer:
|
||||
|
||||
```
|
||||
$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
|
||||
jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}"
|
||||
```
|
||||
|
||||
The above fetches a manifest, pipes it to jq, which assembles a shell pipeline
|
||||
to ingest each layer into the content store. Because the transactions are keyed
|
||||
by their digest, concurrent downloads and downloads of repeated content are
|
||||
ignored. Each process is then executed parallel using xargs. If you run the
|
||||
above command twice, it will not download the layers because those blobs are
|
||||
already present in the content store.
|
||||
|
||||
What about status? Let's first remove our content so we can monitor a download.
|
||||
`dist list` can be combined with xargs and `dist delete` to remove that
|
||||
content:
|
||||
|
||||
```
|
||||
$ ./dist list -q | xargs ./dist delete
|
||||
```
|
||||
|
||||
In a separate shell session, could monitor the active downloads with the following:
|
||||
|
||||
```
|
||||
$ watch -n0.2 ./dist active
|
||||
```
|
||||
|
||||
For now, the content is downloaded into `.content` in the current working
|
||||
directory. To watch the contents of this directory, you can use the following:
|
||||
|
||||
```
|
||||
$ watch -n0.2 tree .content
|
||||
```
|
||||
|
||||
Now, run the fetch pipeline from above. You'll see the active downloads, keyed
|
||||
by locator and object, as well as the ingest transactions resulting blobs
|
||||
becoming available in the content store. This will help to understand what is
|
||||
going on internally.
|
||||
|
||||
### Getting to a rootfs
|
||||
|
||||
While we haven't yet integrated full snapshot support for layer application, we
|
||||
can use the `dist apply` command to start building out rootfs for inspection
|
||||
and testing. We'll build up a similar pipeline to unpack the layers and get an
|
||||
actual image rootfs.
|
||||
|
||||
To get access to the layers, you can use the path command:
|
||||
|
||||
```
|
||||
$./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
|
||||
sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/docker/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
|
||||
```
|
||||
|
||||
This returns the a direct path to the blob to facilitate fast access. We can
|
||||
incorporate this into the `apply` command to get to a rootfs for `redis`:
|
||||
|
||||
```
|
||||
$ mkdir redis-rootfs
|
||||
$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
|
||||
jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}"
|
||||
```
|
||||
|
||||
The above fetches the manifest, then passes each layer into the `dist apply`
|
||||
command, resulting in the full redis container root filesystem. We do not do
|
||||
this in parallel, since each layer must be applied sequentially. Also, note
|
||||
that we have to run `apply` with `sudo`, since the layers typically have
|
||||
resources with root ownership.
|
||||
|
||||
Alternatively, you can just read the manifest from the content store, rather
|
||||
than fetching it. We use fetch above to avoid having to lookup the manifest
|
||||
digest for our demo.
|
||||
|
||||
Note that this is mostly a POC. This tool has a long way to go. Things like
|
||||
failed downloads and abandoned download cleanup aren't quite handled. We'll
|
||||
probably make adjustments around how content store transactions are handled to
|
||||
address this. We still need to incorporate snapshotting, as well as the ability
|
||||
to calculate the `ChainID` under subsequent unpacking. Once we have some tools
|
||||
to play around with snapshotting, we'll be able to incorporate our
|
||||
`rootfs.ApplyLayer` algorithm that will get us a lot closer to a production
|
||||
worthy system.
|
||||
|
||||
From here, we'll build out full image pull and create tooling to get runtime
|
||||
bundles from the fetched content.
|
Loading…
Reference in a new issue