diff --git a/reports/2017-01-27.md b/reports/2017-01-27.md new file mode 100644 index 0000000..c9438c3 --- /dev/null +++ b/reports/2017-01-27.md @@ -0,0 +1,256 @@ +# Development Report for Jan 27, 2017 + +This week we made a lot of progress on tools to work with local content storage +and image distribution. These parts are critical in forming an end to end proof +of concept, taking docker/oci images and turning them into bundles. + +We also have defined a new GRPC protocol for interacting with the +container-shim, which is used for robust container management. + +## Maintainers + +* https://github.com/docker/containerd/pull/473 + +Derek McGowan will be joining the containerd team as a maintainer. His +extensive experience in graphdrivers and distribution will be invaluable to the +containerd project. + +## Shim over GRPC + +* https://github.com/docker/containerd/pull/462 + +``` +NAME: + containerd-shim - + __ _ __ __ _ + _________ ____ / /_____ _(_)___ ___ _________/ / _____/ /_ (_)___ ___ + / ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __ /_____/ ___/ __ \/ / __ `__ \ +/ /__/ /_/ / / / / /_/ /_/ / / / / / __/ / / /_/ /_____(__ ) / / / / / / / / / +\___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/ \__,_/ /____/_/ /_/_/_/ /_/ /_/ + +shim for container lifecycle and reconnection + + +USAGE: + containerd-shim [global options] command [command options] [arguments...] + +VERSION: + 1.0.0 + +COMMANDS: + help, h Shows a list of commands or help for one command + +GLOBAL OPTIONS: + --debug enable debug output in logs + --help, -h show help + --version, -v print the version + +``` + +This week we completed work on porting the shim over to GRPC. This allows us +to have a more robust way to interface with the shim. It also allows us to +have one shim per container where previously we had one shim per process. This +drastically reduces the memory usage for exec processes. + +We also had a lot of code in the containerd core for syncing with the shims +during execution. This was because we needed ways to signal if the shim was +running, the container was created or any errors on create and then starting +the container's process. Getting this right and syncing was hard and required +a lot of code. With the new flow it is just function calls via rpc. + +```proto +service Shim { + rpc Create(CreateRequest) returns (CreateResponse); + rpc Start(StartRequest) returns (google.protobuf.Empty); + rpc Delete(DeleteRequest) returns (DeleteResponse); + rpc Exec(ExecRequest) returns (ExecResponse); + rpc Pty(PtyRequest) returns (google.protobuf.Empty); + rpc Events(EventsRequest) returns (stream Event); + rpc State(StateRequest) returns (StateResponse); +} +``` + +With the GRPC service it allows us to decouple the shim's lifecycle from the +containers, in the way that we get synchronous feedback if the container failed +to create, start, or exec from shim errors. + +The overhead for adding GRPC to the shim is actually less than the initial +implementation. We already had a few pipes that allowed you to control +resizing of the pty master and exit events, now all replaced by one unix +socket. Unix sockets are cheap and fast and we reduce our open fd count with +way by not relying on multiple fifos. + +We also added a subcommand to the `ctr` command for testing and interfacing +with the shim. You can interact with a shim directly via `ctr shim` and get +events, start containers, start exec processes. + +## Distribution Tool + +* https://github.com/docker/containerd/pull/452 +* https://github.com/docker/containerd/pull/472 +* https://github.com/docker/containerd/pull/474 + +Last week, @stevvooe committed the first parts of the distribution tool. The main +component provided there was the `dist fetch` command. This has been followed +up by several other low-level commands that interact with content resolution +and local storage that can be used together to work with parts of images. + +With this change, we add the following commands to the dist tool: + +- `ingest`: verify and accept content into storage +- `active`: display active ingest processes +- `list`: list content in storage +- `path`: provide a path to a blob by digest +- `delete`: remove a piece of content from storage +- `apply`: apply a layer to a directory + +When this is more solidified, we can roll these up into higher-level +operations that can be orchestrated through the `dist` tool or via GRPC. + +As part of the _Development Report_, we thought it was a good idea to show +these tools in depth. Specifically, we can show going from an image locator to +a root filesystem with the current suite of commands. + +### Fetching Image Resources + +The first component added to the `dist` tool is the `fetch` command. It is a +low-level command for fetching image resources, such as manifests and layers. +It operates around the concept of `remotes`. Objects are fetched by providing a +`locator` and an object identifier. The `locator`, roughly analogous to an +image name or repository, is a schema-less URL. The following is an example of +a `locator`: + +``` +docker.io/library/redis +``` + +When we say the `locator` is a "schema-less URL", we mean that it starts with +hostname and has a path, representing some image repository. While the hostname +may represent an actual location, we can pass it through arbitrary resolution +systems to get the actual location. In that sense, it acts like a namespace. + +In practice, the `locator` can be used to resolve a `remote`. Object +identifiers are then passed to this remote, along with hints, which are then +mapped to the specific protocol and retrieved. By dispatching on this common +identifier, we should be able to support almost any protocol and discovery +mechanism imaginable. + +The actual `fetch` command currently provides anonymous access to Docker Hub +images, keyed by the `locator` namespace `docker.io`. With a `locator`, +`identifier` and `hint`, the correct protocol and endpoints are resolved and the +resource is printed to stdout. As an example, one can fetch the manifest for +`redis` with the following command: + +``` +$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json +``` + +Note that we have provided a mediatype "hint", nudging the fetch implementation +to grab the correct endpoint. We can hash the output of that to fetch the same +content by digest: + +``` +$ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256) +``` + +The hint now elided on the outer command, since we have affixed the content to +a particular hash. The above shows us effectively fetches by tag, then by hash +to demonstrate the equivalence when interacting with a remote. + +This is just the beginning. We should be able to centralize configuration +around fetch to implement a number of distribution methodologies that have been +challenging or impossible up to this point. + +Keep reading to see how this is used with the other commands to fetch complete +images. + +### Fetching all the layers of an image + +If you are not yet entertained, let's bring `jq` and `xargs` into the mix for +maximum fun. Our first task will be to collect the layers into a local content +store with the `ingest` command. + +The following incantation fetches the manifest and downloads each layer: + + ``` +$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ + jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}" +``` + +The above fetches a manifest, pipes it to jq, which assembles a shell pipeline +to ingest each layer into the content store. Because the transactions are keyed +by their digest, concurrent downloads and downloads of repeated content are +ignored. Each process is then executed parallel using xargs. If you run the +above command twice, it will not download the layers because those blobs are +already present in the content store. + +What about status? Let's first remove our content so we can monitor a download. +`dist list` can be combined with xargs and `dist delete` to remove that +content: + +``` +$ ./dist list -q | xargs ./dist delete +``` + +In a separate shell session, could monitor the active downloads with the following: + +``` +$ watch -n0.2 ./dist active +``` + +For now, the content is downloaded into `.content` in the current working +directory. To watch the contents of this directory, you can use the following: + +``` +$ watch -n0.2 tree .content +``` + +Now, run the fetch pipeline from above. You'll see the active downloads, keyed +by locator and object, as well as the ingest transactions resulting blobs +becoming available in the content store. This will help to understand what is +going on internally. + +### Getting to a rootfs + +While we haven't yet integrated full snapshot support for layer application, we +can use the `dist apply` command to start building out rootfs for inspection +and testing. We'll build up a similar pipeline to unpack the layers and get an +actual image rootfs. + +To get access to the layers, you can use the path command: + +``` +$./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa +sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/docker/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa +``` + +This returns the a direct path to the blob to facilitate fast access. We can +incorporate this into the `apply` command to get to a rootfs for `redis`: + +``` +$ mkdir redis-rootfs +$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ + jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}" +``` + +The above fetches the manifest, then passes each layer into the `dist apply` +command, resulting in the full redis container root filesystem. We do not do +this in parallel, since each layer must be applied sequentially. Also, note +that we have to run `apply` with `sudo`, since the layers typically have +resources with root ownership. + +Alternatively, you can just read the manifest from the content store, rather +than fetching it. We use fetch above to avoid having to lookup the manifest +digest for our demo. + +Note that this is mostly a POC. This tool has a long way to go. Things like +failed downloads and abandoned download cleanup aren't quite handled. We'll +probably make adjustments around how content store transactions are handled to +address this. We still need to incorporate snapshotting, as well as the ability +to calculate the `ChainID` under subsequent unpacking. Once we have some tools +to play around with snapshotting, we'll be able to incorporate our +`rootfs.ApplyLayer` algorithm that will get us a lot closer to a production +worthy system. + +From here, we'll build out full image pull and create tooling to get runtime +bundles from the fetched content.