Development Report for Feb 24, 2017

containerd Summit

Yesterday, Feb. 23, we had the containerd summit for contributors and maintainers. We started off by getting everyone up to speed on the project, roadmap, and goals before diving down into specific issues and design of containerd. We should have the videos up soon for the various sessions and Q&A.

Runtime Breakout Session

Shim

We discussed the use of the containerd shim, the costs that it adds, as well as its benefits. Overall, most of us agreed that the extra memory overhead is worth it for the feature set that it provides.

However, we did make a few changes to the shim in the master branch; a grpc API so that the shim is straightforward to interact with and much more robust in terms of error handling. We also have one shim per container vs the one shim per process, which was the current implementation.

There were a couple of additional changes we discussed. One being launching the shim inside the cgroup of the container so that any IO, CPU, and memory that the shim consumes is "charged" to the container. I'll expand a little more on why this matters in the next section when it comes to logging.

Logging

Logging is a big issue for containers and it’s not exactly straightforward. Since containers can be used in many different ways the support of logging and connecting to a containers interactively can add complexity to the runtime. Right now, the shim is responsible for handling the IO of a container and making sure that it is kept open in the event that containerd ( the daemon ) or clients above containerd crash and need to reconnect to containers that they launched.

You can use fifos for reconnecting to container logs but there is concerns about the buffer limit when nothing is reading from the fifo. If the fifo fills up, then the application in the container will block. In the end, most container systems log to files on disk but in various formats. The only part of the container's IO that needs to be a pipe, or fifo for reconnection is STDIN. STDOUT and STDERR can go directly to a file in most cases. However, logging the raw output for a container directly to a file is not ideal as both Docker and Kube have specific on disk formats.

So our initial idea that was discussed is to move some type of "log formatter" into the shim. This will allow a single container to have its logs written directly to disk from the shim without using fifos for OUT and ERR. The other problem that this would solve is when a single container has excessive logs. If a single daemon is writing all the logs to disk, you will see high CPU on the entire daemon. If we place the shim in the cgroup for the container, then only that container is affected when it starts to log excessively without affecting the performance of the entire system. The CPU and IO could be charged to the container.

We still have to work out how we will specify the log format in the shim. Go 1.8 plugins are not looking like a good solution for this. I’ll be opening an issue so that we can figure this out on the repo.

Multitenancy Metadata

A use case for containerd is that it’s a solid base for multiple container platforms. When you have a single containerd installed on a system but have Kube, Docker, and Swarm all using it, different operations such as listing of containers and access to content need to be scoped appropriately so that Docker does not return containers created by Kube and vice versa.

We were already looking to design a metadata store for clients and internal data of containerd and this seems like a good fit for this problem. In the breakout session we discussed having system details like content and containers stay in a flat namespace inside the containerd core but query and create operations by clients should be namespaced.

As simple example would be:

/docker/images/redis/3.2 /docker/containers/redis-master /kube/services/lb /kube/pods/redis/containers/logger /kube/pods/redis/containers/master

These are just simple examples and much of the format will be left up to the clients for storing metadata within the system. Clients should be able to query based on the namespace of keys as well as queries of / to see the entire system as a whole.

Plugins in Go 1.8

Looking at the latest Go 1.8 release plugins looked like a perfect solution for extending containerd by third parties. However, after discussions at the summit and with help from Tim to get clarification on the feature from the Go team it does not not look very promising in terms of implementation. We currently have the code for plugins via go 1.8 merged into master but we will have to rethink our extension method going forward.

https://github.com/containerd/containerd/issues/563

Storage and Distribution Breakout Session

Snapshots

During the image distribution breakout we first covered the current state of snapshots and where we expect to go next. The discussion started with how new snapshotters will be integrated using Go plugins or package importing rather than relying on grpc. Snapshots themselves are not currently exposed as a grpc service but there is some desire to have lower level access for debugging, content sideloading, or building. The exposed interface for snapshots beyond the existing pull/push/run workflows will be designed with this in mind. Additionally there was some discussion related to the possibility of having the mounter be pluggable and opening up snapshots to volumes use cases. The snapshot model was discussed in depth including how we will differentiate between terminal commits and online snapshots. Differentiating between the two could be important to optimize different use cases around building, importing, and backing up container changes.

Distribution

The next major topic of discussion was around distribution, primarily focused on the resolver and fetcher interfaces. We discussed what a locator will look for finding remote content as well as how they can be integrated with a git remote-like model. Stephen expressed his intent to keep the locator as opaque as possible, but to stay away from including protocol definitions in the locator value. Having a remotes table was proposed was proposed as a way to keep naming simple but allow defining more specific protocols. The current fetch interface was found to be unclear. It was proposed to remove the hints and integrate the media type checks and expectations into the resolver. Role of content store discussed for validating content and converging ingests of duplicate data.

Upcoming work

Next week we are still working towards and end to end PoC with the various subsystems of containerd working together behind the GRPC api. The API is still under development but the PoC should give people a good idea how the various subsystems will interact with one another.
We will also try to get issues up on the repository from discussions at the summit so we can continue the discussions during the summit with the broader community.
Rethinking an extension model. Extending containerd is important for many people. It makes sure that third parties do not have to commit code to the core to gain the functionality that they require. However, it’s not simple in Go and Go 1.8 plugins do not look like the solution. We need to work more on this.

7.3 KiB Raw Blame History Unescape Escape