Merge pull request #475 from stevvooe/dev-report-2017-01-27
reports: development report for 2017-01-27
This commit is contained in:
		
						commit
						9278a3310f
					
				
					 1 changed files with 256 additions and 0 deletions
				
			
		
							
								
								
									
										256
									
								
								reports/2017-01-27.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										256
									
								
								reports/2017-01-27.md
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,256 @@ | |||
| # Development Report for Jan 27, 2017 | ||||
| 
 | ||||
| This week we made a lot of progress on tools to work with local content storage | ||||
| and image distribution. These parts are critical in forming an end to end proof | ||||
| of concept, taking docker/oci images and turning them into bundles. | ||||
| 
 | ||||
| We also have defined a new GRPC protocol for interacting with the | ||||
| container-shim, which is used for robust container management. | ||||
| 
 | ||||
| ## Maintainers | ||||
| 
 | ||||
| * https://github.com/docker/containerd/pull/473 | ||||
| 
 | ||||
| Derek McGowan will be joining the containerd team as a maintainer. His | ||||
| extensive experience in graphdrivers and distribution will be invaluable to the | ||||
| containerd project. | ||||
| 
 | ||||
| ## Shim over GRPC | ||||
| 
 | ||||
| * https://github.com/docker/containerd/pull/462 | ||||
| 
 | ||||
| ``` | ||||
| NAME: | ||||
|    containerd-shim -  | ||||
|                     __        _                     __           __    _          | ||||
|   _________  ____  / /_____ _(_)___  ___  _________/ /     _____/ /_  (_)___ ___  | ||||
|  / ___/ __ \/ __ \/ __/ __ `/ / __ \/ _ \/ ___/ __  /_____/ ___/ __ \/ / __ `__ \ | ||||
| / /__/ /_/ / / / / /_/ /_/ / / / / /  __/ /  / /_/ /_____(__  ) / / / / / / / / / | ||||
| \___/\____/_/ /_/\__/\__,_/_/_/ /_/\___/_/   \__,_/     /____/_/ /_/_/_/ /_/ /_/  | ||||
|                                                                                   | ||||
| shim for container lifecycle and reconnection | ||||
| 
 | ||||
| 
 | ||||
| USAGE: | ||||
|    containerd-shim [global options] command [command options] [arguments...] | ||||
| 
 | ||||
| VERSION: | ||||
|    1.0.0 | ||||
| 
 | ||||
| COMMANDS: | ||||
|      help, h  Shows a list of commands or help for one command | ||||
| 
 | ||||
| GLOBAL OPTIONS: | ||||
|    --debug        enable debug output in logs | ||||
|    --help, -h     show help | ||||
|    --version, -v  print the version | ||||
| 
 | ||||
| ``` | ||||
| 
 | ||||
| This week we completed work on porting the shim over to GRPC.  This allows us | ||||
| to have a more robust way to interface with the shim.  It also allows us to | ||||
| have one shim per container where previously we had one shim per process.  This | ||||
| drastically reduces the memory usage for exec processes. | ||||
| 
 | ||||
| We also had a lot of code in the containerd core for syncing with the shims | ||||
| during execution.  This was because we needed ways to signal if the shim was | ||||
| running, the container was created or any errors on create and then starting | ||||
| the container's process.  Getting this right and syncing was hard and required | ||||
| a lot of code.  With the new flow it is just function calls via rpc. | ||||
| 
 | ||||
| ```proto | ||||
| service Shim { | ||||
| 	rpc Create(CreateRequest) returns (CreateResponse); | ||||
| 	rpc Start(StartRequest) returns (google.protobuf.Empty); | ||||
| 	rpc Delete(DeleteRequest) returns (DeleteResponse); | ||||
| 	rpc Exec(ExecRequest) returns (ExecResponse); | ||||
| 	rpc Pty(PtyRequest) returns (google.protobuf.Empty); | ||||
| 	rpc Events(EventsRequest) returns (stream Event); | ||||
| 	rpc State(StateRequest) returns (StateResponse); | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| With the GRPC service it allows us to decouple the shim's lifecycle from the | ||||
| containers, in the way that we get synchronous feedback if the container failed | ||||
| to create, start, or exec from shim errors. | ||||
| 
 | ||||
| The overhead for adding GRPC to the shim is actually less than the initial | ||||
| implementation.  We already had a few pipes that allowed you to control | ||||
| resizing of the pty master and exit events, now all replaced by one unix | ||||
| socket.  Unix sockets are cheap and fast and we reduce our open fd count with | ||||
| way by not relying on multiple fifos.   | ||||
| 
 | ||||
| We also added a subcommand to the `ctr` command for testing and interfacing | ||||
| with the shim.  You can interact with a shim directly via `ctr shim` and get | ||||
| events, start containers, start exec processes. | ||||
| 
 | ||||
| ## Distribution Tool | ||||
| 
 | ||||
| * https://github.com/docker/containerd/pull/452 | ||||
| * https://github.com/docker/containerd/pull/472 | ||||
| * https://github.com/docker/containerd/pull/474 | ||||
| 
 | ||||
| Last week, @stevvooe committed the first parts of the distribution tool. The main | ||||
| component provided there was the `dist fetch` command. This has been followed | ||||
| up by several other low-level commands that interact with content resolution | ||||
| and local storage that can be used together to work with parts of images. | ||||
| 
 | ||||
| With this change, we add the following commands to the dist tool: | ||||
|      | ||||
| - `ingest`: verify and accept content into storage | ||||
| - `active`: display active ingest processes | ||||
| - `list`: list content in storage | ||||
| - `path`: provide a path to a blob by digest | ||||
| - `delete`: remove a piece of content from storage | ||||
| - `apply`: apply a layer to a directory | ||||
| 
 | ||||
| When this is more solidified, we can roll these up into higher-level | ||||
| operations that can be orchestrated through the `dist` tool or via GRPC. | ||||
| 
 | ||||
| As part of the _Development Report_, we thought it was a good idea to show | ||||
| these tools in depth. Specifically, we can show going from an image locator to | ||||
| a root filesystem with the current suite of commands. | ||||
| 
 | ||||
| ### Fetching Image Resources | ||||
| 
 | ||||
| The first component added to the `dist` tool is the `fetch` command. It is a | ||||
| low-level command for fetching image resources, such as manifests and layers. | ||||
| It operates around the concept of `remotes`. Objects are fetched by providing a | ||||
| `locator` and an object identifier. The `locator`, roughly analogous to an | ||||
| image name or repository, is a schema-less URL. The following is an example of | ||||
| a `locator`: | ||||
| 
 | ||||
| ``` | ||||
| docker.io/library/redis | ||||
| ``` | ||||
| 
 | ||||
| When we say the `locator` is a "schema-less URL", we mean that it starts with | ||||
| hostname and has a path, representing some image repository. While the hostname | ||||
| may represent an actual location, we can pass it through arbitrary resolution | ||||
| systems to get the actual location. In that sense, it acts like a namespace. | ||||
| 
 | ||||
| In practice, the `locator` can be used to resolve a `remote`. Object | ||||
| identifiers are then passed to this remote, along with hints, which are then | ||||
| mapped to the specific protocol and retrieved.  By dispatching on this common | ||||
| identifier, we should be able to support almost any protocol and discovery | ||||
| mechanism imaginable. | ||||
| 
 | ||||
| The actual `fetch` command currently provides anonymous access to Docker Hub | ||||
| images, keyed by the `locator` namespace `docker.io`. With a `locator`, | ||||
| `identifier` and `hint`, the correct protocol and endpoints are resolved and the | ||||
| resource is printed to stdout. As an example, one can fetch the manifest for | ||||
| `redis` with the following command: | ||||
|      | ||||
| ``` | ||||
| $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | ||||
| ``` | ||||
| 
 | ||||
| Note that we have provided a mediatype "hint", nudging the fetch implementation | ||||
| to grab the correct endpoint. We can hash the output of that to fetch the same | ||||
| content by digest: | ||||
|      | ||||
| ``` | ||||
| $ ./dist fetch docker.io/library/redis sha256:$(./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | shasum -a256) | ||||
| ``` | ||||
|      | ||||
| The hint now elided on the outer command, since we have affixed the content to | ||||
| a particular hash. The above shows us effectively fetches by tag, then by hash | ||||
| to demonstrate the equivalence when interacting with a remote. | ||||
|   | ||||
| This is just the beginning. We should be able to centralize configuration | ||||
| around fetch to implement a number of distribution methodologies that have been | ||||
| challenging or impossible up to this point. | ||||
| 
 | ||||
| Keep reading to see how this is used with the other commands to fetch complete | ||||
| images. | ||||
| 
 | ||||
| ### Fetching all the layers of an image | ||||
| 
 | ||||
| If you are not yet entertained, let's bring `jq` and `xargs` into the mix for | ||||
| maximum fun. Our first task will be to collect the layers into a local content | ||||
| store with the `ingest` command. | ||||
| 
 | ||||
| The following incantation fetches the manifest and downloads each layer: | ||||
| 
 | ||||
|  ``` | ||||
| $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ | ||||
| 	jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}" | ||||
| ``` | ||||
| 
 | ||||
| The above fetches a manifest, pipes it to jq, which assembles a shell pipeline | ||||
| to ingest each layer into the content store. Because the transactions are keyed | ||||
| by their digest, concurrent downloads and downloads of repeated content are | ||||
| ignored. Each process is then executed parallel using xargs.  If you run the | ||||
| above command twice, it will not download the layers because those blobs are | ||||
| already present in the content store. | ||||
| 
 | ||||
| What about status? Let's first remove our content so we can monitor a download. | ||||
| `dist list` can be combined with xargs and `dist delete` to remove that | ||||
| content: | ||||
| 
 | ||||
| ``` | ||||
| $ ./dist list -q | xargs ./dist delete | ||||
| ``` | ||||
| 
 | ||||
| In a separate shell session, could monitor the active downloads with the following: | ||||
|      | ||||
| ``` | ||||
| $ watch -n0.2 ./dist active | ||||
| ``` | ||||
|      | ||||
| For now, the content is downloaded into `.content` in the current working | ||||
| directory. To watch the contents of this directory, you can use the following: | ||||
|      | ||||
| ``` | ||||
| $ watch -n0.2 tree .content | ||||
| ``` | ||||
| 
 | ||||
| Now, run the fetch pipeline from above. You'll see the active downloads, keyed | ||||
| by locator and object, as well as the ingest transactions resulting blobs | ||||
| becoming available in the content store. This will help to understand what is | ||||
| going on internally. | ||||
|   | ||||
| ### Getting to a rootfs | ||||
| 
 | ||||
| While we haven't yet integrated full snapshot support for layer application, we | ||||
| can use the `dist apply` command to start building out rootfs for inspection | ||||
| and testing. We'll build up a similar pipeline to unpack the layers and get an | ||||
| actual image rootfs. | ||||
| 
 | ||||
| To get access to the layers, you can use the path command:  | ||||
| 
 | ||||
| ``` | ||||
| $./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa | ||||
| sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/docker/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa | ||||
| ``` | ||||
| 
 | ||||
| This returns the a direct path to the blob to facilitate fast access. We can | ||||
| incorporate this into the `apply` command to get to a rootfs for `redis`: | ||||
|      | ||||
| ``` | ||||
| $ mkdir redis-rootfs | ||||
| $ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \ | ||||
| 	jq -r '.layers[] | "sudo ./dist apply ./redis-rootfs < $(./dist path -q "+.digest+")"' | xargs -I{} -n1 sh -c "{}" | ||||
| ``` | ||||
| 
 | ||||
| The above fetches the manifest, then passes each layer into the `dist apply` | ||||
| command, resulting in the full redis container root filesystem. We do not do | ||||
| this in parallel, since each layer must be applied sequentially. Also, note | ||||
| that we have to run `apply` with `sudo`, since the layers typically have | ||||
| resources with root ownership. | ||||
| 
 | ||||
| Alternatively, you can just read the manifest from the content store, rather | ||||
| than fetching it. We use fetch above to avoid having to lookup the manifest | ||||
| digest for our demo. | ||||
| 
 | ||||
| Note that this is mostly a POC. This tool has a long way to go. Things like | ||||
| failed downloads and abandoned download cleanup aren't quite handled. We'll | ||||
| probably make adjustments around how content store transactions are handled to | ||||
| address this. We still need to incorporate snapshotting, as well as the ability | ||||
| to calculate the `ChainID` under subsequent unpacking. Once we have some tools | ||||
| to play around with snapshotting, we'll be able to incorporate our | ||||
| `rootfs.ApplyLayer` algorithm that will get us a lot closer to a production | ||||
| worthy system. | ||||
|     | ||||
| From here, we'll build out full image pull and create tooling to get runtime | ||||
| bundles from the fetched content. | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue