design: add data-flow design document
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This commit is contained in:
		
							parent
							
								
									95c13af2f9
								
							
						
					
					
						commit
						a812199b07
					
				
					 1 changed files with 105 additions and 0 deletions
				
			
		
							
								
								
									
										105
									
								
								design/data-flow.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										105
									
								
								design/data-flow.md
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,105 @@ | |||
| # Data Flow | ||||
| 
 | ||||
| In the past, container systems have hidden the complexity of pulling container | ||||
| images, hiding many details and complexity. This document intends to shed light | ||||
| on that complexity and detail how a "pull" operation will look from the | ||||
| perspective of a containerd user. We use the _bundle_ as the target object in | ||||
| this workflow, and walk back from there to describe the full process. In this | ||||
| context, we describe bothing pulling an image and creating a bundle from that | ||||
| image. | ||||
| 
 | ||||
| With containerd, we redefine the "pull" to comprise the same set of steps | ||||
| encompassed in prior container engines. In this model, an image defines a | ||||
| collection of resources that can be used to create a _bundle_. There is no | ||||
| specific format or object called an image. The goal of the pull is to produce a | ||||
| set of steps is to resolve the resources that comprise an image, with the | ||||
| separation providing lifecycle points in the process.  | ||||
| 
 | ||||
| A reference implementation of the complete "pull", performed client-side, will | ||||
| be provided as part of containerd, but there may not be a single "pull" API | ||||
| call. | ||||
| 
 | ||||
| A rough diagram of the dataflow, along with the relevant components, is below. | ||||
| 
 | ||||
|  | ||||
| 
 | ||||
| While the process proceeds left to right in the diagram, this document is | ||||
| written right to left. By working through this process backwards, we can best | ||||
| understand the approach employed by containerd. | ||||
| 
 | ||||
| ## Running a Container | ||||
| 
 | ||||
| For containerd, we'd generally like to retrieve a _bundle_. This is the | ||||
| runtime, on-disk container layout, which includes the filesystem and | ||||
| configuration required to run the container. | ||||
| 
 | ||||
| Generically, speaking, we can say we have the following directory: | ||||
| 
 | ||||
| ``` | ||||
| config.json | ||||
| rootfs/ | ||||
| ``` | ||||
| 
 | ||||
| The contents of `config.json` isn't interesting in this context, but for | ||||
| clarity, it may be the runc config or a containerd specific configuration file | ||||
| for setting up a running container. The `rootfs` is a directory where | ||||
| containerd will setup the runtime container's filesystem. | ||||
| 
 | ||||
| While containerd doesn't have the concept of an image, we can effectively build | ||||
| this structure from an image, as projected into containerd. Given this, we can | ||||
| say that are requirements for running a container are to do the following: | ||||
| 
 | ||||
| 1. Convert the configuration from the container image into the target format | ||||
|    for the containerd runtime. | ||||
| 2. Reproduce the root filesystem from the container image. While we could | ||||
|    unpack this into `rootfs` in the bundle, we can also just pass this as a set | ||||
|    of mounts to the container configuration. | ||||
| 
 | ||||
| The above defines the framework in which we will operate. Put differently, we | ||||
| can say that we want to create a bundle by creating these two components of a | ||||
| bundle. | ||||
| 
 | ||||
| ## Creating a Bundle | ||||
| 
 | ||||
| Now that we've defined what is required to run a container, a _bundle_, we need | ||||
| to create one. | ||||
| 
 | ||||
| Let's say we have the following: | ||||
| 
 | ||||
| ``` | ||||
| ctr run ubuntu | ||||
| ``` | ||||
| 
 | ||||
| This does no pulling of images. It only takes the name and creates a _bundle_. | ||||
| Broken down into steps, the process looks as follows: | ||||
| 
 | ||||
| 1. Lookup the digest of the image in metadata store. | ||||
| 2. Resolve the manifest in the content store. | ||||
| 3. Resolve the layer snapshots in the snapshot subsystem. | ||||
| 4. Transform the config into the target bundle format. | ||||
| 5. Create a runtime snapshot for the rootfs of the container, including resolution of mounts. | ||||
| 6. Run the container. | ||||
| 
 | ||||
| From this, we can understand the required resources to _pull_ an image: | ||||
| 
 | ||||
| 1. An entry in the metadata store a name pointing at a particular digest. | ||||
| 2. The manifest must be available in the content store. | ||||
| 3. The result of successively applied layers must be available as a snapshot. | ||||
| 
 | ||||
| ## Unpacking Layers | ||||
| 
 | ||||
| While this process may be pull or run driven, the idea is quite simple. For | ||||
| each layer, apply the result to a snapshot of the previous layer. The result | ||||
| should be stored under the chain id (as defined by OCI) of the resulting | ||||
| application. | ||||
| 
 | ||||
| ## Pulling an Image | ||||
| 
 | ||||
| With all the above defined, pulling an image simply becomes the following: | ||||
| 
 | ||||
| 1. Fetch the manifest for the image, verify and store it. | ||||
| 2. Fetch each layer of the image manifest, verify and store them. | ||||
| 3. Store the manifest digest under the provided name. | ||||
| 
 | ||||
| Note that we leave off using the name to resolve a particular location. We'll | ||||
| leave that for another doc! | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue