Because they need to prepare the hypervisor networking interfaces
and have them match the ones created in the pod networking
namespace (typically to bridge TAP and veth interfaces), hypervisor
based container runtimes need the sandbox pod networking namespace
to be set up before it's created. They can then prepare and start
the hypervisor interfaces when creating the pod virtual machine.
In order to do so, we need to create per pod persitent networking
namespaces that we pass to the CNI plugin. This patch leverages
the CNI ns package to create such namespaces under /var/run/netns,
and assign them to all pod containers.
The persitent namespace is removed when either the pod is stopped
or removed.
Since the StopPodSandbox() API can be called multiple times from
kubelet, we track the pod networking namespace state (closed or
not) so that we don't get a containernetworking/ns package error
when calling its Close() routine multiple times as well.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
VM base container runtimes (e.g. Clear Containers) will run each pod
in a VM and will create containers within that pod VM. Unfortunately
those runtimes will get called by ocid with the same commands
(create and start) for both the pause containers and subsequent
containers to be added to the pod namespace. Unless they work around
that by e.g. infering that a container which rootfs is under
"/pause" would represent a pod, they have no way to decide if they
need to create/start a VM or if they need to add a container to an
already running VM pod.
This patch tries to formalize this difference through pod
annotations. When starting a container or a sandbox, we now add 2
annotations for the container type (Infrastructure or not) and the
sandbox name. This will allow VM based container runtimes to handle
2 things:
- Decide if they need to create a pod VM or not.
- Keep track of which pod ID runs in a given VM, so that they
know to which sandbox they have to add containers.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>