Container runtimes provide different levels of isolation, from kernel
namespaces to hardware virtualization. When starting a specific
container, one may want to decide which level of isolation to use
depending on how much we trust the container workload. Fully verified
and signed containers may not need the hardware isolation layer but e.g.
CI jobs pulling packages from many untrusted sources should probably not
run only on a kernel namespace isolation layer.
Here we allow CRI-O users to define a container runtime for trusted
containers and another one for untrusted containers, and also to define
a general, default trust level. This anticipates future kubelet
implementations that would be able to tag containers as trusted or
untrusted. When missing a kubelet hint, containers are trusted by
default.
A container becomes untrusted if we get a hint in that direction from
kubelet or if the default trust level is set to "untrusted" and the
container is not privileged. In both cases CRI-O will try to use the
untrusted container runtime. For any other cases, it will switch to the
trusted one.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.
This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.
This fixes https://github.com/kubernetes-incubator/cri-o/issues/569
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The bug is silly if you have a master/node cluster where node is on a
different machine than the master.
The current behavior is to give our addresses like "0.0.0.0:10101". If
you run "kubectl exec ..." from another host, that's not going to work
since on a different host 0.0.0.0 resolves to localhost and kubectl
exec fails with:
error: unable to upgrade connection: 404 page not found
This patch fixes the above by giving our correct addresses for reaching
from outside.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Some runtimes like Clear Containers need to interpret the CRI-O
annotations, to distinguish the infra container from the regular one.
Here we export those annotations and use a more standard dotted
namespace for them.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
During "Port forwarding" e2e tests, the following panic happened:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x64981d]
goroutine 52788 [running]:
panic(0x1830ee0, 0xc4200100c0)
/usr/lib/golang/src/runtime/panic.go:500 +0x1a1
github.com/kubernetes-incubator/cri-o/oci.(*Runtime).UpdateStatus(0xc4202afc00,
0x0, 0x0, 0x0)
/home/amurdaca/go/src/github.com/kubernetes-incubator/cri-o/oci/oci.go:549
+0x7d
github.com/kubernetes-incubator/cri-o/server.streamService.PortForward(0xc42026e000,
0x0, 0x0, 0x0, 0x0, 0xc420d9af40, 0x40, 0xc400000050, 0x7fe660659a28,
0xc4201cd0e0, ...)
```
The issue is `streamService.PortForward` assumed the first argument to
be the sandbox's infra container ID, thus trying to get it from memory
store using `.state.containers.Get`. Since that ID is of the sandbox
itself, it fails to get the container object from memory and panics in
`UpdateStatus`.
Fix it by looking for the sandbox's infra container ID starting from a
sandbox ID.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name. We want to elminate the name ocid from use.
Move fully to crio.
Also cric is being renamed to crioctl for the time being.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
When powering off the system, we want the ocid service, to shutdown
all containers running on the system so they can cleanup properly
This patch will cleanup all pods on poweroff.
The ocid-shutdown.service drops a file /var/run/ocid.shutdown when the system
is shutting down. The ocid-shutdown.service should only be executed at system
shutdown.
On bootup sequence should be
start ocid.service
start ocid-shutdown.service (This is a NO-OP)
On system shutdown
stop ocid-shutdown.service (Creates /var/run/ocid.shutdown)
stop ocid.service (Notices /var/run/ocid.service and stops all pods before exiting.)
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This adds a very simple implementation of logging within conmon, where
every buffer read from the masterfd of the container is also written to
the log file (with errors during writing to the log file ignored).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Interleaving asynchronous updates with pod or container creations can
lead to unrecoverable races and corruptions of the pod or container hash
tables. This is fixed by serializing update against pod or container
creation operations, while pod and container creation operations can
run in parallel.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When we get a pod with DNS settings, we need to build
a resolv.conf file and mount it in all pod containers.
In order to do that, we have to track the built resolv.conf
file and store/load it.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We have moved selinux support out of opencontainers/runc into its
own package. This patch moves to using the new selinux go bindings.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The sandbox privileged flag is set to true only if either the
pod configuration privileged flag is set to true or when any
of the pod namespaces are the host ones.
A container inherit its privileged flag from its sandbox, and
will be run by the privileged runtime only if it's set to true.
In other words, the privileged runtime (when defined) will be
when one of the below conditions is true:
- The sandbox will be asked to run at least one privileged container.
- The sandbox requires access to either the host IPC or networking
namespaces.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We add a privileged flag to the container and sandbox structures
and can now select the appropriate runtime path for any container
operations depending on that flag.
Here again, the default runtime will be used for non privileged
containers and for privileged ones in case there are no privileged
runtime defined.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Previously ocicni did not have support for setting the plugin directory.
Now that it has grown support for it, use it to actually respect the
setting a user has provided for ocid.network.* options.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
The CRI doesn't expect us to implicitly pull an image if it isn't
already present before we're asked to use it to create a container, and
the tests no longer depend on us doing so, either.
Limit the logic which attempts to pull an image, if it isn't present, to
only pulling the configured "pause" image, since our use of that image
for running pod sandboxes is an implementation detail that our clients
can't be expected to know or care about. Include the name of the image
that we didn't pull in the error we return when we don't pull one.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>