Kubelet can send cap add/drop ALL. Handle that in CRI-O as well.
Also, this PR is re-vendoring runtime-tools to fix capabilities add to
add caps to _all_ caps set **and** fix a shared memory issue (caps set
were initialized with the same slice, if one modifies one slice, it's
reflected on the other slices, the vendoring fixes this as well)
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Add new directory /etc/crio/hooks.d, where packagers can drop a json config
file to specify a hook.
The json must specify a valid executable to run.
The json must also specify which stage(s) to run the hook:
prestart, poststart, poststop
The json must specify under which criteria the hook should be launched
If the container HasBindMounts
If the container cmd matches a list of regular expressions
If the containers annotations matches a list of regular expressions.
If any of these match the the hook will be launched.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We calculate these values at container creation time and store
them in the container object as they are requested during container
status. This avoids re-calculation and speeds up container status.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Containers running in kubernetes currently do not specify options
for mount propagation and whether to bind or rbind the mount point.
Since docker defaults to bind and rbind, we should match their
behavious, since this is what admins expect
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Need to mv to latest released and supported version of logrus
switch github.com/Sirupsen/logrus github.com/sirupsen/logrus
Also vendor in latest containers/storage and containers/image
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If sandbox is in the same package as server, there will be a circular dependency when
kpod create is implemented
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
Move non-kubernetes-dependent portions of server struct to libkpod.
So far, only the struct fields have been moved and not their dependent
functions
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall where possible (leave
syscall.SysProcAttr and syscall.Stat_t).
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes just by re-vendoring
golang.org/x/sys/unix instead of having to update to a new go version.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Container runtimes provide different levels of isolation, from kernel
namespaces to hardware virtualization. When starting a specific
container, one may want to decide which level of isolation to use
depending on how much we trust the container workload. Fully verified
and signed containers may not need the hardware isolation layer but e.g.
CI jobs pulling packages from many untrusted sources should probably not
run only on a kernel namespace isolation layer.
Here we allow CRI-O users to define a container runtime for trusted
containers and another one for untrusted containers, and also to define
a general, default trust level. This anticipates future kubelet
implementations that would be able to tag containers as trusted or
untrusted. When missing a kubelet hint, containers are trusted by
default.
A container becomes untrusted if we get a hint in that direction from
kubelet or if the default trust level is set to "untrusted" and the
container is not privileged. In both cases CRI-O will try to use the
untrusted container runtime. For any other cases, it will switch to the
trusted one.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.
This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.
This fixes https://github.com/kubernetes-incubator/cri-o/issues/569
Signed-off-by: Alexander Larsson <alexl@redhat.com>
we were blindly applying RO mount options but net addons like calico
modify those files.
This patch sets RO only when container's rootfs is RO, same behavior as
docker.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
tmpfs'es can override whatever there's on the container rootfs. We just
mkdir the volume as we're confident kube manages volumes in container.
We don't need any tmpfs nor any complex volume handling for now.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This patch fixes the following command:
kubectl run -i --tty centos --image=centos -- sh
The command above use to fail with:
/usr/bin/sh: /usr/bin/sh: cannot execute binary file
That's because we were wrongly assembling the OCI processArgs.
Thanks @alexlarsson for spotting this.
This patch basically replicates what docker does when merging container
config and image config. It also replicates how docker sets processArgs
for the OCI runtime.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Some runtimes like Clear Containers need to interpret the CRI-O
annotations, to distinguish the infra container from the regular one.
Here we export those annotations and use a more standard dotted
namespace for them.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
node-e2e tests were failing in RHEL because, if running a privileged
container, we get all capability in the spec. The spec generator wasn't
filtering caps based on actual host caps, it was just adding _everything_.
This patch makes spec generator host specific.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name. We want to elminate the name ocid from use.
Move fully to crio.
Also cric is being renamed to crioctl for the time being.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Because kubelet will create broken symlinks for logPath it is necessary
to remove those symlinks before we attempt to write to them. This is a
temporary workaround while the issue is fixed upstream.
Ref: https://issues.k8s.io/44043
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This adds a very simple implementation of logging within conmon, where
every buffer read from the masterfd of the container is also written to
the log file (with errors during writing to the log file ignored).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Interleaving asynchronous updates with pod or container creations can
lead to unrecoverable races and corruptions of the pod or container hash
tables. This is fixed by serializing update against pod or container
creation operations, while pod and container creation operations can
run in parallel.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that the image package has fixes to support docker images v2s1,
we can remove our buildOCIProcessARgs() hack for empty image configs
and simplify this routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
When a pod sandbox comes with DNS settings, the resulting resolv.conf
file needs to be bind mounted in all pod containers under
/etc/resolv.conf.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We have moved selinux support out of opencontainers/runc into its
own package. This patch moves to using the new selinux go bindings.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We need to support a 2x2 matrix of use cases with both
kubelet giving us (command, args) slices and the OCI
image config file giving us (ENTRYPOINT, CMD) slices.
Here we always prioritize the kubelet information over
the OCI image one, and use the latter when the former
is incomplete.
Not that this routine will be slightly simpler when
issue #395 is fixed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The way we build the OCI Process Args slice is incorrect.
With the current implementation we may for example end up building this
slice with only the entry point arguments, if the kubelet passed
information is missing the Command slice.
We also will end up building the Args slice with the Image config
process arguments, without the defined entry point, if kubelet does not
tell us anything about the container process command to be run.
This patch fixes that by favoring the kubelet ContainerConfig
information. If that is missing, we try to complete it with the
container image information. We always use ContainerConfig.Command[] or
ImageConfig.EntryPoint[] as the first OCI Process Args slice entries.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The sandbox privileged flag is set to true only if either the
pod configuration privileged flag is set to true or when any
of the pod namespaces are the host ones.
A container inherit its privileged flag from its sandbox, and
will be run by the privileged runtime only if it's set to true.
In other words, the privileged runtime (when defined) will be
when one of the below conditions is true:
- The sandbox will be asked to run at least one privileged container.
- The sandbox requires access to either the host IPC or networking
namespaces.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
kubelet sends a request to create a container with an image ID (as
opposed as an image name). That ID comes from the ImageStatus response.
This patch fixes that by setting the image ID as well as the image name
and fix the login to lookup for image ID as well.
Found while running `make test-e2e-node`.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Use containers/storage to store images, pod sandboxes, and containers.
A pod sandbox's infrastructure container has the same ID as the pod to
which it belongs, and all containers also keep track of their pod's ID.
The container configuration that we build using the data in a
CreateContainerRequest is stored in the container's ContainerDirectory
and ContainerRunDirectory.
We catch SIGTERM and SIGINT, and when we receive either, we gracefully
exit the grpc loop. If we also think that there aren't any container
filesystems in use, we attempt to do a clean shutdown of the storage
driver.
The test harness now waits for ocid to exit before attempting to delete
the storage root directory.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Vendor updated containers/image and containers/storage, along
with any new dependencies they drag in, and updated versions of other
dependencies that happen to get pulled in.
github.com/coreos/go-systemd/daemon/SdNotify() now takes a boolean to
control whether or not it unsets the NOTIFY_SOCKET variable from the
calling process's environment. Adapt.
github.com/opencontainers/runtime-tools/generate/Generator.AddProcessEnv()
now takes the environment variable name and value as two arguments, not
one. Adapt.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Because they need to prepare the hypervisor networking interfaces
and have them match the ones created in the pod networking
namespace (typically to bridge TAP and veth interfaces), hypervisor
based container runtimes need the sandbox pod networking namespace
to be set up before it's created. They can then prepare and start
the hypervisor interfaces when creating the pod virtual machine.
In order to do so, we need to create per pod persitent networking
namespaces that we pass to the CNI plugin. This patch leverages
the CNI ns package to create such namespaces under /var/run/netns,
and assign them to all pod containers.
The persitent namespace is removed when either the pod is stopped
or removed.
Since the StopPodSandbox() API can be called multiple times from
kubelet, we track the pod networking namespace state (closed or
not) so that we don't get a containernetworking/ns package error
when calling its Close() routine multiple times as well.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>