Commit graph

129 commits

Author SHA1 Message Date
Mrunal Patel
e49dd34657 Add support for container pids limit
We add a daemon level setting and will add a container
override once it is supported in CRI.

Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-07-11 14:59:52 -07:00
Mrunal Patel
d40883d88c container: Use ImageVolumes setting at container creation
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-07-10 13:46:14 -07:00
Andrew Pilloud
1a01ca7251 server: inherit rlimits from server
Signed-off-by: Andrew Pilloud <andrewpilloud@igneoussystems.com>
2017-07-03 14:49:34 -07:00
Mrunal Patel
975347b874 container: Add containerID to annotations for the container
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-23 09:31:13 -07:00
Antonio Murdaca
6035cff9e4
server: standardize on naming
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-06-22 11:55:03 +02:00
Mrunal Patel
bd40bbc30b Add missing error checks and simplify bool check
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-16 15:49:16 -07:00
Samuel Ortiz
0e51bbb778 oci: Support mixing trusted and untrusted workloads
Container runtimes provide different levels of isolation, from kernel
namespaces to hardware virtualization. When starting a specific
container, one may want to decide which level of isolation to use
depending on how much we trust the container workload. Fully verified
and signed containers may not need the hardware isolation layer but e.g.
CI jobs pulling packages from many untrusted sources should probably not
run only on a kernel namespace isolation layer.

Here we allow CRI-O users to define a container runtime for trusted
containers and another one for untrusted containers, and also to define
a general, default trust level. This anticipates future kubelet
implementations that would be able to tag containers as trusted or
untrusted. When missing a kubelet hint, containers are trusted by
default.

A container becomes untrusted if we get a hint in that direction from
kubelet or if the default trust level is set to "untrusted" and the
container is not privileged. In both cases CRI-O will try to use the
untrusted container runtime. For any other cases, it will switch to the
trusted one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-06-15 10:04:36 +02:00
Mrunal Patel
7b9032bac7 Merge pull request from alexlarsson/non-terminal-attach
Implement non-terminal attach
2017-06-14 21:45:44 -07:00
Alexander Larsson
7bb957bf75 Implement non-terminal attach
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.

This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.

This fixes https://github.com/kubernetes-incubator/cri-o/issues/569

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-14 22:59:50 +02:00
Antonio Murdaca
0dfec710f2
container_create: net files must be ro when rootfs is ro
we were blindly applying RO mount options but net addons like calico
modify those files.
This patch sets RO only when container's rootfs is RO, same behavior as
docker.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-06-14 15:31:34 +02:00
Antonio Murdaca
d2e1d559b7
container_create: just mkdir on image's volumes
tmpfs'es can override whatever there's on the container rootfs. We just
mkdir the volume as we're confident kube manages volumes in container.
We don't need any tmpfs nor any complex volume handling for now.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-06-14 15:31:31 +02:00
Antonio Murdaca
65d4ac8fc2
container_create: fix OCI processArgs assemblement
This patch fixes the following command:

kubectl run -i --tty centos --image=centos -- sh

The command above use to fail with:

/usr/bin/sh: /usr/bin/sh: cannot execute binary file

That's because we were wrongly assembling the OCI processArgs.

Thanks @alexlarsson for spotting this.

This patch basically replicates what docker does when merging container
config and image config. It also replicates how docker sets processArgs
for the OCI runtime.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-06-08 11:48:11 +02:00
Dan Walsh
4c48e13619 Need to be consistent in out nameing of Oci.
It should always be captitalized.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2017-06-05 15:11:03 -04:00
Samuel Ortiz
f15859c79f pkg/annotations: Export CRI-O annotations namespace
Some runtimes like Clear Containers need to interpret the CRI-O
annotations, to distinguish the infra container from the regular one.
Here we export those annotations and use a more standard dotted
namespace for them.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-06-01 23:45:44 +02:00
Antonio Murdaca
f3650533f0
create src dir for bind mounts
match docker behavior for bind mounts

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-06-01 17:37:20 +02:00
Samuel Ortiz
e23d986cf2 container: Do not restrict path access for privileged containers
Privileged containers should see and reach all host paths.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-05-31 02:06:47 +02:00
Antonio Murdaca
089cb88f17
server: container_create: make the spec hostspecific
node-e2e tests were failing in RHEL because, if running a privileged
container, we get all capability in the spec. The spec generator wasn't
filtering caps based on actual host caps, it was just adding _everything_.
This patch makes spec generator host specific.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-30 18:30:26 +02:00
Antonio Murdaca
b4f1cee2a2
server: store and use image's stop signal to stop containers
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-27 10:21:04 +02:00
Antonio Murdaca
02f3828283
server: workaround images with Config.Volumes
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-22 18:01:06 +02:00
Antonio Murdaca
da0b8a6157
server: store containers state on disk
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-18 21:19:50 +02:00
Antonio Murdaca
790c6d891a
server: store creation in containers
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-18 18:49:54 +02:00
Antonio Murdaca
1f4a4742cb
oci: add container directory to Container struct
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-18 18:49:54 +02:00
Mrunal Patel
7ea255fcea Merge pull request from rhatdan/rename
Rename ocid to crio
2017-05-15 11:27:28 -07:00
Dan Walsh
4493b6f176 Rename ocid to crio.
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name.  We want to elminate the name ocid from use.
Move fully to crio.

Also cric is being renamed to crioctl for the time being.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2017-05-12 09:56:06 -04:00
Antonio Murdaca
1d455a31a9
server: add RO and masked paths on container creation
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-12 12:47:40 +02:00
Mrunal Patel
23cf1a6fdb Add devices to OCI config
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-05-09 14:37:01 -07:00
Mrunal Patel
f7e5e24a05 Add helper for adding devices to OCI spec
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-05-09 14:36:55 -07:00
Antonio Murdaca
139b16bac2
server: fix set caps on container create
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-05 16:31:52 +02:00
Antonio Murdaca
275a5a1ff2
server: remove Update calls
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-27 14:01:37 +02:00
Samuel Ortiz
3b691d085c container: Bind mount hosts file for host networking containers
Fixes 

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-04-22 04:04:38 +02:00
f401adffa9
server: readable fields
`git grep -w images` or `git grep -w storage` needs to be more useful.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2017-04-20 08:22:50 -04:00
Mrunal Patel
24df2538db Update runtime-spec to v1.0.0.rc5
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-04-12 19:15:53 -07:00
Antonio Murdaca
09d2a6b519 Merge pull request from mrunalp/fix_nil_config
Check for case when image config isn't present
2017-04-05 12:42:58 +02:00
Mrunal Patel
aac24e1715 Check for case when image config isn't present
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-04-04 15:42:58 -07:00
Mrunal Patel
2b5dca3950 Merge pull request from runcom/fix-schema1-config
read image config from docker v2s1 manifests
2017-04-04 14:49:54 -07:00
Antonio Murdaca
3c7f3ab2ec Merge pull request from sameo/topic/fat-lock
Serialize Update and Sandbox/Container creation operations
2017-04-04 23:23:19 +02:00
Aleksa Sarai
7679a84c6d
server: issues.k8s.io/44043 workaround
Because kubelet will create broken symlinks for logPath it is necessary
to remove those symlinks before we attempt to write to them. This is a
temporary workaround while the issue is fixed upstream.

Ref: https://issues.k8s.io/44043
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-05 02:45:58 +10:00
Aleksa Sarai
c290c0d9c3
conmon: implement logging to logPath
This adds a very simple implementation of logging within conmon, where
every buffer read from the masterfd of the container is also written to
the log file (with errors during writing to the log file ignored).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-05 02:45:57 +10:00
Samuel Ortiz
be5084387c server: Serialize container/pod creation with updates
Interleaving asynchronous updates with pod or container creations can
lead to unrecoverable races and corruptions of the pod or container hash
tables. This is fixed by serializing update against pod or container
creation operations, while pod and container creation operations can
run in parallel.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-04-04 18:43:21 +02:00
Samuel Ortiz
c89cc876d2
server: Remove Image Config hack
Now that the image package has fixes to support docker images v2s1,
we can remove our buildOCIProcessARgs() hack for empty image configs
and simplify this routine.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-04 17:52:40 +02:00
Mrunal Patel
fd435256e7 Set default working directory to /
runc/runtime-spec doesn't allow empty working dir
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-31 14:04:16 -07:00
Mrunal Patel
fa467a30f1 Merge pull request from mrunalp/fix_hostname
Set the container hostname same as pod hostname
2017-03-31 07:35:49 -07:00
Mrunal Patel
c6897b5f62 Set the uid, gid and groups from container user
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-30 10:58:57 -07:00
Mrunal Patel
4ccc5bbe7c Set the container hostnames same as pod hostname
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-29 16:11:57 -07:00
Mrunal Patel
505bc2cbd5 Add function to lookup user in container /etc/{passwd,group}
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-29 11:18:35 -07:00
Mrunal Patel
f422235b3e Add function to safely open a file in container rootfs
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-29 11:16:53 -07:00
Mrunal Patel
8709f1b5bb Apply working dir and env from image config
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-03-27 16:41:43 -07:00
Samuel Ortiz
48a297ed7b container: Propagate the pod sandbox resolv.conf mount point
When a pod sandbox comes with DNS settings, the resulting resolv.conf
file needs to be bind mounted in all pod containers under
/etc/resolv.conf.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-03-24 15:32:16 +01:00
Antonio Murdaca
673b6e4c51 Merge pull request from sameo/topic/oci-process
server: Fix the OCI process arguments build routine
2017-03-24 09:07:00 +01:00
Daniel J Walsh
19620f3d1e Switch to using opencontainers/selinux
We have moved selinux support out of opencontainers/runc into its
own package.  This patch moves to using the new selinux go bindings.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2017-03-23 15:53:09 -04:00
Samuel Ortiz
21afa1a975 server: Fix the OCI process arguments build routine
We need to support a 2x2 matrix of use cases with both
kubelet giving us (command, args) slices and the OCI
image config file giving us (ENTRYPOINT, CMD) slices.

Here we always prioritize the kubelet information over
the OCI image one, and use the latter when the former
is incomplete.

Not that this routine will be slightly simpler when
issue  is fixed.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-03-23 12:59:26 +01:00
Samuel Ortiz
4ac92d73e4 container: Fix the OCI Process Args string build
The way we build the OCI Process Args slice is incorrect.
With the current implementation we may for example end up building this
slice with only the entry point arguments, if the kubelet passed
information is missing the Command slice.
We also will end up building the Args slice with the Image config
process arguments, without the defined entry point, if kubelet does not
tell us anything about the container process command to be run.

This patch fixes that by favoring the kubelet ContainerConfig
information. If that is missing, we try to complete it with the
container image information. We always use ContainerConfig.Command[] or
ImageConfig.EntryPoint[] as the first OCI Process Args slice entries.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-03-20 15:17:34 +01:00
Mrunal Patel
8c0ff7d904 Run conmon under cgroups (systemd)
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-06 15:08:46 -08:00
Samuel Ortiz
f7eee71792 server: Reduce createSandboxContainer complexity
By factorizing the bind mounts generation code.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-03-03 19:06:29 +01:00
Samuel Ortiz
2ec696be41 server: Set sandbox and container privileged flags
The sandbox privileged flag is set to true only if either the
pod configuration privileged flag is set to true or when any
of the pod namespaces are the host ones.

A container inherit its privileged flag from its sandbox, and
will be run by the privileged runtime only if it's set to true.
In other words, the privileged runtime (when defined) will be
when one of the below conditions is true:

- The sandbox will be asked to run at least one privileged container.
- The sandbox requires access to either the host IPC or networking
  namespaces.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-03-03 19:06:04 +01:00
Michał Żyłowski
5c81217e09 Applying k8s.io v3 API for ocic and ocid
Signed-off-by: Michał Żyłowski <michal.zylowski@intel.com>
2017-02-06 13:05:10 +01:00
Antonio Murdaca
2202c1a460
storage: fix image retrieval by id
kubelet sends a request to create a container with an image ID (as
opposed as an image name). That ID comes from the ImageStatus response.
This patch fixes that by setting the image ID as well as the image name
and fix the login to lookup for image ID as well.

Found while running `make test-e2e-node`.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-01-31 16:32:30 +01:00
Nalin Dahyabhai
c0333b102b Integrate containers/storage
Use containers/storage to store images, pod sandboxes, and containers.
A pod sandbox's infrastructure container has the same ID as the pod to
which it belongs, and all containers also keep track of their pod's ID.

The container configuration that we build using the data in a
CreateContainerRequest is stored in the container's ContainerDirectory
and ContainerRunDirectory.

We catch SIGTERM and SIGINT, and when we receive either, we gracefully
exit the grpc loop.  If we also think that there aren't any container
filesystems in use, we attempt to do a clean shutdown of the storage
driver.

The test harness now waits for ocid to exit before attempting to delete
the storage root directory.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2017-01-18 10:23:30 -05:00
Nalin Dahyabhai
caee4a99c9 Vendor containers/image and containers/storage
Vendor updated containers/image and containers/storage, along
with any new dependencies they drag in, and updated versions of other
dependencies that happen to get pulled in.

github.com/coreos/go-systemd/daemon/SdNotify() now takes a boolean to
control whether or not it unsets the NOTIFY_SOCKET variable from the
calling process's environment.  Adapt.

github.com/opencontainers/runtime-tools/generate/Generator.AddProcessEnv()
now takes the environment variable name and value as two arguments, not
one.  Adapt.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2017-01-18 10:21:59 -05:00
Mrunal Patel
6df58df215 Add support for systemd cgroups
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2016-12-19 16:31:29 -08:00
Harry Zhang
02dfe877e4 Add container to pod qos cgroup
Signed-off-by: Harry Zhang <harryz@hyper.sh>
2016-12-15 14:42:59 +08:00
Mrunal Patel
4cb5af00f6 Merge pull request from runcom/fix-commands
Read command from ContainerCreateRequest
2016-12-13 10:13:38 -08:00
Antonio Murdaca
f99c0a089c
Read command from ContainerCreateRequest
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-12-13 16:59:16 +01:00
Antonio Murdaca
4bb0830c37 Merge pull request from xlgao-zju/reload-apparmor-profile
reload default apparmor profile if it is unloaded
2016-12-13 11:10:26 +01:00
Samuel Ortiz
4cab8ed06a
sandbox: Use persistent networking namespace
Because they need to prepare the hypervisor networking interfaces
and have them match the ones created in the pod networking
namespace (typically to bridge TAP and veth interfaces), hypervisor
based container runtimes need the sandbox pod networking namespace
to be set up before it's created. They can then prepare and start
the hypervisor interfaces when creating the pod virtual machine.

In order to do so, we need to create per pod persitent networking
namespaces that we pass to the CNI plugin. This patch leverages
the CNI ns package to create such namespaces under /var/run/netns,
and assign them to all pod containers.
The persitent namespace is removed when either the pod is stopped
or removed.

Since the StopPodSandbox() API can be called multiple times from
kubelet, we track the pod networking namespace state (closed or
not) so that we don't get a containernetworking/ns package error
when calling its Close() routine multiple times as well.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-12-12 19:48:23 +01:00
Samuel Ortiz
70ede1a5fe
container: Store annotations under ocid/annotations
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-12-12 19:16:05 +01:00
Antonio Murdaca
430297dd81
store annotations and image for a container
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-12-12 11:12:03 +01:00
Xianglin Gao
ca7d5c77c2 Do not load ocid-default if configured apparmor profile is set up.
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-12 15:55:17 +08:00
Antonio Murdaca
67055e20bc
server: fix call to logrus.Warnf
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-12-10 19:21:52 +01:00
Mrunal Patel
be29524ba4 Add support for pod /dev/shm that is shared by the pod ctrs
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2016-12-08 15:32:17 -08:00
Xianglin Gao
cb5ed1ce9d reload default apparmor profile if it is unloaded
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-07 20:19:29 +08:00
Xianglin Gao
4f323377ee add apparmor build tag and update readme
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-06 11:51:15 +08:00
Xianglin Gao
26645c90ac Make the profile configurable
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-01 13:26:59 +08:00
Xianglin Gao
1f863846f5 add default apparmor profile
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-01 13:26:59 +08:00
Xianglin Gao
71b80591e3 support apparmor
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-12-01 13:26:59 +08:00
Samuel Ortiz
60123a77ce server: Export more container metadata for VM containers
VM base container runtimes (e.g. Clear Containers) will run each pod
in a VM and will create containers within that pod VM. Unfortunately
those runtimes will get called by ocid with the same commands
(create and start) for both the pause containers and subsequent
containers to be added to the pod namespace. Unless they work around
that by e.g. infering that a container which rootfs is under
"/pause" would represent a pod, they have no way to decide if they
need to create/start a VM or if they need to add a container to an
already running VM pod.

This patch tries to formalize this difference through pod
annotations. When starting a container or a sandbox, we now add 2
annotations for the container type (Infrastructure or not) and the
sandbox name. This will allow VM based container runtimes to handle
2 things:

- Decide if they need to create a pod VM or not.
- Keep track of which pod ID runs in a given VM, so that they
  know to which sandbox they have to add containers.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-11-29 10:24:33 +01:00
Antonio Murdaca
78ee03a8fc
add seccomp support
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-11-28 22:05:34 +01:00
Antonio Murdaca
70481bc5af
*: bump opencontainers/runtime-tools
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-11-24 12:26:18 +01:00
Antonio Murdaca
61bb04c87c
server: split containers actions
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-11-22 18:38:05 +01:00