Found out that during OpenShift testing, node was trying to remove
containers (probably in a bad state) and was failing the removal with
this kind of error:
E0828 13:19:46.082710 1235 kuberuntime_gc.go:127] Failed to remove
container
"e907f0f46b969e0dc83ca82c03ae7dd072cfe4155341e4521223d9fe3dec5afb": rpc
error: code = 2 desc = failed to remove container exit file
e907f0f46b969e0dc83ca82c03ae7dd072cfe4155341e4521223d9fe3dec5afb: remove
/var/run/crio/exits/e907f0f46b969e0dc83ca82c03ae7dd072cfe4155341e4521223d9fe3dec5afb:
no such file or directory
I believe it's ok to ignore this error as it may happen conmon will
fail early before exit file is written.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
SetMaxThreads from runtime/debug in Golang is called to set max threads
value to 90% of /proc/sys/kernel/threads-max
Should really help performance.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Also, we distinguish between container and a pod infra
container in the exit monitor as pod infra containers
aren't stored in the main container index.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
We calculate these values at container creation time and store
them in the container object as they are requested during container
status. This avoids re-calculation and speeds up container status.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
We get notified of container exits by inotify so we already
have updated status of the container in memory state.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Containers running in kubernetes currently do not specify options
for mount propagation and whether to bind or rbind the mount point.
Since docker defaults to bind and rbind, we should match their
behavious, since this is what admins expect
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This allows the container list API to return updated status
for exited container without having to call container status first.
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
Need to mv to latest released and supported version of logrus
switch github.com/Sirupsen/logrus github.com/sirupsen/logrus
Also vendor in latest containers/storage and containers/image
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The storage library uses github.com/pkg/errors to wrap errors that it
returns from many of its functions, so when passing them to
os.IsNotExist() or comparing them to specific errors defined in the
storage library, unwrap them using errors.Cause().
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Make getStore() take a config struct from which it pulls the store
options, then update the kpod commands so that they call getConfig()
and pass the config into getStore()
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
update libkpod's New() function to use a config struct, and update
server.New() to call into libkpod.New()
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
If sandbox is in the same package as server, there will be a circular dependency when
kpod create is implemented
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
Move container state data to libkpod, separate from the sandbox
data in server. However, the move was structured such that sandbox
data could easily be moved over into libkpod in the future
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
Move non-kubernetes-dependent portions of server struct to libkpod.
So far, only the struct fields have been moved and not their dependent
functions
Signed-off-by: Ryan Cole <rcyoalne@gmail.com>
Use unix.Prctl() instead of manually reimplementing it using
unix.RawSyscall. Also use unix.SECCOMP_MODE_FILTER instead of locally
defining it.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall where possible (leave
syscall.SysProcAttr and syscall.Stat_t).
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes just by re-vendoring
golang.org/x/sys/unix instead of having to update to a new go version.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This matches the current kube behavior. This will probably
be provided over the CRI at which point we won't have to
define a constant in cri-o code.
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
If we get a kubelet annotation about the sandbox trust level, we use it
to toggle our sandbox trust flag.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Container runtimes provide different levels of isolation, from kernel
namespaces to hardware virtualization. When starting a specific
container, one may want to decide which level of isolation to use
depending on how much we trust the container workload. Fully verified
and signed containers may not need the hardware isolation layer but e.g.
CI jobs pulling packages from many untrusted sources should probably not
run only on a kernel namespace isolation layer.
Here we allow CRI-O users to define a container runtime for trusted
containers and another one for untrusted containers, and also to define
a general, default trust level. This anticipates future kubelet
implementations that would be able to tag containers as trusted or
untrusted. When missing a kubelet hint, containers are trusted by
default.
A container becomes untrusted if we get a hint in that direction from
kubelet or if the default trust level is set to "untrusted" and the
container is not privileged. In both cases CRI-O will try to use the
untrusted container runtime. For any other cases, it will switch to the
trusted one.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.
This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.
This fixes https://github.com/kubernetes-incubator/cri-o/issues/569
Signed-off-by: Alexander Larsson <alexl@redhat.com>
we were blindly applying RO mount options but net addons like calico
modify those files.
This patch sets RO only when container's rootfs is RO, same behavior as
docker.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
tmpfs'es can override whatever there's on the container rootfs. We just
mkdir the volume as we're confident kube manages volumes in container.
We don't need any tmpfs nor any complex volume handling for now.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Vendor and use docker/pkg/pools.
pools are used to lower the number of memory allocations and reuse buffers when
processing large streams operations..
The use of pools.Copy avoids io.Copy's internal buffer allocation.
This commit replaces io.Copy with pools.Copy to avoid the allocation of
buffers in io.Copy.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This was cluttering the logs on my clusters. The log should be just in
debug mode as we do for every request/response flow.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This is an optimization of our image pull code path. It's basically
how docker handles pulls as well. Let's be smart and check the image in
pull code path as well.
This also matches docker behavior which first checks whether we're
allowed to actually pull an image before looking into local storage.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This patch fixes the following command:
kubectl run -i --tty centos --image=centos -- sh
The command above use to fail with:
/usr/bin/sh: /usr/bin/sh: cannot execute binary file
That's because we were wrongly assembling the OCI processArgs.
Thanks @alexlarsson for spotting this.
This patch basically replicates what docker does when merging container
config and image config. It also replicates how docker sets processArgs
for the OCI runtime.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
A goroutine is started to forward terminal resize requests
from the resize channel. Also, data is copied back/forth
between stdin, stdout, stderr streams and the attach socket
for the container.
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
The bug is silly if you have a master/node cluster where node is on a
different machine than the master.
The current behavior is to give our addresses like "0.0.0.0:10101". If
you run "kubectl exec ..." from another host, that's not going to work
since on a different host 0.0.0.0 resolves to localhost and kubectl
exec fails with:
error: unable to upgrade connection: 404 page not found
This patch fixes the above by giving our correct addresses for reaching
from outside.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Some runtimes like Clear Containers need to interpret the CRI-O
annotations, to distinguish the infra container from the regular one.
Here we export those annotations and use a more standard dotted
namespace for them.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
node-e2e tests were failing in RHEL because, if running a privileged
container, we get all capability in the spec. The spec generator wasn't
filtering caps based on actual host caps, it was just adding _everything_.
This patch makes spec generator host specific.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
`containerdID` is overridden in `s.ctrIDIndex.Get()`, if the ctr is not
found it's overridden by an empty string making the error return
totally unusable.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
During "Port forwarding" e2e tests, the following panic happened:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x64981d]
goroutine 52788 [running]:
panic(0x1830ee0, 0xc4200100c0)
/usr/lib/golang/src/runtime/panic.go:500 +0x1a1
github.com/kubernetes-incubator/cri-o/oci.(*Runtime).UpdateStatus(0xc4202afc00,
0x0, 0x0, 0x0)
/home/amurdaca/go/src/github.com/kubernetes-incubator/cri-o/oci/oci.go:549
+0x7d
github.com/kubernetes-incubator/cri-o/server.streamService.PortForward(0xc42026e000,
0x0, 0x0, 0x0, 0x0, 0xc420d9af40, 0x40, 0xc400000050, 0x7fe660659a28,
0xc4201cd0e0, ...)
```
The issue is `streamService.PortForward` assumed the first argument to
be the sandbox's infra container ID, thus trying to get it from memory
store using `.state.containers.Get`. Since that ID is of the sandbox
itself, it fails to get the container object from memory and panics in
`UpdateStatus`.
Fix it by looking for the sandbox's infra container ID starting from a
sandbox ID.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
If we create a container using the image ID like
771cd5947d5ea4bf8e8f4900dd357dbb67e7b16486c270f8274087d182d457c6, then
a call to container_status will return that same ID for the "Image"
field in ContainerStatusResponse.
This patch matches dockershim behavior and return the first tagged name
if available from the image store.
This is also needed to fix a failure in k8s e2d tests.
Reference:
https://github.com/kubernetes/kubernetes/pull/39298/files#diff-c7dd39479fd733354254e70845075db5R369
Reference:
67a5bf8454/test/e2e/framework/util.go (L1941)
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name. We want to elminate the name ocid from use.
Move fully to crio.
Also cric is being renamed to crioctl for the time being.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Two issues:
1) pod Namespace was always set to "", which prevents plugins from figuring out
what the actual pod is, and from getting more info about that pod from the
runtime via out-of-band mechanisms
2) the pod Name and ID arguments were switched, further preventing #1
Signed-off-by: Dan Williams <dcbw@redhat.com>