Instead of waiting until stderr/out is closed and then waiting for
the container to exit we wait for the container to exit in the
gmainloop, in addition to everything else, exiting only when
the child dies.
We then drain any output in stderr/out after the child has exited.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
We build paths using g_build_filename and g_strdup_printf() instead
which means we don't have any arbitrary pathname lenght issue, and
the code becomes cleaner.
We also convert asprintf to g_strdup_printf so that we can use
the glib OOM checker instead of open coding it everywhere.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This moves the timeout handling from the go code to conmon, whic
removes some of the complexity from criod, and additionally it will
makes it possible to do the double-fork in the exec case too.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Currently, when creating containers we never call Wait on the
conmon exec.Command, which means that the child hangs around
forever as a zombie after it dies.
However, instead of doing this waitpid() in the parent we instead
do a double-fork in conmon, to daemonize it. That makes a lot of
sense, as conmon really is not tied to the launcher, but needs
to outlive it if e.g. the cri-o daemon restarts.
However, this makes even more obvious a race condition which we
already have. When crio-d puts the conmon pid in a cgroup there
is a race where conmon could already have spawned a child, and
it would then not be part of the cgroup. In order to fix this
we add another synchronization pipe to conmon, which we block
on before we create any children. The parent then makes sure the
pid is in the cgroup before letting it continue.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.
This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.
This fixes https://github.com/kubernetes-incubator/cri-o/issues/569
Signed-off-by: Alexander Larsson <alexl@redhat.com>
We don't want to block on accepting the terminal fd, because then
we can't detect if runc died before calling out to pass the terminal
fd. To handle this we spin the glib mainloop listening to both the
terminal accept fd and a child pid watch.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
conmon.c fails to build on Ubuntu:
cc -std=c99 -Os -Wall -Wextra -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -c -o conmon.o conmon.c
In file included from /usr/include/fcntl.h:289:0,
from conmon.c:4:
In function ‘open’,
inlined from ‘main’ at conmon.c:519:10:
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:4: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
__open_missing_mode ();
^
<builtin>: recipe for target 'conmon.o' failed
make[1]: *** [conmon.o] Error 1
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is not actually read uninitialized, its just that the compiler
can't detect this, but we initilize it anyway to silence the compiler.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
json-glib is a fine library for parsing json. However, all we need
to do is generate some trivial json output, so it is not needed.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
We want to avoid inheriting these into the child. Doing so is both
confusing for the child, and a potential security issue if the
container has access to FDs that are from the outside of the
container.
Some of these are created after we fork for the child, so they
are not technically necessary. However, its best to do this as
we may change the code in the future and forget about this.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This means we don't have to spawn via a shell, but it also
means we do the right thing for any input that would have
needed to be escaped. For instance if the container name had
a $ in i, or even worse, a back-quote!
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The buffer is used to read from the stderr/stdout stream, which
can easily be larger than 256 bytes. With a larger buffer we will
do fewer, larger reads, which is more efficient. And 8k more stack
size use is not really a problem.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The code as is doesn't handle merged controllers.
For instance, I have this in my /proc/self/cgrous:
4:cpu,cpuacct:/user.slice/user-0.slice/session-4.scope
The current code fails to match "cpuacct" wit this line, and
additionally it just does a prefix match so if you were looking
for say "cpu", it would match this:
2:cpuset:/
I also removed some ninfo spew that didn't seem very useful.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Rather than writing the logs with one write per line, use writev()
to write multiple lines in one call. Additionally, this avoids
using dprintf() when writing to the log, which is nice because that
doesn't correctly handle partial writes or ENOINTR.
This also changes set_k8s_timestamp to add the pipe to the reused
buffer so that we don't have to append it on each line.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Any write could be interupted by EINTR if we get some kind of signal,
which means we could be either reporting a EINTR error or a partial
write (if some data was written). Its also generally good to handle
partial writes correctly, as they can happen e.g. when writing to
full pipes.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
conmon has many flags that are parsed when it's executed, one of them
is "-c". During PR #510 where we vendor latest kube master code,
upstream has changed a test to call a "ctr execsync" with a command of
"sh -c commmand ...".
Turns out:
a) conmon has a "-c" flag which refers to the container name/id
b) the exec command has a "-c" flags but it's for "sh"
That leads to conmon parsing the second "-c" flags from the exec
command causing an error. The executed command looks like:
conmon -c [..other flags..] CONTAINERID -e sh -c echo hello world
This patch rewrites the exec sync code to not pass down to conmon the
exec command via command line. Rather, we're now creating an OCI runtime
process spec in a temp file, pass _the path_ down to conmon, and have
runc exec the command using "runc exec --process
/path/to/process-spec.json CONTAINERID". This is far better in which we
don't need to bother anymore about conflicts with flags in conmon.
Added and fixed some tests also.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name. We want to elminate the name ocid from use.
Move fully to crio.
Also cric is being renamed to crioctl for the time being.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Now that conmon splits std{out,err} for !terminal containers, ExecSync
can parse that output to return the correct std{out,err} split to the
kubelet. Invalid log lines are ignored but complained about.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
While it's not currently possible to do this for terminal=true
containers, for !terminal containers we can create separate pipes for
stdout and stderr, and then log them separately. This is required for
k8s's conformance tests.
Signed-off-by: Aleksa Sarai <asarai@suse.de>