Commit graph

62 commits

Author SHA1 Message Date
Mrunal Patel
88037b143b Merge pull request from alexlarsson/conmon-reap-zombies
conmon: Don't leave zombies and fix cgroup race
2017-06-20 07:53:52 -07:00
Alexander Larsson
72686c78b4 fixup! conmon: Don't leave zombies and fix cgroup race
Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-20 12:18:07 +02:00
Antonio Murdaca
2014f0e14f Merge pull request from mrunalp/fix_terminal_settings
conmon: Set ONLCR for console
2017-06-16 10:17:15 +02:00
Mrunal Patel
bfd1b83f51 conmon: Modify console terminal settings to match kube settings
We enable ONLCR on the console to match kube's terminal settings.

Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-15 07:54:12 -07:00
Alexander Larsson
af4fbcd942 conmon: Don't leave zombies and fix cgroup race
Currently, when creating containers we never call Wait on the
conmon exec.Command, which means that the child hangs around
forever as a zombie after it dies.

However, instead of doing this waitpid() in the parent we instead
do a double-fork in conmon, to daemonize it. That makes a lot of
sense, as conmon really is not tied to the launcher, but needs
to outlive it if e.g. the cri-o daemon restarts.

However, this makes even more obvious a race condition which we
already have. When crio-d puts the conmon pid in a cgroup there
is a race where conmon could already have spawned a child, and
it would then not be part of the cgroup. In order to fix this
we add another synchronization pipe to conmon, which we block
on before we create any children. The parent then makes sure the
pid is in the cgroup before letting it continue.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-15 14:20:40 +02:00
Alexander Larsson
7bb957bf75 Implement non-terminal attach
We use a SOCK_SEQPACKET socket for the attach unix domain socket, which
means the kernel will ensure that the reading side only ever get the
data from one write operation. We use this for frameing, where the
first byte is the pipe that the next bytes are for. We have to make sure
that all reads from the socket are using at least the same size of buffer
as the write side, because otherwise the extra data in the message
will be dropped.

This also adds a stdin pipe for the container, similar to the ones we
use for stdout/err, because we need a way for an attached client
to write to stdin, even if not using a tty.

This fixes https://github.com/kubernetes-incubator/cri-o/issues/569

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-14 22:59:50 +02:00
Alexander Larsson
fcac68bf27 conmon: Handle runc exiting before passing terminal fd
We don't want to block on accepting the terminal fd, because then
we can't detect if runc died before calling out to pass the terminal
fd. To handle this we spin the glib mainloop listening to both the
terminal accept fd and a child pid watch.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-08 19:29:52 +02:00
Alexander Larsson
4494d82cfe conmon: Use glib mainloop instead of epoll
Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-08 16:21:20 +02:00
Mrunal Patel
6e53568d15 conmon: Close client on zero read from attach client
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-06 20:49:54 -07:00
Mrunal Patel
1a6825758c conmon: Add control fifo for terminal resize handling
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-06 07:36:52 -07:00
Mrunal Patel
065f12490c conmon: Add unix domain socket for attach
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-06 07:36:52 -07:00
Mrunal Patel
5c383d13d2 conmon: Add info/warn to syslog as well
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-06-06 07:36:52 -07:00
Samuel Ortiz
23ca7307e4 conmon: Fix Ubuntu build failure
conmon.c fails to build on Ubuntu:

cc -std=c99 -Os -Wall -Wextra -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include   -c -o conmon.o conmon.c
In file included from /usr/include/fcntl.h:289:0,
                 from conmon.c:4:
In function ‘open’,
    inlined from ‘main’ at conmon.c:519:10:
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:4: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
    __open_missing_mode ();
    ^
<builtin>: recipe for target 'conmon.o' failed
make[1]: *** [conmon.o] Error 1

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-06-03 01:37:24 +02:00
Mrunal Patel
5d9dcc8431 Add missing include for writev
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-06-02 10:29:50 -07:00
Alexander Larsson
2507ba6453 Remove json-glib in the remaining places
Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:18:27 +02:00
Alexander Larsson
f4b3e90141 conmon: Make console socket mode 0700
It doesn't make sense for other users to connect to this, so
lets make sure of this.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:26 +02:00
Alexander Larsson
f1b0f542e1 conmon: Silence uninitialized read compiler warning
This is not actually read uninitialized, its just that the compiler
can't detect this, but we initilize it anyway to silence the compiler.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:21 +02:00
Alexander Larsson
fe6f1f4786 conmon: Add -Os flag
This is what the other C code uses, and its nice to have as adding
any optimization flags enables a bunch of more warnings.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:15 +02:00
Alexander Larsson
1a168cb196 conmon: Drop json-glib dependency
json-glib is a fine library for parsing json. However, all we need
to do is generate some trivial json output, so it is not needed.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:10 +02:00
Alexander Larsson
f3408cbb5c conmon: Make all file descriptors CLOEXEC
We want to avoid inheriting these into the child. Doing so is both
confusing for the child, and a potential security issue if the
container has access to FDs that are from the outside of the
container.

Some of these are created after we fork for the child, so they
are not technically necessary. However, its best to do this as
we may change the code in the future and forget about this.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:06 +02:00
Alexander Larsson
829ec7f351 conmon: Build argv instead of commandline to spawn runtime
This means we don't have to spawn via a shell, but it also
means we do the right thing for any input that would have
needed to be escaped. For instance if the container name had
a $ in i, or even worse, a back-quote!

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:11:01 +02:00
Alexander Larsson
d2f09ef483 conmon: Increase buffer size
The buffer is used to read from the stderr/stdout stream, which
can easily be larger than 256 bytes. With a larger buffer we will
do fewer, larger reads, which is more efficient. And 8k more stack
size use is not really a problem.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:10:56 +02:00
Alexander Larsson
fe80f857ca conmon: Fix cgroup subsystem parsing
The code as is doesn't handle merged controllers.
For instance, I have this in my /proc/self/cgrous:

4:cpu,cpuacct:/user.slice/user-0.slice/session-4.scope

The current code fails to match "cpuacct" wit this line, and
additionally it just does a prefix match so if you were looking
for say "cpu", it would match this:

2:cpuset:/

I also removed some ninfo spew that didn't seem very useful.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:10:36 +02:00
Alexander Larsson
d34c5829f8 conmon: Write log in larger chunks
Rather than writing the logs with one write per line, use writev()
to write multiple lines in one call. Additionally, this avoids
using dprintf() when writing to the log, which is nice because that
doesn't correctly handle partial writes or ENOINTR.

This also changes set_k8s_timestamp to add the pipe to the reused
buffer so that we don't have to append it on each line.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:10:30 +02:00
Alexander Larsson
ae933d0d03 conmon: Handle EINTR and partial writes when writing
Any write could be interupted by EINTR if we get some kind of signal,
which means we could be either reporting a EINTR error or a partial
write (if some data was written). Its also generally good to handle
partial writes correctly, as they can happen e.g. when writing to
full pipes.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
2017-06-02 16:09:48 +02:00
Antonio Murdaca
b4251aebd8
execsync: rewrite to fix a bug in conmon
conmon has many flags that are parsed when it's executed, one of them
is "-c". During PR  where we vendor latest kube master code,
upstream has changed a test to call a "ctr execsync" with a command of
"sh -c commmand ...".
Turns out:

a) conmon has a "-c" flag which refers to the container name/id
b) the exec command has a "-c" flags but it's for "sh"

That leads to conmon parsing the second "-c" flags from the exec
command causing an error. The executed command looks like:

conmon -c [..other flags..] CONTAINERID -e sh -c echo hello world

This patch rewrites the exec sync code to not pass down to conmon the
exec command via command line. Rather, we're now creating an OCI runtime
process spec in a temp file, pass _the path_ down to conmon, and have
runc exec the command using "runc exec --process
/path/to/process-spec.json CONTAINERID". This is far better in which we
don't need to bother anymore about conflicts with flags in conmon.

Added and fixed some tests also.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-05-25 22:36:33 +02:00
Mrunal Patel
52b27da680 conmon: Disable OOM handling if cgroups not setup
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
7700a62347 conmon: Create oom file for container on OOM notification
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
46f6248e42 conmon: Add OOM eventfd to epoll monitoring list
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
ddb54bf614 conmon: Setup cgroups for container pid OOM notification
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
04ddb57ed7 conmon: Add helper function to get pid cgroup subsystem path
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
8e60251b29 conmon: Add helper for closing C stdlib FILEs
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-05-25 11:30:58 -07:00
Mrunal Patel
0a0533cdfc Capture errors from runtime create failures
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-05-15 13:35:18 -07:00
Dan Walsh
4493b6f176 Rename ocid to crio.
The ocid project was renamed to CRI-O, months ago, it is time that we moved
all of the code to the new name.  We want to elminate the name ocid from use.
Move fully to crio.

Also cric is being renamed to crioctl for the time being.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2017-05-12 09:56:06 -04:00
Mrunal Patel
84424d3829 Add nanoseconds to timestamp to make it RFC3339Nano
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2017-04-25 16:29:56 -07:00
Mrunal Patel
e395afe093 conmon: Fix logic for enabling systemd cgroups
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-04-21 14:20:17 -07:00
Aleksa Sarai
87faf98447
oci: make ExecSync handle split std{out,err}
Now that conmon splits std{out,err} for !terminal containers, ExecSync
can parse that output to return the correct std{out,err} split to the
kubelet. Invalid log lines are ignored but complained about.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-12 21:59:25 +10:00
Aleksa Sarai
d4c9f3e6dc
conmon: split std{out,err} pipe for !terminal containers
While it's not currently possible to do this for terminal=true
containers, for !terminal containers we can create separate pipes for
stdout and stderr, and then log them separately. This is required for
k8s's conformance tests.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-12 21:59:24 +10:00
Aleksa Sarai
35a6403604
*: build with C99
It's 2017, let's not stick with C89 (also for some reason the Travis
environment has a different -std= default value than my local machine).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-12 21:59:16 +10:00
Aleksa Sarai
afadd0aab9
conmon: handle multi-line logging
The CRI requires us to prepend (timestamp, stream) to every line of the
output, and it's quite likely (especially in the !terminal case) that we
will read more than one line of output in the read loop.

So, we need to write out each line separately with the prepended
timestamps. Doing this the simple way (the final part of the buffer is
written partially if it doesn't end in a newline) makes the code much
simpler, with the downside that if we ever switch to multiple streams
for output we'll have to rewrite parts of this.

In addition, drop the debugging output of cri-o for each chunk read so
we stop spamming stderr. We can do this now because 8a928d06e7
("oci: make ExecSync with ExitCode != 0 act properly") actually fixed
how ExecSync was being handled (especially in regards to this patch).

Fixes: 1dc4c87c93 ("conmon: add timestamps to logs")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-11 20:32:24 +10:00
Mrunal Patel
1dc4c87c93
conmon: add timestamps to logs
CRI requires us to timestamp our logs line-by-line by specifying whether
the line came from std{out,err} and the time at which the log was
recieved. This is a preliminary implementation of said behaviour
(without explicit newline handling at the moment).

Signed-off-by: Mrunal Patel <mpatel@redhat.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-05 02:45:57 +10:00
Aleksa Sarai
14a37fb407
conmon: use pipes rather than socketpairs for !terminal
While pipes have their downsides, it turns out that socketpair(2) will
break any program that tries to open /dev/std{out,err} for writing
(because they're symlinked to /proc/1/fd/{1,2} which will cause lots of
fun issues with sockets).

Signed-off-by: Mrunal Patel <mpatel@redhat.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-05 02:45:57 +10:00
Aleksa Sarai
c290c0d9c3
conmon: implement logging to logPath
This adds a very simple implementation of logging within conmon, where
every buffer read from the masterfd of the container is also written to
the log file (with errors during writing to the log file ignored).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-04-05 02:45:57 +10:00
Samuel Ortiz
d60d0ac0c3
conmon: Use conmon for exec'ing a command
Some OCI container runtimes (in particular the hypervisor
based ones) will typically create a shim process between
the hypervisor and the runtime caller, in order to not
rely on the hypervisor process for e.g. forwarding the
output streams or getting a command exit code.

With these runtimes we need to monitor a different process
than the runtime one when executing a command inside a
running container. The natural place to do so is conmon
and thus we add a new option to conmon for calling the
runtime exec command, monitor the PID and then return the
running command exit code through the sync pipe to the
parent.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-01-14 02:02:40 +01:00
Samuel Ortiz
468746aa28
conmon: Use the full PID file path
And not a hardcoded "pidfile".

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-01-14 02:02:37 +01:00
Samuel Ortiz
9a4a1092fe
conmon: Return the exit status code
waitpid fills its second argument with a value that
contains the process exit code in the 8 least significant
bits. Instead of returning the complete value and then
convert it from ocid, return the exit status directly
by using WEXITSTATUS from conmon.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-01-14 02:00:45 +01:00
Mrunal Patel
6df58df215 Add support for systemd cgroups
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2016-12-19 16:31:29 -08:00
Mrunal Patel
e790094f23 Merge pull request from sameo/master
Conmon fixes
2016-11-15 09:47:13 -08:00
Samuel Ortiz
b14bae4869 conmon: Add --bundle and --pidfile command line options
We need to be able pass both the bundle path and the pid file
paths to conmon from ocid.
The former is mandatory when creating an OCI container:

https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#create

And it makes sense to provide a full path for the latter as the
current hardcoded relative path may lead to errors if e.g. the
runtime chdir() before creating the PID file.

In both cases we try to create default reasonable values when
they are left empty by the caller.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-11-15 14:11:42 +01:00
Mrunal Patel
562f8ca684 Add syslog support
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2016-11-14 16:02:03 -08:00