It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful. As such, the shadowing
of /proc/kcore is still enabled with both LXC and native drivers.
Special care has to be taken with /proc/1/attr, which still needs to
be mounted read-write in order to enable the AppArmor profile. It is
bind-mounted from a private read-write mount of procfs.
All that enforcement is done in dockerinit. The code doing the real
work is in libcontainer. The init function for the LXC driver calls
the function from libcontainer to avoid code duplication.
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <jerome@docker.com> (github: jpetazzo)
Kernel capabilities for privileged syslog operations are currently splitted into
CAP_SYS_ADMIN and CAP_SYSLOG since the following commit:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ce6ada35bdf710d16582cc4869c26722547e6f11
This patch drops CAP_SYSLOG to prevent containers from messing with
host's syslog (e.g. `dmesg -c` clears up host's printk ring buffer).
Closes#5491
Docker-DCO-1.1-Signed-off-by: Eiichi Tsukata <devel@etsukata.com> (github: Etsukata)
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
No need to duplicate this information when we already have a
container.json file in the root of libcontainer
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This duplicates some of the Exec code but I think it it worth it because
the native driver is more straight forward and does not have the
complexity have handling the type issues for now.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This temp. expands the Exec method's signature but adds a more robust
way to know when the container's process is actually released and begins
to run. The network interfaces are not guaranteed to be up yet but this
provides a more accurate view with a single callback at this time.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Without this patch, containers inherit the open file descriptors of the daemon, so my "exec 42>&2" allows us to "echo >&42 some nasty error with some bad advice" directly into the daemon log. :)
Also, "hack/dind" was already doing this due to issues caused by the inheritance, so I'm removing that hack too since this patch obsoletes it by generalizing it for all containers.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)
This has every container using the docker daemon's pid for the processes
label so it does not work correctly.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
These are unnecessary since the user package handles these cases properly already (as evidenced by the LXC backend not having these special cases).
I also updated the errors returned to match the other libcontainer error messages in this same file.
Also, switching from Setresuid to Setuid directly isn't a problem, because the "setuid" system call will automatically do that if our own effective UID is root currently: (from `man 2 setuid`)
setuid() sets the effective user ID of the calling process. If the
effective UID of the caller is root, the real UID and saved set-user-
ID are also set.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)
The current load script does alot of things. If it does not find the
parser loaded on the system it will just exit 0 and not load the
profile. We think it should fail loudly if it cannot load the profile
and apparmor is enabled on the system.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
libcontainer.GetNamespace returns nil on FreeBSD because
libcontainer.namespaceList is empty. In this case, Namespaces#Get should
return nil instead of being panic.
Docker-DCO-1.1-Signed-off-by: Kato Kazuyoshi <kato.kazuyoshi@gmail.com> (github: kzys)
It seems that netlink in older kernels, including RHEL6, does not
support RTM_SETLINK with IFLA_MASTER. It just silently ignores it, reporting
no error, causing netlink.NetworkSetMaster() to not do anything yet
return no error.
We fix this by introducing and using AddToBridge() in a very similar manner
to CreateBridge(), which use the old ioctls directly.
This fixes https://github.com/dotcloud/docker/issues/4668
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
The variables that were defined at the top of the apparmor profile are best
pulled in via the <tunables/global> include.
Docker-DCO-1.1-Signed-off-by: Michael Brown <michael.brown@discourse.org> (github: Supermathie)
Add 'pid' variable pointing to 'self' to allow parsing of profile to succeed
Docker-DCO-1.1-Signed-off-by: Michael Brown <michael.brown@discourse.org> (github: Supermathie)
Encountered problems on 14.04 relating to signals between container
processes being blocked by apparmor. The base abstraction contains
appropriate rules to allow this communication.
Docker-DCO-1.1-Signed-off-by: Michael Brown <michael.brown@discourse.org> (github: Supermathie)
When the code attempts to set the ProcessLabel, it checks if SELinux Is
enabled. We have seen a case with some of our patches where the code
is fooled by the container to think that SELinux is not enabled. Calling
label.Init before setting up the rest of the container, tells the library that
SELinux is enabled and everything works fine.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This leaves only the generic cgroup helper functions in cgroups.go and
will allow easy implementations of other cgroup managers.
This also wires up the call to Cleanup the cgroup which was missing
before.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
docker will run the process(es) within the container with an SELinux label and will label
all of the content within the container with mount label. Any temporary file systems
created within the container need to be mounted with the same mount label.
The user can override the process label by specifying
-Z With a string of space separated options.
-Z "user=unconfined_u role=unconfined_r type=unconfined_t level=s0"
Would cause the process label to run with unconfined_u:unconfined_r:unconfined_t:s0"
By default the processes will run execute within the container as svirt_lxc_net_t.
All of the content in the container as svirt_sandbox_file_t.
The process mcs level is based of the PID of the docker process that is creating the container.
If you run the container in --priv mode, the labeling will be disabled.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
We need to change it to read only at the very end so that bound,
copy dev nodes and other ops do not fail.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This also adds an enabled field to the types so that they
can be easily toggled.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Remove loopback code from veth strategy
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Looback strategy: Get rid of uneeded code in Create
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Use append when building network strategy list
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Swap loopback and veth strategies in Networks list
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Revert "Swap loopback and veth strategies in Networks list"
This reverts commit 3b8b2c8454171d79bed5e9a80165172617e92fc7.
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
When initializing networks, only return from the loop if there is an error
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Someone probably got really used to typing er on the end of contain :)
Docker-DCO-1.1-Signed-off-by: Brandon Philips <brandon.philips@coreos.com> (github: philips)
The Capabilities field on libcontainer is actually used as a mask.
Rename the field so that this is more clear.
Docker-DCO-1.1-Signed-off-by: Brandon Philips <brandon.philips@coreos.com> (github: philips)
This moves the bind mounts like /.dockerinit, /etc/hostname, volumes,
etc into the container namespace, by setting them up using lxc.
This is useful to avoid littering the global namespace with a lot of
mounts that are internal to each container and are not generally
needed on the outside. In particular, it seems that having a lot of
mounts is problematic wrt scaling to a lot of containers on systems
where the root filesystem is mounted --rshared.
Note that the "private" option is only supported by the native driver, as
lxc doesn't support setting this. This is not a huge problem, but it does
mean that some mounts are unnecessarily shared inside the container if you're
using the lxc driver.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Use the DOCKER_RAMDISK env var to tell the native driver not to use
a pivot root when setting up the rootfs of a container.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)