update vendor

Signed-off-by: Jess Frazelle <acidburn@microsoft.com>
2018-09-25 12:27:46 -04:00 · 2018-09-25 12:27:46 -04:00 · 94d1cfbfbf
commit 94d1cfbfbf
parent 19a32db84d
10501 changed files with 2307943 additions and 29279 deletions
--- a/vendor/github.com/opencontainers/runc/libcontainer/nsenter/README.md
+++ b/vendor/github.com/opencontainers/runc/libcontainer/nsenter/README.md
@ -10,8 +10,8 @@ The `nsenter` package will `import "C"` and it uses [cgo](https://golang.org/cmd
 package. In cgo, if the import of "C" is immediately preceded by a comment, that comment, 
 called the preamble, is used as a header when compiling the C parts of the package.
 So every time we  import package `nsenter`, the C code function `nsexec()` would be 
-called. And package `nsenter` is now only imported in `main_unix.go`, so every time
-before we call `cmd.Start` on linux, that C code would run.
+called. And package `nsenter` is only imported in `init.go`, so every time the runc
+`init` command is invoked, that C code is run.

 Because `nsexec()` must be run before the Go runtime in order to use the
 Linux kernel namespace, you must `import` this library into a package if
@ -37,7 +37,7 @@ the parent `nsexec()` will exit and the child `nsexec()` process will
 return to allow the Go runtime take over.

 NOTE: We do both `setns(2)` and `clone(2)` even if we don't have any
-CLONE_NEW* clone flags because we must fork a new process in order to
+`CLONE_NEW*` clone flags because we must fork a new process in order to
 enter the PID namespace.


--- a/vendor/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c
+++ b/vendor/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c
@ -211,7 +211,7 @@ static int try_mapping_tool(const char *app, int pid, char *map, size_t map_len)

 	/*
 	 * If @app is NULL, execve will segfault. Just check it here and bail (if
-	 * we're in this path, the caller is already getting desparate and there
+	 * we're in this path, the caller is already getting desperate and there
 	 * isn't a backup to this failing). This usually would be a configuration
 	 * or programming issue.
 	 */
@ -505,7 +505,8 @@ void join_namespaces(char *nslist)

 		ns->fd = fd;
 		ns->ns = nsflag(namespace);
-		strncpy(ns->path, path, PATH_MAX);
+		strncpy(ns->path, path, PATH_MAX - 1);
+		ns->path[PATH_MAX - 1] = '\0';
 	} while ((namespace = strtok_r(NULL, ",", &saveptr)) != NULL);

 	/*
@ -678,17 +679,15 @@ void nsexec(void)
 					/*
 					 * Enable setgroups(2) if we've been asked to. But we also
 					 * have to explicitly disable setgroups(2) if we're
-					 * creating a rootless container (this is required since
-					 * Linux 3.19).
+					 * creating a rootless container for single-entry mapping.
+					 * i.e. config.is_setgroup == false.
+					 * (this is required since Linux 3.19).
+					 *
+					 * For rootless multi-entry mapping, config.is_setgroup shall be true and
+					 * newuidmap/newgidmap shall be used.
 					 */
-					if (config.is_rootless && config.is_setgroup) {
-						kill(child, SIGKILL);
-						bail("cannot allow setgroup in an unprivileged user namespace setup");
-					}

-					if (config.is_setgroup)
-						update_setgroups(child, SETGROUPS_ALLOW);
-					if (config.is_rootless)
+					if (config.is_rootless && !config.is_setgroup)
 						update_setgroups(child, SETGROUPS_DENY);

 					/* Set up mappings. */
@ -809,25 +808,30 @@ void nsexec(void)
 			if (config.namespaces)
 				join_namespaces(config.namespaces);

-			/*
-			 * Unshare all of the namespaces. Now, it should be noted that this
-			 * ordering might break in the future (especially with rootless
-			 * containers). But for now, it's not possible to split this into
-			 * CLONE_NEWUSER + [the rest] because of some RHEL SELinux issues.
-			 *
-			 * Note that we don't merge this with clone() because there were
-			 * some old kernel versions where clone(CLONE_PARENT | CLONE_NEWPID)
-			 * was broken, so we'll just do it the long way anyway.
-			 */
-			if (unshare(config.cloneflags) < 0)
-				bail("failed to unshare namespaces");
-
 			/*
 			 * Deal with user namespaces first. They are quite special, as they
 			 * affect our ability to unshare other namespaces and are used as
 			 * context for privilege checks.
+			 *
+			 * We don't unshare all namespaces in one go. The reason for this
+			 * is that, while the kernel documentation may claim otherwise,
+			 * there are certain cases where unsharing all namespaces at once
+			 * will result in namespace objects being owned incorrectly.
+			 * Ideally we should just fix these kernel bugs, but it's better to
+			 * be safe than sorry, and fix them separately.
+			 *
+			 * A specific case of this is that the SELinux label of the
+			 * internal kern-mount that mqueue uses will be incorrect if the
+			 * UTS namespace is cloned before the USER namespace is mapped.
+			 * I've also heard of similar problems with the network namespace
+			 * in some scenarios. This also mirrors how LXC deals with this
+			 * problem.
 			 */
 			if (config.cloneflags & CLONE_NEWUSER) {
+				if (unshare(CLONE_NEWUSER) < 0)
+					bail("failed to unshare user namespace");
+				config.cloneflags &= ~CLONE_NEWUSER;
+
 				/*
 				 * We don't have the privileges to do any mapping here (see the
 				 * clone_parent rant). So signal our parent to hook us up.
@ -853,8 +857,21 @@ void nsexec(void)
 					if (prctl(PR_SET_DUMPABLE, 0, 0, 0, 0) < 0)
 						bail("failed to set process as dumpable");
 				}
+
+				/* Become root in the namespace proper. */
+				if (setresuid(0, 0, 0) < 0)
+					bail("failed to become root in user namespace");
 			}

+			/*
+			 * Unshare all of the namespaces. Note that we don't merge this
+			 * with clone() because there were some old kernel versions where
+			 * clone(CLONE_PARENT | CLONE_NEWPID) was broken, so we'll just do
+			 * it the long way.
+			 */
+			if (unshare(config.cloneflags) < 0)
+				bail("failed to unshare namespaces");
+
 			/*
 			 * TODO: What about non-namespace clone flags that we're dropping here?
 			 *