linux-stable/security/commoncap.c
David Howells d84f4f992c CRED: Inaugurate COW credentials
Inaugurate copy-on-write credentials management.  This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

	struct cred *new = prepare_creds();
	int ret = blah(new);
	if (ret < 0) {
		abort_creds(new);
		return ret;
	}
	return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const.  The purpose of this is compile-time
discouragement of altering credentials through those pointers.  Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

  (1) Its reference count may incremented and decremented.

  (2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     This now prepares and commits credentials in various places in the
     security code rather than altering the current creds directly.

 (2) Temporary credential overrides.

     do_coredump() and sys_faccessat() now prepare their own credentials and
     temporarily override the ones currently on the acting thread, whilst
     preventing interference from other threads by holding cred_replace_mutex
     on the thread being dumped.

     This will be replaced in a future patch by something that hands down the
     credentials directly to the functions being called, rather than altering
     the task's objective credentials.

 (3) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_capset_check(), ->capset_check()
     (*) security_capset_set(), ->capset_set()

     	 Removed in favour of security_capset().

     (*) security_capset(), ->capset()

     	 New.  This is passed a pointer to the new creds, a pointer to the old
     	 creds and the proposed capability sets.  It should fill in the new
     	 creds or return an error.  All pointers, barring the pointer to the
     	 new creds, are now const.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()

     	 Changed; now returns a value, which will cause the process to be
     	 killed if it's an error.

     (*) security_task_alloc(), ->task_alloc_security()

     	 Removed in favour of security_prepare_creds().

     (*) security_cred_free(), ->cred_free()

     	 New.  Free security data attached to cred->security.

     (*) security_prepare_creds(), ->cred_prepare()

     	 New. Duplicate any security data attached to cred->security.

     (*) security_commit_creds(), ->cred_commit()

     	 New. Apply any security effects for the upcoming installation of new
     	 security by commit_creds().

     (*) security_task_post_setuid(), ->task_post_setuid()

     	 Removed in favour of security_task_fix_setuid().

     (*) security_task_fix_setuid(), ->task_fix_setuid()

     	 Fix up the proposed new credentials for setuid().  This is used by
     	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
     	 setuid() changes.  Changes are made to the new credentials, rather
     	 than the task itself as in security_task_post_setuid().

     (*) security_task_reparent_to_init(), ->task_reparent_to_init()

     	 Removed.  Instead the task being reparented to init is referred
     	 directly to init's credentials.

	 NOTE!  This results in the loss of some state: SELinux's osid no
	 longer records the sid of the thread that forked it.

     (*) security_key_alloc(), ->key_alloc()
     (*) security_key_permission(), ->key_permission()

     	 Changed.  These now take cred pointers rather than task pointers to
     	 refer to the security context.

 (4) sys_capset().

     This has been simplified and uses less locking.  The LSM functions it
     calls have been merged.

 (5) reparent_to_kthreadd().

     This gives the current thread the same credentials as init by simply using
     commit_thread() to point that way.

 (6) __sigqueue_alloc() and switch_uid()

     __sigqueue_alloc() can't stop the target task from changing its creds
     beneath it, so this function gets a reference to the currently applicable
     user_struct which it then passes into the sigqueue struct it returns if
     successful.

     switch_uid() is now called from commit_creds(), and possibly should be
     folded into that.  commit_creds() should take care of protecting
     __sigqueue_alloc().

 (7) [sg]et[ug]id() and co and [sg]et_current_groups.

     The set functions now all use prepare_creds(), commit_creds() and
     abort_creds() to build and check a new set of credentials before applying
     it.

     security_task_set[ug]id() is called inside the prepared section.  This
     guarantees that nothing else will affect the creds until we've finished.

     The calling of set_dumpable() has been moved into commit_creds().

     Much of the functionality of set_user() has been moved into
     commit_creds().

     The get functions all simply access the data directly.

 (8) security_task_prctl() and cap_task_prctl().

     security_task_prctl() has been modified to return -ENOSYS if it doesn't
     want to handle a function, or otherwise return the return value directly
     rather than through an argument.

     Additionally, cap_task_prctl() now prepares a new set of credentials, even
     if it doesn't end up using it.

 (9) Keyrings.

     A number of changes have been made to the keyrings code:

     (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
     	 all been dropped and built in to the credentials functions directly.
     	 They may want separating out again later.

     (b) key_alloc() and search_process_keyrings() now take a cred pointer
     	 rather than a task pointer to specify the security context.

     (c) copy_creds() gives a new thread within the same thread group a new
     	 thread keyring if its parent had one, otherwise it discards the thread
     	 keyring.

     (d) The authorisation key now points directly to the credentials to extend
     	 the search into rather pointing to the task that carries them.

     (e) Installing thread, process or session keyrings causes a new set of
     	 credentials to be created, even though it's not strictly necessary for
     	 process or session keyrings (they're shared).

(10) Usermode helper.

     The usermode helper code now carries a cred struct pointer in its
     subprocess_info struct instead of a new session keyring pointer.  This set
     of credentials is derived from init_cred and installed on the new process
     after it has been cloned.

     call_usermodehelper_setup() allocates the new credentials and
     call_usermodehelper_freeinfo() discards them if they haven't been used.  A
     special cred function (prepare_usermodeinfo_creds()) is provided
     specifically for call_usermodehelper_setup() to call.

     call_usermodehelper_setkeys() adjusts the credentials to sport the
     supplied keyring as the new session keyring.

(11) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) selinux_setprocattr() no longer does its check for whether the
     	 current ptracer can access processes with the new SID inside the lock
     	 that covers getting the ptracer's SID.  Whilst this lock ensures that
     	 the check is done with the ptracer pinned, the result is only valid
     	 until the lock is released, so there's no point doing it inside the
     	 lock.

(12) is_single_threaded().

     This function has been extracted from selinux_setprocattr() and put into
     a file of its own in the lib/ directory as join_session_keyring() now
     wants to use it too.

     The code in SELinux just checked to see whether a task shared mm_structs
     with other tasks (CLONE_VM), but that isn't good enough.  We really want
     to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

     The NFS server daemon now has to use the COW credentials to set the
     credentials it is going to use.  It really needs to pass the credentials
     down to the functions it calls, but it can't do that until other patches
     in this series have been applied.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:23 +11:00

754 lines
18 KiB
C

/* Common capabilities, needed by capability.o and root_plug.o
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
*/
#include <linux/capability.h>
#include <linux/audit.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/security.h>
#include <linux/file.h>
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/pagemap.h>
#include <linux/swap.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
#include <linux/ptrace.h>
#include <linux/xattr.h>
#include <linux/hugetlb.h>
#include <linux/mount.h>
#include <linux/sched.h>
#include <linux/prctl.h>
#include <linux/securebits.h>
int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
{
NETLINK_CB(skb).eff_cap = current_cap();
return 0;
}
int cap_netlink_recv(struct sk_buff *skb, int cap)
{
if (!cap_raised(NETLINK_CB(skb).eff_cap, cap))
return -EPERM;
return 0;
}
EXPORT_SYMBOL(cap_netlink_recv);
/*
* NOTE WELL: cap_capable() cannot be used like the kernel's capable()
* function. That is, it has the reverse semantics: cap_capable()
* returns 0 when a task has a capability, but the kernel's capable()
* returns 1 for this case.
*/
int cap_capable(struct task_struct *tsk, int cap, int audit)
{
__u32 cap_raised;
/* Derived from include/linux/sched.h:capable. */
rcu_read_lock();
cap_raised = cap_raised(__task_cred(tsk)->cap_effective, cap);
rcu_read_unlock();
return cap_raised ? 0 : -EPERM;
}
int cap_settime(struct timespec *ts, struct timezone *tz)
{
if (!capable(CAP_SYS_TIME))
return -EPERM;
return 0;
}
int cap_ptrace_may_access(struct task_struct *child, unsigned int mode)
{
int ret = 0;
rcu_read_lock();
if (!cap_issubset(__task_cred(child)->cap_permitted,
current_cred()->cap_permitted) &&
!capable(CAP_SYS_PTRACE))
ret = -EPERM;
rcu_read_unlock();
return ret;
}
int cap_ptrace_traceme(struct task_struct *parent)
{
int ret = 0;
rcu_read_lock();
if (!cap_issubset(current_cred()->cap_permitted,
__task_cred(parent)->cap_permitted) &&
!has_capability(parent, CAP_SYS_PTRACE))
ret = -EPERM;
rcu_read_unlock();
return ret;
}
int cap_capget (struct task_struct *target, kernel_cap_t *effective,
kernel_cap_t *inheritable, kernel_cap_t *permitted)
{
const struct cred *cred;
/* Derived from kernel/capability.c:sys_capget. */
rcu_read_lock();
cred = __task_cred(target);
*effective = cred->cap_effective;
*inheritable = cred->cap_inheritable;
*permitted = cred->cap_permitted;
rcu_read_unlock();
return 0;
}
#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
static inline int cap_inh_is_capped(void)
{
/*
* Return 1 if changes to the inheritable set are limited
* to the old permitted set. That is, if the current task
* does *not* possess the CAP_SETPCAP capability.
*/
return cap_capable(current, CAP_SETPCAP, SECURITY_CAP_AUDIT) != 0;
}
static inline int cap_limit_ptraced_target(void) { return 1; }
#else /* ie., ndef CONFIG_SECURITY_FILE_CAPABILITIES */
static inline int cap_inh_is_capped(void) { return 1; }
static inline int cap_limit_ptraced_target(void)
{
return !capable(CAP_SETPCAP);
}
#endif /* def CONFIG_SECURITY_FILE_CAPABILITIES */
int cap_capset(struct cred *new,
const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted)
{
if (cap_inh_is_capped() &&
!cap_issubset(*inheritable,
cap_combine(old->cap_inheritable,
old->cap_permitted)))
/* incapable of using this inheritable set */
return -EPERM;
if (!cap_issubset(*inheritable,
cap_combine(old->cap_inheritable,
old->cap_bset)))
/* no new pI capabilities outside bounding set */
return -EPERM;
/* verify restrictions on target's new Permitted set */
if (!cap_issubset(*permitted, old->cap_permitted))
return -EPERM;
/* verify the _new_Effective_ is a subset of the _new_Permitted_ */
if (!cap_issubset(*effective, *permitted))
return -EPERM;
new->cap_effective = *effective;
new->cap_inheritable = *inheritable;
new->cap_permitted = *permitted;
return 0;
}
static inline void bprm_clear_caps(struct linux_binprm *bprm)
{
cap_clear(bprm->cap_post_exec_permitted);
bprm->cap_effective = false;
}
#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
int cap_inode_need_killpriv(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
int error;
if (!inode->i_op || !inode->i_op->getxattr)
return 0;
error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
if (error <= 0)
return 0;
return 1;
}
int cap_inode_killpriv(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
if (!inode->i_op || !inode->i_op->removexattr)
return 0;
return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS);
}
static inline int bprm_caps_from_vfs_caps(struct cpu_vfs_cap_data *caps,
struct linux_binprm *bprm)
{
unsigned i;
int ret = 0;
if (caps->magic_etc & VFS_CAP_FLAGS_EFFECTIVE)
bprm->cap_effective = true;
else
bprm->cap_effective = false;
CAP_FOR_EACH_U32(i) {
__u32 permitted = caps->permitted.cap[i];
__u32 inheritable = caps->inheritable.cap[i];
/*
* pP' = (X & fP) | (pI & fI)
*/
bprm->cap_post_exec_permitted.cap[i] =
(current->cred->cap_bset.cap[i] & permitted) |
(current->cred->cap_inheritable.cap[i] & inheritable);
if (permitted & ~bprm->cap_post_exec_permitted.cap[i]) {
/*
* insufficient to execute correctly
*/
ret = -EPERM;
}
}
/*
* For legacy apps, with no internal support for recognizing they
* do not have enough capabilities, we return an error if they are
* missing some "forced" (aka file-permitted) capabilities.
*/
return bprm->cap_effective ? ret : 0;
}
int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps)
{
struct inode *inode = dentry->d_inode;
__u32 magic_etc;
unsigned tocopy, i;
int size;
struct vfs_cap_data caps;
memset(cpu_caps, 0, sizeof(struct cpu_vfs_cap_data));
if (!inode || !inode->i_op || !inode->i_op->getxattr)
return -ENODATA;
size = inode->i_op->getxattr((struct dentry *)dentry, XATTR_NAME_CAPS, &caps,
XATTR_CAPS_SZ);
if (size == -ENODATA || size == -EOPNOTSUPP) {
/* no data, that's ok */
return -ENODATA;
}
if (size < 0)
return size;
if (size < sizeof(magic_etc))
return -EINVAL;
cpu_caps->magic_etc = magic_etc = le32_to_cpu(caps.magic_etc);
switch ((magic_etc & VFS_CAP_REVISION_MASK)) {
case VFS_CAP_REVISION_1:
if (size != XATTR_CAPS_SZ_1)
return -EINVAL;
tocopy = VFS_CAP_U32_1;
break;
case VFS_CAP_REVISION_2:
if (size != XATTR_CAPS_SZ_2)
return -EINVAL;
tocopy = VFS_CAP_U32_2;
break;
default:
return -EINVAL;
}
CAP_FOR_EACH_U32(i) {
if (i >= tocopy)
break;
cpu_caps->permitted.cap[i] = le32_to_cpu(caps.data[i].permitted);
cpu_caps->inheritable.cap[i] = le32_to_cpu(caps.data[i].inheritable);
}
return 0;
}
/* Locate any VFS capabilities: */
static int get_file_caps(struct linux_binprm *bprm)
{
struct dentry *dentry;
int rc = 0;
struct cpu_vfs_cap_data vcaps;
bprm_clear_caps(bprm);
if (!file_caps_enabled)
return 0;
if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
return 0;
dentry = dget(bprm->file->f_dentry);
rc = get_vfs_caps_from_disk(dentry, &vcaps);
if (rc < 0) {
if (rc == -EINVAL)
printk(KERN_NOTICE "%s: get_vfs_caps_from_disk returned %d for %s\n",
__func__, rc, bprm->filename);
else if (rc == -ENODATA)
rc = 0;
goto out;
}
rc = bprm_caps_from_vfs_caps(&vcaps, bprm);
out:
dput(dentry);
if (rc)
bprm_clear_caps(bprm);
return rc;
}
#else
int cap_inode_need_killpriv(struct dentry *dentry)
{
return 0;
}
int cap_inode_killpriv(struct dentry *dentry)
{
return 0;
}
static inline int get_file_caps(struct linux_binprm *bprm)
{
bprm_clear_caps(bprm);
return 0;
}
#endif
int cap_bprm_set_security (struct linux_binprm *bprm)
{
int ret;
ret = get_file_caps(bprm);
if (!issecure(SECURE_NOROOT)) {
/*
* To support inheritance of root-permissions and suid-root
* executables under compatibility mode, we override the
* capability sets for the file.
*
* If only the real uid is 0, we do not set the effective
* bit.
*/
if (bprm->e_uid == 0 || current_uid() == 0) {
/* pP' = (cap_bset & ~0) | (pI & ~0) */
bprm->cap_post_exec_permitted = cap_combine(
current->cred->cap_bset,
current->cred->cap_inheritable);
bprm->cap_effective = (bprm->e_uid == 0);
ret = 0;
}
}
return ret;
}
int cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
{
const struct cred *old = current_cred();
struct cred *new;
new = prepare_creds();
if (!new)
return -ENOMEM;
if (bprm->e_uid != old->uid || bprm->e_gid != old->gid ||
!cap_issubset(bprm->cap_post_exec_permitted,
old->cap_permitted)) {
set_dumpable(current->mm, suid_dumpable);
current->pdeath_signal = 0;
if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
if (!capable(CAP_SETUID)) {
bprm->e_uid = old->uid;
bprm->e_gid = old->gid;
}
if (cap_limit_ptraced_target()) {
bprm->cap_post_exec_permitted = cap_intersect(
bprm->cap_post_exec_permitted,
new->cap_permitted);
}
}
}
new->suid = new->euid = new->fsuid = bprm->e_uid;
new->sgid = new->egid = new->fsgid = bprm->e_gid;
/* For init, we want to retain the capabilities set
* in the init_task struct. Thus we skip the usual
* capability rules */
if (!is_global_init(current)) {
new->cap_permitted = bprm->cap_post_exec_permitted;
if (bprm->cap_effective)
new->cap_effective = bprm->cap_post_exec_permitted;
else
cap_clear(new->cap_effective);
}
/*
* Audit candidate if current->cap_effective is set
*
* We do not bother to audit if 3 things are true:
* 1) cap_effective has all caps
* 2) we are root
* 3) root is supposed to have all caps (SECURE_NOROOT)
* Since this is just a normal root execing a process.
*
* Number 1 above might fail if you don't have a full bset, but I think
* that is interesting information to audit.
*/
if (!cap_isclear(new->cap_effective)) {
if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
bprm->e_uid != 0 || new->uid != 0 ||
issecure(SECURE_NOROOT))
audit_log_bprm_fcaps(bprm, new, old);
}
new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
return commit_creds(new);
}
int cap_bprm_secureexec (struct linux_binprm *bprm)
{
const struct cred *cred = current_cred();
if (cred->uid != 0) {
if (bprm->cap_effective)
return 1;
if (!cap_isclear(bprm->cap_post_exec_permitted))
return 1;
}
return (cred->euid != cred->uid ||
cred->egid != cred->gid);
}
int cap_inode_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags)
{
if (!strcmp(name, XATTR_NAME_CAPS)) {
if (!capable(CAP_SETFCAP))
return -EPERM;
return 0;
} else if (!strncmp(name, XATTR_SECURITY_PREFIX,
sizeof(XATTR_SECURITY_PREFIX) - 1) &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
}
int cap_inode_removexattr(struct dentry *dentry, const char *name)
{
if (!strcmp(name, XATTR_NAME_CAPS)) {
if (!capable(CAP_SETFCAP))
return -EPERM;
return 0;
} else if (!strncmp(name, XATTR_SECURITY_PREFIX,
sizeof(XATTR_SECURITY_PREFIX) - 1) &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
}
/* moved from kernel/sys.c. */
/*
* cap_emulate_setxuid() fixes the effective / permitted capabilities of
* a process after a call to setuid, setreuid, or setresuid.
*
* 1) When set*uiding _from_ one of {r,e,s}uid == 0 _to_ all of
* {r,e,s}uid != 0, the permitted and effective capabilities are
* cleared.
*
* 2) When set*uiding _from_ euid == 0 _to_ euid != 0, the effective
* capabilities of the process are cleared.
*
* 3) When set*uiding _from_ euid != 0 _to_ euid == 0, the effective
* capabilities are set to the permitted capabilities.
*
* fsuid is handled elsewhere. fsuid == 0 and {r,e,s}uid!= 0 should
* never happen.
*
* -astor
*
* cevans - New behaviour, Oct '99
* A process may, via prctl(), elect to keep its capabilities when it
* calls setuid() and switches away from uid==0. Both permitted and
* effective sets will be retained.
* Without this change, it was impossible for a daemon to drop only some
* of its privilege. The call to setuid(!=0) would drop all privileges!
* Keeping uid 0 is not an option because uid 0 owns too many vital
* files..
* Thanks to Olaf Kirch and Peter Benie for spotting this.
*/
static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
{
if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
(new->uid != 0 && new->euid != 0 && new->suid != 0) &&
!issecure(SECURE_KEEP_CAPS)) {
cap_clear(new->cap_permitted);
cap_clear(new->cap_effective);
}
if (old->euid == 0 && new->euid != 0)
cap_clear(new->cap_effective);
if (old->euid != 0 && new->euid == 0)
new->cap_effective = new->cap_permitted;
}
int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
{
switch (flags) {
case LSM_SETID_RE:
case LSM_SETID_ID:
case LSM_SETID_RES:
/* Copied from kernel/sys.c:setreuid/setuid/setresuid. */
if (!issecure(SECURE_NO_SETUID_FIXUP))
cap_emulate_setxuid(new, old);
break;
case LSM_SETID_FS:
/* Copied from kernel/sys.c:setfsuid. */
/*
* FIXME - is fsuser used for all CAP_FS_MASK capabilities?
* if not, we might be a bit too harsh here.
*/
if (!issecure(SECURE_NO_SETUID_FIXUP)) {
if (old->fsuid == 0 && new->fsuid != 0) {
new->cap_effective =
cap_drop_fs_set(new->cap_effective);
}
if (old->fsuid != 0 && new->fsuid == 0) {
new->cap_effective =
cap_raise_fs_set(new->cap_effective,
new->cap_permitted);
}
}
break;
default:
return -EINVAL;
}
return 0;
}
#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
/*
* Rationale: code calling task_setscheduler, task_setioprio, and
* task_setnice, assumes that
* . if capable(cap_sys_nice), then those actions should be allowed
* . if not capable(cap_sys_nice), but acting on your own processes,
* then those actions should be allowed
* This is insufficient now since you can call code without suid, but
* yet with increased caps.
* So we check for increased caps on the target process.
*/
static int cap_safe_nice(struct task_struct *p)
{
int is_subset;
rcu_read_lock();
is_subset = cap_issubset(__task_cred(p)->cap_permitted,
current_cred()->cap_permitted);
rcu_read_unlock();
if (!is_subset && !capable(CAP_SYS_NICE))
return -EPERM;
return 0;
}
int cap_task_setscheduler (struct task_struct *p, int policy,
struct sched_param *lp)
{
return cap_safe_nice(p);
}
int cap_task_setioprio (struct task_struct *p, int ioprio)
{
return cap_safe_nice(p);
}
int cap_task_setnice (struct task_struct *p, int nice)
{
return cap_safe_nice(p);
}
/*
* called from kernel/sys.c for prctl(PR_CABSET_DROP)
* done without task_capability_lock() because it introduces
* no new races - i.e. only another task doing capget() on
* this task could get inconsistent info. There can be no
* racing writer bc a task can only change its own caps.
*/
static long cap_prctl_drop(struct cred *new, unsigned long cap)
{
if (!capable(CAP_SETPCAP))
return -EPERM;
if (!cap_valid(cap))
return -EINVAL;
cap_lower(new->cap_bset, cap);
return 0;
}
#else
int cap_task_setscheduler (struct task_struct *p, int policy,
struct sched_param *lp)
{
return 0;
}
int cap_task_setioprio (struct task_struct *p, int ioprio)
{
return 0;
}
int cap_task_setnice (struct task_struct *p, int nice)
{
return 0;
}
#endif
int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5)
{
struct cred *new;
long error = 0;
new = prepare_creds();
if (!new)
return -ENOMEM;
switch (option) {
case PR_CAPBSET_READ:
error = -EINVAL;
if (!cap_valid(arg2))
goto error;
error = !!cap_raised(new->cap_bset, arg2);
goto no_change;
#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
case PR_CAPBSET_DROP:
error = cap_prctl_drop(new, arg2);
if (error < 0)
goto error;
goto changed;
/*
* The next four prctl's remain to assist with transitioning a
* system from legacy UID=0 based privilege (when filesystem
* capabilities are not in use) to a system using filesystem
* capabilities only - as the POSIX.1e draft intended.
*
* Note:
*
* PR_SET_SECUREBITS =
* issecure_mask(SECURE_KEEP_CAPS_LOCKED)
* | issecure_mask(SECURE_NOROOT)
* | issecure_mask(SECURE_NOROOT_LOCKED)
* | issecure_mask(SECURE_NO_SETUID_FIXUP)
* | issecure_mask(SECURE_NO_SETUID_FIXUP_LOCKED)
*
* will ensure that the current process and all of its
* children will be locked into a pure
* capability-based-privilege environment.
*/
case PR_SET_SECUREBITS:
error = -EPERM;
if ((((new->securebits & SECURE_ALL_LOCKS) >> 1)
& (new->securebits ^ arg2)) /*[1]*/
|| ((new->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
|| (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
|| (cap_capable(current, CAP_SETPCAP, SECURITY_CAP_AUDIT) != 0) /*[4]*/
/*
* [1] no changing of bits that are locked
* [2] no unlocking of locks
* [3] no setting of unsupported bits
* [4] doing anything requires privilege (go read about
* the "sendmail capabilities bug")
*/
)
/* cannot change a locked bit */
goto error;
new->securebits = arg2;
goto changed;
case PR_GET_SECUREBITS:
error = new->securebits;
goto no_change;
#endif /* def CONFIG_SECURITY_FILE_CAPABILITIES */
case PR_GET_KEEPCAPS:
if (issecure(SECURE_KEEP_CAPS))
error = 1;
goto no_change;
case PR_SET_KEEPCAPS:
error = -EINVAL;
if (arg2 > 1) /* Note, we rely on arg2 being unsigned here */
goto error;
error = -EPERM;
if (issecure(SECURE_KEEP_CAPS_LOCKED))
goto error;
if (arg2)
new->securebits |= issecure_mask(SECURE_KEEP_CAPS);
else
new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
goto changed;
default:
/* No functionality available - continue with default */
error = -ENOSYS;
goto error;
}
/* Functionality provided */
changed:
return commit_creds(new);
no_change:
error = 0;
error:
abort_creds(new);
return error;
}
int cap_syslog (int type)
{
if ((type != 3 && type != 10) && !capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
}
int cap_vm_enough_memory(struct mm_struct *mm, long pages)
{
int cap_sys_admin = 0;
if (cap_capable(current, CAP_SYS_ADMIN, SECURITY_CAP_NOAUDIT) == 0)
cap_sys_admin = 1;
return __vm_enough_memory(mm, pages, cap_sys_admin);
}