Commit Graph

17 Commits

Author SHA1 Message Date
Linus Torvalds cdab10bf32 selinux/stable-5.16 PR 20211101
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGANbAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNaMBAAg+9gZr0F7xiafu8JFZqZfx/AQdJ2
 G2cn3le+/tXGZmF8m/+82lOaR6LeQLatgSDJNSkXWkKr0nRwseQJDbtRfvYJdn0t
 Ax05/Fmz6OGxQ2wgRYgaFiSrKpE5p3NhDtiLFVdkCJaQNe/8DZOc7NhBl6EjZf3x
 ubhl2hUiJ4AmiXGwcYhr4uKgP4nhW8OM1/OkskVi+bBMmLA8KTY9kslmIDP5E3BW
 29W4qhqeLNQupY5dGMEMVcyxY9ZUWpO39q4uOaQVZrUGE7xABkj/jhnxT5gFTSlI
 pu8VhsYXm9KuRVveIsv0L5SZfadwoM9YAl7ki1wD3W5rHqOAte3rBTm6VmNlQwfU
 MqxP65Jiyxudxet5Be3/dCRH/+MDQuwBxivgmZXbeVxor2SeznVb0GDaEUC5FSHu
 CJIgWtQzsPJMxgAEGXN4F3QGP0htTTJni56GUPOsrf4TIBW02TT+oLTLFRIokQQL
 INNOfwVSRXElnCsvxsHR4oB+JZ9pJyBaAmeupcQ6jmcKiWlbLj4s+W0U0pM5h91v
 hmMpz7KMxrX6gVL4gB2Jj4aN3r5YRbq26NBu6D+wdwwBTeTTocaHSpAqkv4buClf
 uNk3cG8Hkp8TTg9cM8jYgpxMyzKH/AI/Uw3VhEa1xCiq2Ck3DgfnZvnvcRRaZevU
 FPgmwgqePJXGi60=
 =sb8J
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Add LSM/SELinux/Smack controls and auditing for io-uring.

   As usual, the individual commit descriptions have more detail, but we
   were basically missing two things which we're adding here:

      + establishment of a proper audit context so that auditing of
        io-uring ops works similarly to how it does for syscalls (with
        some io-uring additions because io-uring ops are *not* syscalls)

      + additional LSM hooks to enable access control points for some of
        the more unusual io-uring features, e.g. credential overrides.

   The additional audit callouts and LSM hooks were done in conjunction
   with the io-uring folks, based on conversations and RFC patches
   earlier in the year.

 - Fixup the binder credential handling so that the proper credentials
   are used in the LSM hooks; the commit description and the code
   comment which is removed in these patches are helpful to understand
   the background and why this is the proper fix.

 - Enable SELinux genfscon policy support for securityfs, allowing
   improved SELinux filesystem labeling for other subsystems which make
   use of securityfs, e.g. IMA.

* tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  security: Return xattr name from security_dentry_init_security()
  selinux: fix a sock regression in selinux_ip_postroute_compat()
  binder: use cred instead of task for getsecid
  binder: use cred instead of task for selinux checks
  binder: use euid from cred instead of using task
  LSM: Avoid warnings about potentially unused hook variables
  selinux: fix all of the W=1 build warnings
  selinux: make better use of the nf_hook_state passed to the NF hooks
  selinux: fix race condition when computing ocontext SIDs
  selinux: remove unneeded ipv6 hook wrappers
  selinux: remove the SELinux lockdown implementation
  selinux: enable genfscon labeling for securityfs
  Smack: Brutalist io_uring support
  selinux: add support for the io_uring access controls
  lsm,io_uring: add LSM hooks to io_uring
  io_uring: convert io_uring to the secure anon inode interface
  fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure()
  audit: add filtering for io_uring records
  audit,io_uring,io-wq: add some basic audit support to io_uring
  audit: prepare audit_context for use in calling contexts beyond syscalls
2021-11-01 21:06:18 -07:00
Todd Kjos 29bc22ac5e binder: use euid from cred instead of using task
Save the 'struct cred' associated with a binder process
at initial open to avoid potential race conditions
when converting to an euid.

Set a transaction's sender_euid from the 'struct cred'
saved at binder_open() instead of looking up the euid
from the binder proc's 'struct task'. This ensures
the euid is associated with the security context that
of the task that opened binder.

Cc: stable@vger.kernel.org # 4.4+
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Signed-off-by: Todd Kjos <tkjos@google.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Jann Horn <jannh@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-14 17:19:40 -04:00
Li Li b564171ade binder: fix freeze race
Currently cgroup freezer is used to freeze the application threads, and
BINDER_FREEZE is used to freeze the corresponding binder interface.
There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
existing transactions to drain out before actually freezing the binder
interface.

But freezing an app requires 2 steps, freezing the binder interface with
ioctl(BINDER_FREEZE) and then freezing the application main threads with
cgroupfs. This is not an atomic operation. The following race issue
might happen.

1) Binder interface is frozen by ioctl(BINDER_FREEZE);
2) Main thread A initiates a new sync binder transaction to process B;
3) Main thread A is frozen by "echo 1 > cgroup.freeze";
4) The response from process B reaches the frozen thread, which will
unexpectedly fail.

This patch provides a mechanism to check if there's any new pending
transaction happening between ioctl(BINDER_FREEZE) and freezing the
main thread. If there's any, the main thread freezing operation can
be rolled back to finish the pending transaction.

Furthermore, the response might reach the binder driver before the
rollback actually happens. That will still cause failed transaction.

As the other process doesn't wait for another response of the response,
the response transaction failure can be fixed by treating the response
transaction like an oneway/async one, allowing it to reach the frozen
thread. And it will be consumed when the thread gets unfrozen later.

NOTE: This patch reuses the existing definition of struct
binder_frozen_status_info but expands the bit assignments of __u32
member sync_recv.

To ensure backward compatibility, bit 0 of sync_recv still indicates
there's an outstanding sync binder transaction. This patch adds new
information to bit 1 of sync_recv, indicating the binder transaction
happens exactly when there's a race.

If an existing userspace app runs on a new kernel, a sync binder call
will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
return the expected value (true). The app just doesn't check bit 1
intentionally so it doesn't have the ability to tell if there's a race.
This behavior is aligned with what happens on an old kernel which
doesn't set bit 1 at all.

A new userspace app can 1) check bit 0 to know if there's a sync binder
transaction happened when being frozen - same as before; and 2) check
bit 1 to know if that sync binder transaction happened exactly when
there's a race - a new information for rollback decision.

the same time, confirmed the pending transactions succeeded.

Fixes: 432ff1e916 ("binder: BINDER_FREEZE ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Li Li <dualli@google.com>
Test: stress test with apps being frozen and initiating binder calls at
Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-14 08:46:08 +02:00
Hang Lu a7dc1e6f99 binder: tell userspace to dump current backtrace when detected oneway spamming
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.

This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible for
either more than 50 transactions, or more than 50% of the oneway space).
And the detection will restart when the async buffer has returned to a
healthy state.

Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:52:04 +02:00
Hang Lu 0051691574 binder: fix the missing BR_FROZEN_REPLY in binder_return_strings
Add BR_FROZEN_REPLY in binder_return_strings to support stat function.

Fixes: ae28c1be1e ("binder: BINDER_GET_FROZEN_INFO ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-2-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:52:04 +02:00
Marco Ballesio ae28c1be1e binder: BINDER_GET_FROZEN_INFO ioctl
User space needs to know if binder transactions occurred to frozen
processes. Introduce a new BINDER_GET_FROZEN ioctl and keep track of
transactions occurring to frozen proceses.

Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-4-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24 08:26:31 +01:00
Marco Ballesio 432ff1e916 binder: BINDER_FREEZE ioctl
Frozen tasks can't process binder transactions, so a way is required to
inform transmitting ends of communication failures due to the frozen
state of their receiving counterparts. Additionally, races are possible
between transitions to frozen state and binder transactions enqueued to
a specific process.

Implement BINDER_FREEZE ioctl for user space to inform the binder driver
about the intention to freeze or unfreeze a process. When the ioctl is
called, block the caller until any pending binder transactions toward
the target process are flushed. Return an error to transactions to
processes marked as frozen.

Co-developed-by: Todd Kjos <tkjos@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Li Li <dualli@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24 08:26:31 +01:00
Frankie.Chang 421518a274 binder: move structs from core file to header file
Moving all structs to header file makes module more
extendable, and makes all these structs to be defined
in the same file.

Signed-off-by: Frankie.Chang <Frankie.Chang@mediatek.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/1605063764-12930-2-git-send-email-Frankie.Chang@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-11 08:20:34 +01:00
Christian Brauner f0fe2c0f05 binder: prevent UAF for binderfs devices II
This is a necessary follow up to the first fix I proposed and we merged
in 2669b8b0c7 ("binder: prevent UAF for binderfs devices"). I have been
overly optimistic that the simple fix I proposed would work. But alas,
ihold() + iput() won't work since the inodes won't survive the
destruction of the superblock.
So all we get with my prior fix is a different race with a tinier
race-window but it doesn't solve the issue. Fwiw, the problem lies with
generic_shutdown_super(). It even has this cozy Al-style comment:

          if (!list_empty(&sb->s_inodes)) {
                  printk("VFS: Busy inodes after unmount of %s. "
                     "Self-destruct in 5 seconds.  Have a nice day...\n",
                     sb->s_id);
          }

On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is
called which punts the actual cleanup operation to a workqueue. At some
point, binder_deferred_func() will be called which will end up calling
binder_deferred_release() which will retrieve and cleanup the
binder_context attach to this struct binder_proc.

If we trace back where this binder_context is attached to binder_proc we
see that it is set in binder_open() and is taken from the struct
binder_device it is associated with. This obviously assumes that the
struct binder_device that context is attached to is _never_ freed. While
that might be true for devtmpfs binder devices it is most certainly
wrong for binderfs binder devices.

So, assume binder_open() is called on a binderfs binder devices. We now
stash away the struct binder_context associated with that struct
binder_devices:
	proc->context = &binder_dev->context;
	/* binderfs stashes devices in i_private */
	if (is_binderfs_device(nodp)) {
		binder_dev = nodp->i_private;
		info = nodp->i_sb->s_fs_info;
		binder_binderfs_dir_entry_proc = info->proc_log_dir;
	} else {
	.
	.
	.
	proc->context = &binder_dev->context;

Now let's assume that the binderfs instance for that binder devices is
shutdown via umount() and/or the mount namespace associated with it goes
away. As long as there is still an fd open for that binderfs binder
device things are fine. But let's assume we now close the last fd for
that binderfs binder device. Now binder_release() is called and punts to
the workqueue. Assume that the workqueue has quite a bit of stuff to do
and doesn't get to cleaning up the struct binder_proc and the associated
struct binder_context with it for that binderfs binder device right
away. In the meantime, the VFS is killing the super block and is
ultimately calling sb->evict_inode() which means it will call
binderfs_evict_inode() which does:

static void binderfs_evict_inode(struct inode *inode)
{
	struct binder_device *device = inode->i_private;
	struct binderfs_info *info = BINDERFS_I(inode);

	clear_inode(inode);

	if (!S_ISCHR(inode->i_mode) || !device)
		return;

	mutex_lock(&binderfs_minors_mutex);
	--info->device_count;
	ida_free(&binderfs_minors, device->miscdev.minor);
	mutex_unlock(&binderfs_minors_mutex);

	kfree(device->context.name);
	kfree(device);
}

thereby freeing the struct binder_device including struct
binder_context.

Now the workqueue finally has time to get around to cleaning up struct
binder_proc and is now trying to access the associate struct
binder_context. Since it's already freed it will OOPs.

Fix this by introducing a refounct on binder devices.

This is an alternative fix to 51d8a7eca6 ("binder: prevent UAF read in
print_binder_transaction_log_entry()").

Fixes: 3ad20fe393 ("binder: implement binderfs")
Fixes: 2669b8b0c7 ("binder: prevent UAF for binderfs devices")
Fixes: 03e2e07e38 ("binder: Make transaction_log available in binderfs")
Related : 51d8a7eca6 ("binder: prevent UAF read in print_binder_transaction_log_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20200303164340.670054-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03 19:58:37 +01:00
Christian Brauner 2669b8b0c7 binder: prevent UAF for binderfs devices
On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is
called which punts the actual cleanup operation to a workqueue. At some
point, binder_deferred_func() will be called which will end up calling
binder_deferred_release() which will retrieve and cleanup the
binder_context attach to this struct binder_proc.

If we trace back where this binder_context is attached to binder_proc we
see that it is set in binder_open() and is taken from the struct
binder_device it is associated with. This obviously assumes that the
struct binder_device that context is attached to is _never_ freed. While
that might be true for devtmpfs binder devices it is most certainly
wrong for binderfs binder devices.

So, assume binder_open() is called on a binderfs binder devices. We now
stash away the struct binder_context associated with that struct
binder_devices:
	proc->context = &binder_dev->context;
	/* binderfs stashes devices in i_private */
	if (is_binderfs_device(nodp)) {
		binder_dev = nodp->i_private;
		info = nodp->i_sb->s_fs_info;
		binder_binderfs_dir_entry_proc = info->proc_log_dir;
	} else {
	.
	.
	.
	proc->context = &binder_dev->context;

Now let's assume that the binderfs instance for that binder devices is
shutdown via umount() and/or the mount namespace associated with it goes
away. As long as there is still an fd open for that binderfs binder
device things are fine. But let's assume we now close the last fd for
that binderfs binder device. Now binder_release() is called and punts to
the workqueue. Assume that the workqueue has quite a bit of stuff to do
and doesn't get to cleaning up the struct binder_proc and the associated
struct binder_context with it for that binderfs binder device right
away. In the meantime, the VFS is killing the super block and is
ultimately calling sb->evict_inode() which means it will call
binderfs_evict_inode() which does:

static void binderfs_evict_inode(struct inode *inode)
{
	struct binder_device *device = inode->i_private;
	struct binderfs_info *info = BINDERFS_I(inode);

	clear_inode(inode);

	if (!S_ISCHR(inode->i_mode) || !device)
		return;

	mutex_lock(&binderfs_minors_mutex);
	--info->device_count;
	ida_free(&binderfs_minors, device->miscdev.minor);
	mutex_unlock(&binderfs_minors_mutex);

	kfree(device->context.name);
	kfree(device);
}

thereby freeing the struct binder_device including struct
binder_context.

Now the workqueue finally has time to get around to cleaning up struct
binder_proc and is now trying to access the associate struct
binder_context. Since it's already freed it will OOPs.

Fix this by holding an additional reference to the inode that is only
released once the workqueue is done cleaning up struct binder_proc. This
is an easy alternative to introducing separate refcounting on struct
binder_device which we can always do later if it becomes necessary.

This is an alternative fix to 51d8a7eca6 ("binder: prevent UAF read in
print_binder_transaction_log_entry()").

Fixes: 3ad20fe393 ("binder: implement binderfs")
Fixes: 03e2e07e38 ("binder: Make transaction_log available in binderfs")
Related : 51d8a7eca6 ("binder: prevent UAF read in print_binder_transaction_log_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03 08:02:57 +01:00
Christian Brauner 51d8a7eca6 binder: prevent UAF read in print_binder_transaction_log_entry()
When a binder transaction is initiated on a binder device coming from a
binderfs instance, a pointer to the name of the binder device is stashed
in the binder_transaction_log_entry's context_name member. Later on it
is used to print the name in print_binder_transaction_log_entry(). By
the time print_binder_transaction_log_entry() accesses context_name
binderfs_evict_inode() might have already freed the associated memory
thereby causing a UAF. Do the simple thing and prevent this by copying
the name of the binder device instead of stashing a pointer to it.

Reported-by: Jann Horn <jannh@google.com>
Fixes: 03e2e07e38 ("binder: Make transaction_log available in binderfs")
Link: https://lore.kernel.org/r/CAG48ez14Q0-F8LqsvcNbyR2o6gPW8SHXsm4u5jmD9MpsteM2Tw@mail.gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Hridya Valsaraju <hridya@google.com>
Link: https://lore.kernel.org/r/20191008130159.10161-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-10 14:39:22 +02:00
Hridya Valsaraju 4feb80faf4 binder: Add binder_proc logging to binderfs
Currently /sys/kernel/debug/binder/proc contains
the debug data for every binder_proc instance.
This patch makes this information also available
in a binderfs instance mounted with a mount option
"stats=global" in addition to debugfs. The patch does
not affect the presence of the file in debugfs.

If a binderfs instance is mounted at path /dev/binderfs,
this file would be present at /dev/binderfs/binder_logs/proc.
This change provides an alternate way to access this file when debugfs
is not mounted.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Link: https://lore.kernel.org/r/20190903161655.107408-5-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:31:26 +02:00
Hridya Valsaraju 03e2e07e38 binder: Make transaction_log available in binderfs
Currently, the binder transaction log files 'transaction_log'
and 'failed_transaction_log' live in debugfs at the following locations:

/sys/kernel/debug/binder/failed_transaction_log
/sys/kernel/debug/binder/transaction_log

This patch makes these files also available in a binderfs instance
mounted with the mount option "stats=global".
It does not affect the presence of these files in debugfs.
If a binderfs instance is mounted at path /dev/binderfs, the location of
these files will be as follows:

/dev/binderfs/binder_logs/failed_transaction_log
/dev/binderfs/binder_logs/transaction_log

This change provides an alternate option to access these files when
debugfs is not mounted.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Link: https://lore.kernel.org/r/20190903161655.107408-4-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:31:22 +02:00
Hridya Valsaraju 0e13e452da binder: Add stats, state and transactions files
The following binder stat files currently live in debugfs.

/sys/kernel/debug/binder/state
/sys/kernel/debug/binder/stats
/sys/kernel/debug/binder/transactions

This patch makes these files available in a binderfs instance
mounted with the mount option 'stats=global'. For example, if a binderfs
instance is mounted at path /dev/binderfs, the above files will be
available at the following locations:

/dev/binderfs/binder_logs/state
/dev/binderfs/binder_logs/stats
/dev/binderfs/binder_logs/transactions

This provides a way to access them even when debugfs is not mounted.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20190903161655.107408-3-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:31:18 +02:00
Hridya Valsaraju ca2864c6e8 binder: Add default binder devices through binderfs when configured
Currently, since each binderfs instance needs its own
private binder devices, every time a binderfs instance is
mounted, all the default binder devices need to be created
via the BINDER_CTL_ADD IOCTL. This patch aims to
add a solution to automatically create the default binder
devices for each binderfs instance that gets mounted.
To achieve this goal, when CONFIG_ANDROID_BINDERFS is set,
the default binder devices specified by CONFIG_ANDROID_BINDER_DEVICES
are created in each binderfs instance instead of global devices
being created by the binder driver.

Co-developed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190808222727.132744-2-hridya@google.com
Link: https://lore.kernel.org/r/20190904110704.8606-2-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:17:35 +02:00
Christian Brauner 5b9633af29 binderfs: remove separate device_initcall()
binderfs should not have a separate device_initcall(). When a kernel is
compiled with CONFIG_ANDROID_BINDERFS register the filesystem alongside
CONFIG_ANDROID_IPC. This use-case is especially sensible when users specify
CONFIG_ANDROID_IPC=y, CONFIG_ANDROID_BINDERFS=y and
ANDROID_BINDER_DEVICES="".
When CONFIG_ANDROID_BINDERFS=n then this always succeeds so there's no
regression potential for legacy workloads.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-01 15:50:26 +01:00
Christian Brauner 3ad20fe393 binder: implement binderfs
As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
implementation of binderfs.

/* Abstract */
binderfs is a backwards-compatible filesystem for Android's binder ipc
mechanism. Each ipc namespace will mount a new binderfs instance. Mounting
binderfs multiple times at different locations in the same ipc namespace
will not cause a new super block to be allocated and hence it will be the
same filesystem instance.
Each new binderfs mount will have its own set of binder devices only
visible in the ipc namespace it has been mounted in. All devices in a new
binderfs mount will follow the scheme binder%d and numbering will always
start at 0.

/* Backwards compatibility */
Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the
initial ipc namespace will work as before. They will be registered via
misc_register() and appear in the devtmpfs mount. Specifically, the
standard devices binder, hwbinder, and vndbinder will all appear in their
standard locations in /dev. Mounting or unmounting the binderfs mount in
the initial ipc namespace will have no effect on these devices, i.e. they
will neither show up in the binderfs mount nor will they disappear when the
binderfs mount is gone.

/* binder-control */
Each new binderfs instance comes with a binder-control device. No other
devices will be present at first. The binder-control device can be used to
dynamically allocate binder devices. All requests operate on the binderfs
mount the binder-control device resides in.
Assuming a new instance of binderfs has been mounted at /dev/binderfs
via mount -t binderfs binderfs /dev/binderfs. Then a request to create a
new binder device can be made as illustrated in [2].
Binderfs devices can simply be removed via unlink().

/* Implementation details */
- dynamic major number allocation:
  When binderfs is registered as a new filesystem it will dynamically
  allocate a new major number. The allocated major number will be returned
  in struct binderfs_device when a new binder device is allocated.
- global minor number tracking:
  Minor are tracked in a global idr struct that is capped at
  BINDERFS_MAX_MINOR. The minor number tracker is protected by a global
  mutex. This is the only point of contention between binderfs mounts.
- struct binderfs_info:
  Each binderfs super block has its own struct binderfs_info that tracks
  specific details about a binderfs instance:
  - ipc namespace
  - dentry of the binder-control device
  - root uid and root gid of the user namespace the binderfs instance
    was mounted in
- mountable by user namespace root:
  binderfs can be mounted by user namespace root in a non-initial user
  namespace. The devices will be owned by user namespace root.
- binderfs binder devices without misc infrastructure:
  New binder devices associated with a binderfs mount do not use the
  full misc_register() infrastructure.
  The misc_register() infrastructure can only create new devices in the
  host's devtmpfs mount. binderfs does however only make devices appear
  under its own mountpoint and thus allocates new character device nodes
  from the inode of the root dentry of the super block. This will have
  the side-effect that binderfs specific device nodes do not appear in
  sysfs. This behavior is similar to devpts allocated pts devices and
  has no effect on the functionality of the ipc mechanism itself.

[1]: https://goo.gl/JL2tfX
[2]: program to allocate a new binderfs binder device:

     #define _GNU_SOURCE
     #include <errno.h>
     #include <fcntl.h>
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <sys/ioctl.h>
     #include <sys/stat.h>
     #include <sys/types.h>
     #include <unistd.h>
     #include <linux/android/binder_ctl.h>

     int main(int argc, char *argv[])
     {
             int fd, ret, saved_errno;
             size_t len;
             struct binderfs_device device = { 0 };

             if (argc < 2)
                     exit(EXIT_FAILURE);

             len = strlen(argv[1]);
             if (len > BINDERFS_MAX_NAME)
                     exit(EXIT_FAILURE);

             memcpy(device.name, argv[1], len);

             fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC);
             if (fd < 0) {
                     printf("%s - Failed to open binder-control device\n",
                            strerror(errno));
                     exit(EXIT_FAILURE);
             }

             ret = ioctl(fd, BINDER_CTL_ADD, &device);
             saved_errno = errno;
             close(fd);
             errno = saved_errno;
             if (ret < 0) {
                     printf("%s - Failed to allocate new binder device\n",
                            strerror(errno));
                     exit(EXIT_FAILURE);
             }

             printf("Allocated new binder device with major %d, minor %d, and "
                    "name %s\n", device.major, device.minor,
                    device.name);

             exit(EXIT_SUCCESS);
     }

Cc: Martijn Coenen <maco@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19 09:40:13 +01:00