Fixes sparse issues reported by the kbuild test robot running
on https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
char-misc-testing: bde4a19fc0 ("binder: use userspace pointer as base
of buffer space")
Error output (drivers/android/binder_alloc_selftest.c):
sparse: warning: incorrect type in assignment (different address spaces)
sparse: expected void *page_addr
sparse: got void [noderef] <asn:1> *user_data
sparse: error: subtraction of different types can't work
Fixed by adding necessary "__user" tags.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that alloc->buffer points to the userspace vm_area
rename buffer->data to buffer->user_data and rename
local pointers that hold user addresses. Also use the
"__user" tag to annotate all user pointers so sparse
can flag cases where user pointer vaues are copied to
kernel pointers. Refactor code to use offsets instead
of user pointers.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove user_buffer_offset since there is no kernel
buffer pointer anymore.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the kernel's vm_area and the code that maps
buffer pages into it.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Refactor the functions to validate and fixup struct
binder_buffer pointer objects to avoid using vm_area
pointers. Instead copy to/from kernel space using
binder_alloc_copy_to_buffer() and
binder_alloc_copy_from_buffer(). The following
functions were refactored:
refactor binder_validate_ptr()
binder_validate_fixup()
binder_fixup_parent()
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When creating or tearing down a transaction, the binder driver
examines objects in the buffer and takes appropriate action.
To do this without needing to dereference pointers into the
buffer, the local copies of the objects are needed. This patch
introduces a function to validate and copy binder objects
from the buffer to a local structure.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid vm_area when copying to or from binder buffers.
Instead, new copy functions are added that copy from
kernel space to binder buffer space. These use
kmap_atomic() and kunmap_atomic() to create temporary
mappings and then memcpy() is used to copy within
that page.
Also, kmap_atomic() / kunmap_atomic() use the appropriate
cache flushing to support VIVT cache architectures.
Allow binder to build if CPU_CACHE_VIVT is defined.
Several uses of the new functions are added here. More
to follow in subsequent patches.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binder driver uses a vm_area to map the per-process
binder buffer space. For 32-bit android devices, this is
now taking too much vmalloc space. This patch removes
the use of vm_area when copying the transaction data
from the sender to the buffer space. Instead of using
copy_from_user() for multi-page copies, it now uses
binder_alloc_copy_user_to_buffer() which uses kmap()
and kunmap() to map each page, and uses copy_from_user()
for copying to that page.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binderfs should not have a separate device_initcall(). When a kernel is
compiled with CONFIG_ANDROID_BINDERFS register the filesystem alongside
CONFIG_ANDROID_IPC. This use-case is especially sensible when users specify
CONFIG_ANDROID_IPC=y, CONFIG_ANDROID_BINDERFS=y and
ANDROID_BINDER_DEVICES="".
When CONFIG_ANDROID_BINDERFS=n then this always succeeds so there's no
regression potential for legacy workloads.
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently adhere to the reserved devices limit when creating new
binderfs devices in binderfs instances not located in the inital ipc
namespace. But it is still possible to rob the host instances of their 4
reserved devices by creating the maximum allowed number of devices in a
single binderfs instance located in a non-initial ipc namespace and then
mounting 4 separate binderfs instances in non-initial ipc namespaces. That
happens because the limit is currently not respected for the creation of
the initial binder-control device node. Block this nonsense by performing
the same check in binderfs_binder_ctl_create() that we perform in
binderfs_binder_device_create().
Fixes: 36bdf3cae0 ("binderfs: reserve devices for initial mount")
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To allow servers to verify client identity, allow a node
flag to be set that causes the sender's security context
to be delivered with the transaction. The BR_TRANSACTION
command is extended in BR_TRANSACTION_SEC_CTX to
contain a pointer to the security context string.
Signed-off-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binderfs_binder_ctl_create() call is a no-op on subsequent calls and
the first call is done before we unlock the suberblock. Hence, there is no
need to take inode_lock() in there. Let's remove it.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al pointed out that first calling kill_litter_super() before cleaning up
info is more correct since destroying info doesn't depend on the state of
the dentries and inodes. That the opposite remains true is not guaranteed.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- switch from d_alloc_name() + d_lookup() to lookup_one_len():
Instead of using d_alloc_name() and then doing a d_lookup() with the
allocated dentry to find whether a device with the name we're trying to
create already exists switch to using lookup_one_len(). The latter will
either return the existing dentry or a new one.
- switch from kmalloc() + strscpy() to kmemdup():
Use a more idiomatic way to copy the name for the new dentry that
userspace gave us.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al pointed out that on binderfs_fill_super() error
deactivate_locked_super() will call binderfs_kill_super() so all of the
freeing and putting we currently do in binderfs_fill_super() is unnecessary
and buggy. Let's simply return errors and let binderfs_fill_super() take
care of cleaning up on error.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- make binderfs control dentry immutable:
We don't allow to unlink it since it is crucial for binderfs to be
useable but if we allow to rename it we make the unlink trivial to
bypass. So prevent renaming too and simply treat the control dentry as
immutable.
- add is_binderfs_control_device() helper:
Take the opportunity and turn the check for the control dentry into a
separate helper is_binderfs_control_device() since it's now used in two
places.
- simplify binderfs_rename():
Instead of hand-rolling our custom version of simple_rename() just dumb
the whole function down to first check whether we're trying to rename the
control dentry. If we do EPERM the caller and if not call simple_rename().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The comment stems from an early version of that patchset and is just
confusing now.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix to return a negative error code -ENOMEM from the new_inode() and
d_make_root() error handling cases instead of 0, as done elsewhere in
this function.
Fixes: 849d540ddf ("binderfs: implement "max" mount option")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kbuild reported a build faile in [1]. This is triggered when CONFIG_IPC_NS
is not set. So let's make the use of init_ipc_ns conditional on
CONFIG_IPC_NS being set.
[1]: https://lists.01.org/pipermail/kbuild-all/2019-January/056903.html
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binderfs instance in the initial ipc namespace will always have a
reserve of 4 binder devices unless explicitly capped by specifying a lower
value via the "max" mount option.
This ensures when binder devices are removed (on accident or on purpose)
they can always be recreated without risking that all minor numbers have
already been used up.
Cc: Todd Kjos <tkjos@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It doesn't make sense to call the header binder_ctl.h when its sole
existence is tied to binderfs. So give it a sensible name. Users will far
more easily remember binderfs.h than binder_ctl.h.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since binderfs can be mounted by userns root in non-initial user namespaces
some precautions are in order. First, a way to set a maximum on the number
of binder devices that can be allocated per binderfs instance and second, a
way to reserve a reasonable chunk of binderfs devices for the initial ipc
namespace.
A first approach as seen in [1] used sysctls similiar to devpts but was
shown to be flawed (cf. [2] and [3]) since some aspects were unneeded. This
is an alternative approach which avoids sysctls completely and instead
switches to a single mount option.
Starting with this commit binderfs instances can be mounted with a limit on
the number of binder devices that can be allocated. The max=<count> mount
option serves as a per-instance limit. If max=<count> is set then only
<count> number of binder devices can be allocated in this binderfs
instance.
This allows to safely bind-mount binderfs instances into unprivileged user
namespaces since userns root in a non-initial user namespace cannot change
the mount option as long as it does not own the mount namespace the
binderfs mount was created in and hence cannot drain the host of minor
device numbers
[1]: https://lore.kernel.org/lkml/20181221133909.18794-1-christian@brauner.io/
[2]; https://lore.kernel.org/lkml/20181221163316.GA8517@kroah.com/
[3]: https://lore.kernel.org/lkml/CAHRSSEx+gDVW4fKKK8oZNAir9G5icJLyodO8hykv3O0O1jt2FQ@mail.gmail.com/
[4]: https://lore.kernel.org/lkml/20181221192044.5yvfnuri7gdop4rs@brauner.io/
Cc: Todd Kjos <tkjos@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When currently mounting binderfs in the same ipc namespace twice:
mount -t binder binder /A
mount -t binder binder /B
then the binderfs instances mounted on /A and /B will be the same, i.e.
they will have the same superblock. This was the first approach that seemed
reasonable. However, this leads to some problems and inconsistencies:
/* private binderfs instance in same ipc namespace */
There is no way for a user to request a private binderfs instance in the
same ipc namespace.
This request has been made in a private mail to me by two independent
people.
/* bind-mounts */
If users want the same binderfs instance to appear in multiple places they
can use bind mounts. So there is no value in having a request for a new
binderfs mount giving them the same instance.
/* unexpected behavior */
It's surprising that request to mount binderfs is not giving the user a new
instance like tmpfs, devpts, ramfs, and others do.
/* past mistakes */
Other pseudo-filesystems once made the same mistakes of giving back the
same superblock when actually requesting a new mount (cf. devpts's
deprecated "newinstance" option).
We should not make the same mistake. Once we've committed to always giving
back the same superblock in the same IPC namespace with the next kernel
release we will not be able to make that change so better to do it now.
/* kdbusfs */
It was pointed out to me that kdbusfs - which is conceptually closely
related to binderfs - also allowed users to get a private kdbusfs instance
in the same IPC namespace by making each mount of kdbusfs a separate
instance. I think that makes a lot of sense.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binderfs filesystem never needs to be mounted by the kernel itself.
This is conceptually wrong and should never have been done in the first
place.
Fixes: 3ad20fe393 ("binder: implement binderfs")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
implementation of binderfs.
/* Abstract */
binderfs is a backwards-compatible filesystem for Android's binder ipc
mechanism. Each ipc namespace will mount a new binderfs instance. Mounting
binderfs multiple times at different locations in the same ipc namespace
will not cause a new super block to be allocated and hence it will be the
same filesystem instance.
Each new binderfs mount will have its own set of binder devices only
visible in the ipc namespace it has been mounted in. All devices in a new
binderfs mount will follow the scheme binder%d and numbering will always
start at 0.
/* Backwards compatibility */
Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the
initial ipc namespace will work as before. They will be registered via
misc_register() and appear in the devtmpfs mount. Specifically, the
standard devices binder, hwbinder, and vndbinder will all appear in their
standard locations in /dev. Mounting or unmounting the binderfs mount in
the initial ipc namespace will have no effect on these devices, i.e. they
will neither show up in the binderfs mount nor will they disappear when the
binderfs mount is gone.
/* binder-control */
Each new binderfs instance comes with a binder-control device. No other
devices will be present at first. The binder-control device can be used to
dynamically allocate binder devices. All requests operate on the binderfs
mount the binder-control device resides in.
Assuming a new instance of binderfs has been mounted at /dev/binderfs
via mount -t binderfs binderfs /dev/binderfs. Then a request to create a
new binder device can be made as illustrated in [2].
Binderfs devices can simply be removed via unlink().
/* Implementation details */
- dynamic major number allocation:
When binderfs is registered as a new filesystem it will dynamically
allocate a new major number. The allocated major number will be returned
in struct binderfs_device when a new binder device is allocated.
- global minor number tracking:
Minor are tracked in a global idr struct that is capped at
BINDERFS_MAX_MINOR. The minor number tracker is protected by a global
mutex. This is the only point of contention between binderfs mounts.
- struct binderfs_info:
Each binderfs super block has its own struct binderfs_info that tracks
specific details about a binderfs instance:
- ipc namespace
- dentry of the binder-control device
- root uid and root gid of the user namespace the binderfs instance
was mounted in
- mountable by user namespace root:
binderfs can be mounted by user namespace root in a non-initial user
namespace. The devices will be owned by user namespace root.
- binderfs binder devices without misc infrastructure:
New binder devices associated with a binderfs mount do not use the
full misc_register() infrastructure.
The misc_register() infrastructure can only create new devices in the
host's devtmpfs mount. binderfs does however only make devices appear
under its own mountpoint and thus allocates new character device nodes
from the inode of the root dentry of the super block. This will have
the side-effect that binderfs specific device nodes do not appear in
sysfs. This behavior is similar to devpts allocated pts devices and
has no effect on the functionality of the ipc mechanism itself.
[1]: https://goo.gl/JL2tfX
[2]: program to allocate a new binderfs binder device:
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/android/binder_ctl.h>
int main(int argc, char *argv[])
{
int fd, ret, saved_errno;
size_t len;
struct binderfs_device device = { 0 };
if (argc < 2)
exit(EXIT_FAILURE);
len = strlen(argv[1]);
if (len > BINDERFS_MAX_NAME)
exit(EXIT_FAILURE);
memcpy(device.name, argv[1], len);
fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC);
if (fd < 0) {
printf("%s - Failed to open binder-control device\n",
strerror(errno));
exit(EXIT_FAILURE);
}
ret = ioctl(fd, BINDER_CTL_ADD, &device);
saved_errno = errno;
close(fd);
errno = saved_errno;
if (ret < 0) {
printf("%s - Failed to allocate new binder device\n",
strerror(errno));
exit(EXIT_FAILURE);
}
printf("Allocated new binder device with major %d, minor %d, and "
"name %s\n", device.major, device.minor,
device.name);
exit(EXIT_SUCCESS);
}
Cc: Martijn Coenen <maco@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
44d8047f1d ("binder: use standard functions to allocate fds")
exposed a pre-existing issue in the binder driver.
fdget() is used in ksys_ioctl() as a performance optimization.
One of the rules associated with fdget() is that ksys_close() must
not be called between the fdget() and the fdput(). There is a case
where this requirement is not met in the binder driver which results
in the reference count dropping to 0 when the device is still in
use. This can result in use-after-free or other issues.
If userpace has passed a file-descriptor for the binder driver using
a BINDER_TYPE_FDA object, then kys_close() is called on it when
handling a binder_ioctl(BC_FREE_BUFFER) command. This violates
the assumptions for using fdget().
The problem is fixed by deferring the close using task_work_add(). A
new variant of __close_fd() was created that returns a struct file
with a reference. The fput() is deferred instead of using ksys_close().
Fixes: 44d8047f1d ("binder: use standard functions to allocate fds")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When dumping out binder transactions via a debug node,
the output is too verbose if a process has many nodes.
Change the output for transaction dumps to only display
nodes with pending async transactions.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We already have the DEFINE_SHOW_ATTRIBUTE.There is no need to define
such a macro,so remove BINDER_DEBUG_ENTRY.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Todd Kjos <tkjos@android.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add __acquire()/__release() annnotations to fix warnings
in sparse context checking
There is one case where the warning was due to a lack of
a "default:" case in a switch statement where a lock was
being released in each of the cases, so the default
case was added.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Malicious code can attempt to free buffers using the BC_FREE_BUFFER
ioctl to binder. There are protections against a user freeing a buffer
while in use by the kernel, however there was a window where
BC_FREE_BUFFER could be used to free a recently allocated buffer that
was not completely initialized. This resulted in a use-after-free
detected by KASAN with a malicious test program.
This window is closed by setting the buffer's allow_user_free attribute
to 0 when the buffer is allocated or when the user has previously freed
it instead of waiting for the caller to set it. The problem was that
when the struct buffer was recycled, allow_user_free was stale and set
to 1 allowing a free to go through.
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Arve Hjønnevåg <arve@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes the following sparse warning:
drivers/android/binder.c:3312:1: warning:
symbol 'binder_free_buf' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.
Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Binder uses internal fs interfaces to allocate and install fds:
__alloc_fd
__fd_install
__close_fd
get_files_struct
put_files_struct
These were used to support the passing of fds between processes
as part of a transaction. The actual allocation and installation
of the fds in the target process was handled by the sending
process so the standard functions, alloc_fd() and fd_install()
which assume task==current couldn't be used.
This patch refactors this mechanism so that the fds are
allocated and installed by the target process allowing the
standard functions to be used.
The sender now creates a list of fd fixups that contains the
struct *file and the address to fixup with the new fd once
it is allocated. This list is processed by the target process
when the transaction is dequeued.
A new error case is introduced by this change. If an async
transaction with file descriptors cannot allocate new
fds in the target (probably due to out of file descriptors),
the transaction is discarded with a log message. In the old
implementation this would have been detected in the sender
context and failed prior to sending.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a process dies, failed reply is sent to the sender of any transaction
queued on a dead thread's todo list. The sender asserts that the
received failed reply corresponds to the head of the transaction stack.
This assert can fail if the dead thread is allowed to send outgoing
transactions when there is already a transaction on its todo list,
because this new transaction can end up on the transaction stack of the
original sender. The following steps illustrate how this assertion can
fail.
1. Thread1 sends txn19 to Thread2
(T1->transaction_stack=txn19, T2->todo+=txn19)
2. Without processing todo list, Thread2 sends txn20 to Thread1
(T1->todo+=txn20, T2->transaction_stack=txn20)
3. T1 processes txn20 on its todo list
(T1->transaction_stack=txn20->txn19, T1->todo=<empty>)
4. T2 dies, T2->todo cleanup attempts to send failed reply for txn19, but
T1->transaction_stack points to txn20 -- assertion failes
Step 2. is the incorrect behavior. When there is a transaction on a
thread's todo list, this thread should not be able to send any outgoing
synchronous transactions. Only the head of the todo list needs to be
checked because only threads that are waiting for proc work can directly
receive work from another thread, and no work is allowed to be queued
on such a thread without waking up the thread. This patch also enforces
that a thread is not waiting for proc work when a work is directly
enqueued to its todo list.
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Reviewed-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is RaceFuzzer report like below because we have no lock to close
below the race between binder_mmap and binder_alloc_new_buf_locked.
To close the race, let's use memory barrier so that if someone see
alloc->vma is not NULL, alloc->vma_vm_mm should be never NULL.
(I didn't add stable mark intentionallybecause standard android
userspace libraries that interact with binder (libbinder & libhwbinder)
prevent the mmap/ioctl race. - from Todd)
"
Thread interleaving:
CPU0 (binder_alloc_mmap_handler) CPU1 (binder_alloc_new_buf_locked)
===== =====
// drivers/android/binder_alloc.c
// #L718 (v4.18-rc3)
alloc->vma = vma;
// drivers/android/binder_alloc.c
// #L346 (v4.18-rc3)
if (alloc->vma == NULL) {
...
// alloc->vma is not NULL at this point
return ERR_PTR(-ESRCH);
}
...
// #L438
binder_update_page_range(alloc, 0,
(void *)PAGE_ALIGN((uintptr_t)buffer->data),
end_page_addr);
// In binder_update_page_range() #L218
// But still alloc->vma_vm_mm is NULL here
if (need_mm && mmget_not_zero(alloc->vma_vm_mm))
alloc->vma_vm_mm = vma->vm_mm;
Crash Log:
==================================================================
BUG: KASAN: null-ptr-deref in __atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline]
BUG: KASAN: null-ptr-deref in atomic_add_unless include/linux/atomic.h:533 [inline]
BUG: KASAN: null-ptr-deref in mmget_not_zero include/linux/sched/mm.h:75 [inline]
BUG: KASAN: null-ptr-deref in binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218
Write of size 4 at addr 0000000000000058 by task syz-executor0/11184
CPU: 1 PID: 11184 Comm: syz-executor0 Not tainted 4.18.0-rc3 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x16e/0x22c lib/dump_stack.c:113
kasan_report_error mm/kasan/report.c:352 [inline]
kasan_report+0x163/0x380 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x140/0x1a0 mm/kasan/kasan.c:267
kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
__atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline]
atomic_add_unless include/linux/atomic.h:533 [inline]
mmget_not_zero include/linux/sched/mm.h:75 [inline]
binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218
binder_alloc_new_buf_locked drivers/android/binder_alloc.c:443 [inline]
binder_alloc_new_buf+0x467/0xc30 drivers/android/binder_alloc.c:513
binder_transaction+0x125b/0x4fb0 drivers/android/binder.c:2957
binder_thread_write+0xc08/0x2770 drivers/android/binder.c:3528
binder_ioctl_write_read.isra.39+0x24f/0x8e0 drivers/android/binder.c:4456
binder_ioctl+0xa86/0xf34 drivers/android/binder.c:4596
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x154/0xd40 fs/ioctl.c:686
ksys_ioctl+0x94/0xb0 fs/ioctl.c:701
__do_sys_ioctl fs/ioctl.c:708 [inline]
__se_sys_ioctl fs/ioctl.c:706 [inline]
__x64_sys_ioctl+0x43/0x50 fs/ioctl.c:706
do_syscall_64+0x167/0x4b0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
"
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If asm/cacheflush.h is included first, the following build warnings are
seen with sparc32 builds.
In file included from arch/sparc/include/asm/cacheflush.h:11:0,
from drivers/android/binder.c:54:
arch/sparc/include/asm/cacheflush_32.h:40:37: warning:
'struct page' declared inside parameter list will not be visible
outside of this definition or declaration
Moving the asm/ include after linux/ includes solves the problem.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If asm/cacheflush.h is included first, the following build warnings are
seen with sparc32 builds.
In file included from ./arch/sparc/include/asm/cacheflush.h:11:0,
from drivers/android/binder_alloc.c:20:
./arch/sparc/include/asm/cacheflush_32.h:40:37: warning:
'struct page' declared inside parameter list
Moving the asm/ include after linux/ includes fixes the problem.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As of commit 7124330dab ("m68k/uaccess: Revive 64-bit
get_user()"), the 64-bit Android binder interface builds fine on m68k.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use new return type vm_fault_t for fault handler in
struct vm_operations_struct. For now, this is just
documenting that the function returns a VM_FAULT
value rather than an errno. Once all instances are
converted, vm_fault_t will become a distinct type.
Reference id -> 1c8f422059 ("mm: change return type
to vm_fault_t")
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_update_page_range needs down_write of mmap_sem because
vm_insert_page need to change vma->vm_flags to VM_MIXEDMAP unless
it is set. However, when I profile binder working, it seems
every binder buffers should be mapped in advance by binder_mmap.
It means we could set VM_MIXEDMAP in binder_mmap time which is
already hold a mmap_sem as down_write so binder_update_page_range
doesn't need to hold a mmap_sem as down_write.
Please use proper API down_read. It would help mmap_sem contention
problem as well as fixing down_write abuse.
Ganesh Mahendran tested app launching and binder throughput test
and he said he couldn't find any problem and I did binder latency
test per Greg KH request(Thanks Martijn to teach me how I can do)
I cannot find any problem, too.
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@google.com>
Reviewed-by: Martijn Coenen <maco@android.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When to execute binder_stat_br the e->cmd has been modifying as BR_OK
instead of the original return error cmd, in fact we want to know the
original return error, such as BR_DEAD_REPLY or BR_FAILED_REPLY, etc.
instead of always BR_OK, in order to avoid the value of the e->cmd is
always BR_OK, so we need assign the value of the e->cmd to cmd before
e->cmd = BR_OK.
Signed-off-by: songjinshi <songjinshi@xiaomi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
New devices launching with Android P need to use the 64-bit
binder interface, even on 32-bit SoCs [0].
This change removes the Kconfig option to select the 32-bit
binder interface. We don't think this will affect existing
userspace for the following reasons:
1) The latest Android common tree is 4.14, so we don't
believe any Android devices are on kernels >4.14.
2) Android devices launch on an LTS release and stick with
it, so we wouldn't expect devices running on <= 4.14 now
to upgrade to 4.17 or later. But even if they did, they'd
rebuild the world (kernel + userspace) anyway.
3) Other userspaces like 'anbox' are already using the
64-bit interface.
Note that this change doesn't remove the 32-bit UAPI
itself; the reason for that is that Android userspace
always uses the latest UAPI headers from upstream, and
userspace retains 32-bit support for devices that are
upgrading. This will be removed as well in 2-3 years,
at which point we can remove the code from the UAPI
as well.
Finally, this change introduces build errors on archs where
64-bit get_user/put_user is not supported, so make binder
unavailable on m68k (which wouldn't want it anyway).
[0]: https://android-review.googlesource.com/c/platform/build/+/595193
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It doesn't make any difference to runtime but I've switched these two
checks to make my static checker happy.
The problem is that "buffer->data_size" is user controlled and if it's
less than "sizeo(*hdr)" then that means "offset" can be more than
"buffer->data_size". It's just cleaner to check it in the other order.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This can't happen with normal nodes (because you can't get a ref
to a node you own), but it could happen with the context manager;
to make the behavior consistent with regular nodes, reject
transactions into the context manager by the process owning it.
Reported-by: syzbot+09e05aba06723a94d43d@syzkaller.appspotmail.com
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To prevent races with ep_remove_waitqueue() removing the
waitqueue at the same time.
Reported-by: syzbot+a2a3c4909716e271487e@syzkaller.appspotmail.com
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The format specifier "%p" can leak kernel addresses. Use
"%pK" instead. There were 4 remaining cases in binder.c.
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_send_failed_reply() is called when a synchronous
transaction fails. It reports an error to the thread that
is waiting for the completion. Given that the transaction
is synchronous, there should never be more than 1 error
response to that thread -- this was being asserted with
a WARN().
However, when exercising the driver with syzbot tests, cases
were observed where multiple "synchronous" requests were
sent without waiting for responses, so it is possible that
multiple errors would be reported to the thread. This testing
was conducted with panic_on_warn set which forced the crash.
This is easily reproduced by sending back-to-back
"synchronous" transactions without checking for any
response (eg, set read_size to 0):
bwr.write_buffer = (uintptr_t)&bc1;
bwr.write_size = sizeof(bc1);
bwr.read_buffer = (uintptr_t)&br;
bwr.read_size = 0;
ioctl(fd, BINDER_WRITE_READ, &bwr);
sleep(1);
bwr2.write_buffer = (uintptr_t)&bc2;
bwr2.write_size = sizeof(bc2);
bwr2.read_buffer = (uintptr_t)&br;
bwr2.read_size = 0;
ioctl(fd, BINDER_WRITE_READ, &bwr2);
sleep(1);
The first transaction is sent to the servicemanager and the reply
fails because no VMA is set up by this client. After
binder_send_failed_reply() is called, the BINDER_WORK_RETURN_ERROR
is sitting on the thread's todo list since the read_size was 0 and
the client is not waiting for a response.
The 2nd transaction is sent and the BINDER_WORK_RETURN_ERROR has not
been consumed, so the thread's reply_error.cmd is still set (normally
cleared when the BINDER_WORK_RETURN_ERROR is handled). Therefore
when the servicemanager attempts to reply to the 2nd failed
transaction, the error is already set and it triggers this warning.
This is a user error since it is not waiting for the synchronous
transaction to complete. If it ever does check, it will see an
error.
Changed the WARN() to a pr_warn().
Signed-off-by: Todd Kjos <tkjos@android.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the kzalloc() in binder_get_thread() fails, binder_poll()
dereferences the resulting NULL pointer.
Fix it by returning POLLERR if the memory allocation failed.
This bug was found by syzkaller using fault injection.
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android binder
fixes, the usual handful of hyper-v updates, and lots of other smaller
driver updates.
All of these have been in linux-next for a long time, with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLuZw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynS4QCcCrPmwfD5PJwaF+q2dPfyKaflkQMAn0x6Wd+u
Gw3Z2scgjETUpwJ9ilnL
=xcQ0
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android
binder fixes, the usual handful of hyper-v updates, and lots of other
smaller driver updates.
All of these have been in linux-next for a long time, with no reported
issues"
* tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits)
char: lp: use true or false for boolean values
android: binder: use VM_ALLOC to get vm area
android: binder: Use true and false for boolean values
lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN
EISA: Delete error message for a failed memory allocation in eisa_probe()
EISA: Whitespace cleanup
misc: remove AVR32 dependencies
virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES
soundwire: Fix a signedness bug
uio_hv_generic: fix new type mismatch warnings
uio_hv_generic: fix type mismatch warnings
auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
uio_hv_generic: add rescind support
uio_hv_generic: check that host supports monitor page
uio_hv_generic: create send and receive buffers
uio: document uio_hv_generic regions
doc: fix documentation about uio_hv_generic
vmbus: add monitor_id and subchannel_id to sysfs per channel
vmbus: fix ABI documentation
uio_hv_generic: use ISR callback method
...
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
VM_IOREMAP is used to access hardware through a mechanism called
I/O mapped memory. Android binder is a IPC machanism which will
not access I/O memory.
And VM_IOREMAP has alignment requiement which may not needed in
binder.
__get_vm_area_node()
{
...
if (flags & VM_IOREMAP)
align = 1ul << clamp_t(int, fls_long(size),
PAGE_SHIFT, IOREMAP_MAX_ORDER);
...
}
This patch will save some kernel vm area, especially for 32bit os.
In 32bit OS, kernel vm area is only 240MB. We may got below
error when launching a app:
<3>[ 4482.440053] binder_alloc: binder_alloc_mmap_handler: 15728 8ce67000-8cf65000 get_vm_area failed -12
<3>[ 4483.218817] binder_alloc: binder_alloc_mmap_handler: 15745 8ce67000-8cf65000 get_vm_area failed -12
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Martijn Coenen <maco@android.com>
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
----
V3: update comments
V2: update comments
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
checkpatch warns against the use of symbolic permissions,
this patch migrates all symbolic permissions in the binder
driver to octal permissions.
Test: debugfs nodes created by binder have the same unix
permissions prior to and after this patch was applied.
Signed-off-by: Harsh Shandilya <harsh@prjkt.io>
Cc: "Arve Hjønnevåg" <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_poll() passes the thread->wait waitqueue that
can be slept on for work. When a thread that uses
epoll explicitly exits using BINDER_THREAD_EXIT,
the waitqueue is freed, but it is never removed
from the corresponding epoll data structure. When
the process subsequently exits, the epoll cleanup
code tries to access the waitlist, which results in
a use-after-free.
Prevent this by using POLLFREE when the thread exits.
Signed-off-by: Martijn Coenen <maco@android.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function binder_alloc_new_buf_locked() is only used in this file, so
make it static. Also clean up sparse warning:
drivers/android/binder_alloc.c:330:23: warning: no previous prototype
for ‘binder_alloc_new_buf_locked’ [-Wmissing-prototypes]
In addition, the line of the function name exceeds 80 characters when
add static for this function, hence indent its arguments anew.
Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both list_lru_init() and register_shrinker() might return an error.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sherry Yang <sherryy@android.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
proc->files cleanup is initiated by binder_vma_close. Therefore
a reference on the binder_proc is not enough to prevent the
files_struct from being released while the binder_proc still has
a reference. This can lead to an attempt to dereference the
stale pointer obtained from proc->files prior to proc->files
cleanup. This has been seen once in task_get_unused_fd_flags()
when __alloc_fd() is called with a stale "files".
The fix is to protect proc->files with a mutex to prevent cleanup
while in use.
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a call to put_user() fails, we failed to
properly free a transaction and send a failed
reply (if necessary).
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This flag determines whether the thread should currently
process the work in the thread->todo worklist.
The prime usecase for this is improving the performance
of synchronous transactions: all synchronous transactions
post a BR_TRANSACTION_COMPLETE to the calling thread,
but there's no reason to return that command to userspace
right away - userspace anyway needs to wait for the reply.
Likewise, a synchronous transaction that contains a binder
object can cause a BC_ACQUIRE/BC_INCREFS to be returned to
userspace; since the caller must anyway hold a strong/weak
ref for the duration of the call, postponing these commands
until the reply comes in is not a problem.
Note that this flag is not used to determine whether a
thread can handle process work; a thread should never pick
up process work when thread work is still pending.
Before patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 45959 ns 20288 ns 34351
BM_sendVec_binderize/8 45603 ns 20080 ns 34909
BM_sendVec_binderize/16 45528 ns 20113 ns 34863
BM_sendVec_binderize/32 45551 ns 20122 ns 34881
BM_sendVec_binderize/64 45701 ns 20183 ns 34864
BM_sendVec_binderize/128 45824 ns 20250 ns 34576
BM_sendVec_binderize/256 45695 ns 20171 ns 34759
BM_sendVec_binderize/512 45743 ns 20211 ns 34489
BM_sendVec_binderize/1024 46169 ns 20430 ns 34081
After patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 42939 ns 17262 ns 40653
BM_sendVec_binderize/8 42823 ns 17243 ns 40671
BM_sendVec_binderize/16 42898 ns 17243 ns 40594
BM_sendVec_binderize/32 42838 ns 17267 ns 40527
BM_sendVec_binderize/64 42854 ns 17249 ns 40379
BM_sendVec_binderize/128 42881 ns 17288 ns 40427
BM_sendVec_binderize/256 42917 ns 17297 ns 40429
BM_sendVec_binderize/512 43184 ns 17395 ns 40411
BM_sendVec_binderize/1024 43119 ns 17357 ns 40432
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Show the high watermark of the index into the alloc->pages
array, to facilitate sizing the buffer on a per-process
basis.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here is the big set of char/misc and other driver subsystem patches for
4.15-rc1.
There are small changes all over here, hyperv driver updates, pcmcia
driver updates, w1 driver updats, vme driver updates, nvmem driver
updates, and lots of other little one-off driver updates as well. The
shortlog has the full details.
Note, there will be a merge conflict in drivers/misc/lkdtm_core.c when
merging to your tree as one lkdtm patch came in through the perf tree as
well as this one. The resolution is to take the const change that this
tree provides.
All of these have been in linux-next for quite a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWg2Lnw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymTUwCgwp46+I8yPlgDH8oe5TxyyJnpdHQAn1XW0i+a
sBi6WS87In5v1QO1Rgfc
=dH2a
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
"Here is the big set of char/misc and other driver subsystem patches
for 4.15-rc1.
There are small changes all over here, hyperv driver updates, pcmcia
driver updates, w1 driver updats, vme driver updates, nvmem driver
updates, and lots of other little one-off driver updates as well. The
shortlog has the full details.
All of these have been in linux-next for quite a while with no
reported issues"
* tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
VME: Return -EBUSY when DMA list in use
w1: keep balance of mutex locks and refcnts
MAINTAINERS: Update VME subsystem tree.
nvmem: sunxi-sid: add support for A64/H5's SID controller
nvmem: imx-ocotp: Update module description
nvmem: imx-ocotp: Enable i.MX7D OTP write support
nvmem: imx-ocotp: Add i.MX7D timing write clock setup support
nvmem: imx-ocotp: Move i.MX6 write clock setup to dedicated function
nvmem: imx-ocotp: Add support for banked OTP addressing
nvmem: imx-ocotp: Pass parameters via a struct
nvmem: imx-ocotp: Restrict OTP write to IMX6 processors
nvmem: uniphier: add UniPhier eFuse driver
dt-bindings: nvmem: add description for UniPhier eFuse
nvmem: set nvmem->owner to nvmem->dev->driver->owner if unset
nvmem: qfprom: fix different address space warnings of sparse
nvmem: mtk-efuse: fix different address space warnings of sparse
nvmem: mtk-efuse: use stack for nvmem_config instead of malloc'ing it
nvmem: imx-iim: use stack for nvmem_config instead of malloc'ing it
thunderbolt: tb: fix use after free in tb_activate_pcie_devices
MAINTAINERS: Add git tree for Thunderbolt development
...
Summary of modules changes for the 4.15 merge window:
- Treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- Minor code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
fbGX8juZWP4ypbK+irTB
=c8vb
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
We want the driver fixes in here and this resolves a merge issue with
the binder driver.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't access next->data in kernel debug message when the
next buffer is null.
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use binder_alloc struct's mm_struct rather than getting
a reference to the mm struct through get_task_mm to
avoid a potential deadlock between lru lock, task lock and
dentry lock, since a thread can be holding the task lock
and the dentry lock while trying to acquire the lru lock.
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_shrinker struct is not used anywhere outside of
binder_alloc.c and should be static.
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vma argument in update_page_range is no longer
used after 74310e06 ("android: binder: Move buffer
out of area shared with user space"), since mmap_handler
no longer calls update_page_range with a vma.
Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because we're not guaranteed that subsequent calls
to poll() will have a poll_table_struct parameter
with _qproc set. When _qproc is not set, poll_wait()
is a noop, and we won't be woken up correctly.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here are 4 patches to resolve some char/misc driver issues found these
past weeks.
One of them is a mei bugfix and another is a new mei device id. There
is also a hyper-v fix for a reported issue, and a binder issue fix for a
problem reported by a few people.
All of these have been in my tree for a while, I don't know if
linux-next is really testing much this month. But 0-day is happy with
them :)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWeNCTw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykNOQCgzskhoji+7Fsl+xc3SX/3ntJwoecAnAz5xQIA
grpH4IOHq/0aPbQj8g3Y
=F3xc
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are 4 patches to resolve some char/misc driver issues found these
past weeks.
One of them is a mei bugfix and another is a new mei device id. There
is also a hyper-v fix for a reported issue, and a binder issue fix for
a problem reported by a few people.
All of these have been in my tree for a while, I don't know if
linux-next is really testing much this month. But 0-day is happy with
them :)"
* tag 'char-misc-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: fix use-after-free in binder_transaction()
Drivers: hv: vmbus: Fix bugs in rescind handling
mei: me: add gemini lake devices id
mei: always use domain runtime pm callbacks.
User-space normally keeps the node alive when creating a transaction
since it has a reference to the target. The local strong ref keeps it
alive if the sending process dies before the target process processes
the transaction. If the source process is malicious or has a reference
counting bug, this can fail.
In this case, when we attempt to decrement the node in the failure
path, the node has already been freed.
This is fixed by taking a tmpref on the node while constructing
the transaction. To avoid re-acquiring the node lock and inner
proc lock to increment the proc's tmpref, a helper is used that
does the ref increments on both the node and proc.
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drop the global lru lock in isolate callback before calling
zap_page_range which calls cond_resched, and re-acquire the global lru
lock before returning. Also change return code to LRU_REMOVED_RETRY.
Use mmput_async when fail to acquire mmap sem in an atomic context.
Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
Also restore mmput_async, which was initially introduced in commit
ec8d7c14ea ("mm, oom_reaper: do not mmput synchronously from the oom
reaper context"), and was removed in commit 2129258024 ("mm: oom: let
oom_reap_task and exit_mmap run concurrently").
Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com
Fixes: f2517eb76f ("android: binder: Add global lru shrinker to binder")
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Kyle Yan <kyan@codeaurora.org>
Acked-by: Arve Hjønnevåg <arve@android.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Martijn Coenen <maco@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Riley Andrews <riandrews@android.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hoeun Ryu <hoeun.ryu@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 7a4408c6bd ("binder: make sure accesses to proc/thread are
safe") made a change to enqueue tcomplete to thread->todo before
enqueuing the transaction. However, in err_dead_proc_or_thread case,
the tcomplete is directly freed, without dequeued. It may cause the
thread->todo list to be corrupted.
So, dequeue it before freeing.
Fixes: 7a4408c6bd ("binder: make sure accesses to proc/thread are safe")
Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 372e3147df ("binder: guarantee txn complete / errors delivered
in-order") incorrectly defined a local ret value. This ret value will
be invalid when out of the if block
Fixes: 372e3147df ("binder: refactor binder ref inc/dec for thread safety")
Signed-off-by: Xu YiPing <xuyiping@hislicon.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allowing binder to expose the 64-bit API on 32-bit kernels caused a
build warning:
drivers/android/binder.c: In function 'binder_transaction_buffer_release':
drivers/android/binder.c:2220:15: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
fd_array = (u32 *)(parent_buffer + fda->parent_offset);
^
drivers/android/binder.c: In function 'binder_translate_fd_array':
drivers/android/binder.c:2445:13: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
fd_array = (u32 *)(parent_buffer + fda->parent_offset);
^
drivers/android/binder.c: In function 'binder_fixup_parent':
drivers/android/binder.c:2511:18: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
This adds extra type casts to avoid the warning.
However, there is another problem with the Kconfig option: turning
it on or off creates two incompatible ABI versions, a kernel that
has this enabled cannot run user space that was built without it
or vice versa. A better solution might be to leave the option hidden
until the binder code is fixed to deal with both ABI versions.
Fixes: e8d2ed7db7 ("Revert "staging: Fix build issues with new binder API"")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This can cause issues with processes using the poll()
interface:
1) client sends two oneway transactions
2) the second one gets queued on async_todo
(because the server didn't handle the first one
yet)
3) server returns from poll(), picks up the
first transaction and does transaction work
4) server is done with the transaction, sends
BC_FREE_BUFFER, and the second transaction gets
moved to thread->todo
5) libbinder's handlePolledCommands() only handles
the commands in the current data buffer, so
doesn't see the new transaction
6) the server continues running and issues a new
outgoing transaction. Now, it suddenly finds
the incoming oneway transaction on its thread
todo, and returns that to userspace.
7) userspace does not expect this to happen; it
may be holding a lock while making the outgoing
transaction, and if handling the incoming
trasnaction requires taking the same lock,
userspace will deadlock.
By queueing the async transaction to the proc
workqueue, we make sure it's only picked up when
a thread is ready for proc work.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows userspace to request death notifications without
having to worry about getting an immediate callback on the same
thread; one scenario where this would be problematic is if the
death recipient handler grabs a lock that was already taken
earlier (eg as part of a nested transaction).
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because is_spin_locked() always returns false on UP
systems.
Use assert_spin_locked() instead, and remove the
WARN_ON() instances, since those were easy to verify.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The BINDER_GET_NODE_DEBUG_INFO ioctl will return debug info on
a node. Each successive call reusing the previous return value
will return the next node. The data will be used by
libmemunreachable to mark the pointers with kernel references
as reachable.
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of pushing new transactions to the process
waitqueue, select a thread that is waiting on proc
work to handle the transaction. This will make it
easier to improve priority inheritance in future
patches, by setting the priority before we wake up
a thread.
If we can't find a waiting thread, submit the work
to the proc waitqueue instead as we did previously.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Removes the process waitqueue, so that threads
can only wait on the thread waitqueue. Whenever
there is process work to do, pick a thread and
wake it up. Having the caller pick a thread is
helpful for things like priority inheritance.
This also fixes an issue with using epoll(),
since we no longer have to block on different
waitqueues.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the number of active, lru, and free pages for
each binder process in binder stats
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix crash introduced by 74310e06be
(android: binder: Move buffer out of area shared with user space)
when close is called after open without mmap in between.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Fixes: 74310e06be ("android: binder: Move buffer out of area shared with user space")
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add tracepoints in binder transaction allocator to
record lru hits and alloc/free page.
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hold on to the pages allocated and mapped for transaction
buffers until the system is under memory pressure. When
that happens, use linux shrinker to free pages. Without
using shrinker, patch "android: binder: Move buffer out
of area shared with user space" will cause a significant
slow down for small transactions that fit into the first
page because free list buffer header used to be inlined
with buffer data.
In addition to prevent the performance regression for
small transactions, this patch improves the performance
for transactions that take up more than one page.
Modify alloc selftest to work with the shrinker change.
Test: Run memory intensive applications (Chrome and Camera)
to trigger shrinker callbacks. Binder frees memory as expected.
Test: Run binderThroughputTest with high memory pressure
option enabled.
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Binder driver allocates buffer meta data in a region that is mapped
in user space. These meta data contain pointers in the kernel.
This patch allocates buffer meta data on the kernel heap that is
not mapped in user space, and uses a pointer to refer to the data mapped.
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_alloc_selftest tests that alloc_new_buf handles page allocation and
deallocation properly when allocate and free buffers. The test allocates 5
buffers of various sizes to cover all possible page alignment cases, and
frees the buffers using a list of exhaustive freeing order.
Test: boot the device with ANDROID_BINDER_IPC_SELFTEST config option
enabled. Allocator selftest passes.
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use helper functions buffer_next and buffer_prev instead
of list_entry to get the next and previous buffers.
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit d0bdff0db8 ("staging: Fix build issues with new
binder API"), because commit e38361d032 ("ARM: 8091/2: add get_user()
support for 8 byte types") has added the 64bit __get_user_asm_*
implementation.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26549d1774 ("binder: guarantee txn complete / errors delivered
in-order") passed the locally declared and undefined cmd
to binder_stat_br() which results in a bogus cmd field in a trace
event and BR stats are incremented incorrectly.
Change to use e->cmd which has been initialized.
Signed-off-by: Todd Kjos <tkjos@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 26549d1774 ("binder: guarantee txn complete / errors delivered in-order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On binder_init() the devices string is duplicated and smashed into individual
device names which are passed along. However, the original duplicated string
wasn't freed in case binder_init() failed. Let's free it on error.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was never used since addition of binder to linux mainstream tree.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Arve Hjønnevåg" <arve@android.com>
Cc: Riley Andrews <riandrews@android.com>
Cc: devel@driverdev.osuosl.org
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use rlimit() helper instead of manually writing whole
chain from current task to rlim_cur
Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A race existed where one thread could register
a death notification for a node, while another
thread was cleaning up that node and sending
out death notifications for its references,
causing simultaneous access to ref->death
because different locks were held.
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When printing transactions there were several race conditions
that could cause a stale pointer to be deferenced. Fixed by
reading the pointer once and using it if valid (which is
safe). The transaction buffer also needed protection via proc
lock, so it is only printed if we are holding the correct lock.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use proc->outer_lock to protect the binder_ref structure.
The outer lock allows functions operating on the binder_ref
to do nested acquires of node and inner locks as necessary
to attach refs to nodes atomically.
Binder refs must never be accesssed without holding the
outer lock.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use the inner lock to protect thread accounting fields in
proc structure: max_threads, requested_threads,
requested_threads_started and ready_threads.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This makes future changes to priority inheritance
easier, since we want to be able to look at a thread's
transaction stack when selecting a thread to inherit
priority for.
It also allows us to take just a single lock in a
few paths, where we used to take two in succession.
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
proc->threads will need to be accessed with higher
locks of other processes held so use proc->inner_lock
to protect it. proc->tmp_ref now needs to be protected
by proc->inner_lock.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When locks for binder_ref handling are added, proc->nodes
will need to be modified while holding the outer lock
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
node->node_lock is used to protect elements of node. No
need to acquire for fields that are invariant: debug_id,
ptr, cookie.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The todo lists in the proc, thread, and node structures
are accessed by other procs/threads to place work
items on the queue.
The todo lists are protected by the new proc->inner_lock.
No locks should ever be nested under these locks. As the
name suggests, an outer lock will be introduced in
a later patch.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For correct behavior we need to hold the inner lock when
dequeuing and processing node work in binder_thread_read.
We now hold the inner lock when we enter the switch statement
and release it after processing anything that might be
affected by other threads.
We also need to hold the inner lock to protect the node
weak/strong ref tracking fields as long as node->proc
is non-NULL (if it is NULL then we are guaranteed that
we don't have any node work queued).
This means that other functions that manipulate these fields
must hold the inner lock. Refactored these functions to use
the inner lock.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are 3 main spinlocks which must be acquired in this
order:
1) proc->outer_lock : protects most fields of binder_proc,
binder_thread, and binder_ref structures. binder_proc_lock()
and binder_proc_unlock() are used to acq/rel.
2) node->lock : protects most fields of binder_node.
binder_node_lock() and binder_node_unlock() are
used to acq/rel
3) proc->inner_lock : protects the thread and node lists
(proc->threads, proc->nodes) and all todo lists associated
with the binder_proc (proc->todo, thread->todo,
proc->delivered_death and node->async_todo).
binder_inner_proc_lock() and binder_inner_proc_unlock()
are used to acq/rel
Any lock under procA must never be nested under any lock at the same
level or below on procB.
Functions that require a lock held on entry indicate which lock
in the suffix of the function name:
foo_olocked() : requires node->outer_lock
foo_nlocked() : requires node->lock
foo_ilocked() : requires proc->inner_lock
foo_iolocked(): requires proc->outer_lock and proc->inner_lock
foo_nilocked(): requires node->lock and proc->inner_lock
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When obtaining a node via binder_get_node(),
binder_get_node_from_ref() or binder_new_node(),
increment node->tmp_refs to take a
temporary reference on the node to ensure the node
persists while being used. binder_put_node() must
be called to remove the temporary reference.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once locks are added, binder_ref's will only be accessed
safely with the proc lock held. Refactor the inc/dec paths
to make them atomic with the binder_get_ref* paths and
node inc/dec. For example, instead of:
ref = binder_get_ref(proc, handle, strong);
...
binder_dec_ref(ref, strong);
we now have:
ret = binder_dec_ref_for_handle(proc, handle, strong, &rdata);
Since the actual ref is no longer exposed to callers, a
new struct binder_ref_data is introduced which can be used
to return a copy of ref state.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_thread and binder_proc may be accessed by other
threads when processing transaction. Therefore they
must be prevented from being freed while a transaction
is in progress that references them.
This is done by introducing a temporary reference
counter for threads and procs that indicates that the
object is in use and must not be freed. binder_thread_dec_tmpref()
and binder_proc_dec_tmpref() are used to decrement
the temporary reference.
It is safe to free a binder_thread if there
is no reference and it has been released
(indicated by thread->is_dead).
It is safe to free a binder_proc if it has no
remaining threads and no reference.
A spinlock is added to the binder_transaction
to safely access and set references for t->from
and for debug code to safely access t->to_thread
and t->to_proc.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When initiating a transaction, the target_node must
have a strong ref on it. Then we take a second
strong ref to make sure the node survives until the
transaction is complete.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since errors are tracked in the return_error/return_error2
fields of the binder_thread object and BR_TRANSACTION_COMPLETEs
can be tracked either in those fields or via the thread todo
work list, it is possible for errors to be reported ahead
of the associated txn complete.
Use the thread todo work list for errors to guarantee
order. Also changed binder_send_failed_reply to pop
the transaction even if it failed to send a reply.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_pop_transaction needs to be split into 2 pieces to
to allow the proc lock to be held on entry to dequeue the
transaction stack, but no lock when kfree'ing the transaction.
Split into binder_pop_transaction_locked and binder_free_transaction
(the actual locks are still to be added).
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The log->next index for the transaction log was
not protected when incremented. This led to a
case where log->next++ resulted in an index
larger than ARRAY_SIZE(log->entry) and eventually
a bad access to memory.
Fixed by making the log index an atomic64 and
converting to an array by using "% ARRAY_SIZE(log->entry)"
Also added "complete" field to the log entry which is
written last to tell the print code whether the
entry is complete
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Display information about allocated/free space whenever
binder buffer allocation fails on synchronous
transactions.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Siqi Lin <siqilin@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adds protection against malicious user code freeing
the same buffer at the same time which could cause
a crash. Cannot happen under normal use.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
node is always non-NULL in binder_get_ref_for_node so the
conditional and else clause are not needed
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The looper member of struct binder_thread is a bitmask
of control bits. All of the existing bits are modified
by the affected thread except for BINDER_LOOPER_STATE_NEED_RETURN
which can be modified in binder_deferred_flush() by
another thread.
To avoid adding a spinlock around all read-mod-writes to
modify a bit, the BINDER_LOOPER_STATE_NEED_RETURN flag
is replaced by a separate field in struct binder_thread.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the transaction complete work item is queued
after the transaction. This means that it is possible
for the transaction to be handled and a reply to be
enqueued in the current thread before the transaction
complete is enqueued, which violates the protocol
with userspace who may not expect the transaction
complete. Fixed by always enqueing the transaction
complete first.
Also, once the transaction is enqueued, it is unsafe
to access since it might be freed. Currently,
t->flags is accessed to determine whether a sync
wake is needed. Changed to access tr->flags
instead.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In binder_thread_read, the BINDER_WORK_NODE command is used
to communicate the references on the node to userspace. It
can take a couple of iterations in the loop to construct
the list of commands for user space. When locking is added,
the lock would need to be release on each iteration which
means the state could change. The work item is not dequeued
during this process which prevents a simpler queue management
that can just dequeue up front and handle the work item.
Fixed by changing the BINDER_WORK_NODE algorithm in
binder_thread_read to determine which commands to send
to userspace atomically in 1 pass so it stays consistent
with the kernel view.
The work item is now dequeued immediately since only
1 pass is needed.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add additional information to determine the cause of binder
failures. Adds the following to failed transaction log and
kernel messages:
return_error : value returned for transaction
return_error_param : errno returned by binder allocator
return_error_line : line number where error detected
Also, return BR_DEAD_REPLY if an allocation error indicates
a dead proc (-ESRCH)
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use an atomic for binder_last_id to avoid locking it
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use atomics for stats to avoid needing to lock for
increments/decrements
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add binder_dead_nodes_lock, binder_procs_lock, and
binder_context_mgr_node_lock to protect the associated global lists
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the global lock, there was a mechanism to access
binder driver debugging information with the global
lock disabled to debug deadlocks or other issues.
This mechanism is rarely (if ever) used anymore
and wasn't needed during the development of
fine-grained locking in the binder driver.
Removing it.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the binder allocator functionality to its own file
Continuation of splitting the binder allocator from the binder
driver. Split binder_alloc functions from normal binder functions.
Add kernel doc comments to functions declared extern in
binder_alloc.h
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Continuation of splitting the binder allocator from the binder
driver. Separate binder_alloc functions from normal binder
functions. Protect the allocator with a separate mutex.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The buffer's transaction has already been freed before
binder_deferred_release. No need to do it again.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binder allocator is logically separate from the rest
of the binder drivers. Separating the data structures
to prepare for splitting into separate file with separate
locking.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use wake_up_interruptible_sync() to hint to the scheduler binder
transactions are synchronous wakeups. Disable preemption while waking
to avoid ping-ponging on the binder lock.
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
Cc: stable <stable@vger.kernel.org> # 4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>