mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-11-01 08:58:07 +00:00
36806 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
David S. Miller
|
84103209ba |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2021-08-07 The following pull-request contains BPF updates for your *net* tree. We've added 4 non-merge commits during the last 9 day(s) which contain a total of 4 files changed, 8 insertions(+), 7 deletions(-). The main changes are: 1) Fix integer overflow in htab's lookup + delete batch op, from Tatsuhiko Yasumatsu. 2) Fix invalid fd 0 close in libbpf if BTF parsing failed, from Daniel Xu. 3) Fix libbpf feature probe for BPF_PROG_TYPE_CGROUP_SOCKOPT, from Robin Gögge. 4) Fix minor libbpf doc warning regarding code-block language, from Randy Dunlap. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Tatsuhiko Yasumatsu
|
c4eb1f4032 |
bpf: Fix integer overflow involving bucket_size
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated
over to count the number of elements in each bucket (bucket_size).
If bucket_size is large enough, the multiplication to calculate
kvmalloc() size could overflow, resulting in out-of-bounds write
as reported by KASAN:
[...]
[ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112
[ 104.986889]
[ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13
[ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 104.988104] Call Trace:
[ 104.988410] dump_stack_lvl+0x34/0x44
[ 104.988706] print_address_description.constprop.0+0x21/0x140
[ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.989622] kasan_report.cold+0x7f/0x11b
[ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.990239] kasan_check_range+0x17c/0x1e0
[ 104.990467] memcpy+0x39/0x60
[ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60
[ 104.990982] ? __wake_up_common+0x4d/0x230
[ 104.991256] ? htab_of_map_free+0x130/0x130
[ 104.991541] bpf_map_do_batch+0x1fb/0x220
[...]
In hashtable, if the elements' keys have the same jhash() value, the
elements will be put into the same bucket. By putting a lot of elements
into a single bucket, the value of bucket_size can be increased to
trigger the integer overflow.
Triggering the overflow is possible for both callers with CAP_SYS_ADMIN
and callers without CAP_SYS_ADMIN.
It will be trivial for a caller with CAP_SYS_ADMIN to intentionally
reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set
the random seed passed to jhash() to 0, it will be easy for the caller
to prepare keys which will be hashed into the same value, and thus put
all the elements into the same bucket.
If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be
used. However, it will be still technically possible to trigger the
overflow, by guessing the random seed value passed to jhash() (32bit)
and repeating the attempt to trigger the overflow. In this case,
the probability to trigger the overflow will be low and will take
a very long time.
Fix the integer overflow by calling kvmalloc_array() instead of
kvmalloc() to allocate memory.
Fixes:
|
||
Linus Torvalds
|
2c4b1ec683 |
Fix tracepoint race between static_call and callback data
As callbacks to a tracepoint are paired with the data that is passed in when the callback is registered to the tracepoint, it must have that data passed to the callback when the tracepoint is triggered, else bad things will happen. To keep the two together, they are both assigned to a tracepoint structure and added to an array. The tracepoint call site will dereference the structure (via RCU) and call the callback in that structure along with the data in that structure. This keeps the callback and data tightly coupled. Because of the overhead that retpolines have on tracepoint callbacks, if there's only one callback attached to a tracepoint (a common case), then it is called via a static call (code modified to do a direct call instead of an indirect call). But to implement this, the data had to be decoupled from the callback, as now the callback is implemented via a direct call from the static call and not an indirect call from the dereferenced structure. Note, the static call only calls a callback used when there's a single callback attached to the tracepoint. If more than one callback is attached to the same tracepoint, then the static call will call an iterator function that goes back to dereferencing the structure keeping the callback and its data tightly coupled again. Issues can arise when going from 0 callbacks to one, as the static call is assigned to the callback, and it must take care that the data passed to it is loaded before the static call calls the callback. Going from 1 to 2 callbacks is not an issue, as long as the static call is updated to the iterator before the tracepoint structure array is updated via RCU. Going from 2 to more or back down to 2 is not an issue as the iterator can handle all theses cases. But going from 2 to 1, care must be taken as the static call is now calling a callback and the data that is loaded must be the data for that callback. Care was taken to ensure the callback and data would be in-sync, but after a bug was reported, it became clear that not enough was done to make sure that was the case. These changes address this. The first change is to compare the old and new data instead of the old and new callback, as it's the data that can corrupt the callback, even if the callback is the same (something getting freed). The next change is to convert these transitions into states, to make it easier to know when a synchronization is needed, and to perform those synchronizations. The problem with this patch is that it slows down disabling all events from under a second, to making it take over 10 seconds to do the same work. But that is addressed in the final patch. The final patch uses the RCU state functions to keep track of the RCU state between the transitions, and only needs to perform the synchronization if an RCU synchronization hasn't been done already. This brings the performance of disabling all events back to its original value. That's because no synchronization is required between disabling tracepoints but is required when enabling a tracepoint after its been disabled. If an RCU synchronization happens after the tracepoint is disabled, and before it is re-enabled, there's no need to do the synchronization again. Both the second and third patch have subtle complexities that they are separated into two patches. But because the second patch causes such a regression in performance, the third patch adds a "Fixes" tag to the second patch, such that the two must be backported together and not just the second patch. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYQ15TBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnmmAP4hoA34CDr5hrd8mYLeKptW63f5Nd1w fVZjprfa1wJhZAEAq39OeRCT4Fb2hIeZNBNUnLU90f+J6NH5QFDEhW+CkAI= =JcZS -----END PGP SIGNATURE----- Merge tag 'trace-v5.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Fix tracepoint race between static_call and callback data As callbacks to a tracepoint are paired with the data that is passed in when the callback is registered to the tracepoint, it must have that data passed to the callback when the tracepoint is triggered, else bad things will happen. To keep the two together, they are both assigned to a tracepoint structure and added to an array. The tracepoint call site will dereference the structure (via RCU) and call the callback in that structure along with the data in that structure. This keeps the callback and data tightly coupled. Because of the overhead that retpolines have on tracepoint callbacks, if there's only one callback attached to a tracepoint (a common case), then it is called via a static call (code modified to do a direct call instead of an indirect call). But to implement this, the data had to be decoupled from the callback, as now the callback is implemented via a direct call from the static call and not an indirect call from the dereferenced structure. Note, the static call only calls a callback used when there's a single callback attached to the tracepoint. If more than one callback is attached to the same tracepoint, then the static call will call an iterator function that goes back to dereferencing the structure keeping the callback and its data tightly coupled again. Issues can arise when going from 0 callbacks to one, as the static call is assigned to the callback, and it must take care that the data passed to it is loaded before the static call calls the callback. Going from 1 to 2 callbacks is not an issue, as long as the static call is updated to the iterator before the tracepoint structure array is updated via RCU. Going from 2 to more or back down to 2 is not an issue as the iterator can handle all theses cases. But going from 2 to 1, care must be taken as the static call is now calling a callback and the data that is loaded must be the data for that callback. Care was taken to ensure the callback and data would be in-sync, but after a bug was reported, it became clear that not enough was done to make sure that was the case. These changes address this. The first change is to compare the old and new data instead of the old and new callback, as it's the data that can corrupt the callback, even if the callback is the same (something getting freed). The next change is to convert these transitions into states, to make it easier to know when a synchronization is needed, and to perform those synchronizations. The problem with this patch is that it slows down disabling all events from under a second, to making it take over 10 seconds to do the same work. But that is addressed in the final patch. The final patch uses the RCU state functions to keep track of the RCU state between the transitions, and only needs to perform the synchronization if an RCU synchronization hasn't been done already. This brings the performance of disabling all events back to its original value. That's because no synchronization is required between disabling tracepoints but is required when enabling a tracepoint after its been disabled. If an RCU synchronization happens after the tracepoint is disabled, and before it is re-enabled, there's no need to do the synchronization again. Both the second and third patch have subtle complexities that they are separated into two patches. But because the second patch causes such a regression in performance, the third patch adds a "Fixes" tag to the second patch, such that the two must be backported together and not just the second patch" * tag 'trace-v5.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracepoint: Use rcu get state and cond sync for static call updates tracepoint: Fix static call function vs data state mismatch tracepoint: static call: Compare data on transition from 2->1 callees |
||
Mathieu Desnoyers
|
7b40066c97 |
tracepoint: Use rcu get state and cond sync for static call updates
State transitions from 1->0->1 and N->2->1 callbacks require RCU synchronization. Rather than performing the RCU synchronization every time the state change occurs, which is quite slow when many tracepoints are registered in batch, instead keep a snapshot of the RCU state on the most recent transitions which belong to a chain, and conditionally wait for a grace period on the last transition of the chain if one g.p. has not elapsed since the last snapshot. This applies to both RCU and SRCU. This brings the performance regression caused by commit |
||
Mathieu Desnoyers
|
231264d692 |
tracepoint: Fix static call function vs data state mismatch
On a 1->0->1 callbacks transition, there is an issue with the new
callback using the old callback's data.
Considering __DO_TRACE_CALL:
do { \
struct tracepoint_func *it_func_ptr; \
void *__data; \
it_func_ptr = \
rcu_dereference_raw((&__tracepoint_##name)->funcs); \
if (it_func_ptr) { \
__data = (it_func_ptr)->data; \
----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ]
static_call(tp_func_##name)(__data, args); \
} \
} while (0)
It has loaded the tp->funcs of the old callback, so it will try to use the old
data. This can be fixed by adding a RCU sync anywhere in the 1->0->1
transition chain.
On a N->2->1 transition, we need an rcu-sync because you may have a
sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged
between 2->1, but was changed from 3->2 (or from 1->2), which may be
observed by the static call. This can be fixed by adding an
unconditional RCU sync in transition 2->1.
Note, this fixes a correctness issue at the cost of adding a tremendous
performance regression to the disabling of tracepoints.
Before this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m0.778s
user 0m0.000s
sys 0m0.061s
After this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m10.593s
user 0m0.017s
sys 0m0.259s
A follow up fix will introduce a more lightweight scheme based on RCU
get_state and cond_sync, that will return the performance back to what it
was. As both this change and the lightweight versions are complex on their
own, for bisecting any issues that this may cause, they are kept as two
separate changes.
Link: https://lkml.kernel.org/r/20210805132717.23813-3-mathieu.desnoyers@efficios.com
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Stefan Metzmacher <metze@samba.org>
Fixes:
|
||
Mathieu Desnoyers
|
f7ec412125 |
tracepoint: static call: Compare data on transition from 2->1 callees
On transition from 2->1 callees, we should be comparing .data rather
than .func, because the same callback can be registered twice with
different data, and what we care about here is that the data of array
element 0 is unchanged to skip rcu sync.
Link: https://lkml.kernel.org/r/20210805132717.23813-2-mathieu.desnoyers@efficios.com
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Stefan Metzmacher <metze@samba.org>
Fixes:
|
||
Linus Torvalds
|
6209049ecf |
Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman: "Fix a subtle locking versus reference counting bug in the ucount changes, found by syzbot" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Fix race condition between alloc_ucounts and put_ucounts |
||
Linus Torvalds
|
3c3e902707 |
Various tracing fixes:
- Fix NULL pointer dereference caused by an error path - Give histogram calculation fields a size, otherwise it breaks synthetic creation based on them. - Reject strings being used for number calculations. - Fix recordmcount.pl warning on llvm building RISC-V allmodconfig - Fix the draw_functrace.py script to handle the new trace output - Fix warning of smp_processor_id() in preemptible code -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYQwR+xQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qtHOAQD7gBn1cRK0T3Eolf5HRd14PLDVUZ1B iMZuTJZzJUWLSAD/ec3ezcOafNlPKmG1ta8UxrWP5VzHOC5qTIAJYc1d5AA= =7FNB -----END PGP SIGNATURE----- Merge tag 'trace-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix NULL pointer dereference caused by an error path - Give histogram calculation fields a size, otherwise it breaks synthetic creation based on them. - Reject strings being used for number calculations. - Fix recordmcount.pl warning on llvm building RISC-V allmodconfig - Fix the draw_functrace.py script to handle the new trace output - Fix warning of smp_processor_id() in preemptible code" * tag 'trace-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Quiet smp_processor_id() use in preemptable warning in hwlat scripts/tracing: fix the bug that can't parse raw_trace_func scripts/recordmcount.pl: Remove check_objcopy() and $can_use_local tracing: Reject string operand in the histogram expression tracing / histogram: Give calculation hist_fields a size tracing: Fix NULL pointer dereference in start_creating |
||
Steven Rostedt (VMware)
|
51397dc6f2 |
tracing: Quiet smp_processor_id() use in preemptable warning in hwlat
The hardware latency detector (hwlat) has a mode that it runs one thread
across CPUs. The logic to move from the currently running CPU to the next
one in the list does a smp_processor_id() to find where it currently is.
Unfortunately, it's done with preemption enabled, and this triggers a
warning for using smp_processor_id() in a preempt enabled section.
As it is only using smp_processor_id() to get information on where it
currently is in order to simply move it to the next CPU, it doesn't really
care if it got moved in the mean time. It will simply balance out later if
such a case arises.
Switch smp_processor_id() to raw_smp_processor_id() to quiet that warning.
Link: https://lkml.kernel.org/r/20210804141848.79edadc0@oasis.local.home
Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Fixes:
|
||
Masami Hiramatsu
|
a9d10ca498 |
tracing: Reject string operand in the histogram expression
Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.
Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes:
|
||
Steven Rostedt (VMware)
|
2c05caa7ba |
tracing / histogram: Give calculation hist_fields a size
When working on my user space applications, I found a bug in the synthetic
event code where the automated synthetic event field was not matching the
event field calculation it was attached to. Looking deeper into it, it was
because the calculation hist_field was not given a size.
The synthetic event fields are matched to their hist_fields either by
having the field have an identical string type, or if that does not match,
then the size and signed values are used to match the fields.
The problem arose when I tried to match a calculation where the fields
were "unsigned int". My tool created a synthetic event of type "u32". But
it failed to match. The string was:
diff=field1-field2:onmatch(event).trace(synth,$diff)
Adding debugging into the kernel, I found that the size of "diff" was 0.
And since it was given "unsigned int" as a type, the histogram fallback
code used size and signed. The signed matched, but the size of u32 (4) did
not match zero, and the event failed to be created.
This can be worse if the field you want to match is not one of the
acceptable fields for a synthetic event. As event fields can have any type
that is supported in Linux, this can cause an issue. For example, if a
type is an enum. Then there's no way to use that with any calculations.
Have the calculation field simply take on the size of what it is
calculating.
Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
Peter Zijlstra
|
f558c2b834 |
sched/rt: Fix double enqueue caused by rt_effective_prio
Double enqueues in rt runqueues (list) have been reported while running
a simple test that spawns a number of threads doing a short sleep/run
pattern while being concurrently setscheduled between rt and fair class.
WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360
CPU: 3 PID: 2825 Comm: setsched__13
RIP: 0010:enqueue_task_rt+0x355/0x360
Call Trace:
__sched_setscheduler+0x581/0x9d0
_sched_setscheduler+0x63/0xa0
do_sched_setscheduler+0xa0/0x150
__x64_sys_sched_setscheduler+0x1a/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40,
next=ffff98679fc67ca0.
kernel BUG at lib/list_debug.c:31!
invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI
CPU: 3 PID: 2825 Comm: setsched__13
RIP: 0010:__list_add_valid+0x41/0x50
Call Trace:
enqueue_task_rt+0x291/0x360
__sched_setscheduler+0x581/0x9d0
_sched_setscheduler+0x63/0xa0
do_sched_setscheduler+0xa0/0x150
__x64_sys_sched_setscheduler+0x1a/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
__sched_setscheduler() uses rt_effective_prio() to handle proper queuing
of priority boosted tasks that are setscheduled while being boosted.
rt_effective_prio() is however called twice per each
__sched_setscheduler() call: first directly by __sched_setscheduler()
before dequeuing the task and then by __setscheduler() to actually do
the priority change. If the priority of the pi_top_task is concurrently
being changed however, it might happen that the two calls return
different results. If, for example, the first call returned the same rt
priority the task was running at and the second one a fair priority, the
task won't be removed by the rt list (on_list still set) and then
enqueued in the fair runqueue. When eventually setscheduled back to rt
it will be seen as enqueued already and the WARNING/BUG be issued.
Fix this by calling rt_effective_prio() only once and then reusing the
return value. While at it refactor code as well for clarity. Concurrent
priority inheritance handling is still safe and will eventually converge
to a new state by following the inheritance chain(s).
Fixes:
|
||
Linus Torvalds
|
c7d1022326 |
Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211)
and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEEWm8ACgkQMUZtbf5S Irv84A//V/nn9VRdpDpmodwBWVEc9SA00M/nmziRBLwRyG+fRMtnePY4Ha40TPbh LL6orth08hZKOjVmMc6Ea4EjZbV5E3iAKtAnaX6wi1HpEXVxKtFYnWxu9ydwTEd9 An1fltDtWYkNi3kiq7il+Tp1/yZAQ+NYv5zQZCWJ47kkN3jkjULdAEBqODA2A6Ul 0PQgS1rKzXukE19PlXDuaNuEekhTiEfaTwzHjdBJZkj1toGJGfHsvdQ/YJjixzB9 44SjE4PfxIaMWP0BVaD6hwzaVQhaZETXhZZufdIDdQd7sDbmd6CPODX6mXfLEq4u JaWylgobsK+5ScHE6siVI+ZlW7stq9l1Ynm10ADiwsZVzKEoP745484aEFOLO6Z+ Ln/IqDQCP/yJQmnl2i0+TfqVDh6BKYoIfUUK/+nzHw4Otycy0m3kj4P+74aYfjOv Q+cUgbXUemcrpq6wGUK+zK0NyNHVILvdPDnHPMMypwqPk18y5ZmFvaJAVUPSavD9 N7t9LoLyGwK3i/Ir4l+JJZ1KgAv1+TbmyNBWvY1Yk/r/vHU3nBPIv26s7YarNAwD 094vJEJ0+mqO4h+Xj1Nc7HEBFi46JfpN2L8uYoM7gpwziIRMdmpXVLmpEk43WmFi UMwWJWqabPEXaozC2UFcFLSk+jS7DiD+G5eG+Fd5HecmKzd7RI0= =sKPI -----END PGP SIGNATURE----- Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211) and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine" * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) gve: Update MAINTAINERS list can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Introduce BPF nospec instruction for mitigating Spectre v4 sis900: Fix missing pci_disable_device() in probe and remove net: let flow have same hash in two directions nfc: nfcsim: fix use after free during module unload tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sctp: fix return value check in __sctp_rcv_asconf_lookup nfc: s3fwrn5: fix undefined parameter values in dev_err() net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() net/mlx5: Unload device upon firmware fatal error net/mlx5e: Fix page allocation failure for ptp-RQ over SF net/mlx5e: Fix page allocation failure for trap-RQ over SF ... |
||
Kamal Agrawal
|
ff41c28c4b |
tracing: Fix NULL pointer dereference in start_creating
The event_trace_add_tracer() can fail. In this case, it leads to a crash
in start_creating with below call stack. Handle the error scenario
properly in trace_array_create_dir.
Call trace:
down_write+0x7c/0x204
start_creating.25017+0x6c/0x194
tracefs_create_file+0xc4/0x2b4
init_tracer_tracefs+0x5c/0x940
trace_array_create_dir+0x58/0xb4
trace_array_create+0x1bc/0x2b8
trace_array_get_by_name+0xdc/0x18c
Link: https://lkml.kernel.org/r/1627651386-21315-1-git-send-email-kamaagra@codeaurora.org
Cc: stable@vger.kernel.org
Fixes:
|
||
David S. Miller
|
fc16a5322e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2021-07-29 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 14 day(s) which contain a total of 20 files changed, 446 insertions(+), 138 deletions(-). The main changes are: 1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer. 2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann, Piotr Krysiuk and Benedict Schlueter. 3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Daniel Borkmann
|
2039f26f3a |
bpf: Fix leakage due to insufficient speculative store bypass mitigation
Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. |
||
Daniel Borkmann
|
f5e81d1117 |
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> |
||
Alexey Gladkov
|
345daff2e9 |
ucounts: Fix race condition between alloc_ucounts and put_ucounts
The race happens because put_ucounts() doesn't use spinlock and
get_ucounts is not under spinlock:
CPU0 CPU1
---- ----
alloc_ucounts() put_ucounts()
spin_lock_irq(&ucounts_lock);
ucounts = find_ucounts(ns, uid, hashent);
atomic_dec_and_test(&ucounts->count))
spin_unlock_irq(&ucounts_lock);
spin_lock_irqsave(&ucounts_lock, flags);
hlist_del_init(&ucounts->node);
spin_unlock_irqrestore(&ucounts_lock, flags);
kfree(ucounts);
ucounts = get_ucounts(ucounts);
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:152 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:150 [inline]
BUG: KASAN: use-after-free in alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
Write of size 4 at addr ffff88802821e41c by task syz-executor.4/16785
CPU: 1 PID: 16785 Comm: syz-executor.4 Not tainted 5.14.0-rc1-next-20210712-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
print_address_description.constprop.0.cold+0x6c/0x309 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:436
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
get_ucounts kernel/ucount.c:152 [inline]
get_ucounts kernel/ucount.c:150 [inline]
alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
__sys_setuid+0x285/0x400 kernel/sys.c:623
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fde54097188 EFLAGS: 00000246 ORIG_RAX: 0000000000000069
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000ff
RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffc8655740f R14: 00007fde54097300 R15: 0000000000022000
Allocated by task 16784:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
____kasan_kmalloc mm/kasan/common.c:513 [inline]
____kasan_kmalloc mm/kasan/common.c:472 [inline]
__kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:522
kmalloc include/linux/slab.h:591 [inline]
kzalloc include/linux/slab.h:721 [inline]
alloc_ucounts+0x23d/0x5b0 kernel/ucount.c:169
set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
__sys_setuid+0x285/0x400 kernel/sys.c:623
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 16785:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xfb/0x130 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:229 [inline]
slab_free_hook mm/slub.c:1650 [inline]
slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1675
slab_free mm/slub.c:3235 [inline]
kfree+0xeb/0x650 mm/slub.c:4295
put_ucounts kernel/ucount.c:200 [inline]
put_ucounts+0x117/0x150 kernel/ucount.c:192
put_cred_rcu+0x27a/0x520 kernel/cred.c:124
rcu_do_batch kernel/rcu/tree.c:2550 [inline]
rcu_core+0x7ab/0x1380 kernel/rcu/tree.c:2785
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
insert_work+0x48/0x370 kernel/workqueue.c:1332
__queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
queue_work_on+0xee/0x110 kernel/workqueue.c:1525
queue_work include/linux/workqueue.h:507 [inline]
call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
netdev_queue_add_kobject net/core/net-sysfs.c:1621 [inline]
netdev_queue_update_kobjects+0x374/0x450 net/core/net-sysfs.c:1655
register_queue_kobjects net/core/net-sysfs.c:1716 [inline]
netdev_register_kobject+0x35a/0x430 net/core/net-sysfs.c:1959
register_netdevice+0xd33/0x1500 net/core/dev.c:10331
nsim_init_netdevsim drivers/net/netdevsim/netdev.c:317 [inline]
nsim_create+0x381/0x4d0 drivers/net/netdevsim/netdev.c:364
__nsim_dev_port_add+0x32e/0x830 drivers/net/netdevsim/dev.c:1295
nsim_dev_port_add_all+0x53/0x150 drivers/net/netdevsim/dev.c:1355
nsim_dev_probe+0xcb5/0x1190 drivers/net/netdevsim/dev.c:1496
call_driver_probe drivers/base/dd.c:517 [inline]
really_probe+0x23c/0xcd0 drivers/base/dd.c:595
__driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747
driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777
__device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894
bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
__device_attach+0x228/0x4a0 drivers/base/dd.c:965
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
device_add+0xc2f/0x2180 drivers/base/core.c:3356
nsim_bus_dev_new drivers/net/netdevsim/bus.c:431 [inline]
new_device_store+0x436/0x710 drivers/net/netdevsim/bus.c:298
bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
call_write_iter include/linux/fs.h:2152 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x75a/0xa40 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
insert_work+0x48/0x370 kernel/workqueue.c:1332
__queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
queue_work_on+0xee/0x110 kernel/workqueue.c:1525
queue_work include/linux/workqueue.h:507 [inline]
call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
kobject_synth_uevent+0x701/0x850 lib/kobject_uevent.c:208
uevent_store+0x20/0x50 drivers/base/core.c:2371
dev_attr_store+0x50/0x80 drivers/base/core.c:2072
sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
call_write_iter include/linux/fs.h:2152 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x75a/0xa40 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88802821e400
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 28 bytes inside of
192-byte region [ffff88802821e400, ffff88802821e4c0)
The buggy address belongs to the page:
page:ffffea0000a08780 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2821e
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff888010841a00
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 12874702440, free_ts 12637793385
prep_new_page mm/page_alloc.c:2433 [inline]
get_page_from_freelist+0xa72/0x2f80 mm/page_alloc.c:4166
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5374
alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2119
alloc_pages+0x238/0x2a0 mm/mempolicy.c:2242
alloc_slab_page mm/slub.c:1713 [inline]
allocate_slab+0x32b/0x4c0 mm/slub.c:1853
new_slab mm/slub.c:1916 [inline]
new_slab_objects mm/slub.c:2662 [inline]
___slab_alloc+0x4ba/0x820 mm/slub.c:2825
__slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2865
slab_alloc_node mm/slub.c:2947 [inline]
slab_alloc mm/slub.c:2989 [inline]
__kmalloc+0x312/0x330 mm/slub.c:4133
kmalloc include/linux/slab.h:596 [inline]
kzalloc include/linux/slab.h:721 [inline]
__register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
rds_tcp_init_net+0x1db/0x4f0 net/rds/tcp.c:551
ops_init+0xaf/0x470 net/core/net_namespace.c:140
__register_pernet_operations net/core/net_namespace.c:1137 [inline]
register_pernet_operations+0x35a/0x850 net/core/net_namespace.c:1214
register_pernet_device+0x26/0x70 net/core/net_namespace.c:1301
rds_tcp_init+0x77/0xe0 net/rds/tcp.c:717
do_one_initcall+0x103/0x650 init/main.c:1285
do_initcall_level init/main.c:1360 [inline]
do_initcalls init/main.c:1376 [inline]
do_basic_setup init/main.c:1396 [inline]
kernel_init_freeable+0x6b8/0x741 init/main.c:1598
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1343 [inline]
free_pcp_prepare+0x312/0x7d0 mm/page_alloc.c:1394
free_unref_page_prepare mm/page_alloc.c:3329 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3408
__vunmap+0x783/0xb70 mm/vmalloc.c:2587
free_work+0x58/0x70 mm/vmalloc.c:82
process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Memory state around the buggy address:
ffff88802821e300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88802821e380: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff88802821e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88802821e480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88802821e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
- The race fix has two parts.
* Changing the code to guarantee that ucounts->count is only decremented
when ucounts_lock is held. This guarantees that find_ucounts
will never find a structure with a zero reference count.
* Changing alloc_ucounts to increment ucounts->count while
ucounts_lock is held. This guarantees the reference count on the
found data structure will not be decremented to zero (and the data
structure freed) before the reference count is incremented.
-- Eric Biederman
Reported-by: syzbot+01985d7909f9468f013c@syzkaller.appspotmail.com
Reported-by: syzbot+59dd63761094a80ad06d@syzkaller.appspotmail.com
Reported-by: syzbot+6cd79f45bb8fa1c9eeae@syzkaller.appspotmail.com
Reported-by: syzbot+b6e65bd125a05f803d6b@syzkaller.appspotmail.com
Fixes:
|
||
Tejun Heo
|
c3df5fb57f |
cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync
|
||
Linus Torvalds
|
51bbe7ebac |
Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo: "Fix leak of filesystem context root which is triggered by LTP. Not too likely to be a problem in non-testing environments" * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup1: fix leaked context root causing sporadic NULL deref in LTP |
||
Linus Torvalds
|
82d712f6d1 |
Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo: "Fix a use-after-free in allocation failure handling path" * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix UAF in pwq_unbound_release_workfn() |
||
Thomas Gleixner
|
bb7262b295 |
timers: Move clearing of base::timer_running under base:: Lock
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:
int data = 0;
void timer_func(struct timer_list *t)
{
data = 1;
}
CPU 0 CPU 1
------------------------------ --------------------------
base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock);
if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk);
ret = detach_if_pending(timer, base, true); base->running_timer = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock);
x = data;
If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.
For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Fixes:
|
||
Linus Torvalds
|
a1833a5403 |
smpboot: fix duplicate and misplaced inlining directive
gcc doesn't care, but clang quite reasonably pointed out that the recent
commit
|
||
Linus Torvalds
|
12e9bd168c |
A samll set of timer related fixes:
- Plug a race between rearm and process tick in the posix CPU timers code - Make the optimization to avoid recalculation of the next timer interrupt work correctly when there are no timers pending. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmD9LLUTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYocqMD/9ciaE5J4gbsamavrPfMaQNo3c3mom1 3xoskHFKiz9fnYL/f3yNQ2OjXl+lxV0VLSwjJnc1TfhQM5g+X7uI4P/sCZzHtEmP F3jTCUX99DSlTEB4Fc3ssr/hZPQab9nXearek9eAEpLKQDe1U1q2p314Mr4dA+Kj awJUuOlkLt+NwRCajuZK6Ur0D1Zte56C/Nl3PeppgJY1U5tLCKTE8ZmRbdLo11zM BEMFL95on1a/wKzTkuYqhfSQn4JWZo4lLP9jVHueF2nbPbOGdC9VwvegRMtRKG/k 0I8n7mcXvzi9pDP08o96rzjdZ9KBpG2hLkL0PgCDjrXwlOBH7tDkOBajJ+x5AgAf 71Zi/XOfjaHbkm37uLTTeerG2pKilds7ukjnQ3VS3t/XfPw6Nr6r9ig5AyBxpnjk 8Shw8kfEOApLEnmYwKAXHCewjxppp9pDsR4J7sYIfiYoDZRfF56Xyr/pKa8cpZge 9ByK8Pul4J9RhTOgIgJNMBb78gmFxsKi5CMt6kj8O89omc4pIUVpEK0z3kWmbjrd m1mtcO2kS/ry+7TgAjkxHcrcm/QX+y/2SrvvLoqVLAJQfIrffsiGGehaIXS8rAKg jlCd3s0NET6yULyQwI7qUdS+ZgtYSQKfIkU38VVcXaplUOABWcCwNXRh0rk6+gBs +HK8UxQnAYRGrw== =Q6oh -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of timer related fixes: - Plug a race between rearm and process tick in the posix CPU timers code - Make the optimization to avoid recalculation of the next timer interrupt work correctly when there are no timers pending" * tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix get_next_timer_interrupt() with no timers pending posix-cpu-timers: Fix rearm racing against process tick |
||
Linus Torvalds
|
9041a4d2ee |
A single update for the boot code to prevent aggressive un-inlining which
causes a section mismatch. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmD9KsYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYof1HEACfpcjQbezuNjuoSYQNpAz9jz/W63iH FXTBAiC8IVzloRgNjlbVAVGgJlYFgbgkke2ulNwsRIwwVhhyDYDPio88U498LdB5 bQdw/jsylHY+5ipvGCxf2uvMI8xIk6Gj3bbgIsVxBqpCTaJOsFE8E69UUVZjxPMB kpHQ2GzCLe0FxXC7reFoIIPhY9U1cYXNHUso4w2um6enr10uEwMouXOGkhNAboOl bmi97wWd/wgfSJNUGrsGpK+f4yTAhfdXvPv79t0Dzbg7KqTxFdU2dmrHQYinuw96 YTZLW32o5JJzW/AjPcDTTixXStbwtrAS6GPqoAL65n2rvwit64SYfhPjJAie3+Ly eHzNUyZX6+zczfe7zj8nXiutMc/KI5aRcwf1S2PCMefi1IoJuVOdbe9Pgj/iFtCt GRGSZ+gob7R4Onvs2Yaw8iS32KjetmGTN7V9BuPTIgPgZ9UjQvB9pnDWcICJnT9U 6HLvKBhVBfDVOSe2YDNEIMFVJdacb0EKegyLXU0S/E0CUPitXM9TivazRrSprEiJ nZxH5sM+9rFtpHWA6lxKYmRP5e0BoMCZ6kerHmOMNa6dW4ha6D3J1KUJPYgJ4YH0 wQ9dF4ppHgBGsBtkZ7YVwXHkNnKJnr8GN8Xcf6CUFh1Yu61QMDCDyGR8cEVFc8Ai TtZDCYJLBBA5bA== =t8VJ -----END PGP SIGNATURE----- Merge tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fix from Thomas Gleixner: "A single update for the boot code to prevent aggressive un-inlining which causes a section mismatch" * tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining |
||
Linus Torvalds
|
04ca88d056 |
dma-mapping fix for Lonux 5.14
- handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman Skakun) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmD8/l8LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYNBzhAAuj1XSAfWr+me5bnSX2uxDJuD4aW7fPpmRo2VxePT uvAL5IY76urO68vnQgPsL5h1fJEsPpaUbztb4FtLb/xEKy22IOtWq5xV4igSzqMd 1c1FBCsaNXrVUtEfpZinQW5kXi82X0L1AI+52gnN847D/ZH8crPKkBajMUk2bQ+v tqR9OxQoHfRmRB3ACWSgo0RifA0131lnWCxtc/lhbZV4XvajKzZn95mx0M3kG20K AtTEhIDYJNkPB4rbHn0HbkG6MG8tnNHAoOhPrfySUlZnrb/WM6el/9FlBf1Hks/u g8MZuZM7ChRLLDGkQtRUDiFwCmQm7TPkj9N9OQ6pyr0sO6P0o2L0Ligv4jiaX2Qg O84/VIskJSY/tix3aLCOrqRIAK8JK1H6sShraDBpLRki0SJo7wtrml1SIYCmG2kV WsYgVqYWvXP+Z0imck9UFmA/LEgbMk3y3k8op8wxSUe3fRRuAHYW14Nsko7nLUoI NKyXw1/kcvqHfo2Bwg4BQFpjddLKGIDcBGulY8+1WyxdmckkKyWV7HL7TG+wIc1n TJc7LedsG4fEUdUuPpjzQNGT6IQQWlwn40bVnFH1hU5n9dQeiv9s73+rAENnWlCB SFtRbDd9i1l2D9qpyEQehLvEchKYOi2AeplDPO5AWum5vo9/LmuoiUgXLxHb1UO9 Qsg= =i9u8 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman Skakun) * tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} |
||
Linus Torvalds
|
05daae0fb0 |
Tracing fixes for 5.14-rc2:
- Fix deadloop in ring buffer because of using stale "read" variable - Fix synthetic event use of field_pos as boolean and not an index - Fixed histogram special var "cpu" overriding event fields called "cpu" - Cleaned up error prone logic in alloc_synth_event() - Removed call to synchronize_rcu_tasks_rude() when not needed - Removed redundant initialization of a local variable "ret" - Fixed kernel crash when updating tracepoint callbacks of different priorities. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYPrGTxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qusoAQDZkMo/vBFZgNGcL8GNCFpOF9HcV7QI JtBU+UG5GY2LagD/SEFEoG1o9UwKnIaBwr7qxGvrPgg8jKWtK/HEVFU94wk= =EVfM -----END PGP SIGNATURE----- Merge tag 'trace-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Fix deadloop in ring buffer because of using stale "read" variable - Fix synthetic event use of field_pos as boolean and not an index - Fixed histogram special var "cpu" overriding event fields called "cpu" - Cleaned up error prone logic in alloc_synth_event() - Removed call to synchronize_rcu_tasks_rude() when not needed - Removed redundant initialization of a local variable "ret" - Fixed kernel crash when updating tracepoint callbacks of different priorities. * tag 'trace-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracepoints: Update static_call before tp_funcs when adding a tracepoint ftrace: Remove redundant initialization of variable ret ftrace: Avoid synchronize_rcu_tasks_rude() call when not necessary tracing: Clean up alloc_synth_event() tracing/histogram: Rename "cpu" to "common_cpu" tracing: Synthetic event field_pos is an index not a boolean tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. |
||
Steven Rostedt (VMware)
|
352384d5c8 |
tracepoints: Update static_call before tp_funcs when adding a tracepoint
Because of the significant overhead that retpolines pose on indirect
calls, the tracepoint code was updated to use the new "static_calls" that
can modify the running code to directly call a function instead of using
an indirect caller, and this function can be changed at runtime.
In the tracepoint code that calls all the registered callbacks that are
attached to a tracepoint, the following is done:
it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs);
if (it_func_ptr) {
__data = (it_func_ptr)->data;
static_call(tp_func_##name)(__data, args);
}
If there's just a single callback, the static_call is updated to just call
that callback directly. Once another handler is added, then the static
caller is updated to call the iterator, that simply loops over all the
funcs in the array and calls each of the callbacks like the old method
using indirect calling.
The issue was discovered with a race between updating the funcs array and
updating the static_call. The funcs array was updated first and then the
static_call was updated. This is not an issue as long as the first element
in the old array is the same as the first element in the new array. But
that assumption is incorrect, because callbacks also have a priority
field, and if there's a callback added that has a higher priority than the
callback on the old array, then it will become the first callback in the
new array. This means that it is possible to call the old callback with
the new callback data element, which can cause a kernel panic.
static_call = callback1()
funcs[] = {callback1,data1};
callback2 has higher priority than callback1
CPU 1 CPU 2
----- -----
new_funcs = {callback2,data2},
{callback1,data1}
rcu_assign_pointer(tp->funcs, new_funcs);
/*
* Now tp->funcs has the new array
* but the static_call still calls callback1
*/
it_func_ptr = tp->funcs [ new_funcs ]
data = it_func_ptr->data [ data2 ]
static_call(callback1, data);
/* Now callback1 is called with
* callback2's data */
[ KERNEL PANIC ]
update_static_call(iterator);
To prevent this from happening, always switch the static_call to the
iterator before assigning the tp->funcs to the new array. The iterator will
always properly match the callback with its data.
To trigger this bug:
In one terminal:
while :; do hackbench 50; done
In another terminal
echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
while :; do
echo 1 > /sys/kernel/tracing/set_event_pid;
sleep 0.5
echo 0 > /sys/kernel/tracing/set_event_pid;
sleep 0.5
done
And it doesn't take long to crash. This is because the set_event_pid adds
a callback to the sched_waking tracepoint with a high priority, which will
be called before the sched_waking trace event callback is called.
Note, the removal to a single callback updates the array first, before
changing the static_call to single callback, which is the proper order as
the first element in the array is the same as what the static_call is
being changed to.
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
Cc: stable@vger.kernel.org
Fixes:
|
||
Colin Ian King
|
3b1a8f457f |
ftrace: Remove redundant initialization of variable ret
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lkml.kernel.org/r/20210721120915.122278-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
Nicolas Saenz Julienne
|
68e83498cb |
ftrace: Avoid synchronize_rcu_tasks_rude() call when not necessary
synchronize_rcu_tasks_rude() triggers IPIs and forces rescheduling on all CPUs. It is a costly operation and, when targeting nohz_full CPUs, very disrupting (hence the name). So avoid calling it when 'old_hash' doesn't need to be freed. Link: https://lkml.kernel.org/r/20210721114726.1545103-1-nsaenzju@redhat.com Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
Steven Rostedt (VMware)
|
9528c19507 |
tracing: Clean up alloc_synth_event()
alloc_synth_event() currently has the following code to initialize the event fields and dynamic_fields: for (i = 0, j = 0; i < n_fields; i++) { event->fields[i] = fields[i]; if (fields[i]->is_dynamic) { event->dynamic_fields[j] = fields[i]; event->dynamic_fields[j]->field_pos = i; event->dynamic_fields[j++] = fields[i]; event->n_dynamic_fields++; } } 1) It would make more sense to have all fields keep track of their field_pos. 2) event->dynmaic_fields[j] is assigned twice for no reason. 3) We can move updating event->n_dynamic_fields outside the loop, and just assign it to j. This combination makes the code much cleaner. Link: https://lkml.kernel.org/r/20210721195341.29bb0f77@oasis.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
Steven Rostedt (VMware)
|
1e3bac71c5 |
tracing/histogram: Rename "cpu" to "common_cpu"
Currently the histogram logic allows the user to write "cpu" in as an
event field, and it will record the CPU that the event happened on.
The problem with this is that there's a lot of events that have "cpu"
as a real field, and using "cpu" as the CPU it ran on, makes it
impossible to run histograms on the "cpu" field of events.
For example, if I want to have a histogram on the count of the
workqueue_queue_work event on its cpu field, running:
># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
Gives a misleading and wrong result.
Change the command to "common_cpu" as no event should have "common_*"
fields as that's a reserved name for fields used by all events. And
this makes sense here as common_cpu would be a field used by all events.
Now we can even do:
># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
># cat events/workqueue/workqueue_queue_work/hist
# event histogram
#
# trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
#
{ common_cpu: 0, cpu: 2 } hitcount: 1
{ common_cpu: 0, cpu: 4 } hitcount: 1
{ common_cpu: 7, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 1 } hitcount: 1
{ common_cpu: 0, cpu: 6 } hitcount: 2
{ common_cpu: 0, cpu: 5 } hitcount: 2
{ common_cpu: 1, cpu: 1 } hitcount: 4
{ common_cpu: 6, cpu: 6 } hitcount: 4
{ common_cpu: 5, cpu: 5 } hitcount: 14
{ common_cpu: 4, cpu: 4 } hitcount: 26
{ common_cpu: 0, cpu: 0 } hitcount: 39
{ common_cpu: 2, cpu: 2 } hitcount: 184
Now for backward compatibility, I added a trick. If "cpu" is used, and
the field is not found, it will fall back to "common_cpu" and work as
it did before. This way, it will still work for old programs that use
"cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
will get that event's "cpu" field, which is probably what it wants
anyway.
I updated the tracefs/README to include documentation about both the
common_timestamp and the common_cpu. This way, if that text is present in
the README, then an application can know that common_cpu is supported over
just plain "cpu".
Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
Steven Rostedt (VMware)
|
3b13911a2f |
tracing: Synthetic event field_pos is an index not a boolean
Performing the following:
># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events
># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs' > events/sched/sched_waking/trigger
># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1:onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,prev_comm)'\
> events/sched/sched_switch/trigger
># echo 1 > events/synthetic/enable
Crashed the kernel:
BUG: kernel NULL pointer dereference, address: 000000000000001b
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.13.0-rc5-test+ #104
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:strlen+0x0/0x20
Code: f6 82 80 2b 0b bc 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2b 0b bc
20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10
48 89 f8 48 83 c0 01 80 38 9 f8 c3 31
RSP: 0018:ffffaa75000d79d0 EFLAGS: 00010046
RAX: 0000000000000002 RBX: ffff9cdb55575270 RCX: 0000000000000000
RDX: ffff9cdb58c7a320 RSI: ffffaa75000d7b40 RDI: 000000000000001b
RBP: ffffaa75000d7b40 R08: ffff9cdb40a4f010 R09: ffffaa75000d7ab8
R10: ffff9cdb4398c700 R11: 0000000000000008 R12: ffff9cdb58c7a320
R13: ffff9cdb55575270 R14: ffff9cdb58c7a000 R15: 0000000000000018
FS: 0000000000000000(0000) GS:ffff9cdb5aa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001b CR3: 00000000c0612006 CR4: 00000000001706e0
Call Trace:
trace_event_raw_event_synth+0x90/0x1d0
action_trace+0x5b/0x70
event_hist_trigger+0x4bd/0x4e0
? cpumask_next_and+0x20/0x30
? update_sd_lb_stats.constprop.0+0xf6/0x840
? __lock_acquire.constprop.0+0x125/0x550
? find_held_lock+0x32/0x90
? sched_clock_cpu+0xe/0xd0
? lock_release+0x155/0x440
? update_load_avg+0x8c/0x6f0
? enqueue_entity+0x18a/0x920
? __rb_reserve_next+0xe5/0x460
? ring_buffer_lock_reserve+0x12a/0x3f0
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x1ae/0x240
trace_event_raw_event_sched_switch+0x114/0x170
__traceiter_sched_switch+0x39/0x50
__schedule+0x431/0xb00
schedule_idle+0x28/0x40
do_idle+0x198/0x2e0
cpu_startup_entry+0x19/0x20
secondary_startup_64_no_verify+0xc2/0xcb
The reason is that the dynamic events array keeps track of the field
position of the fields array, via the field_pos variable in the
synth_field structure. Unfortunately, that field is a boolean for some
reason, which means any field_pos greater than 1 will be a bug (in this
case it was 2).
Link: https://lkml.kernel.org/r/20210721191008.638bce34@oasis.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
Linus Torvalds
|
4784dc99c7 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller: 1) Fix type of bind option flag in af_xdp, from Baruch Siach. 2) Fix use after free in bpf_xdp_link_release(), from Xuan Zhao. 3) PM refcnt imbakance in r8152, from Takashi Iwai. 4) Sign extension ug in liquidio, from Colin Ian King. 5) Mising range check in s390 bpf jit, from Colin Ian King. 6) Uninit value in caif_seqpkt_sendmsg(), from Ziyong Xuan. 7) Fix skb page recycling race, from Ilias Apalodimas. 8) Fix memory leak in tcindex_partial_destroy_work, from Pave Skripkin. 9) netrom timer sk refcnt issues, from Nguyen Dinh Phi. 10) Fix data races aroun tcp's tfo_active_disable_stamp, from Eric Dumazet. 11) act_skbmod should only operate on ethernet packets, from Peilin Ye. 12) Fix slab out-of-bpunds in fib6_nh_flush_exceptions(),, from Psolo Abeni. 13) Fix sparx5 dependencies, from Yajun Deng. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits) dpaa2-switch: seed the buffer pool after allocating the swp net: sched: cls_api: Fix the the wrong parameter net: sparx5: fix unmet dependencies warning net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum net: dsa: ensure linearized SKBs in case of tail taggers ravb: Remove extra TAB ravb: Fix a typo in comment net: dsa: sja1105: make VID 4095 a bridge VLAN too tcp: disable TFO blackhole logic by default sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set net: ixp46x: fix ptp build failure ibmvnic: Remove the proper scrq flush selftests: net: add ESP-in-UDP PMTU test udp: check encap socket in __udp_lib_err sctp: update active_key for asoc when old key is being replaced r8169: Avoid duplicate sysfs entry creation error ixgbe: Fix packet corruption due to missing DMA sync Revert "qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()" ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions fsl/fman: Add fibre support ... |
||
Haoran Luo
|
67f0d6d988 |
tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to the same buffer page, whose "buffer_data_page" is empty and "read" field is non-zero. An error scenario could be constructed as followed (kernel perspective): 1. All pages in the buffer has been accessed by reader(s) so that all of them will have non-zero "read" field. 2. Read and clear all buffer pages so that "rb_num_of_entries()" will return 0 rendering there's no more data to read. It is also required that the "read_page", "commit_page" and "tail_page" points to the same page, while "head_page" is the next page of them. 3. Invoke "ring_buffer_lock_reserve()" with large enough "length" so that it shot pass the end of current tail buffer page. Now the "head_page", "commit_page" and "tail_page" points to the same page. 4. Discard current event with "ring_buffer_discard_commit()", so that "head_page", "commit_page" and "tail_page" points to a page whose buffer data page is now empty. When the error scenario has been constructed, "tracing_read_pipe" will be trapped inside a deadloop: "trace_empty()" returns 0 since "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such constructed ring buffer. Then "trace_find_next_entry_inc()" always return NULL since "rb_num_of_entries()" reports there's no more entry to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking "tracing_read_pipe" back to the start of the "waitagain" loop. I've also written a proof-of-concept script to construct the scenario and trigger the bug automatically, you can use it to trace and validate my reasoning above: https://github.com/aegistudio/RingBufferDetonator.git Tests has been carried out on linux kernel 5.14-rc2 ( |
||
Yang Yingliang
|
b42b0bddcb |
workqueue: fix UAF in pwq_unbound_release_workfn()
I got a UAF report when doing fuzz test:
[ 152.880091][ T8030] ==================================================================
[ 152.881240][ T8030] BUG: KASAN: use-after-free in pwq_unbound_release_workfn+0x50/0x190
[ 152.882442][ T8030] Read of size 4 at addr ffff88810d31bd00 by task kworker/3:2/8030
[ 152.883578][ T8030]
[ 152.883932][ T8030] CPU: 3 PID: 8030 Comm: kworker/3:2 Not tainted 5.13.0+ #249
[ 152.885014][ T8030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 152.886442][ T8030] Workqueue: events pwq_unbound_release_workfn
[ 152.887358][ T8030] Call Trace:
[ 152.887837][ T8030] dump_stack_lvl+0x75/0x9b
[ 152.888525][ T8030] ? pwq_unbound_release_workfn+0x50/0x190
[ 152.889371][ T8030] print_address_description.constprop.10+0x48/0x70
[ 152.890326][ T8030] ? pwq_unbound_release_workfn+0x50/0x190
[ 152.891163][ T8030] ? pwq_unbound_release_workfn+0x50/0x190
[ 152.891999][ T8030] kasan_report.cold.15+0x82/0xdb
[ 152.892740][ T8030] ? pwq_unbound_release_workfn+0x50/0x190
[ 152.893594][ T8030] __asan_load4+0x69/0x90
[ 152.894243][ T8030] pwq_unbound_release_workfn+0x50/0x190
[ 152.895057][ T8030] process_one_work+0x47b/0x890
[ 152.895778][ T8030] worker_thread+0x5c/0x790
[ 152.896439][ T8030] ? process_one_work+0x890/0x890
[ 152.897163][ T8030] kthread+0x223/0x250
[ 152.897747][ T8030] ? set_kthread_struct+0xb0/0xb0
[ 152.898471][ T8030] ret_from_fork+0x1f/0x30
[ 152.899114][ T8030]
[ 152.899446][ T8030] Allocated by task 8884:
[ 152.900084][ T8030] kasan_save_stack+0x21/0x50
[ 152.900769][ T8030] __kasan_kmalloc+0x88/0xb0
[ 152.901416][ T8030] __kmalloc+0x29c/0x460
[ 152.902014][ T8030] alloc_workqueue+0x111/0x8e0
[ 152.902690][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0
[ 152.903459][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0
[ 152.904198][ T8030] scrub_workers_get+0x1e8/0x490
[ 152.904929][ T8030] btrfs_scrub_dev+0x1b9/0x9c0
[ 152.905599][ T8030] btrfs_ioctl+0x122c/0x4e50
[ 152.906247][ T8030] __x64_sys_ioctl+0x137/0x190
[ 152.906916][ T8030] do_syscall_64+0x34/0xb0
[ 152.907535][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 152.908365][ T8030]
[ 152.908688][ T8030] Freed by task 8884:
[ 152.909243][ T8030] kasan_save_stack+0x21/0x50
[ 152.909893][ T8030] kasan_set_track+0x20/0x30
[ 152.910541][ T8030] kasan_set_free_info+0x24/0x40
[ 152.911265][ T8030] __kasan_slab_free+0xf7/0x140
[ 152.911964][ T8030] kfree+0x9e/0x3d0
[ 152.912501][ T8030] alloc_workqueue+0x7d7/0x8e0
[ 152.913182][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0
[ 152.913949][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0
[ 152.914703][ T8030] scrub_workers_get+0x1e8/0x490
[ 152.915402][ T8030] btrfs_scrub_dev+0x1b9/0x9c0
[ 152.916077][ T8030] btrfs_ioctl+0x122c/0x4e50
[ 152.916729][ T8030] __x64_sys_ioctl+0x137/0x190
[ 152.917414][ T8030] do_syscall_64+0x34/0xb0
[ 152.918034][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 152.918872][ T8030]
[ 152.919203][ T8030] The buggy address belongs to the object at ffff88810d31bc00
[ 152.919203][ T8030] which belongs to the cache kmalloc-512 of size 512
[ 152.921155][ T8030] The buggy address is located 256 bytes inside of
[ 152.921155][ T8030] 512-byte region [ffff88810d31bc00, ffff88810d31be00)
[ 152.922993][ T8030] The buggy address belongs to the page:
[ 152.923800][ T8030] page:ffffea000434c600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d318
[ 152.925249][ T8030] head:ffffea000434c600 order:2 compound_mapcount:0 compound_pincount:0
[ 152.926399][ T8030] flags: 0x57ff00000010200(slab|head|node=1|zone=2|lastcpupid=0x7ff)
[ 152.927515][ T8030] raw: 057ff00000010200 dead000000000100 dead000000000122 ffff888009c42c80
[ 152.928716][ T8030] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
[ 152.929890][ T8030] page dumped because: kasan: bad access detected
[ 152.930759][ T8030]
[ 152.931076][ T8030] Memory state around the buggy address:
[ 152.931851][ T8030] ffff88810d31bc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 152.932967][ T8030] ffff88810d31bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 152.934068][ T8030] >ffff88810d31bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 152.935189][ T8030] ^
[ 152.935763][ T8030] ffff88810d31bd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 152.936847][ T8030] ffff88810d31be00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 152.937940][ T8030] ==================================================================
If apply_wqattrs_prepare() fails in alloc_workqueue(), it will call put_pwq()
which invoke a work queue to call pwq_unbound_release_workfn() and use the 'wq'.
The 'wq' allocated in alloc_workqueue() will be freed in error path when
apply_wqattrs_prepare() fails. So it will lead a UAF.
CPU0 CPU1
alloc_workqueue()
alloc_and_link_pwqs()
apply_wqattrs_prepare() fails
apply_wqattrs_cleanup()
schedule_work(&pwq->unbound_release_work)
kfree(wq)
worker_thread()
pwq_unbound_release_workfn() <- trigger uaf here
If apply_wqattrs_prepare() fails, the new pwq are not linked, it doesn't
hold any reference to the 'wq', 'wq' is invalid to access in the worker,
so add check pwq if linked to fix this.
Fixes:
|
||
Paul Gortmaker
|
1e7107c5ef |
cgroup1: fix leaked context root causing sporadic NULL deref in LTP
Richard reported sporadic (roughly one in 10 or so) null dereferences and other strange behaviour for a set of automated LTP tests. Things like: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_sop_show_path+0x1b/0x60 ...or these others: RIP: 0010:do_mkdirat+0x6a/0xf0 RIP: 0010:d_alloc_parallel+0x98/0x510 RIP: 0010:do_readlinkat+0x86/0x120 There were other less common instances of some kind of a general scribble but the common theme was mount and cgroup and a dubious dentry triggering the NULL dereference. I was only able to reproduce it under qemu by replicating Richard's setup as closely as possible - I never did get it to happen on bare metal, even while keeping everything else the same. In commit |
||
Thomas Gleixner
|
ff5a6a3550 |
Merge branch 'timers/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent
Pull dyntick fixes from Frederic Weisbecker: - Fix a rearm race in the posix cpu timer code - Handle get_next_timer_interrupt() correctly when no timers are pending Link: https://lore.kernel.org/r/20210715104218.81276-1-frederic@kernel.org |
||
Linus Torvalds
|
3fdacf402b |
tracing: Fix the histogram logic from possibly crashing the kernel
Working on the histogram code, I found that if you dereference a char pointer in a trace event that happens to point to user space, it can crash the kernel, as it does no checks of that pointer. I have code coming that will do this better, so just remove this ability to treat character pointers in trace events as stings in the histogram. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYPH9FRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qsyhAQDKiQzVJtjfsNbIWliDQOaUwJMO9tNl Qu5TUDmPbAA4fwD+MgYsnITPL+o/YcKQ+aMdj/wLLMKfIjhNkFY8wqdLvwg= =CN97 -----END PGP SIGNATURE----- Merge tag 'trace-v5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix the histogram logic from possibly crashing the kernel Working on the histogram code, I found that if you dereference a char pointer in a trace event that happens to point to user space, it can crash the kernel, as it does no checks of that pointer. I have code coming that will do this better, so just remove this ability to treat character pointers in trace events as stings in the histogram" * tag 'trace-v5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not reference char * as a string in histograms |
||
Linus Torvalds
|
6e442d0662 |
Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fixes from Paul McKenney: - fix regressions induced by a merge-window change in scheduler semantics, which means that smp_processor_id() can no longer be used in kthreads using simple affinity to bind themselves to a specific CPU. - fix a bug in Tasks Trace RCU that was thought to be strictly theoretical. However, production workloads have started hitting this, so these fixes need to be merged sooner rather than later. - fix a minor printk()-format-mismatch issue introduced during the merge window. * 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix pr_info() formats and values in show_rcu_gp_kthreads() rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader() rcu-tasks: Don't delete holdouts within trc_inspect_reader() refscale: Avoid false-positive warnings in ref_scale_reader() scftorture: Avoid false-positive warnings in scftorture_invoker() |
||
xuyehan
|
d4e5076c35 |
locking/rwsem: Remove an unused parameter of rwsem_wake()
The 2nd parameter 'count' is not used in this function. The places where the function is called are also modified. Signed-off-by: xuyehan <xuyehan@xiaomi.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/1625547043-28103-1-git-send-email-yehanxu1@gmail.com |
||
Marco Elver
|
b068fc04de |
perf: Refactor permissions check into perf_check_permission()
Refactor the permission check in perf_event_open() into a helper perf_check_permission(). This makes the permission check logic more readable (because we no longer have a negated disjunction). Add a comment mentioning the ptrace check also checks the uid. No functional change intended. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20210705084453.2151729-2-elver@google.com |
||
Marco Elver
|
9d7a6c95f6 |
perf: Fix required permissions if sigtrap is requested
If perf_event_open() is called with another task as target and
perf_event_attr::sigtrap is set, and the target task's user does not
match the calling user, also require the CAP_KILL capability or
PTRACE_MODE_ATTACH permissions.
Otherwise, with the CAP_PERFMON capability alone it would be possible
for a user to send SIGTRAP signals via perf events to another user's
tasks. This could potentially result in those tasks being terminated if
they cannot handle SIGTRAP signals.
Note: The check complements the existing capability check, but is not
supposed to supersede the ptrace_may_access() check. At a high level we
now have:
capable of CAP_PERFMON and (CAP_KILL if sigtrap)
OR
ptrace_may_access(...) // also checks for same thread-group and uid
Fixes:
|
||
Daniel Borkmann
|
e042aa532c |
bpf: Fix pointer arithmetic mask tightening under state pruning
In |
||
Daniel Borkmann
|
59089a189e |
bpf: Remove superfluous aux sanitation on subprog rejection
Follow-up to
|
||
Roman Skakun
|
40ac971eab |
dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
xen-swiotlb can use vmalloc backed addresses for dma coherent allocations
and uses the common helpers. Properly handle them to unbreak Xen on
ARM platforms.
Fixes:
|
||
David S. Miller
|
20192d9c9f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Andrii Nakryiko says: ==================== pull-request: bpf 2021-07-15 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 5 day(s) which contain a total of 9 files changed, 37 insertions(+), 15 deletions(-). The main changes are: 1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and BPF_XDP_CPUMAP programs, from Xuan Zhuo. 2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo. 3) Follow-up fix to subprog poke descriptor use-after-free problem, from Daniel Borkmann and John Fastabend. 4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King. 5) Fix memory leak in BPF sockmap, from John Fastabend. 6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend and Jakub Sitnicki. 7) Fix NULL pointer dereference in bpftool, from Tobias Klauser. 8) AF_XDP documentation fixes, from Baruch Siach. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Steven Rostedt (VMware)
|
704adfb5a9 |
tracing: Do not reference char * as a string in histograms
The histogram logic was allowing events with char * pointers to be used as normal strings. But it was easy to crash the kernel with: # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger And open some files, and boom! BUG: unable to handle page fault for address: 00007f2ced0c3280 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280 FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0 Call Trace: event_hist_trigger+0x463/0x5f0 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? kernel_init_free_pages+0x6d/0x90 ? preempt_count_sub+0x9b/0xd0 ? kernel_init_free_pages+0x6d/0x90 ? get_page_from_freelist+0x12c4/0x1680 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 ftrace_syscall_enter+0x264/0x2c0 syscall_trace_enter.constprop.0+0x1ee/0x210 do_syscall_64+0x1c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Where it triggered a fault on strlen(key) where key was the filename. The reason is that filename is a char * to user space, and the histogram code just blindly dereferenced it, with obvious bad results. I originally tried to use strncpy_from_user/kernel_nofault() but found that there's other places that its dereferenced and not worth the effort. Just do not allow "char *" to act like strings. Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Tom Zanussi <zanussi@kernel.org> Fixes: |
||
Linus Torvalds
|
e9338abf0e |
fallthrough fixes for Clang for 5.14-rc2
Hi Linus, Please, pull the following patches that fix many fall-through warnings when building with Clang and -Wimplicit-fallthrough. This pull-request also contains the patch for Makefile that enables -Wimplicit-fallthrough for Clang, globally. It's also important to notice that since we have adopted the use of the pseudo-keyword macro fallthrough; we also want to avoid having more /* fall through */ comments being introduced. Notice that contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we have to use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. The patch for Makefile also enforces this. We had almost 4,000 of these issues for Clang in the beginning, and there might be a couple more out there when building some architectures with certain configurations. However, with the recent fixes I think we are in good shape and it is now possible to enable -Wimplicit-fallthrough for Clang. :) Thanks! -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDvPV0ACgkQRwW0y0cG 2zGJ1xAAsbSN8I+ESxFCKi4UwQC71988isb0rHGL6rQPBbEYD2bbI+82aGu6WH8z 2ZVQZa5x3tUoIsqR2GMK5Npn1dFvuNWGZkYzlgjAp+FgzaULCLUc4NW9DEWU/Kbd UBz8ilXL7tUAlDalUYjXAVVTxMnRTZ+BDCbEmIdRZ5X/pMlD2xHeMGG4kh8/cHjJ +mqFwCFPZoyLJzPBnm37OEfIr8JxrLZAo1gfmz5sxfmBdYDY3hjV/BVgP8h0bkY9 ITaq0LMEAPMXvU+4DP3DxuqzRr9COYMcddmbaWwsXPd7UAuoXwGLvqx7JE0QqY3g J80w4/hXqEa4o4/fUAsfjYEQoNTEnhl0iGskBJD2xy5Lj17/m4aOSL44ibr2Ag67 yfJPWi9tooYELEGNobPVHPuqk4ts6amKTOKqHWMBEcTEOIGzaGWPEjRoyaMivfZ3 G3ZbEPEYERcJRizm63UbciLeslGsxb6tdYQEy5VI8Wa7caz9Ci9HT+jjS7Vm2fuq 1zg+vIgwuSxvwxrSV/PDTRcQm9EM/MoRL0D4jbr7gtYD6KNwH8Vo6M21CoaB9gH3 FgMiCT4zy9KiONjVDJKkjtzuk+9OwpI5AAg0/fAQ4Ak9TzUN3Xfg7HpytETCIJ0Z Ii3Jqwe+3IRTNq94ZTIiX/0hT7i5GsPxUfPzI2vwttCJn9ctUcM= =XVNN -----END PGP SIGNATURE----- Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "This fixes many fall-through warnings when building with Clang and -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for Clang, globally. It's also important to notice that since we have adopted the use of the pseudo-keyword macro fallthrough, we also want to avoid having more /* fall through */ comments being introduced. Contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. The patch for Makefile also enforces this. We had almost 4,000 of these issues for Clang in the beginning, and there might be a couple more out there when building some architectures with certain configurations. However, with the recent fixes I think we are in good shape and it is now possible to enable the warning for Clang" * tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) Makefile: Enable -Wimplicit-fallthrough for Clang powerpc/smp: Fix fall-through warning for Clang dmaengine: mpc512x: Fix fall-through warning for Clang usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang powerpc/powernv: Fix fall-through warning for Clang MIPS: Fix unreachable code issue MIPS: Fix fall-through warnings for Clang ASoC: Mediatek: MT8183: Fix fall-through warning for Clang power: supply: Fix fall-through warnings for Clang dmaengine: ti: k3-udma: Fix fall-through warning for Clang s390: Fix fall-through warnings for Clang dmaengine: ipu: Fix fall-through warning for Clang iommu/arm-smmu-v3: Fix fall-through warning for Clang mmc: jz4740: Fix fall-through warning for Clang PCI: Fix fall-through warning for Clang scsi: libsas: Fix fall-through warning for Clang video: fbdev: Fix fall-through warning for Clang math-emu: Fix fall-through warning cpufreq: Fix fall-through warning for Clang drm/msm: Fix fall-through warning in msm_gem_new_impl() ... |
||
Nicolas Saenz Julienne
|
aebacb7f6c |
timers: Fix get_next_timer_interrupt() with no timers pending
|
||
Frederic Weisbecker
|
1a3402d93c |
posix-cpu-timers: Fix rearm racing against process tick
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:
CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)
handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)
tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A
Fix this with simply locking process wide cputime counting start and
timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Fixes:
|
||
Linus Torvalds
|
8096acd744 |
Networking fixes for 5.14-rc2, including fixes from bpf and netfilter.
Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state - netfilter: conntrack: do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0 vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw 4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs= =QFnb -----END PGP SIGNATURE----- Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ... |
||
Christian Brauner
|
d1d488d813 |
fs: add vfs_parse_fs_param_source() helper
Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks. Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Christian Brauner
|
3b0462726e |
cgroup: verify that source is a string
The following sequence can be used to trigger a UAF:
int fscontext_fd = fsopen("cgroup");
int fd_null = open("/dev/null, O_RDONLY);
int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
close_range(3, ~0U, 0);
The cgroup v1 specific fs parser expects a string for the "source"
parameter. However, it is perfectly legitimate to e.g. specify a file
descriptor for the "source" parameter. The fs parser doesn't know what
a filesystem allows there. So it's a bug to assume that "source" is
always of type fs_value_is_string when it can reasonably also be
fs_value_is_file.
This assumption in the cgroup code causes a UAF because struct
fs_parameter uses a union for the actual value. Access to that union is
guarded by the param->type member. Since the cgroup paramter parser
didn't check param->type but unconditionally moved param->string into
fc->source a close on the fscontext_fd would trigger a UAF during
put_fs_context() which frees fc->source thereby freeing the file stashed
in param->file causing a UAF during a close of the fd_null.
Fix this by verifying that param->type is actually a string and report
an error if not.
In follow up patches I'll add a new generic helper that can be used here
and by other filesystems instead of this error-prone copy-pasta fix.
But fixing it in here first makes backporting a it to stable a lot
easier.
Fixes:
|
||
Daniel Borkmann
|
5dd0a6b858 |
bpf: Fix tail_call_reachable rejection for interpreter when jit failed
During testing of |
||
Ingo Molnar
|
e9ba16e68c |
smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
While this function is a static inline, and is only used once in local scope, certain Kconfig variations may cause it to be compiled as a standalone function: 89231bf0 <idle_init>: 89231bf0: 83 05 60 d9 45 89 01 addl $0x1,0x8945d960 89231bf7: 55 push %ebp Resulting in this build failure: WARNING: modpost: vmlinux.o(.text.unlikely+0x7fd5): Section mismatch in reference from the function idle_init() to the function .init.text:fork_idle() The function idle_init() references the function __init fork_idle(). This is often because idle_init lacks a __init annotation or the annotation of fork_idle is wrong. ERROR: modpost: Section mismatches detected. Certain USBSAN options x86-32 builds with CONFIG_CC_OPTIMIZE_FOR_SIZE=y seem to be causing this. So mark idle_init() as __always_inline to work around this compiler bug/feature. Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Gustavo A. R. Silva
|
1adee589cd |
kernel: debug: Fix unreachable code in gdb_serial_stub()
Fix the following warning: kernel/debug/gdbstub.c:1049:4: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] fallthrough; ^ include/linux/compiler_attributes.h:210:41: note: expanded from macro 'fallthrough' # define fallthrough __attribute__((__fallthrough__) by placing the fallthrough; statement inside ifdeffery. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> |
||
Linus Torvalds
|
98f7fdced2 |
Two fixes:
- Fix a MIPS IRQ handling RCU bug - Remove a DocBook annotation for a parameter that doesn't exist anymore Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDq9IcRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gOnA//U6VcTbivf1QG05Omof+378iFZWWSdDHP /iF12DrtcalqCH73o4iuiEg3W8PdISqvLh5QI9G1WvKNXOmX/RRoFPpd1W24uM/D DM9m3hEjjEX3AmRZoQ7yrA2Jzd435DKQOHjzClV/5ykKiyBw533twHm7L9sI6WCl fsP5yiCMXUP2wS2Go/wWgRjrIIb5iQezXGGxeKhP4zW/lam86ntPcdm8I6EuRD7r bT4TXmJkjk98FL+A6QSk8cP+ZlzauHkNa0HYiSz1pvOnwPDYs8IXacCEeOwog3Wr hH0IXbX2NXJIUPDQc/BvWzOMxRNBgMZ1ixnj8B2kaDfIM5YfaGAV3/GuogFCUSns 1u9E+LkLohvE61w7m49PjnYIlWSkuQFrMlJL+69O2wiqmbFX9aQa/T9i7Mq7bmsv Q5qTe935QRVcraUPQ5h0uJnKmBL8xteHpQSSrF5ppsC8cxEDX/CC/GsqAn0xG15Q VZ91ocXr58zCXDpLbesULY7LbGyB6c1Kt55WYxhp/IOtRhEpVzD5Y9ukXYdckIpq 25TMFsPyTUTaS3r8nD8G0QeW3CLVj8Lzi/r0P3oa8GvgyqxlX7jzXI8aGPRb7QIi JtX6iY2nH0DZfp600VMJNUS67yMDuDXJ2Me5WY3XkvpYuFtPGRm41HOs3dEV+Ay1 nxBMxUC/6Ug= =xghF -----END PGP SIGNATURE----- Merge tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Two fixes: - Fix a MIPS IRQ handling RCU bug - Remove a DocBook annotation for a parameter that doesn't exist anymore" * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry genirq/irqdesc: Drop excess kernel-doc entry @lookup |
||
Linus Torvalds
|
877029d921 |
Three fixes:
- Fix load tracking bug/inconsistency - Fix a sporadic CFS bandwidth constraints enforcement bug - Fix a uclamp utilization tracking bug for newly woken tasks Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDq8scRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1i1nQ/+PsOimY0+2kW2EUp/a1a8h1ZHfyG1RjDo Kl/CCUDdEP0OuWC/cdnvYgo3hNPTVCpZvsIoR/t7CFinuR/ubfsFtLZ5juOzjk5v 2RjmSyOr0DbhUcO3GFKo+ABlrmiiJ9de89+oal+W/t/Zks/xqVHQ+f7HqSGvSd1f Lhem4U9UakbO3yjT/+VwSvNZgP8trtPQ6rrKw+yrwrxfjSo4D0Y+/3u/HCaYthIW 5i1+uBEXGnZaU7QhDfqqzbGcAKLRA+i2vBmNfbOyUeCcyKTsKlLwX9L1DlgNPLRP XvxVJWJcxwTsLbwVG1F3TvWw93iSLi34jPO//2ZnNppEhA4fjxmLSYV3uIsm8PUY /YmDdZ6fTW7ZIO/nhfcf3nS8Sp0UlfHXL9dV3mn2EzeMLGOKZY6vgAKdvEd+Fj+y J+VB01MgmVzGvFr9o1/ez3vWyk03CLDQuYMUo0yVcqAi4OLaArAz5vxXR/cF4PsB r69CCdSinMj2finaR39Eq0431Tpv71NDDfyqjVJEOk88Weszu6IACIOJCvpy0ZLQ LOA5kl2I8/mYEevnXgg9NPX8XO2iUFS1cVVNsRHUe4zqQZPPoBD6Oppb+kmfUQUe gABCZK217nkqFH4GdC9RCtRdnb4HO+6H15cLlDHjilECgGOPeJ8CPaK3pRvzv0g3 N2m4KcFI2j0= =FBcX -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes: - Fix load tracking bug/inconsistency - Fix a sporadic CFS bandwidth constraints enforcement bug - Fix a uclamp utilization tracking bug for newly woken tasks" * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/uclamp: Ignore max aggregation if rq is idle sched/fair: Fix CFS bandwidth hrtimer expiry type sched/fair: Sync load_sum with load_avg after dequeue |
||
Linus Torvalds
|
301c8b1d7c |
Locking fixes:
- Fix a Sparc crash - Fix a number of objtool warnings - Fix /proc/lockdep output on certain configs - Restore a kprobes fail-safe Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDq8DMRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jZcA//aYhW8gm3rjtXeRme6H5vLF3fehxw9xoC g6RTAHStHd9xJyctsFYR7Fx7o1l2G05jf5tv4MWoAYMtnjz6OKfPQu7b8eTD3Z+3 n0AAfsrrVaK4f8AgGZ+bj4kw/BCJL0Xx8HyRXjDWODVZVY+yUEo2c5vsw02inQeW 3AQ1m4ZhQBYvl7r4pD0oi6BrL0ruvC0NN5kRYuh1Ib4I8GtF1h9ACPFICxsV6Glx 4SKqzsvaQbV+9EiiLpKqEpi/EJqMmAE5sr4EUnQxWsMeuOKavzETck1ZxWTO5iIh gXI2yTuLS6++yBPCQer/8eePsP3bAiQeNJ+71xpfFdmwx9osA7DFe3aV3f5Ug+Bq f4yswcw1Y1jZhvNp3AV9kE+h2mrSUEWGKAj9LCIV6VqNfOeKKrAyrxSfLRYiB1Ek M9+ML+lN3M2c4n5P7qxx1ZUOZ1It19Nx6HNEeTPkfKhlI+57hpmvPvKIjqZQRdAD oE9exVRssFxDQLIHWoshoDQ7JVR7fsqn7I6ExejnAIpl6veFAAQ457gOHmFyi+jo aLeCTAie0hA18TrMqWtp/ftnpTTJvRJKtHPQXIYmqEkp8S85ryd7Co/9sMRHDS8e XhQRFPSfp4MHqucmoyUIlbRkv16f/0RsC0gv10U0T/WUkjQGMBL5/dvZLpJILtDm DOmYxoe0UP8= =WvwL -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Fix a Sparc crash - Fix a number of objtool warnings - Fix /proc/lockdep output on certain configs - Restore a kprobes fail-safe * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: sparc: Fix arch_cmpxchg64_local() kprobe/static_call: Restore missing static_call_text_reserved() static_call: Fix static_call_text_reserved() vs __init jump_label: Fix jump_label_text_reserved() vs __init locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING |
||
Linus Torvalds
|
81361b837a |
Kbuild updates for v5.14
- Increase the -falign-functions alignment for the debug option. - Remove ugly libelf checks from the top Makefile. - Make the silent build (-s) more silent. - Re-compile the kernel if KBUILD_BUILD_TIMESTAMP is specified. - Various script cleanups -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmDon90VHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGWFUP/RGNwlGD/YV1xg0ZmM0/ynBzzOy2 3dcr3etJZpipQDeqnHy3jt0esgMVlbkTdrHvP+2hpNaeXFwjF1fDHjhur9m8ZkVD efOA6nugOnNwhy2G3BvtCJv+Vhb+KZ0nNLB27z3Bl0LGP6LJdMRNAxFBJMv4k3aR F3sABugwCpnT2/YtuprxRl2/3/CyLur5NjY24FD+ugON3JIWfl6ETbHeFmxr1JE4 mE+zaN5AwYuSuH9LpdRy85XVCcW/FFqP/DwOFllVvCCCNvvS0KWYSNHWfEsKdR75 hmAAaS/rpi2eaL0vp88sNhAtYnhMSf+uFu0fyfYeWZuJqMt4Xz5xZKAzDsifCdif aQ6UEPDjiKABh9gpX26BMd2CXzkGR+L4qZ7iBPfO586Iy7opajrFX9kIj5U7ZtCl wsPat/9+18xpVJOTe0sss3idId7Ft4cRoW5FQMEAW2EWJ9fXAG1yDxEREj1V5gFx sMXtpmCoQag968qjfARvP08s3MB1P4Ij6tXcioGqHuEWeJLxOMK/KWyafQUg611d 0kSWNO0OMo+odBj6j/vM+MIIaPhgwtZnPgw2q4uHGMcemzQxaEvGW+G/5a5qEpTv SKm8W24wXplNot4tuTGWq5/jANRJcMvVsyC48DYT81OZEOWrIc0kDV4v4qZToTxW 97jn1NKa2H6L0J1V =Za8V -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Increase the -falign-functions alignment for the debug option. - Remove ugly libelf checks from the top Makefile. - Make the silent build (-s) more silent. - Re-compile the kernel if KBUILD_BUILD_TIMESTAMP is specified. - Various script cleanups * tag 'kbuild-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (27 commits) scripts: add generic syscallnr.sh scripts: check duplicated syscall number in syscall table sparc: syscalls: use pattern rules to generate syscall headers parisc: syscalls: use pattern rules to generate syscall headers nds32: add arch/nds32/boot/.gitignore kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set kbuild: modpost: Explicitly warn about unprototyped symbols kbuild: remove trailing slashes from $(KBUILD_EXTMOD) kconfig.h: explain IS_MODULE(), IS_ENABLED() kconfig: constify long_opts scripts/setlocalversion: simplify the short version part scripts/setlocalversion: factor out 12-chars hash construction scripts/setlocalversion: add more comments to -dirty flag detection scripts/setlocalversion: remove workaround for old make-kpkg scripts/setlocalversion: remove mercurial, svn and git-svn supports kbuild: clean up ${quiet} checks in shell scripts kbuild: sink stdout from cmd for silent build init: use $(call cmd,) for generating include/generated/compile.h kbuild: merge scripts/mkmakefile to top Makefile sh: move core-y in arch/sh/Makefile to arch/sh/Kbuild ... |
||
Linus Torvalds
|
5a7f7fc5dd |
Tracing fix for histograms and a clean up in ftrace
- Fixed a bug that broke the .sym-offset modifier and added a test to make sure nothing breaks it again. - Replace a list_del/list_add() with a list_move() -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYOhMlxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6quYRAQCbciaL9gQwk6lS4Rn3NkC/i1xsxv0A q56XDPK3qyuHWQD/d6gcJ5n+Bl0OoNaDyG4UqnSGucZctYQeY4LQMzd4PQA= =wPyY -----END PGP SIGNATURE----- Merge tag 'trace-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix and cleanup from Steven Rostedt: "Tracing fix for histograms and a clean up in ftrace: - Fixed a bug that broke the .sym-offset modifier and added a test to make sure nothing breaks it again. - Replace a list_del/list_add() with a list_move()" * tag 'trace-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Use list_move instead of list_del/list_add tracing/selftests: Add tests to test histogram sym and sym-offset modifiers tracing/histograms: Fix parsing of "sym-offset" modifier |
||
Linus Torvalds
|
bd9c350603 |
Merge branch 'akpm' (patches from Andrew)
Pull yet more updates from Andrew Morton: "54 patches. Subsystems affected by this patch series: lib, mm (slub, secretmem, cleanups, init, pagemap, and mremap), and debug" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (54 commits) powerpc/mm: enable HAVE_MOVE_PMD support powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache mm/mremap: allow arch runtime override mm/mremap: hold the rmap lock in write mode when moving page table entries. mm/mremap: use pmd/pud_poplulate to update page table entries mm/mremap: don't enable optimized PUD move if page table levels is 2 mm/mremap: convert huge PUD move to separate helper selftest/mremap_test: avoid crash with static build selftest/mremap_test: update the test to handle pagesize other than 4K mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * kdump: use vmlinux_build_id to simplify buildid: fix kernel-doc notation buildid: mark some arguments const scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm scripts/decode_stacktrace.sh: support debuginfod x86/dumpstack: use %pSb/%pBb for backtrace printing arm64: stacktrace: use %pSb for backtrace printing module: add printk formats to add module build ID to stacktraces ... |
||
Thomas Gleixner
|
4840048356 |
irqchip fixes for 5.14, take #1
- Fix a MIPS bug where irqdomain loopkups could occur in a context where RCU is not allowed - Fix a documentation bug for handle_domain_irq -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDoFZwPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDkxgP/0IlcPoBfamLbzpjyrMm0359AIllKP7Vgpb6 IaDnoiWTCe1B6rFoP59j357iGCTvB04+oJ17DsB51R9RKaM2sdhi2dKxbEHRd+se v1EyU5gmVB7JJP/1lPnjYdPGnXz73jjEq5MRI6o4yLG2xnQ+ZcdHifuP/tD7glGZ UmJkAWrq53IbBCja9HbtbAQanuDtDu6xgYIweaCdf2oKzNS9jQVvjqFokMd0AcIA juY8xTFNzHoA/8AF8eBUE5TfLVcG8j3P31ffw4gVvzEkman77AP5DZ8qkSvi7plH wOdjlBrsGTfoti4kfAIvsfi6zBHyXJEaW0Vd38uaA+cXYI1QNiZ9qqzQPvjuK9gs VFORcWe2pDiI1q4mg1pz7dGBy0FoEswe4uVnr6xm+vn98KeAn/gS5Dc/K1JpmfN+ iOJt9H7hvDrUC/KnntzuRY82I8y7gDyyjGDJHhFlWgZBeONPhpOiE/d6xbxGtQm8 SpVBD9QZnMBxDY2eVNQp31SCdrwLCLczNeQrJHP6Oh9LLmjQq3VD4M3OpBLpEA8e 8WIiH9vJfXI5emU1wQRkscyA8tT2mAo3Kb192fpC5nDwTMSxbfk1l94oLGUldcV5 liQ78yd/12mJBMS5MJPsA39g1ww2vv5m6Bq0JDwDu33l1qKrlRJqveeS8UjDSXrD s3cVrxFO =eTw3 -----END PGP SIGNATURE----- Merge tag 'irqchip-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Fix a MIPS bug where irqdomain loopkups could occur in a context where RCU is not allowed - Fix a documentation bug for handle_domain_irq |
||
John Fastabend
|
f263a81451 |
bpf: Track subprog poke descriptors correctly and fix use-after-free
Subprograms are calling map_poke_track(), but on program release there is no hook to call map_poke_untrack(). However, on program release, the aux memory (and poke descriptor table) is freed even though we still have a reference to it in the element list of the map aux data. When we run map_poke_run(), we then end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run(): [...] [ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e [ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337 [ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399 [ 402.824715] Call Trace: [ 402.824719] dump_stack+0x93/0xc2 [ 402.824727] print_address_description.constprop.0+0x1a/0x140 [ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824744] kasan_report.cold+0x7c/0xd8 [ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824757] prog_array_map_poke_run+0xc2/0x34e [ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0 [...] The elements concerned are walked as follows: for (i = 0; i < elem->aux->size_poke_tab; i++) { poke = &elem->aux->poke_tab[i]; [...] The access to size_poke_tab is a 4 byte read, verified by checking offsets in the KASAN dump: [ 402.825004] The buggy address belongs to the object at ffff8881905a7800 which belongs to the cache kmalloc-1k of size 1024 [ 402.825008] The buggy address is located 320 bytes inside of 1024-byte region [ffff8881905a7800, ffff8881905a7c00) The pahole output of bpf_prog_aux: struct bpf_prog_aux { [...] /* --- cacheline 5 boundary (320 bytes) --- */ u32 size_poke_tab; /* 320 4 */ [...] In general, subprograms do not necessarily manage their own data structures. For example, BTF func_info and linfo are just pointers to the main program structure. This allows reference counting and cleanup to be done on the latter which simplifies their management a bit. The aux->poke_tab struct, however, did not follow this logic. The initial proposed fix for this use-after-free bug further embedded poke data tracking into the subprogram with proper reference counting. However, Daniel and Alexei questioned why we were treating these objects special; I agree, its unnecessary. The fix here removes the per subprogram poke table allocation and map tracking and instead simply points the aux->poke_tab pointer at the main programs poke table. This way, map tracking is simplified to the main program and we do not need to manage them per subprogram. This also means, bpf_prog_free_deferred(), which unwinds the program reference counting and kfrees objects, needs to ensure that we don't try to double free the poke_tab when free'ing the subprog structures. This is easily solved by NULL'ing the poke_tab pointer. The second detail is to ensure that per subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do this, we add a pointer in the poke structure to point at the subprogram value so JITs can easily check while walking the poke_tab structure if the current entry belongs to the current program. The aux pointer is stable and therefore suitable for such comparison. On the jit_subprogs() error path, we omit cleaning up the poke->aux field because these are only ever referenced from the JIT side, but on error we will never make it to the JIT, so its fine to leave them dangling. Removing these pointers would complicate the error path for no reason. However, we do need to untrack all poke descriptors from the main program as otherwise they could race with the freeing of JIT memory from the subprograms. Lastly, |
||
Stephen Boyd
|
44e8a5e912 |
kdump: use vmlinux_build_id to simplify
We can use the vmlinux_build_id array here now instead of open coding it. This mostly consolidates code. Link: https://lkml.kernel.org/r/20210511003845.2429846-14-swboyd@chromium.org Signed-off-by: Stephen Boyd <swboyd@chromium.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox <willy@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Stephen Boyd
|
9294523e37 |
module: add printk formats to add module build ID to stacktraces
Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices). Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed. Instead, we opt for the simpler approach of introducing new printk formats '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb' for "pointer backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format. Before: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm] direct_entry+0x16c/0x1b4 [lkdtm] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 After: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout] [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set] Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static] [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
9a436f8ff6 |
PM: hibernate: disable when there are active secretmem users
It is unsafe to allow saving of secretmem areas to the hibernation snapshot as they would be visible after the resume and this essentially will defeat the purpose of secret memory mappings. Prevent hibernation whenever there are active secret memory users. Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
1507f51255 |
mm: introduce memfd_secret system call to create "secret" memory areas
Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.
The secretmem feature is off by default and the user must explicitly
enable it at the boot time.
Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call. The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.
Secretmem is designed to provide the following protections:
* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes
"simple" ROP insufficient to perform exfiltration, which increases the
required complexity of the attack. Along with other protections like
the kernel stack size limit and address space layout randomization which
make finding gadgets is really hard, absence of any in-kernel primitive
for accessing secret memory means the one gadget ROP attack can't work.
Since the only way to access secret memory is to reconstruct the missing
mapping entry, the attacker has to recover the physical page and insert
a PTE pointing to it in the kernel and then retrieve the contents. That
takes at least three gadgets which is a level of difficulty beyond most
standard attacks.
* Prevent cross-process secret userspace memory exposures. Once the
secret memory is allocated, the user can't accidentally pass it into the
kernel to be transmitted somewhere. The secreremem pages cannot be
accessed via the direct map and they are disallowed in GUP.
* Harden against exploited kernel flaws. In order to access secretmem,
a kernel-side attack would need to either walk the page tables and
create new ones, or spawn a new privileged uiserspace process to perform
secrets exfiltration using ptrace.
The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations. Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU. Andy Lutomirski says:
"Getting fd-backed memory into a guest will take some possibly major
work in the kernel, but getting vma-backed memory into a guest without
mapping it in the host user address space seems much, much worse."
memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory. Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall. Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with. If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.
The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.
Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit
|
||
Alexey Gladkov
|
f3791f4df5 |
Fix UCOUNT_RLIMIT_SIGPENDING counter leak
We must properly handle an errors when we increase the rlimit counter
and the ucounts reference counter. We have to this with RCU protection
to prevent possible use-after-free that could occur due to concurrent
put_cred_rcu().
The following reproducer triggers the problem:
$ cat testcase.sh
case "${STEP:-0}" in
0)
ulimit -Si 1
ulimit -Hi 1
STEP=1 unshare -rU "$0"
killall sleep
;;
1)
for i in 1 2 3 4 5; do unshare -rU sleep 5 & done
;;
esac
with the KASAN report being along the lines of
BUG: KASAN: use-after-free in put_ucounts+0x17/0xa0
Write of size 4 at addr ffff8880045f031c by task swapper/2/0
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.13.0+ #19
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-alt4 04/01/2014
Call Trace:
<IRQ>
put_ucounts+0x17/0xa0
put_cred_rcu+0xd5/0x190
rcu_core+0x3bf/0xcb0
__do_softirq+0xe3/0x341
irq_exit_rcu+0xbe/0xe0
sysvec_apic_timer_interrupt+0x6a/0x90
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20
default_idle_call+0x53/0x130
do_idle+0x311/0x3c0
cpu_startup_entry+0x14/0x20
secondary_startup_64_no_verify+0xc2/0xcb
Allocated by task 127:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
alloc_ucounts+0x169/0x2b0
set_cred_ucounts+0xbb/0x170
ksys_unshare+0x24c/0x4e0
__x64_sys_unshare+0x16/0x20
do_syscall_64+0x37/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 0:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xeb/0x120
kfree+0xaa/0x460
put_cred_rcu+0xd5/0x190
rcu_core+0x3bf/0xcb0
__do_softirq+0xe3/0x341
The buggy address belongs to the object at ffff8880045f0300
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 28 bytes inside of
192-byte region [ffff8880045f0300, ffff8880045f03c0)
The buggy address belongs to the page:
page:000000008de0a388 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880045f0000 pfn:0x45f0
flags: 0x100000000000200(slab|node=0|zone=1)
raw: 0100000000000200 ffffea00000f4640 0000000a0000000a ffff888001042a00
raw: ffff8880045f0000 000000008010000d 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880045f0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880045f0280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8880045f0300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880045f0380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8880045f0400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
Fixes:
|
||
Baokun Li
|
3ecda64475 |
ftrace: Use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add(). Link: https://lkml.kernel.org/r/20210608031108.2820996-1-libaokun1@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
Linus Torvalds
|
aef4226f91 |
More power management updates for 5.14-rc1
- Drop the ->stop_cpu() (not really useful) and ->resolve_freq() (unused) cpufreq driver callbacks and modify the users of the former accordingly (Viresh Kumar, Rafael Wysocki). - Add frequency invariance support to the ACPI CPPC cpufreq driver again along with the related fixes and cleanups (Viresh Kumar). - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET). - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar). - Add generic performance domains support to the dvfs DT bindings (Sudeep Holla). - Refine locking in the generic power domains (genpd) support code to avoid lock dependency issues (Stephen Boyd). - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak). - Simplify the PM core debug code by using ktime_us_delta() to compute time interval lengths (Mark-PK Tsai). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmDl+NcSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxyggQALTT/4fFZYlAG/2jpu0wl/3OeR5EPS/O /KrfU7NfXOlNDiuZzVVrMsu1YhkAMedvhHl+9bvveYJBgNuMz7Y1WSGY4PAbW9j+ htnvPd8UdpNODAhT3MDV9sdpXglkWAjivCAM7nr57nl3vwrRgAhBEBkA0/OKiij7 dlTSRy+Doq0AI3ceOHlMFHGnL+/PBx3YJcOuKBwK9gHrbCQxoBQJ5QJnm2Sa8NEy QGoQWWvSI6mDntsAYL4g7mbUdOqGSN04KdXnf0nA9Ywe/Lb/Rp/Il7zCfc13iBSV Nv1ala2NFtp/W+nQan8hxXOgbET+ybbduG65bB33McTuaZ3VIV5sSxxAfEWV+DiM PGEGwDJ/vZqoJiiTKmYij07/sIUXe1Q2YsxTvDh0tqzFmO78reDfXlLdfrrOjRsd Ybw3dkI/XQH7/sTawCxn/TdFRj2hx7wg7yDqkLJXnA037BN+EBFqSckyABoBPpmA TwcacjyotwnfwOpoxbxmjCX1VwIIZ2Mk7Q3h+v9Ej5aNIuyDfR0DzybqCLqIan/4 vETtz+OCO5H1BT6g/ctIi13e7MvWzZXudNWTRTS8ZVXzuzo1hO3okHWuXiQbKGOA Dh2sjkJBBgr9WPFjkF5mgZZp8SM0D4S5SSAEglTKSwxcaMcqXVlgEN8beyMa5FuD y/UeopwhL66a =Un7e -----END PGP SIGNATURE----- Merge tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include cpufreq core simplifications and fixes, cpufreq driver updates, cpuidle driver update, a generic power domains (genpd) locking fix and a debug-related simplification of the PM core. Specifics: - Drop the ->stop_cpu() (not really useful) and ->resolve_freq() (unused) cpufreq driver callbacks and modify the users of the former accordingly (Viresh Kumar, Rafael Wysocki). - Add frequency invariance support to the ACPI CPPC cpufreq driver again along with the related fixes and cleanups (Viresh Kumar). - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET). - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar). - Add generic performance domains support to the dvfs DT bindings (Sudeep Holla). - Refine locking in the generic power domains (genpd) support code to avoid lock dependency issues (Stephen Boyd). - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak). - Simplify the PM core debug code by using ktime_us_delta() to compute time interval lengths (Mark-PK Tsai)" * tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits) PM: domains: Shrink locking area of the gpd_list_lock PM: sleep: Use ktime_us_delta() in initcall_debug_report() cpufreq: CPPC: Add support for frequency invariance arch_topology: Avoid use-after-free for scale_freq_data cpufreq: CPPC: Pass structure instance by reference cpufreq: CPPC: Fix potential memleak in cppc_cpufreq_cpu_init cpufreq: Remove ->resolve_freq() cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target() cpufreq: Remove the ->stop_cpu() driver callback cpufreq: powernv: Migrate to ->exit() callback instead of ->stop_cpu() cpufreq: CPPC: Migrate to ->exit() callback instead of ->stop_cpu() cpufreq: intel_pstate: Combine ->stop_cpu() and ->offline() cpuidle: qcom: Add SPM register data for MSM8226 dt-bindings: arm: msm: Add SAW2 for MSM8226 dt-bindings: cpufreq: update cpu type and clock name for MT8173 SoC clk: mediatek: remove deprecated CLK_INFRA_CA57SEL for MT8173 SoC cpufreq: dt: Rename black/white-lists cpufreq: scmi: Fix an error message cpufreq: mediatek: add support for mt8365 dt-bindings: dvfs: Add support for generic performance domains ... |
||
Linus Torvalds
|
a931dd33d3 |
Modules updates for v5.14
Summary of modules changes for the 5.14 merge window: - Fix incorrect logic in module_kallsyms_on_each_symbol() - Fix for a Coccinelle warning Signed-off-by: Jessica Yu <jeyu@kernel.org> -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAmDltvoQHGpleXVAa2Vy bmVsLm9yZwAKCRDARX44zjvBcu74D/9+HDmNlEd92ongCVhod9+s4p/g7pArBPdW YrvjYM+icH0qpE58etIQOibvFL04WxelDqyrCewEhABwB5hbad+owShMl+R43C8P IRCYYN2sG7vHVPMiUgeXlS5OiBZeFOb+Mf9Ccnu5MuI/BHpsTX9uhG+rbHQg+lXb eOWu0QmmCXgtZy0NVeCNtdOJV/PwgXC3ZHE2zgWukf1QWtwm1+0VciZwui162eyN TL/XLSH5pNZNr2MSxi5b2rdwrUVdxb0enE4L4cwCdALipooVEsrxuVvugcYPBHrC 38b7cJBfdYpVuS2iK9tSqSMdRkqX2LKZrfiGFCAxn2TA55Jc6WJhbzL0nEPGyGX1 fRR9tGAuWe1ihQHRIajLMtmdfA/HHJMmuPx8c0JDIAeTj3SdDch/gEhcKcXj6kP2 eXRoQS3RD7RSz6jKKRTFxJsn8lYJhSYN8IGwfjQzPwruLRynDQaCpnN+7wfGsdCf rKb7NxdsbNdQKjkaewu2zb/Qk7U7TOVjJhAv2gN0u/RbGSJ3i+MJTLfk3d/LLJcD ncLaVh2zRHIdKNQj/dtmIeXiSuJ23NjNQszOiaI8xZ7YKexhJCimAWPP/8f4dCuB 2tL/Xkg1GVy9+WZQXkApyOPzvtXXE9dDfQR+HiNTQaTlZpw5SxYg0ZPBBL22yv1J xDD9Hfl1jw== =/zzN -----END PGP SIGNATURE----- Merge tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: - Fix incorrect logic in module_kallsyms_on_each_symbol() - Fix for a Coccinelle warning * tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 kernel/module: Use BUG_ON instead of if condition followed by BUG |
||
Steven Rostedt (VMware)
|
26c5637310 |
tracing/histograms: Fix parsing of "sym-offset" modifier
With the addition of simple mathematical operations (plus and minus), the
parsing of the "sym-offset" modifier broke, as it took the '-' part of the
"sym-offset" as a minus, and tried to break it up into a mathematical
operation of "field.sym - offset", in which case it failed to parse
(unless the event had a field called "offset").
Both .sym and .sym-offset modifiers should not be entered into
mathematical calculations anyway. If ".sym-offset" is found in the
modifier, then simply make it not an operation that can be calculated on.
Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
Peter Zijlstra
|
e6b4457b05 |
locking/mutex: Add MUTEX_WARN_ON
Cleanup some #ifdef'fery. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154115.020298650@infradead.org |
||
Peter Zijlstra
|
ad90880dc9 |
locking/mutex: Introduce __mutex_trylock_or_handoff()
Yanfei reported that it is possible to loose HANDOFF when we race with mutex_unlock() and end up setting HANDOFF on an unlocked mutex. At that point anybody can steal it, losing HANDOFF in the process. If this happens often enough, we can in fact starve the top waiter. Solve this by folding the 'set HANDOFF' operation into the trylock operation, such that either we acquire the lock, or it gets HANDOFF set. This avoids having HANDOFF set on an unlocked mutex. Reported-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.958507900@infradead.org |
||
Peter Zijlstra
|
048661a1f9 |
locking/mutex: Fix HANDOFF condition
Yanfei reported that setting HANDOFF should not depend on recomputing @first, only on @first state. Which would then give: if (ww_ctx || !first) first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); But because 'ww_ctx || !first' is basically 'always' and the test for first is relatively cheap, omit that first branch entirely. Reported-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org |
||
Peter Zijlstra
|
ab4e4d9f79 |
locking/mutex: Use try_cmpxchg()
For simpler and better code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.834438545@infradead.org |
||
Paul E. McKenney
|
2a2ed5618a |
rcu: Fix pr_info() formats and values in show_rcu_gp_kthreads()
This commit changes from "%lx" to "%x" and from "0x1ffffL" to "0x1ffff" to match the change in type between the old field ->state (unsigned long) and the new field ->__state (unsigned int). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Paul E. McKenney
|
a9ab9cce93 |
rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()
Invoking trc_del_holdout() from within trc_wait_for_one_reader() is only a performance optimization because the RCU Tasks Trace grace-period kthread will eventually do this within check_all_holdout_tasks_trace(). But it is not a particularly important performance optimization because it only applies to the grace-period kthread, of which there is but one. This commit therefore removes this invocation of trc_del_holdout() in favor of the one in check_all_holdout_tasks_trace() in the grace-period kthread. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Paul E. McKenney
|
1d10bf55d8 |
rcu-tasks: Don't delete holdouts within trc_inspect_reader()
As Yanfei pointed out, although invoking trc_del_holdout() is safe from the viewpoint of the integrity of the holdout list itself, the put_task_struct() invoked by trc_del_holdout() can result in use-after-free errors due to later accesses to this task_struct structure by the RCU Tasks Trace grace-period kthread. This commit therefore removes this call to trc_del_holdout() from trc_inspect_reader() in favor of the grace-period thread's existing call to trc_del_holdout(), thus eliminating that particular class of use-after-free errors. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Paul E. McKenney
|
05bc276cf2 |
refscale: Avoid false-positive warnings in ref_scale_reader()
If the call to set_cpus_allowed_ptr() in ref_scale_reader()
fails, a later WARN_ONCE() complains. But with the advent of
|
||
Paul E. McKenney
|
22b6d14992 |
scftorture: Avoid false-positive warnings in scftorture_invoker()
If the call to set_cpus_allowed_ptr() in scftorture_invoker()
fails, a later WARN_ONCE() complains. But with the advent of
|
||
Linus Torvalds
|
df8ba5f160 |
kgdb patches for 5.14
This was a extremely quiet cycle for kgdb. This PR consists of two patches that between them address spelling errors and a switch fallthrough warning. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmDkLFQACgkQfOMlXTn3 iKEXqA/8CjjiWc7Tr9GrXuSKf2QjcPxUEQwzEVCqalk/X7bQwdJg69nKVulW4Ixm db4fANz90w9E8CCMNJlc/Wfev43kcQprn6cO1AkBUyhtDn6F/ql8E/oUWzniGJiU ko+2Hkd9Vfr6lYsDeMNPMe4bjYYUcfBIkJivPA6auX2iyJq7WJgUAh9iea2VoaNB MVjy8F7uUA/TUK4Y/4YiCK4lBILT0KnUvr50n67ZxhMlE3nzFfI0DtIx+7D1vcor w6COVZSkDeyuj+gYq7BFrnOaXPqmEnwesva8VI3UHBOY8+bXdzAld0MrzpyktXTh HPTRIR1KPdqxr4P+wOJc/D8HRIQZ2oKXFXaziUvy6rSxyfBWpJZ1ZCy/uUbPhmyx d620xwHa9R4g0VeA6NGCHhLv7T7vCItBap+bTVAzFJ1LfBYWF4Y3NUsnWxkFNBjM URv9lfXSD6wXt8k/CtkyYUCilT9RsTrQUwJio3yd+kmrbUm3iPfDMWWHW78Jdp0v JIR3/47YNCh53O37QzRGgiB4/cVBzhk5hI/pRnkfRFttUqIjeaSYXvT4mmTUox+f euZESQbwsi8XmvqNSIkDuNjru6JmrfEiiAotjBY6lZKhRP0dhldWPGNT5x1XCh8i p1Dx42So4BNaPJSf3eTiFWGgtYKqu/gqkns4D4ejtiQUe60pzC0= =dnV5 -----END PGP SIGNATURE----- Merge tag 'kgdb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "This was a extremely quiet cycle for kgdb. This consists of two patches that between them address spelling errors and a switch fallthrough warning" * tag 'kgdb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: Fix fall-through warning for Clang kgdb: Fix spelling mistakes |
||
Peter Zijlstra
|
fa68bd09fc |
kprobe/static_call: Restore missing static_call_text_reserved()
Restore two hunks from commit: |
||
Peter Zijlstra
|
2bee6d16e4 |
static_call: Fix static_call_text_reserved() vs __init
It turns out that static_call_text_reserved() was reporting __init
text as being reserved past the time when the __init text was freed
and re-used.
This is mostly harmless and will at worst result in refusing a kprobe.
Fixes:
|
||
Peter Zijlstra
|
9e667624c2 |
jump_label: Fix jump_label_text_reserved() vs __init
It turns out that jump_label_text_reserved() was reporting __init text as being reserved past the time when the __init text was freed and re-used. For a long time, this resulted in, at worst, not being able to kprobe text that happened to land at the re-used address. However a recent commit |
||
Xiongwei Song
|
4840ce2267 |
locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING
When enabling CONFIG_LOCK_STAT=y, then CONFIG_LOCKDEP=y is forcedly enabled, but CONFIG_PROVE_LOCKING is disabled. We can get output from /proc/lockdep, which currently includes usages of lock classes. But the usages are meaningless, see the output below: / # cat /proc/lockdep all lock classes: ffffffff9af63350 ....: cgroup_mutex ffffffff9af54eb8 ....: (console_sem).lock ffffffff9af54e60 ....: console_lock ffffffff9ae74c38 ....: console_owner_lock ffffffff9ae74c80 ....: console_owner ffffffff9ae66e60 ....: cpu_hotplug_lock Only one usage context for each lock, this is because each usage is only changed in mark_lock() that is in the CONFIG_PROVE_LOCKING=y section, however in the test situation, it's not. The fix is to move the usages reading and seq_print from the !CONFIG_PROVE_LOCKING section to its defined section. Also, locks_after list of lock_class is empty when !CONFIG_PROVE_LOCKING, so do the same thing as what have done for usages of lock classes. With this patch with !CONFIG_PROVE_LOCKING we can get the results below: / # cat /proc/lockdep all lock classes: ffffffff85163290: cgroup_mutex ffffffff85154dd8: (console_sem).lock ffffffff85154d80: console_lock ffffffff85074b58: console_owner_lock ffffffff85074ba0: console_owner ffffffff85066d60: cpu_hotplug_lock ... a class key and the relevant class name each line. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20210629135916.308210-1-sxwjean@me.com |
||
Linus Torvalds
|
28e92f9903 |
Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ... |
||
Linus Torvalds
|
b97efd5e98 |
Merge branch 'kcsan.2021.05.18a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney. * 'kcsan.2021.05.18a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: Use URL link for pointing access-marking.txt kcsan: Document "value changed" line kcsan: Report observed value changes kcsan: Remove kcsan_report_type kcsan: Remove reporting indirection kcsan: Refactor access_info initialization kcsan: Fold panic() call into print_report() kcsan: Refactor passing watchpoint/other_info kcsan: Distinguish kcsan_report() calls kcsan: Simplify value change detection kcsan: Add pointer to access-marking.txt to data_race() bullet |
||
Linus Torvalds
|
58ec9059b3 |
Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs name lookup updates from Al Viro: "Small namei.c patch series, mostly to simplify the rules for nameidata state. It's actually from the previous cycle - but I didn't post it for review in time... Changes visible outside of fs/namei.c: file_open_root() calling conventions change, some freed bits in LOOKUP_... space" * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namei: make sure nd->depth is always valid teach set_nameidata() to handle setting the root as well take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space switch file_open_root() to struct path |
||
Linus Torvalds
|
757fa80f4e |
Tracing updates for 5.14:
- Added option for per CPU threads to the hwlat tracer - Have hwlat tracer handle hotplug CPUs - New tracer: osnoise, that detects latency caused by interrupts, softirqs and scheduling of other tasks. - Added timerlat tracer that creates a thread and measures in detail what sources of latency it has for wake ups. - Removed the "success" field of the sched_wakeup trace event. This has been hardcoded as "1" since 2015, no tooling should be looking at it now. If one exists, we can revert this commit, fix that tool and try to remove it again in the future. - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids. - New boot command line option "tp_printk_stop", as tp_printk causes trace events to write to console. When user space starts, this can easily live lock the system. Having a boot option to stop just after boot up is useful to prevent that from happening. - Have ftrace_dump_on_oops boot command line option take numbers that match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops. - Bootconfig clean ups, fixes and enhancements. - New ktest script that tests bootconfig options. - Add tracepoint_probe_register_may_exist() to register a tracepoint without triggering a WARN*() if it already exists. BPF has a path from user space that can do this. All other paths are considered a bug. - Small clean ups and fixes -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYN8YPhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qhxLAP9Mo5hHv7Hg6W7Ddv77rThm+qclsMR/ yW0P+eJpMm4+xAD8Cq03oE1DimPK+9WZBKU5rSqAkqG6CjgDRw6NlIszzQQ= =WEPR -----END PGP SIGNATURE----- Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Added option for per CPU threads to the hwlat tracer - Have hwlat tracer handle hotplug CPUs - New tracer: osnoise, that detects latency caused by interrupts, softirqs and scheduling of other tasks. - Added timerlat tracer that creates a thread and measures in detail what sources of latency it has for wake ups. - Removed the "success" field of the sched_wakeup trace event. This has been hardcoded as "1" since 2015, no tooling should be looking at it now. If one exists, we can revert this commit, fix that tool and try to remove it again in the future. - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids. - New boot command line option "tp_printk_stop", as tp_printk causes trace events to write to console. When user space starts, this can easily live lock the system. Having a boot option to stop just after boot up is useful to prevent that from happening. - Have ftrace_dump_on_oops boot command line option take numbers that match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops. - Bootconfig clean ups, fixes and enhancements. - New ktest script that tests bootconfig options. - Add tracepoint_probe_register_may_exist() to register a tracepoint without triggering a WARN*() if it already exists. BPF has a path from user space that can do this. All other paths are considered a bug. - Small clean ups and fixes * tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits) tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT tracing: Simplify & fix saved_tgids logic treewide: Add missing semicolons to __assign_str uses tracing: Change variable type as bool for clean-up trace/timerlat: Fix indentation on timerlat_main() trace/osnoise: Make 'noise' variable s64 in run_osnoise() tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing tracing: Fix spelling in osnoise tracer "interferences" -> "interference" Documentation: Fix a typo on trace/osnoise-tracer trace/osnoise: Fix return value on osnoise_init_hotplug_support trace/osnoise: Make interval u64 on osnoise_main trace/osnoise: Fix 'no previous prototype' warnings tracing: Have osnoise_main() add a quiescent state for task rcu seq_buf: Make trace_seq_putmem_hex() support data longer than 8 seq_buf: Fix overflow in seq_buf_putmem_hex() trace/osnoise: Support hotplug operations trace/hwlat: Support hotplug operations trace/hwlat: Protect kdata->kthread with get/put_online_cpus trace: Add timerlat tracer trace: Add osnoise tracer ... |
||
Linus Torvalds
|
bd31b9efbf |
SCSI misc on 20210702
This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYN7I6iYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXpRAQCkngYZ 35yQrqOxgOk2pfrysE95tHrV1MfJm2U49NFTwAEAuZutEvBUTfBF+sbcJ06r6q7i H0hkJN/Io7enFs5v3WA= =zwIa -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits) scsi: aha1740: Avoid over-read of sense buffer scsi: arcmsr: Avoid over-read of sense buffer scsi: ips: Avoid over-read of sense buffer scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe() scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame() scsi: elx: libefc: Fix less than zero comparison of a unsigned int scsi: elx: efct: Fix pointer error checking in debugfs init scsi: elx: efct: Fix is_originator return code type scsi: elx: efct: Fix link error for _bad_cmpxchg scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel() scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session() scsi: elx: efct: Fix error handling in efct_hw_init() scsi: elx: efct: Remove redundant initialization of variable lun scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected" scsi: lpfc: Fix build error in lpfc_scsi.c scsi: target: iscsi: Remove redundant continue statement scsi: qla4xxx: Remove redundant continue statement scsi: ppa: Switch to use module_parport_driver() scsi: imm: Switch to use module_parport_driver() scsi: mpt3sas: Fix error return value in _scsih_expander_add() ... |
||
Linus Torvalds
|
e72b069609 |
dma-mapping updates for Linux 5.14
- a trivivial whitespace fix (Zhen Lei) - report -EEXIST errors in add_dma_entry (Hamza Mahfooz) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmDfOS4LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYP8sBAAq1qOMdxAsOtfdl4MrUP3/LrxL9FSN6ThMeaZy3QV jUgcIEZ8To8KAXoEAu0LzIDhI6xGfAcwzavNviCPzoyT4OPzTOZQvHK4AOotruu0 qQ/Xa9iSputWNA/eV8eMLRYKRZJv0MB0su4mqn9h/ruxRENm/iBKHj30Q1JrCERQ bhbEm1Bnw6BZqjQrhn4hEKq7UVVTUeVOITDXmj4TiYQJPLsr2RFgZa1UzOx1xbvu x1HzzRNeBRjri2yIn562eIuIPno+d6NtLDPlCXD46dqd6VUclV0hYy6irooZY0uY 0gnXo35NKB1DG45he6TIm/LTKMjUh7glGLeNY6Id1ZcZJWOTf3zCSs3Z/BiX0aXK ituMdk56FjFK2aXExioUJ3heZ2lrn2WHNMeZ9wNcZ38UPJNo/dfFXXfYbsGp+TGa XsS6QbPW2CmaVz6nktM6ra0VaUNWF/dHY2IpAatMnPfGJbrvmeYLoRQZSvkGwB64 k95Vu+Q8BAydb9cjLqs5EIZLAPGPsVT6bWZWYUPkc74I7Ke8c03r6K81IwAky9mS 5EtitaDFQPOlUFwI79RLgg4I+xPEMw3u/mmYLP6JH4WhPjHWk7SeOP1p6EBLCQHE 7OJLVJxmsrG7VLK+YW7N9q7pSokij0ghpNAeYv7CtxA+tiEPuEYx39Td7+btiP6f 0KQ= =QIFu -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.14' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - a trivivial whitespace fix (Zhen Lei) - report -EEXIST errors in add_dma_entry (Hamza Mahfooz) * tag 'dma-mapping-5.14' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: report -EEXIST errors in add_dma_entry dma-mapping: remove a trailing space |
||
Linus Torvalds
|
a48ad6e7a3 |
linux-kselftest-kunit-fixes-5.14-rc1
This KUnit update for Linux 5.14-rc1 consists of fixes and features: -- add support for skipped tests -- introduce kunit_kmalloc_array/kunit_kcalloc() helpers -- add gnu_printf specifiers -- add kunit_shutdown -- add unit test for filtering suites by names -- convert lib/test_list_sort.c to use KUnit -- code organization moving default config to tools/testing/kunit -- refactor of internal parser input handling -- cleanups and updates to documentation -- code cleanup related to casts -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmDfNHgACgkQCwJExA0N QxzU5g/+ORbcbE5jhumo82xYUCGTScgIqPllNO000Vk9xydMqwx6tpY5puV0ig2z 5X7pxKEQTnype58yzY5mS1p336ryIx3q+AwEownxTW3YurXo2naQ59ZPjEvlAV+E h+DMYLzEjrxzcyJATw02uC9YLUZ3w6FPJfiXViIv93YYrtcnM0u6JpwG0yfBlI4v 4KKB2Xu4K7T90C9/ADFYFKX3mjXQl5fQwvIdtA7wS90Cgq52LKp2mvg1XEiZE5d+ 0dkTZ4Zo8TxxHt665o7vfnUjQQNmh45iGlW65wONxfAPb8BPoneGjVKWQTnN0Hor W+93ZPbMuFMSSKJuoHY9U7sP5VySKvaiIYaGdi6prnZZu0zUabKnLZ6FOy7kEdfs v09ulCBTVLslixVgNcp/kD9T+G/SXwCF5YBMAiMDQ0GNfUqlFtBkEA3gd44KwMI0 KwCcOgUSiaCkqyzOz/VeQsu/nhA5jdMO0KjiAs7Z3e7r7O/qKFs/ll7hZgDNCWSC q8eIrcBkSL0EGgXR1iZ4AtGm8op6KKd4ACBM8NdtTyoGFl1npZOgZnHoIsy35G9K 9mhc7eXSoaDGqy9dONL1Tc8Neg7qLTXQNp2radqsnAAgNPUrJuC7+8YC+DdIsjBH W7OyMjpfbwPws5rP4CS+JdwL+nQprKXZvFIhWGYhkDK44MbOngw= =5QAv -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-kunit-fixes-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit update from Shuah Khan: "Fixes and features: - add support for skipped tests - introduce kunit_kmalloc_array/kunit_kcalloc() helpers - add gnu_printf specifiers - add kunit_shutdown - add unit test for filtering suites by names - convert lib/test_list_sort.c to use KUnit - code organization moving default config to tools/testing/kunit - refactor of internal parser input handling - cleanups and updates to documentation - code cleanup related to casts" * tag 'linux-kselftest-kunit-fixes-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits) kunit: add unit test for filtering suites by names kasan: test: make use of kunit_skip() kunit: test: Add example tests which are always skipped kunit: tool: Support skipped tests in kunit_tool kunit: Support skipped tests thunderbolt: test: Reinstate a few casts of bitfields kunit: tool: internal refactor of parser input handling lib/test: convert lib/test_list_sort.c to use KUnit kunit: introduce kunit_kmalloc_array/kunit_kcalloc() helpers kunit: Remove the unused all_tests.config kunit: Move default config from arch/um -> tools/testing/kunit kunit: arch/um/configs: Enable KUNIT_ALL_TESTS by default kunit: Add gnu_printf specifiers lib/cmdline_kunit: Remove a cast which are no-longer required kernel/sysctl-test: Remove some casts which are no-longer required thunderbolt: test: Remove some casts which are no longer required mmc: sdhci-of-aspeed: Remove some unnecessary casts from KUnit tests iio: Remove a cast in iio-test-format which is no longer required device property: Remove some casts in property-entry-test Documentation: kunit: Clean up some string casts in examples ... |
||
Linus Torvalds
|
019b3fd94b |
powerpc updates for 5.14
- A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, Zhen Lei. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7 VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+ 7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+ 8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC cttP43kHZ7ljTw== =zo4d -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ... |
||
Linus Torvalds
|
71bd934101 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ... |
||
Xuewen Yan
|
3e1493f463 |
sched/uclamp: Ignore max aggregation if rq is idle
When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.
This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.
Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.
Fixes:
|
||
Odin Ugedal
|
72d0ad7cb5 |
sched/fair: Fix CFS bandwidth hrtimer expiry type
The time remaining until expiry of the refresh_timer can be negative. Casting the type to an unsigned 64-bit value will cause integer underflow, making the runtime_refresh_within return false instead of true. These situations are rare, but they do happen. This does not cause user-facing issues or errors; other than possibly unthrottling cfs_rq's using runtime from the previous period(s), making the CFS bandwidth enforcement less strict in those (special) situations. Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al |
||
Vincent Guittot
|
ceb6ba45dc |
sched/fair: Sync load_sum with load_avg after dequeue
commit |