Commit graph

31902 commits

Author SHA1 Message Date
Ingo Molnar
1e5f8a3085 Linux 5.5-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS
 ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1
 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y
 oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG
 j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I
 jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf
 33WGSyI=
 =iEwx
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc3' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:41:37 +01:00
Linus Torvalds
78bac77b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

 7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

 8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

16) Reject invalid MTU values in stmmac, from Jose Abreu.

17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
  sfc: Include XDP packet headroom in buffer step size.
  sfc: fix channel allocation with brute force
  net: dst: Force 4-byte alignment of dst_metrics
  selftests: pmtu: fix init mtu value in description
  hv_netvsc: Fix unwanted rx_table reset
  net: phy: ensure that phy IDs are correctly typed
  mod_devicetable: fix PHY module format
  qede: Disable hardware gro when xdp prog is installed
  net: ena: fix issues in setting interrupt moderation params in ethtool
  net: ena: fix default tx interrupt moderation interval
  net/smc: unregister ib devices in reboot_event
  net: stmmac: platform: Fix MDIO init for platforms without PHY
  llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
  net: hisilicon: Fix a BUG trigered by wrong bytes_compl
  net: dsa: ksz: use common define for tag len
  s390/qeth: don't return -ENOTSUPP to userspace
  s390/qeth: fix promiscuous mode after reset
  s390/qeth: handle error due to unsupported transport mode
  cxgb4: fix refcount init for TC-MQPRIO offload
  tc-testing: initial tdc selftests for cls_u32
  ...
2019-12-22 09:54:33 -08:00
Linus Torvalds
b8e382a185 Various tracing fixes:
- Fix memory leak on error path of process_system_preds()
  - Lock inversion fix with updating tgid recording option
  - Fix histogram compare function on big endian machines
  - Fix histogram trigger function on big endian machines
  - Make trace_printk() irq sync on init for kprobe selftest correctness
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXf6MRxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlw6AQCny2YeASymmOjDqh9/G53UdhO539Y2
 oL/2nQ8B9T9KWgD6AmmohhbX+TS9l5Nwy2/bKmRgADZ7u+2XLM2f2mYR2Ag=
 =D7hI
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix memory leak on error path of process_system_preds()

 - Lock inversion fix with updating tgid recording option

 - Fix histogram compare function on big endian machines

 - Fix histogram trigger function on big endian machines

 - Make trace_printk() irq sync on init for kprobe selftest correctness

* tag 'trace-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix endianness bug in histogram trigger
  samples/trace_printk: Wait for IRQ work to finish
  tracing: Fix lock inversion in trace_event_enable_tgid_record()
  tracing: Have the histogram compare functions convert to u64 first
  tracing: Avoid memory leak in process_system_preds()
2019-12-21 15:16:56 -08:00
Sven Schnelle
fe6e096a5b tracing: Fix endianness bug in histogram trigger
At least on PA-RISC and s390 synthetic histogram triggers are failing
selftests because trace_event_raw_event_synth() always writes a 64 bit
values, but the reader expects a field->size sized value. On little endian
machines this doesn't hurt, but on big endian this makes the reader always
read zero values.

Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com

Cc: stable@vger.kernel.org
Fixes: 4b147936fa ("tracing: Add support for 'synthetic' events")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-21 16:08:59 -05:00
Prateek Sood
3a53acf1d9 tracing: Fix lock inversion in trace_event_enable_tgid_record()
Task T2                             Task T3
trace_options_core_write()            subsystem_open()

 mutex_lock(trace_types_lock)           mutex_lock(event_mutex)

 set_tracer_flag()

   trace_event_enable_tgid_record()       mutex_lock(trace_types_lock)

    mutex_lock(event_mutex)

This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.

Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com

Cc: stable@vger.kernel.org
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-21 16:05:13 -05:00
Linus Torvalds
fd7a6d2b8f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: a (rare) PSI crash fix, a CPU affinity related balancing
  fix, and a toning down of active migration attempts"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cfs: fix spurious active migration
  sched/fair: Fix find_idlest_group() to handle CPU affinity
  psi: Fix a division error in psi poll()
  sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
2019-12-21 10:52:10 -08:00
Linus Torvalds
c4ff10efe8 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: a BTS fix, a PT NMI handling fix, a PMU sysfs fix and an
  SRCU annotation"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Add SRCU annotation for pmus list walk
  perf/x86/intel: Fix PT PMI handling
  perf/x86/intel/bts: Fix the use of page_private()
  perf/x86: Fix potential out-of-bounds access
2019-12-21 10:51:00 -08:00
Steven Rostedt (VMware)
106f41f5a3 tracing: Have the histogram compare functions convert to u64 first
The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.

To fix this, first derference the value as a u64 then convert it to the type
being compared.

Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 08d43a5fa0 ("tracing: Add lock-free tracing_map")
Acked-by: Tom Zanussi <zanussi@kernel.org>
Reported-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-19 18:26:00 -05:00
Keita Suzuki
79e65c27f0 tracing: Avoid memory leak in process_system_preds()
When failing in the allocation of filter_item, process_system_preds()
goes to fail_mem, where the allocated filter is freed.

However, this leads to memory leak of filter->filter_string and
filter->prog, which is allocated before and in process_preds().
This bug has been detected by kmemleak as well.

Fix this by changing kfree to __free_fiter.

unreferenced object 0xffff8880658007c0 (size 32):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30  common_pid  > 10
    00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00  ........es......
  backtrace:
    [<0000000067441602>] kstrdup+0x2d/0x60
    [<00000000141cf7b7>] apply_subsystem_event_filter+0x378/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888060c22d00 (size 64):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff  ...........A....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000b8c1b109>] process_preds+0x243/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888041d7e800 (size 512):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00  p...............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001e04af34>] process_preds+0x71a/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20191211091258.11310-1-keitasuzuki.park@sslab.ics.keio.ac.jp

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 404a3add43 ("tracing: Only add filter list when needed")
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-19 18:24:17 -05:00
Daniel Borkmann
cc52d9140a bpf: Fix record_func_key to perform backtracking on r3
While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort
implementation, I noticed a strange BPF SNAT engine behavior from time to
time. In some cases it would do the correct SNAT/DNAT service translation,
but at a random point in time it would just stop and perform an unexpected
translation after SYN, SYN/ACK and stack would send a RST back. While initially
assuming that there is some sort of a race condition in BPF code, adding
trace_printk()s for debugging purposes at some point seemed to have resolved
the issue auto-magically.

Digging deeper on this Heisenbug and reducing the trace_printk() calls to
an absolute minimum, it turns out that a single call would suffice to
trigger / not trigger the seen RST issue, even though the logic of the
program itself remains unchanged. Turns out the single call changed verifier
pruning behavior to get everything to work. Reconstructing a minimal test
case, the incorrect JIT dump looked as follows:

  # bpftool p d j i 11346
  0xffffffffc0cba96c:
  [...]
    21:   movzbq 0x30(%rdi),%rax
    26:   cmp    $0xd,%rax
    2a:   je     0x000000000000003a
    2c:   xor    %edx,%edx
    2e:   movabs $0xffff89cc74e85800,%rsi
    38:   jmp    0x0000000000000049
    3a:   mov    $0x2,%edx
    3f:   movabs $0xffff89cc74e85800,%rsi
    49:   mov    -0x224(%rbp),%eax
    4f:   cmp    $0x20,%eax
    52:   ja     0x0000000000000062
    54:   add    $0x1,%eax
    57:   mov    %eax,-0x224(%rbp)
    5d:   jmpq   0xffffffffffff6911
    62:   mov    $0x1,%eax
  [...]

Hence, unexpectedly, JIT emitted a direct jump even though retpoline based
one would have been needed since in line 2c and 3a we have different slot
keys in BPF reg r3. Verifier log of the test case reveals what happened:

  0: (b7) r0 = 14
  1: (73) *(u8 *)(r1 +48) = r0
  2: (71) r0 = *(u8 *)(r1 +48)
  3: (15) if r0 == 0xd goto pc+4
   R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (b7) r3 = 0
  5: (18) r2 = 0xffff89cc74d54a00
  7: (05) goto pc+3
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0
  8: (b7) r3 = 2
  9: (18) r2 = 0xffff89cc74d54a00
  11: safe
  processed 13 insns (limit 1000000) [...]

Second branch is pruned by verifier since considered safe, but issue is that
record_func_key() couldn't have seen the index in line 3a and therefore
decided that emitting a direct jump at this location was okay.

Fix this by reusing our backtracking logic for precise scalar verification
in order to prevent pruning on the slot key. This means verifier will track
content of r3 all the way backwards and only prune if both scalars were
unknown in state equivalence check and therefore poisoned in the first place
in record_func_key(). The range is [x,x] in record_func_key() case since
the slot always would have to be constant immediate. Correct verification
after fix:

  0: (b7) r0 = 14
  1: (73) *(u8 *)(r1 +48) = r0
  2: (71) r0 = *(u8 *)(r1 +48)
  3: (15) if r0 == 0xd goto pc+4
   R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (b7) r3 = 0
  5: (18) r2 = 0x0
  7: (05) goto pc+3
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0
  8: (b7) r3 = 2
  9: (18) r2 = 0x0
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  processed 15 insns (limit 1000000) [...]

And correct corresponding JIT dump:

  # bpftool p d j i 11
  0xffffffffc0dc34c4:
  [...]
    21:	  movzbq 0x30(%rdi),%rax
    26:	  cmp    $0xd,%rax
    2a:	  je     0x000000000000003a
    2c:	  xor    %edx,%edx
    2e:	  movabs $0xffff9928b4c02200,%rsi
    38:	  jmp    0x0000000000000049
    3a:	  mov    $0x2,%edx
    3f:	  movabs $0xffff9928b4c02200,%rsi
    49:	  cmp    $0x4,%rdx
    4d:	  jae    0x0000000000000093
    4f:	  and    $0x3,%edx
    52:	  mov    %edx,%edx
    54:	  cmp    %edx,0x24(%rsi)
    57:	  jbe    0x0000000000000093
    59:	  mov    -0x224(%rbp),%eax
    5f:	  cmp    $0x20,%eax
    62:	  ja     0x0000000000000093
    64:	  add    $0x1,%eax
    67:	  mov    %eax,-0x224(%rbp)
    6d:	  mov    0x110(%rsi,%rdx,8),%rax
    75:	  test   %rax,%rax
    78:	  je     0x0000000000000093
    7a:	  mov    0x30(%rax),%rax
    7e:	  add    $0x19,%rax
    82:   callq  0x000000000000008e
    87:   pause
    89:   lfence
    8c:   jmp    0x0000000000000087
    8e:   mov    %rax,(%rsp)
    92:   retq
    93:   mov    $0x1,%eax
  [...]

Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since
backtracking is enabled under former (direct jumps as well, but use different
test). In case of only tracking different map pointers as in c93552c443 ("bpf:
properly enforce index mask to prevent out-of-bounds speculation"), pruning
cannot make such short-cuts, neither if there are paths with scalar and non-scalar
types as r3. mark_chain_precision() is only needed after we know that
register_is_const(). If it was not the case, we already poison the key on first
path and non-const key in later paths are not matching the scalar range in regsafe()
either. Cilium NodePort testing passes fine as well now. Note, released kernels
not affected.

Fixes: d2e4c1e6c2 ("bpf: Constant map key tracking for prog array pokes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net
2019-12-19 13:39:22 -08:00
Linus Torvalds
5f096c0ecd Power management fix for 5.5-rc3
Fix a problem related to CPU offline/online and cpufreq governors
 that in some system configurations may lead to a system-wide
 deadlock during CPU online.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl37lO4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxrUoP+wfiXQ8k3GncyD8NXY1/GhEmqB95v/f4
 clbn0xNu2WaQB3UdO/LkouL0+IaVw/i8PAt0cdeuEjKSgbPT8HHCkN28J0oia02H
 HD7JzdiUZh7ONG1eq9Z/7ckSXBflZaUIjzTi6C1axX8reEzGVVuy5LNhc+0iWjsh
 +mr9hRymgsRcGHPTN+CKi8Qhb29PPvVRt4YbghL0moQUDYewYENb/JBYJIjhgChG
 vWpHX6Kra99uveTMkAN5GVcgZP5b/RiM5E+cCpLEZDTSUnCIuTPM38ATGDTpadpW
 DSDuu+vEEmFu7RHO/lheN92n2fnTgjGpl5d6L5qwGCSzm0GeYZNo84RDEFCWwXZh
 5sY8oz+1wA2MIXV3f1bXYTDMWWQSitSVQ3A9OeKLlprGcZhG/66T2QB7aTut/D/R
 devyNt+xjMoqKcA7AaeVZ6XqUSHMTSCak88okXbKapJq6qkA6QkVsga+LArlRa0c
 xdA6lma2ICPG7Q2ta2G4nHekHd9mDSaR7aFkcKoApOkIDKUY9j47pI3KWSgVFCu3
 D6by7F7CCWHfp0Vw22eGuCQokBsLvhMsa7qwFlxKoxC6iJADANzBVkRzaH70wu2w
 QP2Xu9+WndyRJrrmIQS5iTrClUfgverOgXTJ5OH2jFm+Oi4r6quTKF83rturnDBr
 J8OK4odeh6E9
 =+MQE
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a problem related to CPU offline/online and cpufreq governors that
  in some system configurations may lead to a system-wide deadlock
  during CPU online"

* tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Avoid leaving stale IRQ work items during CPU offline
2019-12-19 08:09:43 -08:00
Linus Torvalds
9e8a0d5ff8 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Tone down mutex debugging complaints, and annotate/fix spinlock
  debugging data accesses for KCSAN"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"
  locking/spinlock/debug: Fix various data races
2019-12-17 11:00:46 -08:00
Daniel Borkmann
e47304232b bpf: Fix cgroup local storage prog tracking
Recently noticed that we're tracking programs related to local storage maps
through their prog pointer. This is a wrong assumption since the prog pointer
can still change throughout the verification process, for example, whenever
bpf_patch_insn_single() is called.

Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
and the map would therefore remain in busy state forever. Fix this by using
the prog's aux pointer which is stable throughout verification and beyond.

Fixes: de9cbbaadb ("bpf: introduce cgroup storage maps")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net
2019-12-17 08:58:02 -08:00
Yangtao Li
a5e37de90e stop_machine: remove try_stop_cpus helper
try_stop_cpus is not used after this:

commit c190c3b16c ("rcu: Switch synchronize_sched_expedited() to
stop_one_cpu()")

So remove it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191214195107.26480-1-tiny.windzz@gmail.com
2019-12-17 13:32:51 +01:00
Peng Wang
d040e0734f schied/fair: Skip calculating @contrib without load
Because of the:

	if (!load)
		runnable = running = 0;

clause in ___update_load_sum(), all the actual users of @contrib in
accumulate_sum():

	if (load)
		sa->load_sum += load * contrib;
	if (runnable)
		sa->runnable_load_sum += runnable * contrib;
	if (running)
		sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;

don't happen, and therefore we don't care what @contrib actually is and
calculating it is pointless.

If we count the times when @load equals zero and not as below:

	if (load) {
		load_is_not_zero_count++;
		contrib = __accumulate_pelt_segments(periods,
				1024 - sa->period_contrib,delta);
	} else
		load_is_zero_count++;

As we can see, load_is_zero_count is much bigger than
load_is_zero_count, and the gap is gradually widening:

	load_is_zero_count:            6016044 times
	load_is_not_zero_count:         244316 times
	19:50:43 up 1 min,  1 user,  load average: 0.09, 0.06, 0.02

	load_is_zero_count:            7956168 times
	load_is_not_zero_count:         261472 times
	19:51:42 up 2 min,  1 user,  load average: 0.03, 0.05, 0.01

	load_is_zero_count:           10199896 times
	load_is_not_zero_count:         278364 times
	19:52:51 up 3 min,  1 user,  load average: 0.06, 0.05, 0.01

	load_is_zero_count:           14333700 times
	load_is_not_zero_count:         318424 times
	19:54:53 up 5 min,  1 user,  load average: 0.01, 0.03, 0.00

Perhaps we can gain some performance advantage by saving these
unnecessary calculation.

Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot < vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1576208740-35609-1-git-send-email-rocking@linux.alibaba.com
2019-12-17 13:32:51 +01:00
Cheng Jian
60588bfa22 sched/fair: Optimize select_idle_cpu
select_idle_cpu() will scan the LLC domain for idle CPUs,
it's always expensive. so the next commit :

	1ad3aaf3fc ("sched/core: Implement new approach to scale select_idle_cpu()")

introduces a way to limit how many CPUs we scan.

But it consume some CPUs out of 'nr' that are not allowed
for the task and thus waste our attempts. The function
always return nr_cpumask_bits, and we can't find a CPU
which our task is allowed to run.

Cpumask may be too big, similar to select_idle_core(), use
per_cpu_ptr 'select_idle_mask' to prevent stack overflow.

Fixes: 1ad3aaf3fc ("sched/core: Implement new approach to scale select_idle_cpu()")
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com
2019-12-17 13:32:51 +01:00
Peter Zijlstra
45178ac0ce cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.

Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).

It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:

	BP				AP

	bringup_cpu()
	  __cpu_up(cpu, idle)	 -->	start_secondary()
					...
					cpu_startup_entry()
	  bringup_wait_for_ap()
	    wait_for_ap_thread() <--	  cpuhp_online_idle()
					  while (1)
					    do_idle()

					... available to run kthreads ...

	    stop_machine_unpark()
	      stopper->enable = true;

Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.

Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
2019-12-17 13:32:50 +01:00
Oleg Nesterov
cde6519450 sched/wait: fix ___wait_var_event(exclusive)
init_wait_var_entry() forgets to initialize wq_entry->flags.

Currently not a problem, we don't have wait_var_event_exclusive().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20191210191902.GB14449@redhat.com
2019-12-17 13:32:50 +01:00
Frederic Weisbecker
5443a0be61 sched: Use fair:prio_changed() instead of ad-hoc implementation
set_user_nice() implements its own version of fair::prio_changed() and
therefore misses a specific optimization towards nohz_full CPUs that
avoid sending an resched IPI to a reniced task running alone. Use the
proper callback instead.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20191203160106.18806-3-frederic@kernel.org
2019-12-17 13:32:50 +01:00
Frederic Weisbecker
7c2e8bbd87 sched: Spare resched IPI when prio changes on a single fair task
The runqueue of a fair task being remotely reniced is going to get a
resched IPI in order to reassess which task should be the current
running on the CPU. However that evaluation is useless if the fair task
is running alone, in which case we can spare that IPI, preventing
nohz_full CPUs from being disturbed.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20191203160106.18806-2-frederic@kernel.org
2019-12-17 13:32:50 +01:00
Vincent Guittot
6cf82d559e sched/cfs: fix spurious active migration
The load balance can fail to find a suitable task during the periodic check
because  the imbalance is smaller than half of the load of the waiting
tasks. This results in the increase of the number of failed load balance,
which can end up to start an active migration. This active migration is
useless because the current running task is not a better choice than the
waiting ones. In fact, the current task was probably not running but
waiting for the CPU during one of the previous attempts and it had already
not been selected.

When load balance fails too many times to migrate a task, we should relax
the contraint on the maximum load of the tasks that can be migrated
similarly to what is done with cache hotness.

Before the rework, load balance used to set the imbalance to the average
load_per_task in order to mitigate such situation. This increased the
likelihood of migrating a task but also of selecting a larger task than
needed while more appropriate ones were in the list.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1575036287-6052-1-git-send-email-vincent.guittot@linaro.org
2019-12-17 13:32:48 +01:00
Vincent Guittot
7ed735c331 sched/fair: Fix find_idlest_group() to handle CPU affinity
Because of CPU affinity, the local group can be skipped which breaks the
assumption that statistics are always collected for local group. With
uninitialized local_sgs, the comparison is meaningless and the behavior
unpredictable. This can even end up to use local pointer which is to
NULL in this case.

If the local group has been skipped because of CPU affinity, we return
the idlest group.

Fixes: 57abff067a ("sched/fair: Rework find_idlest_group()")
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: mingo@redhat.com
Cc: mgorman@suse.de
Cc: juri.lelli@redhat.com
Cc: dietmar.eggemann@arm.com
Cc: bsegall@google.com
Cc: qais.yousef@arm.com
Link: https://lkml.kernel.org/r/1575483700-22153-1-git-send-email-vincent.guittot@linaro.org
2019-12-17 13:32:48 +01:00
Johannes Weiner
c3466952ca psi: Fix a division error in psi poll()
The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.

Use div64_u64.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-3-hannes@cmpxchg.org
2019-12-17 13:32:48 +01:00
Johannes Weiner
3dfbe25c27 sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
Jingfeng reports rare div0 crashes in psi on systems with some uptime:

[58914.066423] divide error: 0000 [#1] SMP
[58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O)
[58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 #1
[58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019
[58914.102728] Workqueue: events psi_update_work
[58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000
[58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330
[58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246
[58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640
[58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e
[58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000
[58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000
[58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78
[58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000
[58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0
[58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58914.200360] PKRU: 55555554
[58914.203221] Stack:
[58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8
[58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000
[58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[58914.228734] Call Trace:
[58914.231337] [] psi_update_work+0x22/0x60
[58914.237067] [] process_one_work+0x189/0x420
[58914.243063] [] worker_thread+0x4e/0x4b0
[58914.248701] [] ? process_one_work+0x420/0x420
[58914.254869] [] kthread+0xe6/0x100
[58914.259994] [] ? kthread_park+0x60/0x60
[58914.265640] [] ret_from_fork+0x39/0x50
[58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1
[58914.279691] RIP [] psi_update_stats+0x1c1/0x330

The crashing instruction is trying to divide the observed stall time
by the sampling period. The period, stored in R8, is not 0, but we are
dividing by the lower 32 bits only, which are all 0 in this instance.

We could switch to a 64-bit division, but the period shouldn't be that
big in the first place. It's the time between the last update and the
next scheduled one, and so should always be around 2s and comfortably
fit into 32 bits.

The bug is in the initialization of new cgroups: we schedule the first
sampling event in a cgroup as an offset of sched_clock(), but fail to
initialize the last_update timestamp, and it defaults to 0. That
results in a bogusly large sampling period the first time we run the
sampling code, and consequently we underreport pressure for the first
2s of a cgroup's life. But worse, if sched_clock() is sufficiently
advanced on the system, and the user gets unlucky, the period's lower
32 bits can all be 0 and the sampling division will crash.

Fix this by initializing the last update timestamp to the creation
time of the cgroup, thus correctly marking the start of the first
pressure sampling period in a new cgroup.

Reported-by: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-2-hannes@cmpxchg.org
2019-12-17 13:32:47 +01:00
Sebastian Andrzej Siewior
9f0bff1180 perf/core: Add SRCU annotation for pmus list walk
Since commit
   28875945ba ("rcu: Add support for consolidated-RCU reader checking")

there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.

Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lkml.kernel.org/r/20191119121429.zhcubzdhm672zasg@linutronix.de
2019-12-17 13:32:46 +01:00
Daniel Borkmann
a2ea07465c bpf: Fix missing prog untrack in release_maps
Commit da765a2f59 ("bpf: Add poke dependency tracking for prog array
maps") wrongly assumed that in case of prog load errors, we're cleaning
up all program tracking via bpf_free_used_maps().

However, it can happen that we're still at the point where we didn't copy
map pointers into the prog's aux section such that env->prog->aux->used_maps
is still zero, running into a UAF. In such case, the verifier has similar
release_maps() helper that drops references to used maps from its env.

Consolidate the release code into __bpf_free_used_maps() and call it from
all sides to fix it.

Fixes: da765a2f59 ("bpf: Add poke dependency tracking for prog array maps")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1c2909484ca524ae9f55109b06f22b6213e76376.1576514756.git.daniel@iogearbox.net
2019-12-16 10:59:29 -08:00
Linus Torvalds
22ff311af9 treewide conversion from FIELD_SIZEOF() to sizeof_field()
-----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl3umDgWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlvsD/49R12HK7UzTxNTrcpvbadJ4t7j
 j/qJvjMerW7iVNAPOoNAOePUa21+y3rI1AZPvoPyzIqp1Bf2eOICf5SdisG2cG+O
 X0A8EKWvS0SSQWSKaT6udUKJ3nBJItwvOvQ5B58KQzcOj3S4X7B9iVBWgieMHrzz
 urkZm7pqowrZB3wuF8keRtli5IZaoiCwzApy48Qrn70G3OeXymknFbpHTDwIAiGw
 RiE5Xh0R4EzQdsYyCgjR8U56gBchadAmj8BUJU0ppMnOFMyIAG670hNLrs0L3roP
 8TOIeyb993ZC5GZaMlnR8mz0jfibfkPa3Z85VAsVyQSPaOQldwc9j8TGBqD5Gfat
 1PjOU5RVwma0pH5xTPOeevWPQpIK9KovQpQYqMMN9GMxOEx96IOUjwTrnNK2xWoN
 UGyOVlESFGoniClhCiKYzPSrYOjlIBk5ovf15PdTe+bwyUDMfyfy5CZV88OS2DHz
 ZBZvpLrH/EMW9zJ+FqMTp0C4s4wa2Ioid3bSh6XuNUTtltKSjp71eUja8ZEz+2sd
 5AGstCC+hYqxaEk+6/851pfkQ9sbBjwuGtNrtX+pqreiLUvWLhQ0yUj6cLXlEQNH
 aucjCukCjI+4lMzofeaQ2LbNhtff4YsfO4b1Ye8maoDdHjzUVL57n3bTOxKhdzbt
 y6FM3lApOjk3OyaTJQ==
 =YU4A
 -----END PGP SIGNATURE-----

Merge tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull FIELD_SIZEOF conversion from Kees Cook:
 "A mostly mechanical treewide conversion from FIELD_SIZEOF() to
  sizeof_field(). This avoids the redundancy of having 2 macros
  (actually 3) doing the same thing, and consolidates on sizeof_field().
  While "field" is not an accurate name, it is the common name used in
  the kernel, and doesn't result in any unintended innuendo.

  As there are still users of FIELD_SIZEOF() in -next, I will clean up
  those during this coming development cycle and send the final old
  macro removal patch at that time"

* tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use sizeof_field() macro
  MIPS: OCTEON: Replace SIZEOF_FIELD() macro
2019-12-13 14:02:12 -08:00
Rafael J. Wysocki
85572c2c4a cpufreq: Avoid leaving stale IRQ work items during CPU offline
The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.

Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.

If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point.  Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever.  In particular, a system-wide
deadlock may occur during CPU online as a result of that.

The failing scenario is as follows.  CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu).  It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline.  The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above.  Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3.  When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.

To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.

Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.

Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.

Fixes: 99d14d0e16 ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-12 17:59:43 +01:00
Alexei Starovoitov
b91e014f07 bpf: Make BPF trampoline use register_ftrace_direct() API
Make BPF trampoline attach its generated assembly code to kernel functions via
register_ftrace_direct() API. It helps ftrace-based tracers co-exist with BPF
trampoline on the same kernel function. It also switches attaching logic from
arch specific text_poke to generic ftrace that is available on many
architectures. text_poke is still necessary for bpf-to-bpf attach and for
bpf_tail_call optimization.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191209000114.1876138-3-ast@kernel.org
2019-12-11 15:18:08 -08:00
Linus Torvalds
6674fdb25a This contains 3 changes:
- Removal of code I accidentally applied when doing a minor fix up
    to a patch, and then using "git commit -a --amend", which pulled
    in some other changes I was playing with.
 
  - Remove an used variable in trace_events_inject code
 
  - Fix to function graph tracer when it traces a ftrace direct function.
    It will now ignore tracing a function that has a ftrace direct
    tramploine attached. This is needed for eBPF to use the ftrace direct
    code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXfD/thQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoo2AP4j7ONw7BTmMyo+GdYqPPntBeDnClHK
 vfMKrgK1j5BxYgEA7LgkwuUT9bcyLjfJVcyfeW67rB2PtmovKTWnKihFOwI=
 =DZ6N
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Remove code I accidentally applied when doing a minor fix up to a
   patch, and then using "git commit -a --amend", which pulled in some
   other changes I was playing with.

 - Remove an used variable in trace_events_inject code

 - Fix function graph tracer when it traces a ftrace direct function.
   It will now ignore tracing a function that has a ftrace direct
   tramploine attached. This is needed for eBPF to use the ftrace direct
   code.

* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix function_graph tracer interaction with BPF trampoline
  tracing: remove set but not used variable 'buffer'
  module: Remove accidental change of module_enable_x()
2019-12-11 12:22:38 -08:00
Arnd Bergmann
4c80c7bc58 bpf: Fix build in minimal configurations, again
Building with -Werror showed another failure:

kernel/bpf/btf.c: In function 'btf_get_prog_ctx_type.isra.31':
kernel/bpf/btf.c:3508:63: error: array subscript 0 is above array bounds of 'u8[0]' {aka 'unsigned char[0]'} [-Werror=array-bounds]
  ctx_type = btf_type_member(conv_struct) + bpf_ctx_convert_map[prog_type] * 2;

I don't actually understand why the array is empty, but a similar
fix has addressed a related problem, so I suppose we can do the
same thing here.

Fixes: ce27709b81 ("bpf: Fix build in minimal configurations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191210203553.2941035-1-arnd@arndb.de
2019-12-11 13:57:26 +01:00
Davidlohr Bueso
c571b72e2b Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"
This ended up causing some noise in places such as rxrpc running in softirq.

The warning is misleading in this case as the mutex trylock and unlock
operations are done within the same context; and therefore we need not
worry about the PI-boosting issues that comes along with no single-owner
lock guarantees.

While we don't want to support this in mutexes, there is no way out of
this yet; so lets get rid of the WARNs for now, as it is only fair to
code that has historically relied on non-preemptible softirq guarantees.
In addition, changing the lock type is also unviable: exclusive rwsems
have the same issue (just not the WARN_ON) and counting semaphores
would introduce a performance hit as mutexes are a lot more optimized.

This reverts:

    a0855d24fc: ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")

Fixes: a0855d24fc: ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-afs@lists.infradead.org
Cc: linux-fsdevel@vger.kernel.org
Cc: will@kernel.org
Link: https://lkml.kernel.org/r/20191210220523.28540-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-11 00:27:43 +01:00
Alexei Starovoitov
ff205766db ftrace: Fix function_graph tracer interaction with BPF trampoline
Depending on type of BPF programs served by BPF trampoline it can call original
function. In such case the trampoline will skip one stack frame while
returning. That will confuse function_graph tracer and will cause crashes with
bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10 13:53:59 -05:00
YueHaibing
a61f810567 tracing: remove set but not used variable 'buffer'
kernel/trace/trace_events_inject.c: In function trace_inject_entry:
kernel/trace/trace_events_inject.c:20:22: warning: variable buffer set but not used [-Wunused-but-set-variable]

It is never used, so remove it.

Link: http://lkml.kernel.org/r/20191207034409.25668-1-yuehaibing@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10 13:53:51 -05:00
Steven Rostedt (VMware)
af74262337 module: Remove accidental change of module_enable_x()
When pulling in Divya Indi's patch, I made a minor fix to remove unneeded
braces. I commited my fix up via "git commit -a --amend". Unfortunately, I
didn't realize I had some changes I was testing in the module code, and
those changes were applied to Divya's patch as well.

This reverts the accidental updates to the module code.

Cc: Jessica Yu <jeyu@kernel.org>
Cc: Divya Indi <divya.indi@oracle.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e585e6469d ("tracing: Verify if trace array exists before destroying it.")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10 13:53:43 -05:00
Linus Torvalds
184b8f7f91 pr_warning() removal for 5.5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl3uHtsACgkQUqAMR0iA
 lPKN3g/9HG6k7NIVtArCy/0kdxLOCr1JZp0EhOCexvCFLSOXInJ82izQVblOA+XE
 +1AceBqF5Akg23u/oLH9tSXBjMULemTyhm7Pnnopn1+bX/bfF+nN6027ltH8ncPY
 25oOP22ulUko3smI2yj2/gmRG5J6z/bUxe/4BMBATsj5YKJs3XGOHtfj2rgVd2qc
 HzPPirH82uOj8sBYxTMeq/+QpCzfJ5HdXXkcJGDuzEW078vb8eRgpcXIqscTBB5Z
 pTsG+Io5/RzfKIDWLs7Eqgg9qSzcGrpXZxPsWpCXXL9nArdeA78ZBLIreDSZIpEt
 QBQMT8Yxy34oW1npw1tFgulxZFjF6np2eg+3a6VX3vqo7DIIAU0gqlTRCde30tNW
 RedFSZIKrlfKfVUJXcBV9sNU57vUx7WwURPnlQxyMmbp0ryf6vhxT3ybPhf8yFXk
 WIKf7PngxLxMRtkRL5rZRDAA6z3/SPg6WkWFzDa/jZKCRRob/uM/35GxWzBk4xR0
 MhCesVDCpM1oB+qgJkJRhyHhddzu3nafxtpjBnrKOUHke+qF5u36BXFDSZZHUSkz
 VXshmaaPcWfHOl2DBGL8SJmYartr/ASvd5TPUbfnqYM1h7+wkB1gZkX8MgSHzc9b
 EYPfmipWh9LsZ1OjSJoFkMIB5bOIaYefOdSUyPDlJwhRuliboWE=
 =d5HL
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.5-pr-warning-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull pr_warning() removal from Petr Mladek.

 - Final removal of the unused pr_warning() alias.

You're supposed to use just "pr_warn()" in the kernel.

* tag 'printk-for-5.5-pr-warning-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  checkpatch: Drop pr_warning check
  printk: Drop pr_warning definition
  Fix up for "printk: Drop pr_warning definition"
  workqueue: Use pr_warn instead of pr_warning
2019-12-09 11:48:21 -08:00
Pankaj Bharadiya
c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Linus Torvalds
95e6ba5133 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) More jumbo frame fixes in r8169, from Heiner Kallweit.

 2) Fix bpf build in minimal configuration, from Alexei Starovoitov.

 3) Use after free in slcan driver, from Jouni Hogander.

 4) Flower classifier port ranges don't work properly in the HW offload
    case, from Yoshiki Komachi.

 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.

 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.

 7) Fix flow dissection in dsa TX path, from Alexander Lobakin.

 8) Stale syncookie timestampe fixes from Guillaume Nault.

[ Did an evil merge to silence a warning introduced by this pull - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  r8169: fix rtl_hw_jumbo_disable for RTL8168evl
  net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
  r8169: add missing RX enabling for WoL on RTL8125
  vhost/vsock: accept only packets with the right dst_cid
  net: phy: dp83867: fix hfs boot in rgmii mode
  net: ethernet: ti: cpsw: fix extra rx interrupt
  inet: protect against too small mtu values.
  gre: refetch erspan header from skb->data after pskb_may_pull()
  pppoe: remove redundant BUG_ON() check in pppoe_pernet
  tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
  tcp: tighten acceptance of ACKs not matching a child socket
  tcp: fix rejected syncookies due to stale timestamps
  lpc_eth: kernel BUG on remove
  tcp: md5: fix potential overestimation of TCP option space
  net: sched: allow indirect blocks to bind to clsact in TC
  net: core: rename indirect block ingress cb function
  net-sysfs: Call dev_hold always in netdev_queue_add_kobject
  net: dsa: fix flow dissection on Tx path
  net/tls: Fix return values to avoid ENOTSUPP
  net: avoid an indirect call in ____sys_recvmsg()
  ...
2019-12-08 13:28:11 -08:00
Sebastian Andrzej Siewior
025f50f386 sched/rt, workqueue: Use PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Update the comment to use PREEMPTION because it is true for both
preemption models.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20191015191821.11479-35-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:37 +01:00
Sebastian Andrzej Siewior
1b40cd56f3 sched/rt, locking: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the Kconfig dependency to use CONFIG_PREEMPTION.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20191015191821.11479-32-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:36 +01:00
Ingo Molnar
4f797f56c3 Merge branch 'linus' into sched/urgent, to pick up the latest before merging new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:10 +01:00
Stephen Rothwell
ee19545220 Fix up for "printk: Drop pr_warning definition"
Link: http://lkml.kernel.org/r/20191206092503.303d6a57@canb.auug.org.au
Cc: Linux Next Mailing List <linux-next@vger.kernel.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-12-06 10:02:35 +01:00
Kefeng Wang
1d9a6159bd workqueue: Use pr_warn instead of pr_warning
Use pr_warn() instead of the remaining pr_warning() calls.

Link: http://lkml.kernel.org/r/20191128004752.35268-2-wangkefeng.wang@huawei.com
To: joe@perches.com
To: linux-kernel@vger.kernel.org
Cc: gregkh@linuxfoundation.org
Cc: tj@kernel.org
Cc: arnd@arndb.de
Cc: sergey.senozhatsky@gmail.com
Cc: rostedt@goodmis.org
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-12-06 09:59:30 +01:00
Linus Torvalds
0f13741624 Modules updates for v5.5
Summary of modules changes for the 5.5 merge window:
 
 - Refactor include/linux/export.h and remove code duplication between
   EXPORT_SYMBOL and EXPORT_SYMBOL_NS to make it more readable. The most
   notable change is that no namespace is represented by an empty string ""
   rather than NULL.
 
 - Fix a module load/unload race where waiter(s) trying to load the same
   module weren't being woken up when a module finally goes away.
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJd6V3yAAoJEMBFfjjOO8FyeGEP/0Ue4uNehrDYQ6wHaLJOeSA3
 GEcraILbsT4v/9HqVbIaMH2idwwCI5xY6XlyDADaimYkEvs3jLOSsVEpjTvnjt0s
 DnNFR2vm+JsWVmS4jsmij2T6RgVfZq062RWJA1HvxtCsZWHFOttQe3gh9s/ycFAv
 UwGk0FUr4E78pUYNj+zQ35j4/L/C3Va2vC3VwSV4ND0kVTBrqcVHV6g3K409vgb8
 /ZD8/cFwVvOvGSK47M4r+Xt2X/57A/Cb0RgjvKHRfvONfyranKv9WlqM6Y6DXlZ0
 Su7eIo5kAH40/LUR2ludTSHLNcr/PWM4W2q8q81+gqF4h3KitYXXARWjKSLDwo/8
 nEq/rxJzEDX0bIgnSyU3t+ZqK2JonAF0a1D53otPPaSvTMPe1Gz48//cD6TGc3np
 xxLDZEPne/vbNUy3z2K1tXoWbxdThAhtCb8qOilVZBitPtnQpmUt2eyn1/2snoBR
 uerB/S8B48YI1TGxuK6Ksy5QIuJk9DG2o33nD5PPHe5dKEZQPAmSJDEwVaLzpW9b
 t9JoHo+H6BefKj0Sexf+1jlK9WKJEwGpqhZqfRkosACelxJJ3Ap3nuMsdNuZY+6U
 rAG8N322HV5x50weIId+t8AP8cdS+vRfh8PgvpvHY8YYXXeagOK49+snkJLAMgw0
 9Px3j20sNSmFfYloNUzW
 =XDci
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "Summary of modules changes for the 5.5 merge window:

   - Refactor include/linux/export.h and remove code duplication between
     EXPORT_SYMBOL and EXPORT_SYMBOL_NS to make it more readable.

     The most notable change is that no namespace is represented by an
     empty string "" rather than NULL.

   - Fix a module load/unload race where waiter(s) trying to load the
     same module weren't being woken up when a module finally goes away"

* tag 'modules-for-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  kernel/module.c: wakeup processes in module_wq on module unload
  moduleparam: fix parameter description mismatch
  export: avoid code duplication in include/linux/export.h
2019-12-05 12:27:16 -08:00
Linus Torvalds
fb3da48a86 Merge branch 'thermal/next' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal management updates from Zhang Rui:

 - Fix a deadlock regression in thermal core framework, which was
   introduced in 5.3 (Wei Wang)

 - Initialize thermal control framework earlier to enable thermal
   mitigation during boot (Amit Kucheria)

 - Convert the Intelligent Power Allocator (IPA) thermal governor to
   follow the generic PM_EM instead of its own Energy Model (Quentin
   Perret)

 - Introduce a new Amlogic soc thermal driver (Guillaume La Roque)

 - Add interrupt support for tsens thermal driver (Amit Kucheria)

 - Add support for MSM8956/8976 in tsens thermal driver
   (AngeloGioacchino Del Regno)

 - Add support for r8a774b1 in rcar thermal driver (Biju Das)

 - Add support for Thermal Monitor Unit v2 in qoriq thermal driver
   (Yuantian Tang)

 - Some other fixes/cleanups on thermal core framework and soc thermal
   drivers (Colin Ian King, Daniel Lezcano, Hsin-Yi Wang, Tian Tao)

* 'thermal/next' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (32 commits)
  thermal: Fix deadlock in thermal thermal_zone_device_check
  thermal: cpu_cooling: Migrate to using the EM framework
  thermal: cpu_cooling: Make the power-related code depend on IPA
  PM / EM: Declare EM data types unconditionally
  arm64: defconfig: Enable CONFIG_ENERGY_MODEL
  drivers: thermal: tsens: fix potential integer overflow on multiply
  thermal: cpu_cooling: Reorder the header file
  thermal: cpu_cooling: Remove pointless dependency on CONFIG_OF
  thermal: no need to set .owner when using module_platform_driver
  thermal: qcom: tsens-v1: Fix kfree of a non-pointer value
  cpufreq: qcom-hw: Move driver initialization earlier
  clk: qcom: Initialize clock drivers earlier
  cpufreq: Initialize cpufreq-dt driver earlier
  cpufreq: Initialize the governors in core_initcall
  thermal: Initialize thermal subsystem earlier
  thermal: Remove netlink support
  dt: thermal: tsens: Document compatible for MSM8976/56
  thermal: qcom: tsens-v1: Add support for MSM8956 and MSM8976
  MAINTAINERS: add entry for Amlogic Thermal driver
  thermal: amlogic: Add thermal driver to support G12 SoCs
  ...
2019-12-05 11:21:24 -08:00
Linus Torvalds
5ecc9d15f7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Most of the rest of MM and various other things. Some Kconfig rework
  still awaits merges of dependent trees from linux-next.

  Subsystems affected by this patch series: mm/hotfixes, mm/memcg,
  mm/vmstat, mm/thp, procfs, sysctl, misc, notifiers, core-kernel,
  bitops, lib, checkpatch, epoll, binfmt, init, rapidio, uaccess, kcov,
  ubsan, ipc, bitmap, mm/pagemap"

* akpm: (86 commits)
  mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h
  um: add support for folded p4d page tables
  um: remove unused pxx_offset_proc() and addr_pte() functions
  sparc32: use pgtable-nopud instead of 4level-fixup
  parisc/hugetlb: use pgtable-nopXd instead of 4level-fixup
  parisc: use pgtable-nopXd instead of 4level-fixup
  nds32: use pgtable-nopmd instead of 4level-fixup
  microblaze: use pgtable-nopmd instead of 4level-fixup
  m68k: mm: use pgtable-nopXd instead of 4level-fixup
  m68k: nommu: use pgtable-nopud instead of 4level-fixup
  c6x: use pgtable-nopud instead of 4level-fixup
  arm: nommu: use pgtable-nopud instead of 4level-fixup
  alpha: use pgtable-nopud instead of 4level-fixup
  gpio: pca953x: tighten up indentation
  gpio: pca953x: convert to use bitmap API
  gpio: pca953x: use input from regs structure in pca953x_irq_pending()
  gpio: pca953x: remove redundant variable and check in IRQ handler
  lib/bitmap: introduce bitmap_replace() helper
  lib/test_bitmap: fix comment about this file
  lib/test_bitmap: move exp1 and exp2 upper for others to use
  ...
2019-12-05 09:46:26 -08:00
Yonghong Song
e9eeec58c9 bpf: Fix a bug when getting subprog 0 jited image in check_attach_btf_id
For jited bpf program, if the subprogram count is 1, i.e.,
there is no callees in the program, prog->aux->func will be NULL
and prog->bpf_func points to image address of the program.

If there is more than one subprogram, prog->aux->func is populated,
and subprogram 0 can be accessed through either prog->bpf_func or
prog->aux->func[0]. Other subprograms should be accessed through
prog->aux->func[subprog_id].

This patch fixed a bug in check_attach_btf_id(), where
prog->aux->func[subprog_id] is used to access any subprogram which
caused a segfault like below:
  [79162.619208] BUG: kernel NULL pointer dereference, address:
  0000000000000000
  ......
  [79162.634255] Call Trace:
  [79162.634974]  ? _cond_resched+0x15/0x30
  [79162.635686]  ? kmem_cache_alloc_trace+0x162/0x220
  [79162.636398]  ? selinux_bpf_prog_alloc+0x1f/0x60
  [79162.637111]  bpf_prog_load+0x3de/0x690
  [79162.637809]  __do_sys_bpf+0x105/0x1740
  [79162.638488]  do_syscall_64+0x5b/0x180
  [79162.639147]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ......

Fixes: 5b92a28aae ("bpf: Support attaching tracing BPF program to other BPF programs")
Reported-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191205010606.177774-1-yhs@fb.com
2019-12-04 21:20:07 -08:00
Andrey Konovalov
eec028c938 kcov: remote coverage support
Patch series " kcov: collect coverage from usb and vhost", v3.

This patchset extends kcov to allow collecting coverage from backgound
kernel threads.  This extension requires custom annotations for each of
the places where coverage collection is desired.  This patchset
implements this for hub events in the USB subsystem and for vhost
workers.  See the first patch description for details about the kcov
extension.  The other two patches apply this kcov extension to USB and
vhost.

Examples of other subsystems that might potentially benefit from this
when custom annotations are added (the list is based on
process_one_work() callers for bugs recently reported by syzbot):

1. fs: writeback wb_workfn() worker,
2. net: addrconf_dad_work()/addrconf_verify_work() workers,
3. net: neigh_periodic_work() worker,
4. net/p9: p9_write_work()/p9_read_work() workers,
5. block: blk_mq_run_work_fn() worker.

These patches have been used to enable coverage-guided USB fuzzing with
syzkaller for the last few years, see the details here:

  https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md

This patchset has been pushed to the public Linux kernel Gerrit
instance:

  https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524

This patch (of 3):

Add background thread coverage collection ability to kcov.

With KCOV_ENABLE coverage is collected only for syscalls that are issued
from the current process.  With KCOV_REMOTE_ENABLE it's possible to
collect coverage for arbitrary parts of the kernel code, provided that
those parts are annotated with kcov_remote_start()/kcov_remote_stop().

This allows to collect coverage from two types of kernel background
threads: the global ones, that are spawned during kernel boot in a
limited number of instances (e.g.  one USB hub_event() worker thread is
spawned per USB HCD); and the local ones, that are spawned when a user
interacts with some kernel interface (e.g.  vhost workers).

To enable collecting coverage from a global background thread, a unique
global handle must be assigned and passed to the corresponding
kcov_remote_start() call.  Then a userspace process can pass a list of
such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
of the kcov_remote_arg struct.  This will attach the used kcov device to
the code sections, that are referenced by those handles.

Since there might be many local background threads spawned from
different userspace processes, we can't use a single global handle per
annotation.  Instead, the userspace process passes a non-zero handle
through the common_handle field of the kcov_remote_arg struct.  This
common handle gets saved to the kcov_handle field in the current
task_struct and needs to be passed to the newly spawned threads via
custom annotations.  Those threads should in turn be annotated with
kcov_remote_start()/kcov_remote_stop().

Internally kcov stores handles as u64 integers.  The top byte of a
handle is used to denote the id of a subsystem that this handle belongs
to, and the lower 4 bytes are used to denote the id of a thread instance
within that subsystem.  A reserved value 0 is used as a subsystem id for
common handles as they don't belong to a particular subsystem.  The
bytes 4-7 are currently reserved and must be zero.  In the future the
number of bytes used for the subsystem or handle ids might be increased.

When a particular userspace process collects coverage by via a common
handle, kcov will collect coverage for each code section that is
annotated to use the common handle obtained as kcov_handle from the
current task_struct.  However non common handles allow to collect
coverage selectively from different subsystems.

Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Huang Shijie
964975ac66 lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Follow the kernel conventions, rename addr_in_gen_pool to
gen_pool_has_addr.

[sjhuang@iluvatar.ai: fix Documentation/ too]
 Link: http://lkml.kernel.org/r/20181229015914.5573-1-sjhuang@iluvatar.ai
Link: http://lkml.kernel.org/r/20181228083950.20398-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:13 -08:00
Joe Perches
5e1aada08c kernel/sys.c: avoid copying possible padding bytes in copy_to_user
Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.

Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:12 -08:00