Commit graph

37296 commits

Author SHA1 Message Date
David S. Miller
bdfa75ad70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.

With a build fix from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22 11:41:16 +01:00
Linus Torvalds
9d235ac01f Merge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fixes from Eric Biederman:
 "There has been one very hard to track down bug in the ucount code that
  we have been tracking since roughly v5.14 was released. Alex managed
  to find a reliable reproducer a few days ago and then I was able to
  instrument the code and figure out what the issue was.

  It turns out the sigqueue_alloc single atomic operation optimization
  did not play nicely with ucounts multiple level rlimits. It turned out
  that either sigqueue_alloc or sigqueue_free could be operating on
  multiple levels and trigger the conditions for the optimization on
  more than one level at the same time.

  To deal with that situation I have introduced inc_rlimit_get_ucounts
  and dec_rlimit_put_ucounts that just focuses on the optimization and
  the rlimit and ucount changes.

  While looking into the big bug I found I couple of other little issues
  so I am including those fixes here as well.

  When I have time I would very much like to dig into process ownership
  of the shared signal queue and see if we could pick a single owner for
  the entire queue so that all of the rlimits can count to that owner.
  That should entirely remove the need to call get_ucounts and
  put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult
  because Linux unlike POSIX supports setuid that works on a single
  thread"

* 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
  ucounts: Proper error handling in set_cred_ucounts
  ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
  ucounts: Fix signal ucount refcounting
2021-10-21 17:27:17 -10:00
Linus Torvalds
515dcc2e02 dma-mapping fixes for Linux 5.15
- fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz)
  - fix a kerneldoc warning (Logan Gunthorpe)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmFwTiELHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOhaA/+MfIeaeQB0PcBxBET9fj63sWvL/VKvMv0is0RoARE
 c/Y/g8VzYY/LvAUjru3zON+WJzVulzmxRiNA5ogYS1v2yG80/ztLHfUdZVoOlJ9j
 sndUqtGHbOC1oc97zvuRC+6jTVIpse90lo5yfM/5pgb5dG2Dfgs/tpbOVUX4eR2H
 UkCvBqhTeE2jFgOxCVZoeeLovMYAupDhIcfWuRZXnYbcC3XblY29/vyppvqsTE1J
 TpycunPOmoC1U1V+Mc0nGcqLC+U+2Bn46cYAvGsfqLw1LhaAMDSlW+SCyEAMQdUf
 sgcOhtKG/Hv2Sg0CyD0ZTei4uv0NFjFkSwGVkEPak7YwarZV3LSna1vGCi6igwzU
 M/N5C7vI3zlS4mATitDm27FWTZxiJsGJPysJOJdktzWPOczKFixR9dl0JgTSIuXQ
 B38BJpZ4RNQdM77CserFmuvMVXWGFCneL9Rk4cq/InCFyNE6fnbmrY0EKuN0Vohy
 /OS6tttQiMOcDdHAdcvl7NAMs+mQVHLunXtO1MlVOD8/ZVFicoTFZXL8ZeB56XXd
 qMZEAovB4jIrh04hbMfwyomlHc185DzIBODtZx1VkGkxB8N1FiP8DR1Hk72VAjh/
 /qu2cVlia9NUSRiHUqfiCDTTvog7702H8Gd4hEhc8fhvMQ3m65fF1yrzwU1TYvyN
 iXE=
 =eJpY
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz)

 - fix a kerneldoc warning (Logan Gunthorpe)

* tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC
  dma-debug: fix sg checks in debug_dma_map_sg()
  dma-mapping: fix the kerneldoc for dma_map_sgtable()
2021-10-20 10:16:51 -10:00
Linus Torvalds
6da52dead8 audit/stable-5.15 PR 20211019
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmFvbsAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOMsBAAkf8ZrL5UHq+0F60g+dEJHq7kwet7
 QKWjDS1UOQK9Cvtt7T6Ggwu+kUMYG/HNlWMBkiv8+6SSy9o4KaftjpEoZDkZIO0F
 lujGhgPsZdfadRUZgvhC0NrEmXwYQxdGqxWiE00tRYRBMviP2vb2Bf1Z/iXQZGeI
 JZOENUCT6fridU9gNkUKP+qcV7/eZLaOTUItPyd8spYGtl4k85kKsHDHlF1C4MHM
 ByRVAjuTbubGU8m5RF4tjju+f7CWBiiAXQFev/qouBzRXp+bk//WCxmc8e414Uwh
 /QScy4wbRplIpq+iWTIcii8jwo0uJke7rPMetDik3VtqLCBu9hhh4Np0umfwjnOt
 Fwis0H/2VoikJE8G7lC/0qd2ya3DgGtBbr+QMePh3QK8iUTkTlDTKiAf4b8JYm3x
 MNXV/XSYIdlSoYUsSZXx9IciSCKnEa5TayY/N60CLFeyKOgmyxdtRA/Mql30bLc5
 a141pVF+hNnovdpgcoIfCvA/oXhxsPYAL/Rh1OLPhwhTG+fKPrJfM8qsIZsvNUAV
 Kg0UJRWxr5mkmYFv7vlPJSK+ZrJ/LlbNGskr2RuAPQ9QOsAQjf/3z/WJe2nfOQgH
 oMLij2M0/sq1YuEP1yfQP4k2Du/Vqy4z+Ls1kKovlxldZEwk3TAEKND5voPUXVVv
 h6rtxUWwGl+j0m4=
 =tBGu
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "One small audit patch to add a pointer NULL check"

* tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix possible null-pointer dereference in audit_filter_rules
2021-10-20 06:11:17 -10:00
Linus Torvalds
fc9b289344 tracing recursion fix:
- While cleaning up some of the tracing recursion protection logic,
    I discovered a scenario that the current design would miss, and
    would allow an infinite recursion. Removing an optimization trick
    that opened the hole, fixes the issue and cleans up the code as well.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYW7CqRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmVWAQDnboKAUVmB/3D/L1T9XdWEq4AzCS6W
 51QpzWff0pBVkwEAuc1af2gqDZ6/N9sQjN9kGxikY6luVs3CSQ1yHkcanQw=
 =1zGP
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Recursion fix for tracing.

  While cleaning up some of the tracing recursion protection logic, I
  discovered a scenario that the current design would miss, and would
  allow an infinite recursion. Removing an optimization trick that
  opened the hole fixes the issue and cleans up the code as well"

* tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have all levels of checks prevent recursion
2021-10-20 06:02:58 -10:00
Eric W. Biederman
5ebcbe342b ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
Setting cred->ucounts in cred_alloc_blank does not make sense.  The
uid and user_ns are deliberately not set in cred_alloc_blank but
instead the setting is delayed until key_change_session_keyring.

So move dealing with ucounts into key_change_session_keyring as well.

Unfortunately that movement of get_ucounts adds a new failure mode to
key_change_session_keyring.  I do not see anything stopping the parent
process from calling setuid and changing the relevant part of it's
cred while keyctl_session_to_parent is running making it fundamentally
necessary to call get_ucounts in key_change_session_keyring.  Which
means that the new failure mode cannot be avoided.

A failure of key_change_session_keyring results in a single threaded
parent keeping it's existing credentials.  Which results in the parent
process not being able to access the session keyring and whichever
keys are in the new keyring.

Further get_ucounts is only expected to fail if the number of bits in
the refernece count for the structure is too few.

Since the code has no other way to report the failure of get_ucounts
and because such failures are not expected to be common add a WARN_ONCE
to report this problem to userspace.

Between the WARN_ONCE and the parent process not having access to
the keys in the new session keyring I expect any failure of get_ucounts
will be noticed and reported and we can find another way to handle this
condition.  (Possibly by just making ucounts->count an atomic_long_t).

Cc: stable@vger.kernel.org
Fixes: 905ae01c4a ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-20 10:34:20 -05:00
Eric W. Biederman
34dc2fd6e6 ucounts: Proper error handling in set_cred_ucounts
Instead of leaking the ucounts in new if alloc_ucounts fails, store
the result of alloc_ucounts into a temporary variable, which is later
assigned to new->ucounts.

Cc: stable@vger.kernel.org
Fixes: 905ae01c4a ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/87pms2s0v8.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-19 11:04:25 -05:00
Eric W. Biederman
629715adc6 ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
The purpose of inc_rlimit_ucounts and dec_rlimit_ucounts in commit_creds
is to change which rlimit counter is used to track a process when the
credentials changes.

Use the same test for both to guarantee the tracking is correct.

Cc: stable@vger.kernel.org
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Link: https://lkml.kernel.org/r/87v91us0w4.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-19 11:01:52 -05:00
Gaosheng Cui
6e3ee990c9 audit: fix possible null-pointer dereference in audit_filter_rules
Fix  possible null-pointer dereference in audit_filter_rules.

audit_filter_rules() error: we previously assumed 'ctx' could be null

Cc: stable@vger.kernel.org
Fixes: bf361231c2 ("audit: add saddr_fam filter field")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-18 18:27:47 -04:00
Steven Rostedt (VMware)
ed65df63a3 tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.

The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.

Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.

Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.

Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.

If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.

Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.

This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.

i.e.

 traced_function_1: [ more than one callback tracing it ]
   call loop_func

 loop_func:
   trace_recursion set internal bit
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 traced_function_2: [ only traced by above callback ]
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 [ wash, rinse, repeat, BOOM! out of shampoo! ]

Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.

Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.

 [*] There is on layer of recursion that is allowed, and that is to allow
     for the transition between interrupt context (normal -> softirq ->
     irq -> NMI), because a trace may occur before the context update is
     visible to the trace recursion logic.

Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-18 18:12:09 -04:00
Eric W. Biederman
15bc01effe ucounts: Fix signal ucount refcounting
In commit fda31c5029 ("signal: avoid double atomic counter
increments for user accounting") Linus made a clever optimization to
how rlimits and the struct user_struct.  Unfortunately that
optimization does not work in the obvious way when moved to nested
rlimits.  The problem is that the last decrement of the per user
namespace per user sigpending counter might also be the last decrement
of the sigpending counter in the parent user namespace as well.  Which
means that simply freeing the leaf ucount in __free_sigqueue is not
enough.

Maintain the optimization and handle the tricky cases by introducing
inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.

By moving the entire optimization into functions that perform all of
the work it becomes possible to ensure that every level is handled
properly.

The new function inc_rlimit_get_ucounts returns 0 on failure to
increment the ucount.  This is different than inc_rlimit_ucounts which
increments the ucounts and returns LONG_MAX if the ucount counter has
exceeded it's maximum or it wrapped (to indicate the counter needs to
decremented).

I wish we had a single user to account all pending signals to across
all of the threads of a process so this complexity was not necessary

Cc: stable@vger.kernel.org
Fixes: d646969055 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133
Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Tested-by: Rune Kleveland <rune.kleveland@infomedia.dk>
Tested-by: Yu Zhao <yuzhao@google.com>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-18 16:02:30 -05:00
Hamza Mahfooz
c2bbf9d1e9 dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC
Mapping something twice should be possible as long as,
DMA_ATTR_SKIP_CPU_SYNC is passed to the strictly speaking second relevant
mapping operation (that attempts to map the same thing). So, don't issue a
warning if the specified condition is met in add_dma_entry().

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-18 12:46:45 +02:00
Linus Torvalds
368a978cc5 Tracing fixes for 5.15:
- Fix defined but not use warning/error for osnoise function
 
  - Fix memory leak in event probe
 
  - Fix memblock leak in bootconfig
 
  - Fix the API of event probes to be like kprobes
 
  - Added test to check removal of event probe API
 
  - Fix recordmcount.pl for nds32 failed build
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYWpB6BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnu/AQD1eYekS43uCDyzzpvjsz0tZ6tzVH8z
 ainpgtcAd11q4AD8CHLvhBsEyo99Yna2Mvir6nCkafm2Y2IVGvVbnDofnAA=
 =yvDo
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Tracing fixes for 5.15:

 - Fix defined but not use warning/error for osnoise function

 - Fix memory leak in event probe

 - Fix memblock leak in bootconfig

 - Fix the API of event probes to be like kprobes

 - Added test to check removal of event probe API

 - Fix recordmcount.pl for nds32 failed build

* tag 'trace-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  nds32/ftrace: Fix Error: invalid operands (*UND* and *UND* sections) for `^'
  selftests/ftrace: Update test for more eprobe removal process
  tracing: Fix event probe removal from dynamic events
  tracing: Fix missing * in comment block
  bootconfig: init: Fix memblock leak in xbc_make_cmdline()
  tracing: Fix memory leak in eprobe_register()
  tracing: Fix missing osnoise tracer on max_latency
2021-10-16 10:51:41 -07:00
Jakub Kicinski
e15f5972b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/ioam6.sh
  7b1700e009 ("selftests: net: modify IOAM tests for undef bits")
  bf77b1400a ("selftests: net: Test for the IOAM encapsulation with IPv6")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14 16:50:14 -07:00
Steven Rostedt (VMware)
7d5fda1c84 tracing: Fix event probe removal from dynamic events
When an event probe is to be removed via the API that created it via the
dynamic events, an -ENOENT error is returned.

This is because the removal of the event probe does not expect to see the
event system and name that the event probe is attached to, even though
that's part of the API to create it. As the removal of probes is to use
the same API as they are created.

In fact, the removal is not consistent with the kprobes and uprobes
removal. Fix that by allowing various ways to remove the eprobe.

The eprobe is created with:

 e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS]

Have it get removed by echoing in the following into dynamic_events:

 # Remove all eprobes with NAME
 echo '-:NAME' >> dynamic_events

 # Remove a specific eprobe
 echo '-:GROUP/NAME' >> dynamic_events
 echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events
 echo '-:NAME SYSTEM/EVENT' >> dynamic_events
 echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events
 echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events

Link: https://lkml.kernel.org/r/20211012081925.0e19cc4f@gandalf.local.home
Link: https://lkml.kernel.org/r/20211013205533.630722129@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-13 19:26:57 -04:00
Linus Torvalds
348949d9a4 Modules fix for v5.15-rc6
- Build fix for cfi_init() when CONFIG_MODULE_UNLOAD=n
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAmFmkNkQHGpleXVAa2Vy
 bmVsLm9yZwAKCRDARX44zjvBchKnD/9OKi+4mSS/08mZ2OCGI8nBq/uIF6jh2cPx
 3AqLSdN1LudTiRPFOll6mBso7GV5JAP22Gw1yjUtiXzweu4zvkorLZTZXzKLMRFV
 niTeZ11xeU3lnEJWyvQWZhpM6Jd0s2YQfwTEsPidE8SEDWMSLzpW/OkPzGW3DcK/
 Hs/UBDl9Jp2bROn7w5bOZwGtq9yNg8N5jLBoMLCHcU3hz+Nrb9d3nXGN4Na2AC+C
 vAtZL3rO27CXJrU1jF1GSdi35sGyIBJm8V/m9aWe7SZomvrjnQIL/LMI4OumIYlB
 NglmmoH88bpUsy7WYuMCTbKoBhQB1m+DZ4IYIRMxGPS5O78O+Cc3e6T3LUiWMaYo
 fBad4KOBBaAaRGSLFBPVju2Kim2Q3p+In+8DEQid2dwdz26vmdiC+QBl7LHgG9JL
 pzeaPpPVYCFv9J4PhXVPFOC7gqA0Kil8J+BEGj2Jvm9lIpjHF6uhWu/+wUl4hmTJ
 UdL3KjTS6bT+RMXx1zFc/cQD6oqTWaD4ABDP4wqyBYzfGIDWYTrHPA8M+2CbisTN
 yY5ZVqawVUr4FK0I8oq7mkUWdxr5qkOY0/pZqiWGBoREG9q24Q2vVxDXBHaiW31g
 7Z6emavdOOLaHcwto4Zj0O4rJ683CUx9fGZqkc6S26icOqYbiVfgUjBWpYE+BtDa
 c3A0OeKI6g==
 =ijIH
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull modules fix from Jessica Yu:

 - Build fix for cfi_init() when CONFIG_MODULE_UNLOAD=n

* tag 'modules-for-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: fix clang CFI with MODULE_UNLOAD=n
2021-10-13 07:42:07 -07:00
Linus Torvalds
459ea72c6c Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "All documentation / comment updates"

* 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroupv2, docs: fix misinformation in "device controller" section
  cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
  docs/cgroup: remove some duplicate words
2021-10-11 17:16:41 -07:00
Linus Torvalds
0a5d6c641b Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "One patch to add a missing __printf annotation and the other to enable
  deferred printing for debug dumps to avoid deadlocks when triggered
  from some contexts (e.g. console drivers)"

* 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix state-dump console deadlock
  workqueue: annotate alloc_workqueue() as printf
2021-10-11 16:59:49 -07:00
Johan Hovold
57116ce17b workqueue: fix state-dump console deadlock
Console drivers often queue work while holding locks also taken in their
console write paths, something which can lead to deadlocks on SMP when
dumping workqueue state (e.g. sysrq-t or on suspend failures).

For serial console drivers this could look like:

	CPU0				CPU1
	----				----

	show_workqueue_state();
	  lock(&pool->lock);		<IRQ>
	  				  lock(&port->lock);
					  schedule_work();
					    lock(&pool->lock);
	  printk();
	    lock(console_owner);
	    lock(&port->lock);

where workqueues are, for example, used to push data to the line
discipline, process break signals and handle modem-status changes. Line
disciplines and serdev drivers can also queue work on write-wakeup
notifications, etc.

Reworking every console driver to avoid queuing work while holding locks
also taken in their write paths would complicate drivers and is neither
desirable or feasible.

Instead use the deferred-printk mechanism to avoid printing while
holding pool locks when dumping workqueue state.

Note that there are a few WARN_ON() assertions in the workqueue code
which could potentially also trigger a deadlock. Hopefully the ongoing
printk rework will provide a general solution for this eventually.

This was originally reported after a lockdep splat when executing
sysrq-t with the imx serial driver.

Fixes: 3494fc3084 ("workqueue: dump workqueues on sysrq-t")
Cc: stable@vger.kernel.org	# 4.0
Reported-by: Fabio Estevam <festevam@denx.de>
Tested-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-11 06:50:28 -10:00
Gerald Schaefer
293d92cbbd dma-debug: fix sg checks in debug_dma_map_sg()
The following warning occurred sporadically on s390:
DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or rodata [addr=0000000048cc5e2f] [len=131072]
WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 check_for_illegal_area+0xa8/0x138

It is a false-positive warning, due to broken logic in debug_dma_map_sg().
check_for_illegal_area() checks for overlay of sg elements with kernel text
or rodata. It is called with sg_dma_len(s) instead of s->length as
parameter. After the call to ->map_sg(), sg_dma_len() will contain the
length of possibly combined sg elements in the DMA address space, and not
the individual sg element length, which would be s->length.

The check will then use the physical start address of an sg element, and
add the DMA length for the overlap check, which could result in the false
warning, because the DMA length can be larger than the actual single sg
element length.

In addition, the call to check_for_illegal_area() happens in the iteration
over mapped_ents, which will not include all individual sg elements if
any of them were combined in ->map_sg().

Fix this by using s->length instead of sg_dma_len(s). Also put the call to
check_for_illegal_area() in a separate loop, iterating over all the
individual sg elements ("nents" instead of "mapped_ents").

While at it, as suggested by Robin Murphy, also move check_for_stack()
inside the new loop, as it is similarly concerned with validating the
individual sg elements.

Link: https://lore.kernel.org/lkml/20210705185252.4074653-1-gerald.schaefer@linux.ibm.com
Fixes: 884d05970b ("dma-debug: use sg_dma_len accessor")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-11 13:43:31 +02:00
Logan Gunthorpe
011a9ce807 dma-mapping: fix the kerneldoc for dma_map_sgtable()
htmldocs began producing the following warnings:

  kernel/dma/mapping.c:256: WARNING: Definition list ends without a
             blank line; unexpected unindent.
  kernel/dma/mapping.c:257: WARNING: Bullet list ends without a blank
             line; unexpected unindent.

Reformatting the list without hyphens fixes the warnings and produces
both a readable text and HTML output.

Fixes: fffe3cc8c2 ("dma-mapping: allow map_sg() ops to return negative error code")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-11 13:43:04 +02:00
Colin Ian King
b26503b156 tracing: Fix missing * in comment block
There is a missing * in a comment block, add it in.

Link: https://lkml.kernel.org/r/20211006172830.1025336-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10 22:27:40 -04:00
Vamshi K Sthambamkadi
6675880fc4 tracing: Fix memory leak in eprobe_register()
kmemleak report:
unreferenced object 0xffff900a70ec7ec0 (size 32):
  comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s)
  hex dump (first 32 bytes):
    c8 31 23 45 0a 90 ff ff 40 85 c7 6e 0a 90 ff ff  .1#E....@..n....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440
    [<0000000088b8124b>] eprobe_register+0x1e3/0x350
    [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240
    [<0000000019109321>] event_enable_write+0x93/0xe0
    [<000000007d85b320>] vfs_write+0xb9/0x260
    [<00000000b94c5e41>] ksys_write+0x67/0xe0
    [<000000005a08c81d>] __x64_sys_write+0x1a/0x20
    [<00000000240bf576>] do_syscall_64+0x3b/0xc0
    [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae

unreferenced object 0xffff900a56bbf280 (size 128):
  comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s)
  hex dump (first 32 bytes):
    ff ff ff ff ff ff ff ff 00 00 00 00 01 00 00 00  ................
    80 69 3b b2 ff ff ff ff 20 69 3b b2 ff ff ff ff  .i;..... i;.....
  backtrace:
    [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440
    [<00000000c4e90fad>] eprobe_register+0x1fc/0x350
    [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240
    [<0000000019109321>] event_enable_write+0x93/0xe0
    [<000000007d85b320>] vfs_write+0xb9/0x260
    [<00000000b94c5e41>] ksys_write+0x67/0xe0
    [<000000005a08c81d>] __x64_sys_write+0x1a/0x20
    [<00000000240bf576>] do_syscall_64+0x3b/0xc0
    [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae

In new_eprobe_trigger(), allocated edata and trigger variables are
never freed.

To fix, free memory in disable_eprobe().

Link: https://lkml.kernel.org/r/20211008071802.GA2098@cosmos

Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10 22:26:55 -04:00
Jakub Kicinski
9fe1155233 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07 15:24:06 -07:00
Linus Torvalds
4a16df549d Networking fixes for 5.15-rc5, including fixes from xfrm, bpf,
netfilter, and wireless.
 
 Current release - regressions:
 
  - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting
    a new value in the middle of an enum
 
  - unix: fix an issue in unix_shutdown causing the other end
    read/write failures
 
  - phy: mdio: fix memory leak
 
 Current release - new code bugs:
 
  - mlx5e: improve MQPRIO resiliency against bad configs
 
 Previous releases - regressions:
 
  - bpf: fix integer overflow leading to OOB access in map element
    pre-allocation
 
  - stmmac: dwmac-rk: fix ethernet on rk3399 based devices
 
  - netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1
 
  - brcmfmac: revert using ISO3166 country code and 0 rev as fallback
 
  - i40e: fix freeing of uninitialized misc IRQ vector
 
  - iavf: fix double unlock of crit_lock
 
 Previous releases - always broken:
 
  - bpf, arm: fix register clobbering in div/mod implementation
 
  - netfilter: nf_tables: correct issues in netlink rule change
    event notifications
 
  - dsa: tag_dsa: fix mask for trunked packets
 
  - usb: r8152: don't resubmit rx immediately to avoid soft lockup
    on device unplug
 
  - i40e: fix endless loop under rtnl if FW fails to correctly
    respond to capability query
 
  - mlx5e: fix rx checksum offload coexistence with ipsec offload
 
  - mlx5: force round second at 1PPS out start time and allow it
    only in supported clock modes
 
  - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
    sequence
 
 Misc:
 
  - xfrm: slightly rejig the new policy uAPI to make it less cryptic
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFfEpUACgkQMUZtbf5S
 Iru3vQ//fgm+pdDE860BXmLEgrbJTHU4rq/YD1vwZBcWw/i5wI1vnLr6BzZsdPNX
 DkhcKFGOZUTj+8ctuRDuqrkqoDjva6uRjwM0vcPWh5i9sGqJpKjxB3dFksyxJELR
 SnXM3Jmlk7YiGAw9Bi+66OuIwt2ouRLR/bNIwg8/qCnFI1efIF7IPeCpuvKCw/yd
 SOiSBIfuSPD1qcs5Sy4aHZqA8Xr9qbwDNwWQfFLXgNDKEiY7XOSdo3CoCddSxdR+
 2nmpOiz4w68wspapdZn3GSZHYrQh5kjz7b0Aru0Jvw86M79mKp3b9AfJ9uXTcJhp
 4cQBralLnQvLKanvKi1z5CI6NjXx+r6rXI43N6NjHOtjRUPoFMqxZEH0d7o11aT1
 sN3UDgtFtuE9Pfrhnc5ZHuHqFCCyxA6NWD6nt1dUoSEo0oWt9mOHfeoFT4+45fO0
 5no5+1q3EkYdH4jiJlavnM2DMvVzMd6FbxDzWpXJ2j4W1vM6TLkexEJIK4MLGxPV
 76lxeXzcvbM9a0vq5BabR4QbPIAv+A9qYPmXJwPVrvjo+zynwuWc3gMO5hc4EaOf
 FXF2Ka5Jn97jW8/JS7i7Gj6M8GKdyIxaFHgS4MLtJNs6pt3h7m6bSgcIOQZ5psBZ
 dKRYjM2lxeVkvDDmy5Gztkw2asbofYQP004tgagc+jwXP7DwXaY=
 =xzZg
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from xfrm, bpf, netfilter, and wireless.

  Current release - regressions:

   - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
     value in the middle of an enum

   - unix: fix an issue in unix_shutdown causing the other end
     read/write failures

   - phy: mdio: fix memory leak

  Current release - new code bugs:

   - mlx5e: improve MQPRIO resiliency against bad configs

  Previous releases - regressions:

   - bpf: fix integer overflow leading to OOB access in map element
     pre-allocation

   - stmmac: dwmac-rk: fix ethernet on rk3399 based devices

   - netfilter: conntrack: fix boot failure with
     nf_conntrack.enable_hooks=1

   - brcmfmac: revert using ISO3166 country code and 0 rev as fallback

   - i40e: fix freeing of uninitialized misc IRQ vector

   - iavf: fix double unlock of crit_lock

  Previous releases - always broken:

   - bpf, arm: fix register clobbering in div/mod implementation

   - netfilter: nf_tables: correct issues in netlink rule change event
     notifications

   - dsa: tag_dsa: fix mask for trunked packets

   - usb: r8152: don't resubmit rx immediately to avoid soft lockup on
     device unplug

   - i40e: fix endless loop under rtnl if FW fails to correctly respond
     to capability query

   - mlx5e: fix rx checksum offload coexistence with ipsec offload

   - mlx5: force round second at 1PPS out start time and allow it only
     in supported clock modes

   - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
     sequence

  Misc:

   - xfrm: slightly rejig the new policy uAPI to make it less cryptic"

* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  net: prefer socket bound to interface when not in VRF
  iavf: fix double unlock of crit_lock
  i40e: Fix freeing of uninitialized misc IRQ vector
  i40e: fix endless loop under rtnl
  dt-bindings: net: dsa: marvell: fix compatible in example
  ionic: move filter sync_needed bit set
  gve: report 64bit tx_bytes counter from gve_handle_report_stats()
  gve: fix gve_get_stats()
  rtnetlink: fix if_nlmsg_stats_size() under estimation
  gve: Properly handle errors in gve_assign_qpl
  gve: Avoid freeing NULL pointer
  gve: Correct available tx qpl check
  unix: Fix an issue in unix_shutdown causing the other end read/write failures
  net: stmmac: trigger PCS EEE to turn off on link down
  net: pcs: xpcs: fix incorrect steps on disable EEE
  netlink: annotate data races around nlk->bound
  net: pcs: xpcs: fix incorrect CL37 AN sequence
  net: sfp: Fix typo in state machine debug string
  net/sched: sch_taprio: properly cancel timer from taprio_destroy()
  net: bridge: fix under estimation in br_get_linkxstats_size()
  ...
2021-10-07 09:50:31 -07:00
Jakub Kicinski
7671b026bb Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-10-07

We've added 7 non-merge commits during the last 8 day(s) which contain
a total of 8 files changed, 38 insertions(+), 21 deletions(-).

The main changes are:

1) Fix ARM BPF JIT to preserve caller-saved regs for DIV/MOD JIT-internal
   helper call, from Johan Almbladh.

2) Fix integer overflow in BPF stack map element size calculation when
   used with preallocation, from Tatsuhiko Yasumatsu.

3) Fix an AF_UNIX regression due to added BPF sockmap support related
   to shutdown handling, from Jiang Wang.

4) Fix a segfault in libbpf when generating light skeletons from objects
   without BTF, from Kumar Kartikeya Dwivedi.

5) Fix a libbpf memory leak in strset to free the actual struct strset
   itself, from Andrii Nakryiko.

6) Dual-license bpf_insn.h similarly as we did for libbpf and bpftool,
   with ACKs from all contributors, from Luca Boccassi.
====================

Link: https://lore.kernel.org/r/20211007135010.21143-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07 07:11:33 -07:00
Jackie Liu
424b650f35 tracing: Fix missing osnoise tracer on max_latency
The compiler warns when the data are actually unused:

  kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function]
   1712 | static void trace_create_maxlat_file(struct trace_array *tr,
        |             ^~~~~~~~~~~~~~~~~~~~~~~~

[Why]
CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y
gcc report warns.

[How]
Now trace_create_maxlat_file will only take effect when
CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after
adding osnoise trace, it also needs to take effect.

Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/
Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev

Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-07 09:41:47 -04:00
Linus Torvalds
777feabaea - Tell the compiler to always inline is_percpu_thread()
- Make sure tunable_scaling buffer is null-terminated after an update in sysfs
 
 - Fix LTP named regression due to cgroup list ordering
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFZhRgACgkQEsHwGGHe
 VUq0lRAAqazsUL+j8UL5URX28/gVNtQye25pkVFvg16VRGk0qqHRyBikA165InhE
 btTbQb/Bd/YTU2ykKeBt+Cw4Zl+kWnvplXu2gxreQQhCjukSqe6OlQEjC0midnL0
 haOjIFERoxY5dTnwa8JuFJpCiKT5ltnLmudiOmCrUAVoANCg5ra8Yl+z6WJwsnFL
 EOXMj7tSRHZpinMjLzoF7kbID43dil4OjailrM6iR6+7/JzJNaXPG/IuVfEDnZM7
 2+WY+qijveD1PhZMHz5tIkDjYmfOSn1uyfeIklUKM/9dpp82TdwA2yptJ+/IRmvj
 vjn9ZCfpzWqXn0bf4RIvcK49MQQkJBXrg4EqZOJFXgT/6tXqr5BxSdTKnIkRBs+V
 zOc+zF/Y9OH5NH0DlELFtn7k2HMOO8DLOVrbOb+TixNPxbUFrbCJR5i9T7yecuQJ
 s+B+5aBmOZclNYehx1Hv8A9bjixWvp+UkGixLj1F4i6RYONS11M15bM94Cwyh1K9
 koHcZtVt1iQGkpYtV5fQAvxiKtIrAwkX4oniZzWbYr8ORQ269T/l5oeNNdzFxo7y
 qWkHWWIAG+A3dMUdJGbc8c7cInc4EK8tUzD8ec23cF74UO16cBgl/6tTGazyUrdA
 oxNTCuWPtvqp49eef3y02QONkRARfmEBnFYCqfFrEvEEh9OUQPU=
 =Kj5B
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Tell the compiler to always inline is_percpu_thread()

 - Make sure tunable_scaling buffer is null-terminated after an update
   in sysfs

 - Fix LTP named regression due to cgroup list ordering

* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Always inline is_percpu_thread()
  sched/fair: Null terminate buffer when updating tunable_scaling
  sched/fair: Add ancestors of unthrottled undecayed cfs_rq
2021-10-03 10:49:00 -07:00
Linus Torvalds
3a399a2bc4 - Make sure the destroy callback is reset when a event initialization fails
- Update the event constraints for Icelake
 
 - Make sure the active time of an event is updated even for inactive events
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFZfZoACgkQEsHwGGHe
 VUqonA//SskDtPRTrbHoc1pDZEBOoYmuYZpsVIXMTcjxt8DkdGZcmgSRzJ53QdGT
 aOv+Oc7qIsOa7qcRPaUxlyjnqbWEynOmeHaIN2PtKebOKsXsvdf0ehrrOYjgoD5g
 1ENCisvV6TyyPtQPDv9QkYkGScZXg5/D8VyrlV5GsRj+MslWsfVWaJvUCHQtJcnz
 YPbcOFw1MHCBtUuJXtXuzA2JDp0Lfr6aX5Zl6hXqP8k69Y+Ob0bjA3jopo29l5rU
 lp/y4MQcFHuEAbQ7AdVeKg3Yi/RnSEI0w3HT1xDeKyXlMGp+Xz5rDBg1I+WREDri
 u7QuT9Mx6XSzGHGeg/Y5XbMN+M5WCQUxH5gJMLiduG0YYULWhq/j4f5tA0U0cQ80
 qVKRH8luh+0R+RLZtqno9pNg0pSdB47X8lqXOX0/DyAA7f6xPm17VBvyDeuF75AL
 vYuhdtu0vB5Iwsccj+h5cXGsaMWCzcriGABWZL/1JZFwRT8kRadiybJkv0n6R77F
 qDqUMu6cLM9kDhTeggDffoQZoxLiEL6pfeWFuWcaWdVXa8veX4lZZNnPFbqNhnJm
 zRm758r3jdHie1BAgFXym6Gt5EhkBhtofbUBaw3T2SdxFTs0csKzF5tzCXIvIyz2
 LH3Is3ce0pw+A72barhbLua2IJ2h895DGE529sQBiaSibB53tYk=
 =TQvw
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure the destroy callback is reset when a event initialization
   fails

 - Update the event constraints for Icelake

 - Make sure the active time of an event is updated even for inactive
   events

* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: fix userpage->time_enabled of inactive events
  perf/x86/intel: Update event constraints for ICX
  perf/x86: Reset destroy callback on event init failure
2021-10-03 10:32:27 -07:00
Jakub Kicinski
6b7b0c3091 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2021-10-02

We've added 85 non-merge commits during the last 15 day(s) which contain
a total of 132 files changed, 13779 insertions(+), 6724 deletions(-).

The main changes are:

1) Massive update on test_bpf.ko coverage for JITs as preparatory work for
   an upcoming MIPS eBPF JIT, from Johan Almbladh.

2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool,
   with driver support for i40e and ice from Magnus Karlsson.

3) Add legacy uprobe support to libbpf to complement recently merged legacy
   kprobe support, from Andrii Nakryiko.

4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky.

5) Support saving the register state in verifier when spilling <8byte bounded
   scalar to the stack, from Martin Lau.

6) Add libbpf opt-in for stricter BPF program section name handling as part
   of libbpf 1.0 effort, from Andrii Nakryiko.

7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov.

8) Fix skel_internal.h to propagate errno if the loader indicates an internal
   error, from Kumar Kartikeya Dwivedi.

9) Fix build warnings with -Wcast-function-type so that the option can later
   be enabled by default for the kernel, from Kees Cook.

10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it
    otherwise errors out when encountering them, from Toke Høiland-Jørgensen.

11) Teach libbpf to recognize specialized maps (such as for perf RB) and
    internally remove BTF type IDs when creating them, from Hengqi Chen.

12) Various fixes and improvements to BPF selftests.
====================

Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 19:58:02 -07:00
Mel Gorman
703066188f sched/fair: Null terminate buffer when updating tunable_scaling
This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.

Before:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 > /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1

After:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 > /sys/kernel/debug/sched/tunable_scaling
  $ cat /sys/kernel/debug/sched/tunable_scaling
  0
  $ echo 3 > /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument

Fixes: 8a99b6833c ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
2021-10-01 13:57:57 +02:00
Michal Koutný
2630cde267 sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Since commit a7b359fc6a ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8
("sched/fair: Ensure that the CFS parent is added after unthrottling").

I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.

Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.

Fixes: a7b359fc6a ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
2021-10-01 13:57:57 +02:00
Song Liu
f792565326 perf/core: fix userpage->time_enabled of inactive events
Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage->time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage->time_enabled. This can
be reproduced with something like:

   /* open 20 task perf_event "cycles", to create multiplexing */

   fd = perf_event_open();  /* open task perf_event "cycles" */
   userpage = mmap(fd);     /* use mmap and rdmpc */

   while (true) {
     time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
     time_enabled_read = read(fd).time_enabled;
     if (time_enabled_mmap > time_enabled_read)
         BUG();
   }

Fix this by updating userpage for inactive events in merge_sched_in.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-and-tested-by: Lucian Grijincu <lucian@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
2021-10-01 13:57:54 +02:00
Jakub Kicinski
dd9a887b35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/phy/bcm7xxx.c
  d88fd1b546 ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac6 ("net: prevent user from passing illegal stab size")
  69508d4333 ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 14:49:21 -07:00
Linus Torvalds
4de593fb96 Networking fixes for 5.15-rc4, including fixes from mac80211, netfilter
and bpf.
 
 Current release - regressions:
 
  - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
    interrupt
 
  - mdio: revert mechanical patches which broke handling of optional
    resources
 
  - dev_addr_list: prevent address duplication
 
 Previous releases - regressions:
 
  - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
    (NULL deref)
 
  - Revert "mac80211: do not use low data rates for data frames with no
    ack flag", fixing broadcast transmissions
 
  - mac80211: fix use-after-free in CCMP/GCMP RX
 
  - netfilter: include zone id in tuple hash again, minimize collisions
 
  - netfilter: nf_tables: unlink table before deleting it (race -> UAF)
 
  - netfilter: log: work around missing softdep backend module
 
  - mptcp: don't return sockets in foreign netns
 
  - sched: flower: protect fl_walk() with rcu (race -> UAF)
 
  - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup
 
  - smsc95xx: fix stalled rx after link change
 
  - enetc: fix the incorrect clearing of IF_MODE bits
 
  - ipv4: fix rtnexthop len when RTA_FLOW is present
 
  - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU
 
  - e100: fix length calculation & buffer overrun in ethtool::get_regs
 
 Previous releases - always broken:
 
  - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx
 
  - mac80211: drop frames from invalid MAC address in ad-hoc mode
 
  - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
    (race -> UAF)
 
  - bpf, x86: Fix bpf mapping of atomic fetch implementation
 
  - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
 
  - netfilter: ip6_tables: zero-initialize fragment offset
 
  - mhi: fix error path in mhi_net_newlink
 
  - af_unix: return errno instead of NULL in unix_create1() when
    over the fs.file-max limit
 
 Misc:
 
  - bpf: exempt CAP_BPF from checks against bpf_jit_limit
 
  - netfilter: conntrack: make max chain length random, prevent guessing
    buckets by attackers
 
  - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
    generic, defer conntrack walk to work queue (prevent hogging RTNL lock)
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFV5KYACgkQMUZtbf5S
 Irs3cRAAqNsgaQXSVOXcPKsndeDDKHIv1Ktes8aOP9tgEXZw5rpsbct7g9Yxc0os
 Oolyt6HThjWr3u1/e3HHJO9I5Klr/J8eQReoRZnKW+6TYZflmmzfuf8u1nx6SLP/
 tliz5y8wKbp8BqqNuTMdRpm+R1QQcNkXTeruUoR1PgREcY4J7bC2BRrqeZhBGHSR
 Z5yPOIietFN3nITxNwbe4AYJXlesMc6QCWhBtXjMPGQ4Zc4/sjDNfqi7eHJi2H2y
 kW2dHeXG86gnlgFllOBBWP85ptxynyxoNQJuhrxgC9T+/FpSVST7cwKbtmkwDI3M
 5WGmeE6B3yfF8iOQuR8fbKQmsnLgQlYhjpbbhgN0GxzkyI7RpGYOFroX0Pht4IVZ
 mwprDOtvoLs4UeDjULRMB0JZfRN75PCtVlhfUkhhJxXGCCmnhGYaxG/pE+6OQWlr
 +n8RXYYMoOzPaHIYTS9NGSGqT0r32IUy/W5Yfv3rEzSeehy2/fxzGr2fOyBGs+q7
 xrnqpsOnM8cODDwGMy3TclCI4Dd72WoHNCHPhA/bk/ZMjHpBd4CSEZPm8IROY3Ja
 g1t68cncgL8fB7TSD9WLFgYu67Lg5j0gC/BHOzUQDQMX5IbhOq/fj1xRy5Lc6SYp
 mqW1f7LdnixBe4W61VjDAYq5jJRqrwEWedx+rvV/ONLvr77KULk=
 =rSti
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from mac80211, netfilter and bpf.

  Current release - regressions:

   - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
     interrupt

   - mdio: revert mechanical patches which broke handling of optional
     resources

   - dev_addr_list: prevent address duplication

  Previous releases - regressions:

   - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
     (NULL deref)

   - Revert "mac80211: do not use low data rates for data frames with no
     ack flag", fixing broadcast transmissions

   - mac80211: fix use-after-free in CCMP/GCMP RX

   - netfilter: include zone id in tuple hash again, minimize collisions

   - netfilter: nf_tables: unlink table before deleting it (race -> UAF)

   - netfilter: log: work around missing softdep backend module

   - mptcp: don't return sockets in foreign netns

   - sched: flower: protect fl_walk() with rcu (race -> UAF)

   - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup

   - smsc95xx: fix stalled rx after link change

   - enetc: fix the incorrect clearing of IF_MODE bits

   - ipv4: fix rtnexthop len when RTA_FLOW is present

   - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
     SKU

   - e100: fix length calculation & buffer overrun in ethtool::get_regs

  Previous releases - always broken:

   - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx

   - mac80211: drop frames from invalid MAC address in ad-hoc mode

   - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
     -> UAF)

   - bpf, x86: Fix bpf mapping of atomic fetch implementation

   - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

   - netfilter: ip6_tables: zero-initialize fragment offset

   - mhi: fix error path in mhi_net_newlink

   - af_unix: return errno instead of NULL in unix_create1() when over
     the fs.file-max limit

  Misc:

   - bpf: exempt CAP_BPF from checks against bpf_jit_limit

   - netfilter: conntrack: make max chain length random, prevent
     guessing buckets by attackers

   - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
     generic, defer conntrack walk to work queue (prevent hogging RTNL
     lock)"

* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
  net: stmmac: fix EEE init issue when paired with EEE capable PHYs
  net: dev_addr_list: handle first address in __hw_addr_add_ex
  net: sched: flower: protect fl_walk() with rcu
  net: introduce and use lock_sock_fast_nested()
  net: phy: bcm7xxx: Fixed indirect MMD operations
  net: hns3: disable firmware compatible features when uninstall PF
  net: hns3: fix always enable rx vlan filter problem after selftest
  net: hns3: PF enable promisc for VF when mac table is overflow
  net: hns3: fix show wrong state when add existing uc mac address
  net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
  net: hns3: don't rollback when destroy mqprio fail
  net: hns3: remove tc enable checking
  net: hns3: do not allow call hns3_nic_net_open repeatedly
  ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
  net: bridge: mcast: Associate the seqcount with its protecting lock.
  net: mdio-ipq4019: Fix the error for an optional regs resource
  net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
  net: mdio: mscc-miim: Fix the mdio controller
  af_unix: Return errno instead of NULL in unix_create1().
  ...
2021-09-30 14:28:05 -07:00
Tatsuhiko Yasumatsu
30e29a9a2b bpf: Fix integer overflow in prealloc_elems_and_freelist()
In prealloc_elems_and_freelist(), the multiplication to calculate the
size passed to bpf_map_area_alloc() could lead to an integer overflow.
As a result, out-of-bounds write could occur in pcpu_freelist_populate()
as reported by KASAN:

[...]
[   16.968613] BUG: KASAN: slab-out-of-bounds in pcpu_freelist_populate+0xd9/0x100
[   16.969408] Write of size 8 at addr ffff888104fc6ea0 by task crash/78
[   16.970038]
[   16.970195] CPU: 0 PID: 78 Comm: crash Not tainted 5.15.0-rc2+ #1
[   16.970878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   16.972026] Call Trace:
[   16.972306]  dump_stack_lvl+0x34/0x44
[   16.972687]  print_address_description.constprop.0+0x21/0x140
[   16.973297]  ? pcpu_freelist_populate+0xd9/0x100
[   16.973777]  ? pcpu_freelist_populate+0xd9/0x100
[   16.974257]  kasan_report.cold+0x7f/0x11b
[   16.974681]  ? pcpu_freelist_populate+0xd9/0x100
[   16.975190]  pcpu_freelist_populate+0xd9/0x100
[   16.975669]  stack_map_alloc+0x209/0x2a0
[   16.976106]  __sys_bpf+0xd83/0x2ce0
[...]

The possibility of this overflow was originally discussed in [0], but
was overlooked.

Fix the integer overflow by changing elem_size to u64 from u32.

  [0] https://lore.kernel.org/bpf/728b238e-a481-eb50-98e9-b0f430ab01e7@gmail.com/

Fixes: 557c0c6e7d ("bpf: convert stackmap to pre-allocation")
Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210930135545.173698-1-th.yasumatsu@gmail.com
2021-09-30 16:17:23 +02:00
Kees Cook
102acbacfd bpf: Replace callers of BPF_CAST_CALL with proper function typedef
In order to keep ahead of cases in the kernel where Control Flow
Integrity (CFI) may trip over function call casts, enabling
-Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
various warnings and is one of the last places in the kernel
triggering this warning.

For actual function calls, replace BPF_CAST_CALL() with a typedef, which
captures the same details about the given function pointers.

This change results in no object code difference.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://github.com/KSPP/linux/issues/20
Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org
2021-09-28 16:27:18 -07:00
Kees Cook
3d717fad50 bpf: Replace "want address" users of BPF_CAST_CALL with BPF_CALL_IMM
In order to keep ahead of cases in the kernel where Control Flow
Integrity (CFI) may trip over function call casts, enabling
-Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
various warnings and is one of the last places in the kernel triggering
this warning.

Most places using BPF_CAST_CALL actually just want a void * to perform
math on. It's not actually performing a call, so just use a different
helper to get the void *, by way of the new BPF_CALL_IMM() helper, which
can clean up a common copy/paste idiom as well.

This change results in no object code difference.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://github.com/KSPP/linux/issues/20
Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210928230946.4062144-2-keescook@chromium.org
2021-09-28 16:27:18 -07:00
David S. Miller
4ccb9f03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-09-28

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 14 day(s) which contain
a total of 11 files changed, 139 insertions(+), 53 deletions(-).

The main changes are:

1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.

2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.

3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.

4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.

5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.

6) Fix return value handling for struct_ops BPF programs, from Hou Tao.

7) Various fixes to BPF selftests, from Jiri Benc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,
2021-09-28 13:52:46 +01:00
Arnd Bergmann
0d67e332e6 module: fix clang CFI with MODULE_UNLOAD=n
When CONFIG_MODULE_UNLOAD is disabled, the module->exit member
is not defined, causing a build failure:

kernel/module.c:4493:8: error: no member named 'exit' in 'struct module'
                mod->exit = *exit;

add an #ifdef block around this.

Fixes: cf68fffb66 ("add support for Clang CFI")
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2021-09-28 12:56:26 +02:00
Daniel Borkmann
78cc316e95 bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt
If cgroup_sk_alloc() is called from interrupt context, then just assign the
root cgroup to skcd->cgroup. Prior to commit 8520e224f5 ("bpf, cgroups:
Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path, and
iff indeed NULL returning the root cgroup (v ?: &cgrp_dfl_root.cgrp). Rather
than re-adding the NULL-test to the fast-path we can just assign it once from
cgroup_sk_alloc() given v1/v2 handling has been simplified. The migration from
NULL test with returning &cgrp_dfl_root.cgrp to assigning &cgrp_dfl_root.cgrp
directly does /not/ change behavior for callers of sock_cgroup_ptr().

syzkaller was able to trigger a splat in the legacy netrom code base, where
the RX handler in nr_rx_frame() calls nr_make_new() which calls sk_alloc()
and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL
skcd->cgroup, where it trips over on cgroup_sk_free() side given it expects
a non-NULL object. There are a few other candidates aside from netrom which
have similar pattern where in their accept-like implementation, they just call
to sk_alloc() and thus cgroup_sk_alloc() instead of sk_clone_lock() with the
corresponding cgroup_sk_clone() which then inherits the cgroup from the parent
socket. None of them are related to core protocols where BPF cgroup programs
are running from. However, in future, they should follow to implement a similar
inheritance mechanism.

Additionally, with a !CONFIG_CGROUP_NET_PRIO and !CONFIG_CGROUP_NET_CLASSID
configuration, the same issue was exposed also prior to 8520e224f5 due to
commit e876ecc67d ("cgroup: memcg: net: do not associate sock with unrelated
cgroup") which added the early in_interrupt() return back then.

Fixes: 8520e224f5 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
Fixes: e876ecc67d ("cgroup: memcg: net: do not associate sock with unrelated cgroup")
Reported-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
Reported-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
Tested-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20210927123921.21535-1-daniel@iogearbox.net
2021-09-28 09:29:19 +02:00
Lorenz Bauer
8a98ae12fb bpf: Exempt CAP_BPF from checks against bpf_jit_limit
When introducing CAP_BPF, bpf_jit_charge_modmem() was not changed to treat
programs with CAP_BPF as privileged for the purpose of JIT memory allocation.
This means that a program without CAP_BPF can block a program with CAP_BPF
from loading a program.

Fix this by checking bpf_capable() in bpf_jit_charge_modmem().

Fixes: 2c78ee898d ("bpf: Implement CAP_BPF")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922111153.19843-1-lmb@cloudflare.com
2021-09-28 09:28:37 +02:00
Martin KaFai Lau
354e8f1970 bpf: Support <8-byte scalar spill and refill
The verifier currently does not save the reg state when
spilling <8byte bounded scalar to the stack.  The bpf program
will be incorrectly rejected when this scalar is refilled to
the reg and then used to offset into a packet header.
The later patch has a simplified bpf prog from a real use case
to demonstrate this case.  The current work around is
to reparse the packet again such that this offset scalar
is close to where the packet data will be accessed to
avoid the spill.  Thus, the header is parsed twice.

The llvm patch [1] will align the <8bytes spill to
the 8-byte stack address.  This can simplify the verifier
support by avoiding to store multiple reg states for
each 8 byte stack slot.

This patch changes the verifier to save the reg state when
spilling <8bytes scalar to the stack.  This reg state saving
is limited to spill aligned to the 8-byte stack address.
The current refill logic has already called coerce_reg_to_size(),
so coerce_reg_to_size() is not called on state->stack[spi].spilled_ptr
during spill.

When refilling in check_stack_read_fixed_off(),  it checks
the refill size is the same as the number of bytes marked with
STACK_SPILL before restoring the reg state.  When restoring
the reg state to state->regs[dst_regno], it needs
to avoid the state->regs[dst_regno].subreg_def being
over written because it has been marked by the check_reg_arg()
earlier [check_mem_access() is called after check_reg_arg() in
do_check()].  Reordering check_mem_access() and check_reg_arg()
will need a lot of changes in test_verifier's tests because
of the difference in verifier's error message.  Thus, the
patch here is to save the state->regs[dst_regno].subreg_def
first in check_stack_read_fixed_off().

There are cases that the verifier needs to scrub the spilled slot
from STACK_SPILL to STACK_MISC.  After this patch the spill is not always
in 8 bytes now, so it can no longer assume the other 7 bytes are always
marked as STACK_SPILL.  In particular, the scrub needs to avoid marking
an uninitialized byte from STACK_INVALID to STACK_MISC.  Otherwise, the
verifier will incorrectly accept bpf program reading uninitialized bytes
from the stack.  A new helper scrub_spilled_slot() is created for this
purpose.

[1]: https://reviews.llvm.org/D109073

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004941.625398-1-kafai@fb.com
2021-09-26 13:07:27 -07:00
Martin KaFai Lau
27113c59b6 bpf: Check the other end of slot_type for STACK_SPILL
Every 8 bytes of the stack is tracked by a bpf_stack_state.
Within each bpf_stack_state, there is a 'u8 slot_type[8]' to track
the type of each byte.  Verifier tests slot_type[0] == STACK_SPILL
to decide if the spilled reg state is saved.  Verifier currently only
saves the reg state if the whole 8 bytes are spilled to the stack,
so checking the slot_type[7] is the same as checking slot_type[0].

The later patch will allow verifier to save the bounded scalar
reg also for <8 bytes spill.  There is a llvm patch [1] to ensure
the <8 bytes spill will be 8-byte aligned,  so checking
slot_type[7] instead of slot_type[0] is required.

While at it, this patch refactors the slot_type[0] == STACK_SPILL
test into a new function is_spilled_reg() and change the
slot_type[0] check to slot_type[7] check in there also.

[1] https://reviews.llvm.org/D109073

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004934.624194-1-kafai@fb.com
2021-09-26 13:07:27 -07:00
Linus Torvalds
3a398acc56 A single fix for the recently introduced regression in posix CPU timers
which failed to stop the timer when requested. That caused unexpected
 signals to be sent to the process/thread causing malfunction.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmFQPvkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodUWEACIRQpIAGQBstcsW+Zhi0CgH+uQ+yvo
 r5X7pekETGynvr4xUNqnQe9ajWh81xGoOX59xvQws0ZS/EGD03DNpThDfGGeojZw
 bH97FoXOUVrDa3AZIJ7/lf056h5HbyatUr0rYyCt43rJ+6hCkJtrRMixhm2i2Aun
 ph+d7fyPP0wqHxgYVVUIlOzZbHptGz6rwdKM4SgzZXCH2CAFL2hoF+R4WpFjkiVm
 HgN0MrZMldKgnRtBqiD1ZAupJB8smTRV443tqu0Qcn1bRNSa8mhEHeMCKkZGE7VY
 oBaF6cifqsAY6nHVgeOAo7tV8mTEwmP65igQA4Ff/Ad7cXIKuEVTQH4ytX4ZMQeE
 C6sPd9awpsQ4eLJYYU9njAf8U99FenacI2qZ5dM+Fm8xsZUIFmL6Kh8144C0J91N
 g7c0u4xITV5zg17ZQSLC+7v8R1kvSrP12h6HURTVfSpK5nzVvEgK4LZgEOyJP8lR
 kVbVHXVmbdWeBPW1ce2wNqWAjFg1MzVJ4eFVxXeOOkK0607TxyGriGMzI7k8eHce
 EpabYusTw+1xX6bxvSmXyJKDGwi/caVpZWZ5I0COVOoSjuG+OzWsN7tCghzdD03P
 Lx15+wiT5BbFvU90OaoRQoJPkOXkPRec0JvZTMw9Rm6khXQ6SpEBhi7YBOD4jBh8
 dGiIHLBNNCwLLQ==
 =7QjG
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the recently introduced regression in posix CPU
  timers which failed to stop the timer when requested. That caused
  unexpected signals to be sent to the process/thread causing
  malfunction"

* tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Prevent spuriously armed 0-value itimer
2021-09-26 10:00:16 -07:00
Linus Torvalds
dc0f97c261 A set of fixes for interrupt chip drivers:
- Work around a bad GIC integration on a Renesas platform which can't
    handle byte-sized MMIO access
 
  - Plug a potential memory leak in the GICv4 driver
 
  - Fix a regression in the Armada 370-XP IPI code which was caused by
    issuing EOI instack of ACK.
 
  - A couple of small fixes here and there
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmFQPksTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQ8OD/sEW4qSg+c78Awn2oURyYi7iB4YLbVi
 YlVxgTGPyKEo6W/VC6YG/WtC02i8Jo44VurrgRTQ2f8HGwpbxrZYhaCHfh7gPeTV
 HoSYNQN5OIArtZYefctPpndJXMcUDFbJwEK0TN9G7ZOL+Rb0CB5gLIxe5BumWWUH
 j9yciIdXtklnhNnEDPiZnT3dUPIAYNdbl8mrr11kO0Ifr5vEAHh7qEE0xzyspgO0
 pYACv7DoeyqR97XXgjn/GD7HFKCIFoZbfeT2FEAEK6uEp4bCYTfo9XPS6YNFoAA9
 ywrSf7Daf9IoU7NhA88iUNBnEspCkgaQB+iJZUQvcZSaSiSns8IqiQSIJMVPjsfw
 IQ+0i9mbYv2XvI27K4nJmJTCjiHdbV3xFGj4Nh8jEg94/SwD4MSpv7kRsbuQfYYo
 EEJwUpDsPDfDIyCVCAm59rKJdo6BSeTRlHjBbLhn5du3jyy02qRx7C2l67HRQxzz
 PuySfSk89wURrqljvkOF9ec0sTyNslUysS+K3doRhAZq4L9nGGfckTbZkYW6h3H3
 oJPK9HKWEBmcrXyLGooyu4DeWn0ZPSML/uRKuJEqYRSisRj2eixJOPhoQREGvIb5
 En2xiw03STVtXfCrAKwkWcyrIGpyhBHWDyhvksaXdrb4aQKaq/9f4uVB95nzZHJD
 wZ9JlMls9zjSrw==
 =QSr2
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Work around a bad GIC integration on a Renesas platform which can't
     handle byte-sized MMIO access

   - Plug a potential memory leak in the GICv4 driver

   - Fix a regression in the Armada 370-XP IPI code which was caused by
     issuing EOI instack of ACK.

   - A couple of small fixes here and there"

* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic: Work around broken Renesas integration
  irqchip/renesas-rza1: Use semicolons instead of commas
  irqchip/gic-v3-its: Fix potential VPE leak on error
  irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
  irqchip/mbigen: Repair non-kernel-doc notation
  irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
  irqchip/armada-370-xp: Fix ack/eoi breakage
  Documentation: Fix irq-domain.rst build warning
2021-09-26 09:55:22 -07:00
Linus Torvalds
2d70de4ee5 block-5.15-2021-09-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFPKxoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpo5YEACLm8JHNZKC/PGY4dn3QN6VKbUCqNMkJSXs
 GPOkpcZQsA0G9CiDp00HdarB/besjG2S5OAzuYz+YkUN9dmTPYHrVdIW+4NMkLGQ
 axIL0778wIFz2rO1donpCwHJd09YuKL5uEnbpaBdLEOPf0xPh387UjYk31KAZSzQ
 dD1dgHUuObDYgS2vL4J0jgoRn5fAChB5hRCPywh7+4XfU3ZprjVrW4lRMOGQvqNH
 KwNR+ZMjRtaydHISHH4u/VlS4vdt1XP/Syu9qWhjPizWR/GmUyvZUd5PIa/bE0pQ
 RmTkH6tFy/W76VUhudlgbYQe2Y28WgHafFkFciAyQxJAS90dSvHpxlLhu3rglab0
 4xb4mlhLQmdmdPnt9hF/kl1pXCqnddeL96d7wwqrlWGwv+e3ls/Zn1Q7pgQX8X0A
 Q2Fofqq+PxxyyX4PguugpTLSd3Rti2O2KqpMWPUSb4DZsZiHBmWHvLXqwfPuWv/W
 OQM2vgoeNQa3QJAGghEf1R1fRZrCXsltqcGWbbSpTLNerSByrRDahfZalacM1m7Y
 pcCcuxswCa1XP42x6q/gCm2HMlY5H+kfufhBQAscZHRTY6bAoXJciUiTgHIWEiIf
 MBBjvEeMuksurGk0SGFrtrAsFqyETNo53KLtOvzX4frt0vUHy8851oxBrBYzvfs/
 aVzdvZN0HA==
 =ZbMW
 -----END PGP SIGNATURE-----

Merge tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - keep ctrl->namespaces ordered (Christoph Hellwig)
      - fix incorrect h2cdata pdu offset accounting in nvme-tcp (Sagi
        Grimberg)
      - handled updated hw_queues in nvme-fc more carefully (Daniel
        Wagner, James Smart)

 - md lock order fix (Christoph)

 - fallocate locking fix (Ming)

 - blktrace UAF fix (Zhihao)

 - rq-qos bio tracking fix (Ming)

* tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
  block: hold ->invalidate_lock in blkdev_fallocate
  blktrace: Fix uaf in blk_trace access after removing by sysfs
  block: don't call rq_qos_ops->done_bio if the bio isn't tracked
  md: fix a lock order reversal in md_alloc
  nvme: keep ctrl->namespaces ordered
  nvme-tcp: fix incorrect h2cdata pdu offset accounting
  nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
  nvme-fc: avoid race between time out and tear down
  nvme-fc: update hardware queues before using them
2021-09-25 15:44:05 -07:00
Zhihao Cheng
5afedf670c blktrace: Fix uaf in blk_trace access after removing by sysfs
There is an use-after-free problem triggered by following process:

      P1(sda)				P2(sdb)
			echo 0 > /sys/block/sdb/trace/enable
			  blk_trace_remove_queue
			    synchronize_rcu
			    blk_trace_free
			      relay_close
rcu_read_lock
__blk_add_trace
  trace_note_tsk
  (Iterate running_trace_list)
			        relay_close_buf
				  relay_destroy_buf
				    kfree(buf)
    trace_note(sdb's bt)
      relay_reserve
        buf->offset <- nullptr deference (use-after-free) !!!
rcu_read_unlock

[  502.714379] BUG: kernel NULL pointer dereference, address:
0000000000000010
[  502.715260] #PF: supervisor read access in kernel mode
[  502.715903] #PF: error_code(0x0000) - not-present page
[  502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0
[  502.717252] Oops: 0000 [#1] SMP
[  502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
[  502.732872] Call Trace:
[  502.733193]  __blk_add_trace.cold+0x137/0x1a3
[  502.733734]  blk_add_trace_rq+0x7b/0xd0
[  502.734207]  blk_add_trace_rq_issue+0x54/0xa0
[  502.734755]  blk_mq_start_request+0xde/0x1b0
[  502.735287]  scsi_queue_rq+0x528/0x1140
...
[  502.742704]  sg_new_write.isra.0+0x16e/0x3e0
[  502.747501]  sg_ioctl+0x466/0x1100

Reproduce method:
  ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
  ioctl(/dev/sda, BLKTRACESTART)
  ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
  ioctl(/dev/sdb, BLKTRACESTART)

  echo 0 > /sys/block/sdb/trace/enable &
  // Add delay(mdelay/msleep) before kernel enters blk_trace_free()

  ioctl$SG_IO(/dev/sda, SG_IO, ...)
  // Enters trace_note_tsk() after blk_trace_free() returned
  // Use mdelay in rcu region rather than msleep(which may schedule out)

Remove blk_trace from running_list before calling blk_trace_free() by
sysfs if blk_trace is at Blktrace_running state.

Fixes: c71a896154 ("blktrace: add ftrace plugin")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 11:06:15 -06:00
Thomas Gleixner
f9bfed3ad5 irqchip fixes for 5.15, take #1
- Work around a bad GIC integration on a Renesas platform, where the
   interconnect cannot deal with byte-sized MMIO accesses
 
 - Cleanup another Renesas driver abusing the comma operator
 
 - Fix a potential GICv4 memory leak on an error path
 
 - Make the type of 'size' consistent with the rest of the code in
   __irq_domain_add()
 
 - Fix a regression in the Armada 370-XP IPI path
 
 - Fix the build for the obviously unloved goldfish-pic
 
 - Some documentation fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmFNk10PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD59kP/A4Al80ndT4GhIlj1b+LolpBctOl3OxNpoYm
 uCsf/LjmNjEQ62F3wd0lMe/qgioU+MKssA94/4pp9IkySNSxToHpaD5WwScaGKP4
 twATEs3cdAmrvE8YTiq+bjuX8mJ7toqhwRWjc2ZTlql4l3DbHzMoeywwnULza/A8
 ZGLJZ4SdvBQPUnMtEXJa9jHwtxRd0irinUApO5XpfRMiGAfCaCD2XfOMVmeBX3TP
 OFtpsxSluIURaAhEBsr60saagqftGrCABr8m19zGynutnosbVvDYq4HUIlIYxeRm
 7BWOskyGw1CZ9beylIO7v2Vp5pNx5KR4t/5wL7+tZXhY7VrgPPQjFf1CbJwB8NUz
 p8ad7n9yHJvzc90mzgqZfuAr7GBZt5wFXj1vKw5hDxlTDo4LfaMD+2Qkp2KOESqi
 ejX3vdrVgLCadzgDqpkjBRpsqjjG+1x+rjji4dpaADEUYxUoyX5lYObiImOznTeS
 9NitgJe5aGFOo0y7DOFYNSc4e2ODfGxTwVl4NTwd4NGVJ+CeBYHlow1B8+5NfoKo
 rqMgo6dgyKjfwyN6YxVo6RDvDe+e/xTKk7s1kaVzYVgQeDh5GeMd9SJ0xms3Dbhe
 pjZLsAmnnIOoHWqcvQOOFJPkhqQBpuY8Gbtw0X3JVrj/C/6HoAAS0FyqhYw53dSC
 gVC3Im4R
 =Fn7y
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-fixes-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

 - Work around a bad GIC integration on a Renesas platform, where the
   interconnect cannot deal with byte-sized MMIO accesses

 - Cleanup another Renesas driver abusing the comma operator

 - Fix a potential GICv4 memory leak on an error path

 - Make the type of 'size' consistent with the rest of the code in
   __irq_domain_add()

 - Fix a regression in the Armada 370-XP IPI path

 - Fix the build for the obviously unloved goldfish-pic

 - Some documentation fixes

Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
2021-09-24 14:11:04 +02:00
Jakub Kicinski
2fcd14d0f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/mptcp/protocol.c
  977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
  efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")

same patch merged in both trees, keep net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23 11:19:49 -07:00