From ef0324b6415db6742bd632dc0dfbb8fbc111473b Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Sat, 26 Mar 2022 20:40:28 +0100 Subject: [PATCH 001/121] ARM: dts: lan966x: fix sys_clk frequency The sys_clk frequency is 165.625MHz. The register reference of the Generic Clock controller lists the CPU clock as 600MHz, the DDR clock as 300MHz and the SYS clock as 162.5MHz. This is wrong. It was first noticed during the fan driver development and it was measured and verified via the CLK_MON output of the SoC which can be configured to output sys_clk/64. The core PLL settings (which drives the SYS clock) seems to be as follows: DIVF = 52 DIVQ = 3 DIVR = 1 With a refernce clock of 25MHz, this means we have a post divider clock Fpfd = Fref / (DIVR + 1) = 25MHz / (1 + 1) = 12.5MHz The resulting VCO frequency is then Fvco = Fpfd * (DIVF + 1) * 2 = 12.5MHz * (52 + 1) * 2 = 1325MHz And the output frequency is Fout = Fvco / 2^DIVQ = 1325MHz / 2^3 = 165.625Mhz This all adds up to the constrains of the PLL: 10MHz <= Fpfd <= 200MHz 20MHz <= Fout <= 1000MHz 1000MHz <= Fvco <= 2000MHz Fixes: 290deaa10c50 ("ARM: dts: add DT for lan966 SoC and 2-port board pcb8291") Signed-off-by: Michael Walle Reviewed-by: Kavyasree Kotagiri Signed-off-by: Claudiu Beznea Link: https://lore.kernel.org/r/20220326194028.2945985-1-michael@walle.cc --- arch/arm/boot/dts/lan966x.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/lan966x.dtsi b/arch/arm/boot/dts/lan966x.dtsi index 3cb02fffe716..38e90a31d2dd 100644 --- a/arch/arm/boot/dts/lan966x.dtsi +++ b/arch/arm/boot/dts/lan966x.dtsi @@ -38,7 +38,7 @@ clocks { sys_clk: sys_clk { compatible = "fixed-clock"; #clock-cells = <0>; - clock-frequency = <162500000>; + clock-frequency = <165625000>; }; cpu_clk: cpu_clk { From be640317a1d0b9cf42fedb2debc2887a7cfa38de Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Mon, 18 Jul 2022 23:44:18 +1000 Subject: [PATCH 002/121] powerpc/64s: Disable stack variable initialisation for prom_init With GCC 12 allmodconfig prom_init fails to build: Error: External symbol 'memset' referenced from prom_init.c make[2]: *** [arch/powerpc/kernel/Makefile:204: arch/powerpc/kernel/prom_init_check] Error 1 The allmodconfig build enables KASAN, so all calls to memset in prom_init should be converted to __memset by the #ifdefs in asm/string.h, because prom_init must use the non-KASAN instrumented versions. The build failure happens because there's a call to memset that hasn't been caught by the pre-processor and converted to __memset. Typically that's because it's a memset generated by the compiler itself, and that is the case here. With GCC 12, allmodconfig enables CONFIG_INIT_STACK_ALL_PATTERN, which causes the compiler to emit memset calls to initialise on-stack variables with a pattern. Because prom_init is non-user-facing boot-time only code, as a workaround just disable stack variable initialisation to unbreak the build. Reported-by: Sudip Mukherjee Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220718134418.354114-1-mpe@ellerman.id.au --- arch/powerpc/kernel/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index f91f0f29a566..c8cf924bf9c0 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -20,6 +20,7 @@ CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom_init.o += -fno-stack-protector CFLAGS_prom_init.o += -DDISABLE_BRANCH_PROFILING CFLAGS_prom_init.o += -ffreestanding +CFLAGS_prom_init.o += $(call cc-option, -ftrivial-auto-var-init=uninitialized) ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code From 7849f5cf7639cd1125a3546a31675af4ab54278f Mon Sep 17 00:00:00 2001 From: Baolin Wang Date: Wed, 20 Jul 2022 15:03:58 +0800 Subject: [PATCH 003/121] mailmap: update Baolin Wang's email I recently switched to my Alibaba email address. So add aliases for my previous email addresses. Signed-off-by: Baolin Wang Signed-off-by: Arnd Bergmann --- .mailmap | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.mailmap b/.mailmap index 13e4f504e17f..a5c395dbb507 100644 --- a/.mailmap +++ b/.mailmap @@ -60,6 +60,10 @@ Arnd Bergmann Atish Patra Axel Dyks Axel Lin +Baolin Wang +Baolin Wang +Baolin Wang +Baolin Wang Bart Van Assche Bart Van Assche Ben Gardner From 9b31e60800d8fa69027baf9ec7f03a0c5b145079 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Fri, 15 Jul 2022 11:55:49 -0700 Subject: [PATCH 004/121] tools: Fixed MIPS builds due to struct flock re-definition Building perf for MIPS failed after 9f79b8b72339 ("uapi: simplify __ARCH_FLOCK{,64}_PAD a little") with the following error: CC /home/fainelli/work/buildroot/output/bmips/build/linux-custom/tools/perf/trace/beauty/fcntl.o In file included from ../../../../host/mipsel-buildroot-linux-gnu/sysroot/usr/include/asm/fcntl.h:77, from ../include/uapi/linux/fcntl.h:5, from trace/beauty/fcntl.c:10: ../include/uapi/asm-generic/fcntl.h:188:8: error: redefinition of 'struct flock' struct flock { ^~~~~ In file included from ../include/uapi/linux/fcntl.h:5, from trace/beauty/fcntl.c:10: ../../../../host/mipsel-buildroot-linux-gnu/sysroot/usr/include/asm/fcntl.h:63:8: note: originally defined here struct flock { ^~~~~ This is due to the local copy under tools/include/uapi/asm-generic/fcntl.h including the toolchain's kernel headers which already define 'struct flock' and define HAVE_ARCH_STRUCT_FLOCK to future inclusions make a decision as to whether re-defining 'struct flock' is appropriate or not. Make sure what do not re-define 'struct flock' when HAVE_ARCH_STRUCT_FLOCK is already defined. Fixes: 9f79b8b72339 ("uapi: simplify __ARCH_FLOCK{,64}_PAD a little") Signed-off-by: Florian Fainelli Reviewed-by: Christoph Hellwig [arnd: sync with include/uapi/asm-generic/fcntl.h as well] Signed-off-by: Arnd Bergmann --- include/uapi/asm-generic/fcntl.h | 2 ++ tools/include/uapi/asm-generic/fcntl.h | 11 ++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index f13d37b60775..1ecdb911add8 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -192,6 +192,7 @@ struct f_owner_ex { #define F_LINUX_SPECIFIC_BASE 1024 +#ifndef HAVE_ARCH_STRUCT_FLOCK struct flock { short l_type; short l_whence; @@ -216,5 +217,6 @@ struct flock64 { __ARCH_FLOCK64_PAD #endif }; +#endif /* HAVE_ARCH_STRUCT_FLOCK */ #endif /* _ASM_GENERIC_FCNTL_H */ diff --git a/tools/include/uapi/asm-generic/fcntl.h b/tools/include/uapi/asm-generic/fcntl.h index 0197042b7dfb..1ecdb911add8 100644 --- a/tools/include/uapi/asm-generic/fcntl.h +++ b/tools/include/uapi/asm-generic/fcntl.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ #ifndef _ASM_GENERIC_FCNTL_H #define _ASM_GENERIC_FCNTL_H @@ -90,7 +91,7 @@ /* a horrid kludge trying to make sure that this will fail on old kernels */ #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) -#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) +#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) #ifndef O_NDELAY #define O_NDELAY O_NONBLOCK @@ -115,11 +116,13 @@ #define F_GETSIG 11 /* for sockets. */ #endif +#if __BITS_PER_LONG == 32 || defined(__KERNEL__) #ifndef F_GETLK64 #define F_GETLK64 12 /* using 'struct flock64' */ #define F_SETLK64 13 #define F_SETLKW64 14 #endif +#endif /* __BITS_PER_LONG == 32 || defined(__KERNEL__) */ #ifndef F_SETOWN_EX #define F_SETOWN_EX 15 @@ -178,6 +181,10 @@ struct f_owner_ex { blocking */ #define LOCK_UN 8 /* remove lock */ +/* + * LOCK_MAND support has been removed from the kernel. We leave the symbols + * here to not break legacy builds, but these should not be used in new code. + */ #define LOCK_MAND 32 /* This is a mandatory flock ... */ #define LOCK_READ 64 /* which allows concurrent read operations */ #define LOCK_WRITE 128 /* which allows concurrent write operations */ @@ -185,6 +192,7 @@ struct f_owner_ex { #define F_LINUX_SPECIFIC_BASE 1024 +#ifndef HAVE_ARCH_STRUCT_FLOCK struct flock { short l_type; short l_whence; @@ -209,5 +217,6 @@ struct flock64 { __ARCH_FLOCK64_PAD #endif }; +#endif /* HAVE_ARCH_STRUCT_FLOCK */ #endif /* _ASM_GENERIC_FCNTL_H */ From 27161db0904ee48e59140aa8d0835939a666c1f1 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 20 Jul 2022 14:20:57 +0300 Subject: [PATCH 005/121] net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii While phylink_pcs_ops :: pcs_get_state does return void, xpcs_get_state() does check for a non-zero return code from xpcs_get_state_c37_sgmii() and prints that as a message to the kernel log. However, a non-zero return code from xpcs_read() is translated into "return false" (i.e. zero as int) and the I/O error is therefore not printed. Fix that. Fixes: b97b5331b8ab ("net: pcs: add C37 SGMII AN support for intel mGbE controller") Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20220720112057.3504398-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/pcs/pcs-xpcs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/pcs/pcs-xpcs.c b/drivers/net/pcs/pcs-xpcs.c index 4cfd05c15aee..d25fbb9caeba 100644 --- a/drivers/net/pcs/pcs-xpcs.c +++ b/drivers/net/pcs/pcs-xpcs.c @@ -896,7 +896,7 @@ static int xpcs_get_state_c37_sgmii(struct dw_xpcs *xpcs, */ ret = xpcs_read(xpcs, MDIO_MMD_VEND2, DW_VR_MII_AN_INTR_STS); if (ret < 0) - return false; + return ret; if (ret & DW_VR_MII_C37_ANSGM_SP_LNKSTS) { int speed_value; From ebbbe23fdf6070e31509638df3321688358cc211 Mon Sep 17 00:00:00 2001 From: Liang He Date: Wed, 20 Jul 2022 21:10:03 +0800 Subject: [PATCH 006/121] net: sungem_phy: Add of_node_put() for reference returned by of_get_parent() In bcm5421_init(), we should call of_node_put() for the reference returned by of_get_parent() which has increased the refcount. Fixes: 3c326fe9cb7a ("[PATCH] ppc64: Add new PHY to sungem") Signed-off-by: Liang He Link: https://lore.kernel.org/r/20220720131003.1287426-1-windhl@126.com Signed-off-by: Jakub Kicinski --- drivers/net/sungem_phy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/sungem_phy.c b/drivers/net/sungem_phy.c index ff22b6b1c686..36803d932dff 100644 --- a/drivers/net/sungem_phy.c +++ b/drivers/net/sungem_phy.c @@ -450,6 +450,7 @@ static int bcm5421_init(struct mii_phy* phy) int can_low_power = 1; if (np == NULL || of_get_property(np, "no-autolowpower", NULL)) can_low_power = 0; + of_node_put(np); if (can_low_power) { /* Enable automatic low-power */ sungem_phy_write(phy, 0x1c, 0x9002); From 58ebb1c8b35a8ef38cd6927431e0fa7b173a632d Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:12 -0700 Subject: [PATCH 007/121] tcp: Fix data-races around sysctl_tcp_dsack. While reading sysctl_tcp_dsack, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 07dbcbae7782..6fdad9505396 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4426,7 +4426,7 @@ static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq) { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { int mib_idx; if (before(seq, tp->rcv_nxt)) @@ -4473,7 +4473,7 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb) NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); - if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { u32 end_seq = TCP_SKB_CB(skb)->end_seq; tcp_rcv_spurious_retrans(sk, skb); From 02ca527ac5581cf56749db9fd03d854e842253dd Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:13 -0700 Subject: [PATCH 008/121] tcp: Fix a data-race around sysctl_tcp_app_win. While reading sysctl_tcp_app_win, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6fdad9505396..8cf4fbd349ab 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -534,7 +534,7 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, */ static void tcp_init_buffer_space(struct sock *sk) { - int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win; + int tcp_app_win = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_app_win); struct tcp_sock *tp = tcp_sk(sk); int maxwin; From 36eeee75ef0157e42fb6593dcc65daab289b559e Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:14 -0700 Subject: [PATCH 009/121] tcp: Fix a data-race around sysctl_tcp_adv_win_scale. While reading sysctl_tcp_adv_win_scale, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- include/net/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 071735e10872..78a64e1b33a7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1419,7 +1419,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, static inline int tcp_win_from_space(const struct sock *sk, int space) { - int tcp_adv_win_scale = sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale; + int tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale); return tcp_adv_win_scale <= 0 ? (space>>(-tcp_adv_win_scale)) : From 706c6202a3589f290e1ef9be0584a8f4a3cc0507 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:15 -0700 Subject: [PATCH 010/121] tcp: Fix a data-race around sysctl_tcp_frto. While reading sysctl_tcp_frto, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 8cf4fbd349ab..af376d7423d1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2175,7 +2175,7 @@ void tcp_enter_loss(struct sock *sk) * loss recovery is underway except recurring timeout(s) on * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing */ - tp->frto = net->ipv4.sysctl_tcp_frto && + tp->frto = READ_ONCE(net->ipv4.sysctl_tcp_frto) && (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; } From 8499a2454d9e8a55ce616ede9f9580f36fd5b0f3 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:16 -0700 Subject: [PATCH 011/121] tcp: Fix a data-race around sysctl_tcp_nometrics_save. While reading sysctl_tcp_nometrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index a501150deaa3..9dcc418a26f2 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -329,7 +329,7 @@ void tcp_update_metrics(struct sock *sk) int m; sk_dst_confirm(sk); - if (net->ipv4.sysctl_tcp_nometrics_save || !dst) + if (READ_ONCE(net->ipv4.sysctl_tcp_nometrics_save) || !dst) return; rcu_read_lock(); From ab1ba21b523ab496b1a4a8e396333b24b0a18f9a Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:17 -0700 Subject: [PATCH 012/121] tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save. While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 65e6d90168f3 ("net-tcp: Disable TCP ssthresh metrics cache by default") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_metrics.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 9dcc418a26f2..d58e672be31c 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -385,7 +385,7 @@ void tcp_update_metrics(struct sock *sk) if (tcp_in_initial_slowstart(tp)) { /* Slow start still did not finish. */ - if (!net->ipv4.sysctl_tcp_no_ssthresh_metrics_save && + if (!READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) && !tcp_metric_locked(tm, TCP_METRIC_SSTHRESH)) { val = tcp_metric_get(tm, TCP_METRIC_SSTHRESH); if (val && (tcp_snd_cwnd(tp) >> 1) > val) @@ -401,7 +401,7 @@ void tcp_update_metrics(struct sock *sk) } else if (!tcp_in_slow_start(tp) && icsk->icsk_ca_state == TCP_CA_Open) { /* Cong. avoidance phase, cwnd is reliable. */ - if (!net->ipv4.sysctl_tcp_no_ssthresh_metrics_save && + if (!READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) && !tcp_metric_locked(tm, TCP_METRIC_SSTHRESH)) tcp_metric_set(tm, TCP_METRIC_SSTHRESH, max(tcp_snd_cwnd(tp) >> 1, tp->snd_ssthresh)); @@ -418,7 +418,7 @@ void tcp_update_metrics(struct sock *sk) tcp_metric_set(tm, TCP_METRIC_CWND, (val + tp->snd_ssthresh) >> 1); } - if (!net->ipv4.sysctl_tcp_no_ssthresh_metrics_save && + if (!READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) && !tcp_metric_locked(tm, TCP_METRIC_SSTHRESH)) { val = tcp_metric_get(tm, TCP_METRIC_SSTHRESH); if (val && tp->snd_ssthresh > val) @@ -463,7 +463,7 @@ void tcp_init_metrics(struct sock *sk) if (tcp_metric_locked(tm, TCP_METRIC_CWND)) tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND); - val = net->ipv4.sysctl_tcp_no_ssthresh_metrics_save ? + val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); if (val) { tp->snd_ssthresh = val; From 780476488844e070580bfc9e3bc7832ec1cea883 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:18 -0700 Subject: [PATCH 013/121] tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf. While reading sysctl_tcp_moderate_rcvbuf, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- net/mptcp/protocol.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index af376d7423d1..debfff94f3af 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -724,7 +724,7 @@ void tcp_rcv_space_adjust(struct sock *sk) * */ - if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf && + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { int rcvmem, rcvbuf; u64 rcvwin, grow; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 21a3ed64226e..9bbd8cbe0acb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1908,7 +1908,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) if (msk->rcvq_space.copied <= msk->rcvq_space.space) goto new_measure; - if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf && + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { int rcvmem, rcvbuf; u64 rcvwin, grow; From 0f1e4d06591d0a7907c71f7b6d1c79f8a4de8098 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:19 -0700 Subject: [PATCH 014/121] tcp: Fix data-races around sysctl_tcp_workaround_signed_windows. While reading sysctl_tcp_workaround_signed_windows, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 15d99e02baba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_output.c | 4 ++-- net/mptcp/options.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c38e07b50639..88f7d51e6691 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -230,7 +230,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, * which we interpret as a sign the remote TCP is not * misinterpreting the window field as a signed quantity. */ - if (sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows) + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)) (*rcv_wnd) = min(space, MAX_TCP_WINDOW); else (*rcv_wnd) = min_t(u32, space, U16_MAX); @@ -285,7 +285,7 @@ static u16 tcp_select_window(struct sock *sk) * scaled window. */ if (!tp->rx_opt.rcv_wscale && - sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows) + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)) new_win = min(new_win, MAX_TCP_WINDOW); else new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale)); diff --git a/net/mptcp/options.c b/net/mptcp/options.c index bd8f0f425be4..30d289044e71 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1271,7 +1271,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) if (unlikely(th->syn)) new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale; if (!tp->rx_opt.rcv_wscale && - sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows) + READ_ONCE(sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)) new_win = min(new_win, MAX_TCP_WINDOW); else new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale)); From 9fb90193fbd66b4c5409ef729fd081861f8b6351 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:20 -0700 Subject: [PATCH 015/121] tcp: Fix a data-race around sysctl_tcp_limit_output_bytes. While reading sysctl_tcp_limit_output_bytes, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 88f7d51e6691..80c9bed73337 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2507,7 +2507,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift)); if (sk->sk_pacing_status == SK_PACING_NONE) limit = min_t(unsigned long, limit, - sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes)); limit <<= factor; if (static_branch_unlikely(&tcp_tx_delay_enabled) && From db3815a2fa691da145cfbe834584f31ad75df9ff Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:21 -0700 Subject: [PATCH 016/121] tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit. While reading sysctl_tcp_challenge_ack_limit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index debfff94f3af..d386a69b1c05 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3629,7 +3629,7 @@ static void tcp_send_challenge_ack(struct sock *sk) /* Then check host-wide RFC 5961 rate limit. */ now = jiffies / HZ; if (now != challenge_timestamp) { - u32 ack_limit = net->ipv4.sysctl_tcp_challenge_ack_limit; + u32 ack_limit = READ_ONCE(net->ipv4.sysctl_tcp_challenge_ack_limit); u32 half = (ack_limit + 1) >> 1; challenge_timestamp = now; From e0bb4ab9dfddd872622239f49fb2bd403b70853b Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:22 -0700 Subject: [PATCH 017/121] tcp: Fix a data-race around sysctl_tcp_min_tso_segs. While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 80c9bed73337..6e29cf391a64 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1995,7 +1995,7 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now) min_tso = ca_ops->min_tso_segs ? ca_ops->min_tso_segs(sk) : - sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs; + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); tso_segs = tcp_tso_autosize(sk, mss_now, min_tso); return min_t(u32, tso_segs, sk->sk_gso_max_segs); From 2455e61b85e9c99af38cd889a7101f1d48b33cb4 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:23 -0700 Subject: [PATCH 018/121] tcp: Fix a data-race around sysctl_tcp_tso_rtt_log. While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 6e29cf391a64..cf6713c9567e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1976,7 +1976,7 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, bytes = sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift); - r = tcp_min_rtt(tcp_sk(sk)) >> sock_net(sk)->ipv4.sysctl_tcp_tso_rtt_log; + r = tcp_min_rtt(tcp_sk(sk)) >> READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_rtt_log); if (r < BITS_PER_TYPE(sk->sk_gso_max_size)) bytes += sk->sk_gso_max_size >> r; From 1330ffacd05fc9ac4159d19286ce119e22450ed2 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:24 -0700 Subject: [PATCH 019/121] tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen. While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d386a69b1c05..96d11f8ab729 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3058,7 +3058,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag) { - u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ; + u32 wlen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen) * HZ; struct tcp_sock *tp = tcp_sk(sk); if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) { From 85225e6f0a76e6745bc841c9f25169c509b573d8 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:25 -0700 Subject: [PATCH 020/121] tcp: Fix a data-race around sysctl_tcp_autocorking. While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2faaaaf540ac..a11e5de3a4c3 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -686,7 +686,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb, int size_goal) { return skb->len < size_goal && - sock_net(sk)->ipv4.sysctl_tcp_autocorking && + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_autocorking) && !tcp_rtx_queue_empty(sk) && refcount_read(&sk->sk_wmem_alloc) > skb->truesize && tcp_skb_can_collapse_to(skb); From 2afdbe7b8de84c28e219073a6661080e1b3ded48 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Jul 2022 09:50:26 -0700 Subject: [PATCH 021/121] tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit. While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 96d11f8ab729..c799f39cb774 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3581,7 +3581,8 @@ static bool __tcp_oow_rate_limited(struct net *net, int mib_idx, if (*last_oow_ack_time) { s32 elapsed = (s32)(tcp_jiffies32 - *last_oow_ack_time); - if (0 <= elapsed && elapsed < net->ipv4.sysctl_tcp_invalid_ratelimit) { + if (0 <= elapsed && + elapsed < READ_ONCE(net->ipv4.sysctl_tcp_invalid_ratelimit)) { NET_INC_STATS(net, mib_idx); return true; /* rate-limited: don't send yet! */ } From 17161c341de0b02788b0428cb253a35b9a3c89b3 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 19 Jul 2022 15:50:59 -0600 Subject: [PATCH 022/121] dt-bindings: net: ethernet-controller: Rework 'fixed-link' schema While the if/then schemas mostly work, there's a few issues. The 'allOf' schema will also be true if 'fixed-link' is not an array or object as a false 'if' schema (without an 'else') will be true. In the array case doesn't set the type (uint32-array) in the 'then' clause. In the node case, 'additionalProperties' is missing. Rework the schema to use oneOf with each possible type. Signed-off-by: Rob Herring Signed-off-by: David S. Miller --- .../bindings/net/ethernet-controller.yaml | 105 +++++++++--------- 1 file changed, 50 insertions(+), 55 deletions(-) diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml index 4f15463611f8..170cd201adc2 100644 --- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml @@ -167,70 +167,65 @@ properties: - in-band-status fixed-link: - allOf: - - if: - type: array - then: - deprecated: true - items: - - minimum: 0 - maximum: 31 - description: - Emulated PHY ID, choose any but unique to the all - specified fixed-links + oneOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + deprecated: true + items: + - minimum: 0 + maximum: 31 + description: + Emulated PHY ID, choose any but unique to the all + specified fixed-links - - enum: [0, 1] - description: - Duplex configuration. 0 for half duplex or 1 for - full duplex + - enum: [0, 1] + description: + Duplex configuration. 0 for half duplex or 1 for + full duplex - - enum: [10, 100, 1000, 2500, 10000] - description: - Link speed in Mbits/sec. + - enum: [10, 100, 1000, 2500, 10000] + description: + Link speed in Mbits/sec. - - enum: [0, 1] - description: - Pause configuration. 0 for no pause, 1 for pause + - enum: [0, 1] + description: + Pause configuration. 0 for no pause, 1 for pause - - enum: [0, 1] - description: - Asymmetric pause configuration. 0 for no asymmetric - pause, 1 for asymmetric pause + - enum: [0, 1] + description: + Asymmetric pause configuration. 0 for no asymmetric + pause, 1 for asymmetric pause + - type: object + additionalProperties: false + properties: + speed: + description: + Link speed. + $ref: /schemas/types.yaml#/definitions/uint32 + enum: [10, 100, 1000, 2500, 10000] + full-duplex: + $ref: /schemas/types.yaml#/definitions/flag + description: + Indicates that full-duplex is used. When absent, half + duplex is assumed. - - if: - type: object - then: - properties: - speed: - description: - Link speed. - $ref: /schemas/types.yaml#/definitions/uint32 - enum: [10, 100, 1000, 2500, 10000] + pause: + $ref: /schemas/types.yaml#definitions/flag + description: + Indicates that pause should be enabled. - full-duplex: - $ref: /schemas/types.yaml#/definitions/flag - description: - Indicates that full-duplex is used. When absent, half - duplex is assumed. + asym-pause: + $ref: /schemas/types.yaml#/definitions/flag + description: + Indicates that asym_pause should be enabled. - pause: - $ref: /schemas/types.yaml#definitions/flag - description: - Indicates that pause should be enabled. + link-gpios: + maxItems: 1 + description: + GPIO to determine if the link is up - asym-pause: - $ref: /schemas/types.yaml#/definitions/flag - description: - Indicates that asym_pause should be enabled. - - link-gpios: - maxItems: 1 - description: - GPIO to determine if the link is up - - required: - - speed + required: + - speed additionalProperties: true From 030f21ba2ab14c221ff31cf22a16c78963328f6f Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 19 Jul 2022 15:51:08 -0600 Subject: [PATCH 023/121] dt-bindings: net: fsl,fec: Add missing types to phy-reset-* properties The phy-reset-* properties are missing type definitions and are not common properties. Even though they are deprecated, a type is needed. Signed-off-by: Rob Herring Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/fsl,fec.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/net/fsl,fec.yaml b/Documentation/devicetree/bindings/net/fsl,fec.yaml index daa2f79a294f..1b1853062cd3 100644 --- a/Documentation/devicetree/bindings/net/fsl,fec.yaml +++ b/Documentation/devicetree/bindings/net/fsl,fec.yaml @@ -183,6 +183,7 @@ properties: Should specify the gpio for phy reset. phy-reset-duration: + $ref: /schemas/types.yaml#/definitions/uint32 deprecated: true description: Reset duration in milliseconds. Should present only if property @@ -191,12 +192,14 @@ properties: and 1 millisecond will be used instead. phy-reset-active-high: + type: boolean deprecated: true description: If present then the reset sequence using the GPIO specified in the "phy-reset-gpios" property is reversed (H=reset state, L=operation state). phy-reset-post-delay: + $ref: /schemas/types.yaml#/definitions/uint32 deprecated: true description: Post reset delay in milliseconds. If present then a delay of phy-reset-post-delay From 8ee18e2a9e7b0e97730549f58dd618433c15811b Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Wed, 20 Jul 2022 21:49:46 +0200 Subject: [PATCH 024/121] caif: Fix bitmap data type in "struct caifsock" Bitmap are "unsigned long", so use it instead of a "u32" to make things more explicit. While at it, remove some useless cast (and leading spaces) when using the bitmap API. Signed-off-by: Christophe JAILLET Signed-off-by: David S. Miller --- net/caif/caif_socket.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index 251e666ba9a2..748be7253248 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -47,7 +47,7 @@ enum caif_states { struct caifsock { struct sock sk; /* must be first member */ struct cflayer layer; - u32 flow_state; + unsigned long flow_state; struct caif_connect_request conn_req; struct mutex readlock; struct dentry *debugfs_socket_dir; @@ -56,38 +56,32 @@ struct caifsock { static int rx_flow_is_on(struct caifsock *cf_sk) { - return test_bit(RX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + return test_bit(RX_FLOW_ON_BIT, &cf_sk->flow_state); } static int tx_flow_is_on(struct caifsock *cf_sk) { - return test_bit(TX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + return test_bit(TX_FLOW_ON_BIT, &cf_sk->flow_state); } static void set_rx_flow_off(struct caifsock *cf_sk) { - clear_bit(RX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + clear_bit(RX_FLOW_ON_BIT, &cf_sk->flow_state); } static void set_rx_flow_on(struct caifsock *cf_sk) { - set_bit(RX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + set_bit(RX_FLOW_ON_BIT, &cf_sk->flow_state); } static void set_tx_flow_off(struct caifsock *cf_sk) { - clear_bit(TX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + clear_bit(TX_FLOW_ON_BIT, &cf_sk->flow_state); } static void set_tx_flow_on(struct caifsock *cf_sk) { - set_bit(TX_FLOW_ON_BIT, - (void *) &cf_sk->flow_state); + set_bit(TX_FLOW_ON_BIT, &cf_sk->flow_state); } static void caif_read_lock(struct sock *sk) From be76ceaf03bc04e74be5e28f608316b73c2b04ad Mon Sep 17 00:00:00 2001 From: Sherry Sun Date: Wed, 27 Apr 2022 09:51:36 +0800 Subject: [PATCH 025/121] EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw v3.x Synopsys EDAC DDR doesn't have the QOS Interrupt register. Use the ECC Clear Register to disable the error interrupts instead. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun Signed-off-by: Borislav Petkov Reviewed-by: Shubhrajyoti Datta Acked-by: Michal Simek Cc: Link: https://lore.kernel.org/r/20220427015137.8406-2-sherry.sun@nxp.com --- drivers/edac/synopsys_edac.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c index 1cee64b80a7e..1e38b677d8fd 100644 --- a/drivers/edac/synopsys_edac.c +++ b/drivers/edac/synopsys_edac.c @@ -852,8 +852,11 @@ static void enable_intr(struct synps_edac_priv *priv) static void disable_intr(struct synps_edac_priv *priv) { /* Disable UE/CE Interrupts */ - writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, - priv->baseaddr + DDR_QOS_IRQ_DB_OFST); + if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) + writel(0x0, priv->baseaddr + ECC_CLR_OFST); + else + writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, + priv->baseaddr + DDR_QOS_IRQ_DB_OFST); } static int setup_irq(struct mem_ctl_info *mci, From 4bcffe941758ee17becb43af3b25487f848f6512 Mon Sep 17 00:00:00 2001 From: Sherry Sun Date: Wed, 27 Apr 2022 09:51:37 +0800 Subject: [PATCH 026/121] EDAC/synopsys: Re-enable the error interrupts on v3 hw zynqmp_get_error_info() writes 0 to the ECC_CLR_OFST register after an interrupt for a {un-,}correctable error is raised, which disables the error interrupts. Then the interrupt handler will be called only once. Therefore, re-enable the error interrupt line at the end of intr_handler() for v3.x Synopsys EDAC DDR. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun Signed-off-by: Borislav Petkov Reviewed-by: Shubhrajyoti Datta Acked-by: Michal Simek Cc: Link: https://lore.kernel.org/r/20220427015137.8406-3-sherry.sun@nxp.com --- drivers/edac/synopsys_edac.c | 47 +++++++++++++++++++----------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c index 1e38b677d8fd..f7d37c282819 100644 --- a/drivers/edac/synopsys_edac.c +++ b/drivers/edac/synopsys_edac.c @@ -514,6 +514,28 @@ static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p) memset(p, 0, sizeof(*p)); } +static void enable_intr(struct synps_edac_priv *priv) +{ + /* Enable UE/CE Interrupts */ + if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) + writel(DDR_UE_MASK | DDR_CE_MASK, + priv->baseaddr + ECC_CLR_OFST); + else + writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, + priv->baseaddr + DDR_QOS_IRQ_EN_OFST); + +} + +static void disable_intr(struct synps_edac_priv *priv) +{ + /* Disable UE/CE Interrupts */ + if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) + writel(0x0, priv->baseaddr + ECC_CLR_OFST); + else + writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, + priv->baseaddr + DDR_QOS_IRQ_DB_OFST); +} + /** * intr_handler - Interrupt Handler for ECC interrupts. * @irq: IRQ number. @@ -555,6 +577,9 @@ static irqreturn_t intr_handler(int irq, void *dev_id) /* v3.0 of the controller does not have this register */ if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST); + else + enable_intr(priv); + return IRQ_HANDLED; } @@ -837,28 +862,6 @@ static void mc_init(struct mem_ctl_info *mci, struct platform_device *pdev) init_csrows(mci); } -static void enable_intr(struct synps_edac_priv *priv) -{ - /* Enable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(DDR_UE_MASK | DDR_CE_MASK, - priv->baseaddr + ECC_CLR_OFST); - else - writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, - priv->baseaddr + DDR_QOS_IRQ_EN_OFST); - -} - -static void disable_intr(struct synps_edac_priv *priv) -{ - /* Disable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(0x0, priv->baseaddr + ECC_CLR_OFST); - else - writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, - priv->baseaddr + DDR_QOS_IRQ_DB_OFST); -} - static int setup_irq(struct mem_ctl_info *mci, struct platform_device *pdev) { From e2a619ca0b38f2114347b7078b8a67d72d457a3d Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Fri, 22 Jul 2022 13:07:11 +0200 Subject: [PATCH 027/121] asm-generic: remove a broken and needless ifdef conditional Commit 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB in the reference) in ./include/asm-generic/io.h. Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs: GENERIC_DEVMEM_IS_ALLOWED Referencing files: include/asm-generic/io.h The actual fix, though, is simply to not to make this function declaration dependent on any kernel config. For architectures that intend to use the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will lead to picking the function definition, and for other architectures, this function is simply defined elsewhere. The wrong '#ifndef' on a non-existing config symbol also always had the same effect (although more by mistake than by intent). So, there is no functional change. Remove this broken and needless ifdef conditional. Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Lukas Bulwahn Signed-off-by: Arnd Bergmann --- include/asm-generic/io.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index 7ce93aaf69f8..98954dda5734 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -1125,9 +1125,7 @@ static inline void memcpy_toio(volatile void __iomem *addr, const void *buffer, } #endif -#ifndef CONFIG_GENERIC_DEVMEM_IS_ALLOWED extern int devmem_is_allowed(unsigned long pfn); -#endif #endif /* __KERNEL__ */ From c5cdb9286913aa5a5ebb81bcca0c17df3b0e2c79 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 22 Jul 2022 13:46:11 +0200 Subject: [PATCH 028/121] ARM: pxa2xx: Fix GPIO descriptor tables Laurence reports: "Kernel >5.18 on Zaurus has a bug where the power management code can't talk to devices, emitting the following errors: sharpsl-pm sharpsl-pm: Error: AC check failed: voltage -22. sharpsl-pm sharpsl-pm: Charging Error! sharpsl-pm sharpsl-pm: Warning: Cannot read main battery! Looking at the recent changes, I found that commit 31455bbda208 ("spi: pxa2xx_spi: Convert to use GPIO descriptors") replaced the deprecated SPI chip select platform device code with a gpiod lookup table. However, this didn't seem to work until I changed the `dev_id` member from the device name to the bus id. I'm not entirely sure why this is necessary, but I suspect it is related to the fact that in sysfs SPI devices are attached under /sys/devices/.../dev_name/spi_master/spiB/spiB.C, rather than directly to the device." After reviewing the change I conclude that the same fix is needed for all affected boards. Fixes: 31455bbda208 ("spi: pxa2xx_spi: Convert to use GPIO descriptors") Reported-by: Laurence de Bruxelles Signed-off-by: Linus Walleij Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220722114611.1517414-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann --- arch/arm/mach-pxa/corgi.c | 2 +- arch/arm/mach-pxa/hx4700.c | 2 +- arch/arm/mach-pxa/icontrol.c | 4 ++-- arch/arm/mach-pxa/littleton.c | 2 +- arch/arm/mach-pxa/magician.c | 2 +- arch/arm/mach-pxa/spitz.c | 2 +- arch/arm/mach-pxa/z2.c | 4 ++-- 7 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm/mach-pxa/corgi.c b/arch/arm/mach-pxa/corgi.c index c546356d0f02..5738496717e2 100644 --- a/arch/arm/mach-pxa/corgi.c +++ b/arch/arm/mach-pxa/corgi.c @@ -549,7 +549,7 @@ static struct pxa2xx_spi_controller corgi_spi_info = { }; static struct gpiod_lookup_table corgi_spi_gpio_table = { - .dev_id = "pxa2xx-spi.1", + .dev_id = "spi1", .table = { GPIO_LOOKUP_IDX("gpio-pxa", CORGI_GPIO_ADS7846_CS, "cs", 0, GPIO_ACTIVE_LOW), GPIO_LOOKUP_IDX("gpio-pxa", CORGI_GPIO_LCDCON_CS, "cs", 1, GPIO_ACTIVE_LOW), diff --git a/arch/arm/mach-pxa/hx4700.c b/arch/arm/mach-pxa/hx4700.c index 2ae06edf413c..2fd665944103 100644 --- a/arch/arm/mach-pxa/hx4700.c +++ b/arch/arm/mach-pxa/hx4700.c @@ -635,7 +635,7 @@ static struct pxa2xx_spi_controller pxa_ssp2_master_info = { }; static struct gpiod_lookup_table pxa_ssp2_gpio_table = { - .dev_id = "pxa2xx-spi.2", + .dev_id = "spi2", .table = { GPIO_LOOKUP_IDX("gpio-pxa", GPIO88_HX4700_TSC2046_CS, "cs", 0, GPIO_ACTIVE_LOW), { }, diff --git a/arch/arm/mach-pxa/icontrol.c b/arch/arm/mach-pxa/icontrol.c index 753fe166ab68..624088257cfc 100644 --- a/arch/arm/mach-pxa/icontrol.c +++ b/arch/arm/mach-pxa/icontrol.c @@ -140,7 +140,7 @@ struct platform_device pxa_spi_ssp4 = { }; static struct gpiod_lookup_table pxa_ssp3_gpio_table = { - .dev_id = "pxa2xx-spi.3", + .dev_id = "spi3", .table = { GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS1, "cs", 0, GPIO_ACTIVE_LOW), GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS2, "cs", 1, GPIO_ACTIVE_LOW), @@ -149,7 +149,7 @@ static struct gpiod_lookup_table pxa_ssp3_gpio_table = { }; static struct gpiod_lookup_table pxa_ssp4_gpio_table = { - .dev_id = "pxa2xx-spi.4", + .dev_id = "spi4", .table = { GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS3, "cs", 0, GPIO_ACTIVE_LOW), GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS4, "cs", 1, GPIO_ACTIVE_LOW), diff --git a/arch/arm/mach-pxa/littleton.c b/arch/arm/mach-pxa/littleton.c index f98dc61e87af..98423a96f440 100644 --- a/arch/arm/mach-pxa/littleton.c +++ b/arch/arm/mach-pxa/littleton.c @@ -207,7 +207,7 @@ static struct spi_board_info littleton_spi_devices[] __initdata = { }; static struct gpiod_lookup_table littleton_spi_gpio_table = { - .dev_id = "pxa2xx-spi.2", + .dev_id = "spi2", .table = { GPIO_LOOKUP_IDX("gpio-pxa", LITTLETON_GPIO_LCD_CS, "cs", 0, GPIO_ACTIVE_LOW), { }, diff --git a/arch/arm/mach-pxa/magician.c b/arch/arm/mach-pxa/magician.c index 20456a55c4c5..0827ebca1d38 100644 --- a/arch/arm/mach-pxa/magician.c +++ b/arch/arm/mach-pxa/magician.c @@ -994,7 +994,7 @@ static struct pxa2xx_spi_controller magician_spi_info = { }; static struct gpiod_lookup_table magician_spi_gpio_table = { - .dev_id = "pxa2xx-spi.2", + .dev_id = "spi2", .table = { /* NOTICE must be GPIO, incompatibility with hw PXA SPI framing */ GPIO_LOOKUP_IDX("gpio-pxa", GPIO14_MAGICIAN_TSC2046_CS, "cs", 0, GPIO_ACTIVE_LOW), diff --git a/arch/arm/mach-pxa/spitz.c b/arch/arm/mach-pxa/spitz.c index dd88953adc9d..9964729cd428 100644 --- a/arch/arm/mach-pxa/spitz.c +++ b/arch/arm/mach-pxa/spitz.c @@ -578,7 +578,7 @@ static struct pxa2xx_spi_controller spitz_spi_info = { }; static struct gpiod_lookup_table spitz_spi_gpio_table = { - .dev_id = "pxa2xx-spi.2", + .dev_id = "spi2", .table = { GPIO_LOOKUP_IDX("gpio-pxa", SPITZ_GPIO_ADS7846_CS, "cs", 0, GPIO_ACTIVE_LOW), GPIO_LOOKUP_IDX("gpio-pxa", SPITZ_GPIO_LCDCON_CS, "cs", 1, GPIO_ACTIVE_LOW), diff --git a/arch/arm/mach-pxa/z2.c b/arch/arm/mach-pxa/z2.c index d03520555497..c4d4162a7e6e 100644 --- a/arch/arm/mach-pxa/z2.c +++ b/arch/arm/mach-pxa/z2.c @@ -623,7 +623,7 @@ static struct pxa2xx_spi_controller pxa_ssp2_master_info = { }; static struct gpiod_lookup_table pxa_ssp1_gpio_table = { - .dev_id = "pxa2xx-spi.1", + .dev_id = "spi1", .table = { GPIO_LOOKUP_IDX("gpio-pxa", GPIO24_ZIPITZ2_WIFI_CS, "cs", 0, GPIO_ACTIVE_LOW), { }, @@ -631,7 +631,7 @@ static struct gpiod_lookup_table pxa_ssp1_gpio_table = { }; static struct gpiod_lookup_table pxa_ssp2_gpio_table = { - .dev_id = "pxa2xx-spi.2", + .dev_id = "spi2", .table = { GPIO_LOOKUP_IDX("gpio-pxa", GPIO88_ZIPITZ2_LCD_CS, "cs", 0, GPIO_ACTIVE_LOW), { }, From 88bd24d73d5bfa1b7b97a9221ff320fc44ef401a Mon Sep 17 00:00:00 2001 From: Emil Renner Berthing Date: Sat, 25 Jun 2022 16:42:07 +0100 Subject: [PATCH 029/121] riscv: compat: vdso: Fix vdso_install target When CONFIG_COMPAT=y the vdso_install target fails: $ make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- vdso_install INSTALL vdso.so make[1]: *** No rule to make target 'vdso_install'. Stop. make: *** [arch/riscv/Makefile:112: vdso_install] Error 2 The problem is that arch/riscv/kernel/compat_vdso/Makefile doesn't have a vdso_install target, but instead calls it compat_vdso_install. Signed-off-by: Emil Renner Berthing Link: https://lore.kernel.org/r/20220625154207.80972-1-emil.renner.berthing@canonical.com Fixes: 0715372a06ce ("riscv: compat: vdso: Add COMPAT_VDSO base code implementation") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt --- arch/riscv/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index a4c46a03d2e2..81029d40a672 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -111,7 +111,7 @@ PHONY += vdso_install vdso_install: $(Q)$(MAKE) $(build)=arch/riscv/kernel/vdso $@ $(if $(CONFIG_COMPAT),$(Q)$(MAKE) \ - $(build)=arch/riscv/kernel/compat_vdso $@) + $(build)=arch/riscv/kernel/compat_vdso compat_$@) ifeq ($(KBUILD_EXTMOD),) ifeq ($(CONFIG_MMU),y) From 4d8f24eeedc58d5f87b650ddda73c16e8ba56559 Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Thu, 21 Jul 2022 20:44:04 +0000 Subject: [PATCH 030/121] Revert "tcp: change pingpong threshold to 3" This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang Suggested-by: Neal Cardwell Signed-off-by: Wei Wang Acked-by: Neal Cardwell Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski --- include/net/inet_connection_sock.h | 10 +--------- net/ipv4/tcp_output.c | 15 ++++++--------- 2 files changed, 7 insertions(+), 18 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 85cd695e7fd1..ee88f0f1350f 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -321,7 +321,7 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); -#define TCP_PINGPONG_THRESH 3 +#define TCP_PINGPONG_THRESH 1 static inline void inet_csk_enter_pingpong_mode(struct sock *sk) { @@ -338,14 +338,6 @@ static inline bool inet_csk_in_pingpong_mode(struct sock *sk) return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; } -static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) -{ - struct inet_connection_sock *icsk = inet_csk(sk); - - if (icsk->icsk_ack.pingpong < U8_MAX) - icsk->icsk_ack.pingpong++; -} - static inline bool inet_csk_has_ulp(struct sock *sk) { return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index cf6713c9567e..2efe41c84ee8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -167,16 +167,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp, if (tcp_packets_in_flight(tp) == 0) tcp_ca_event(sk, CA_EVENT_TX_START); - /* If this is the first data packet sent in response to the - * previous received data, - * and it is a reply for ato after last received packet, - * increase pingpong count. - */ - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) - inet_csk_inc_pingpong_cnt(sk); - tp->lsndtime = now; + + /* If it is a reply for ato after last received + * packet, enter pingpong mode. + */ + if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) + inet_csk_enter_pingpong_mode(sk); } /* Account for an ACK we sent. */ From f6336724a4d4220c89a4ec38bca84b03b178b1a3 Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Thu, 21 Jul 2022 12:11:27 +0300 Subject: [PATCH 031/121] net/tls: Remove the context from the list in tls_device_down tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Signed-off-by: David S. Miller --- net/tls/tls_device.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 879b9024678e..9975df34d9c2 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -1376,8 +1376,13 @@ static int tls_device_down(struct net_device *netdev) * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW. * Now release the ref taken above. */ - if (refcount_dec_and_test(&ctx->refcount)) + if (refcount_dec_and_test(&ctx->refcount)) { + /* sk_destruct ran after tls_device_down took a ref, and + * it returned early. Complete the destruction here. + */ + list_del(&ctx->list); tls_device_free_ctx(ctx); + } } up_write(&device_offload_lock); From aa709da0e032cee7c202047ecd75f437bb0126ed Mon Sep 17 00:00:00 2001 From: Xin Long Date: Thu, 21 Jul 2022 10:35:46 -0400 Subject: [PATCH 032/121] Documentation: fix sctp_wmem in ip-sysctl.rst Since commit 1033990ac5b2 ("sctp: implement memory accounting on tx path"), SCTP has supported memory accounting on tx path where 'sctp_wmem' is used by sk_wmem_schedule(). So we should fix the description for this option in ip-sysctl.rst accordingly. v1->v2: - Improve the description as Marcelo suggested. Fixes: 1033990ac5b2 ("sctp: implement memory accounting on tx path") Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 66c72230eaad..d7a1bf1a55b5 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2866,7 +2866,14 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max Default: 4K sctp_wmem - vector of 3 INTEGERs: min, default, max - Currently this tunable has no effect. + Only the first value ("min") is used, "default" and "max" are + ignored. + + min: Minimum size of send buffer that can be used by SCTP sockets. + It is guaranteed to each SCTP socket (but not association) even + under moderate memory pressure. + + Default: 4K addr_scope_policy - INTEGER Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 From d6c52fa3e955b97f8eb3ac824d2a3e0af147b3ce Mon Sep 17 00:00:00 2001 From: Tobias Gruetzmacher Date: Fri, 22 Jul 2022 19:05:57 +0200 Subject: [PATCH 033/121] nvme-pci: Crucial P2 has bogus namespace ids This adds a quirk for the Crucial P2. Signed-off-by: Tobias Gruetzmacher Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 58c72d55769a..73d9fcba3b1c 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3515,6 +3515,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1e49, 0x0041), /* ZHITAI TiPro7000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, + { PCI_DEVICE(0xc0a9, 0x540a), /* Crucial P2 */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065), From af35f95aca69a86058d480a63f4e096f0220905c Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Fri, 22 Jul 2022 16:20:27 +0800 Subject: [PATCH 034/121] nfp: bpf: Fix typo 'the the' in comment Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao Acked-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/netronome/nfp/bpf/jit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index e31f8fbbc696..df2ab5cbd49b 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -4233,7 +4233,7 @@ static void nfp_bpf_opt_ldst_gather(struct nfp_prog *nfp_prog) } /* If the chain is ended by an load/store pair then this - * could serve as the new head of the the next chain. + * could serve as the new head of the next chain. */ if (curr_pair_is_memcpy(meta1, meta2)) { head_ld_meta = meta1; From 2540d3c99926c234718e058acdd956d7c614eddd Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Fri, 22 Jul 2022 16:22:27 +0800 Subject: [PATCH 035/121] net: ipa: Fix typo 'the the' in comment Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao Signed-off-by: David S. Miller --- drivers/net/ipa/ipa_qmi_msg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ipa/ipa_qmi_msg.h b/drivers/net/ipa/ipa_qmi_msg.h index 3233d145fd87..495e85abe50b 100644 --- a/drivers/net/ipa/ipa_qmi_msg.h +++ b/drivers/net/ipa/ipa_qmi_msg.h @@ -214,7 +214,7 @@ struct ipa_init_modem_driver_req { /* The response to a IPA_QMI_INIT_DRIVER request begins with a standard * QMI response, but contains other information as well. Currently we - * simply wait for the the INIT_DRIVER transaction to complete and + * simply wait for the INIT_DRIVER transaction to complete and * ignore any other data that might be returned. */ struct ipa_init_modem_driver_rsp { From 1aaa62c4838a140d0592935c51985158963d5971 Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Fri, 22 Jul 2022 17:38:34 +0800 Subject: [PATCH 036/121] s390/qeth: Fix typo 'the the' in comment Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao Signed-off-by: David S. Miller --- drivers/s390/net/qeth_core_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 9e54fe76a9b2..35d4b398c197 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -3565,7 +3565,7 @@ static void qeth_flush_buffers(struct qeth_qdio_out_q *queue, int index, if (!atomic_read(&queue->set_pci_flags_count)) { /* * there's no outstanding PCI any more, so we - * have to request a PCI to be sure the the PCI + * have to request a PCI to be sure the PCI * will wake at some time in the future then we * can flush packed buffers that might still be * hanging around, which can happen if no From f46040eeaf2e523a4096199fd93a11e794818009 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 22 Jul 2022 11:16:27 +0200 Subject: [PATCH 037/121] macsec: fix NULL deref in macsec_add_rxsa Commit 48ef50fa866a added a test on tb_sa[MACSEC_SA_ATTR_PN], but nothing guarantees that it's not NULL at this point. The same code was added to macsec_add_txsa, but there it's not a problem because validate_add_txsa checks that the MACSEC_SA_ATTR_PN attribute is present. Note: it's not possible to reproduce with iproute, because iproute doesn't allow creating an SA without specifying the PN. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208315 Reported-by: Frantisek Sumsal Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller --- drivers/net/macsec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 817577e713d7..769a1eca6bd8 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1753,7 +1753,8 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct genl_info *info) } pn_len = secy->xpn ? MACSEC_XPN_PN_LEN : MACSEC_DEFAULT_PN_LEN; - if (nla_len(tb_sa[MACSEC_SA_ATTR_PN]) != pn_len) { + if (tb_sa[MACSEC_SA_ATTR_PN] && + nla_len(tb_sa[MACSEC_SA_ATTR_PN]) != pn_len) { pr_notice("macsec: nl: add_rxsa: bad pn length: %d != %d\n", nla_len(tb_sa[MACSEC_SA_ATTR_PN]), pn_len); rtnl_unlock(); From 3240eac4ff20e51b87600dbd586ed814daf313db Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 22 Jul 2022 11:16:28 +0200 Subject: [PATCH 038/121] macsec: fix error message in macsec_add_rxsa and _txsa The expected length is MACSEC_SALT_LEN, not MACSEC_SA_ATTR_SALT. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller --- drivers/net/macsec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 769a1eca6bd8..634452d3ecc5 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1770,7 +1770,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct genl_info *info) if (nla_len(tb_sa[MACSEC_SA_ATTR_SALT]) != MACSEC_SALT_LEN) { pr_notice("macsec: nl: add_rxsa: bad salt length: %d != %d\n", nla_len(tb_sa[MACSEC_SA_ATTR_SALT]), - MACSEC_SA_ATTR_SALT); + MACSEC_SALT_LEN); rtnl_unlock(); return -EINVAL; } @@ -2012,7 +2012,7 @@ static int macsec_add_txsa(struct sk_buff *skb, struct genl_info *info) if (nla_len(tb_sa[MACSEC_SA_ATTR_SALT]) != MACSEC_SALT_LEN) { pr_notice("macsec: nl: add_txsa: bad salt length: %d != %d\n", nla_len(tb_sa[MACSEC_SA_ATTR_SALT]), - MACSEC_SA_ATTR_SALT); + MACSEC_SALT_LEN); rtnl_unlock(); return -EINVAL; } From b07a0e2044057f201d694ab474f5c42a02b6465b Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 22 Jul 2022 11:16:29 +0200 Subject: [PATCH 039/121] macsec: limit replay window size with XPN IEEE 802.1AEbw-2013 (section 10.7.8) specifies that the maximum value of the replay window is 2^30-1, to help with recovery of the upper bits of the PN. To avoid leaving the existing macsec device in an inconsistent state if this test fails during changelink, reuse the cleanup mechanism introduced for HW offload. This wasn't needed until now because macsec_changelink_common could not fail during changelink, as modifying the cipher suite was not allowed. Finally, this must happen after handling IFLA_MACSEC_CIPHER_SUITE so that secy->xpn is set. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller --- drivers/net/macsec.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 634452d3ecc5..b3834e353c22 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -243,6 +243,7 @@ static struct macsec_cb *macsec_skb_cb(struct sk_buff *skb) #define DEFAULT_SEND_SCI true #define DEFAULT_ENCRYPT false #define DEFAULT_ENCODING_SA 0 +#define MACSEC_XPN_MAX_REPLAY_WINDOW (((1 << 30) - 1)) static bool send_sci(const struct macsec_secy *secy) { @@ -3746,9 +3747,6 @@ static int macsec_changelink_common(struct net_device *dev, secy->operational = tx_sa && tx_sa->active; } - if (data[IFLA_MACSEC_WINDOW]) - secy->replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]); - if (data[IFLA_MACSEC_ENCRYPT]) tx_sc->encrypt = !!nla_get_u8(data[IFLA_MACSEC_ENCRYPT]); @@ -3794,6 +3792,16 @@ static int macsec_changelink_common(struct net_device *dev, } } + if (data[IFLA_MACSEC_WINDOW]) { + secy->replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]); + + /* IEEE 802.1AEbw-2013 10.7.8 - maximum replay window + * for XPN cipher suites */ + if (secy->xpn && + secy->replay_window > MACSEC_XPN_MAX_REPLAY_WINDOW) + return -EINVAL; + } + return 0; } @@ -3823,7 +3831,7 @@ static int macsec_changelink(struct net_device *dev, struct nlattr *tb[], ret = macsec_changelink_common(dev, data); if (ret) - return ret; + goto cleanup; /* If h/w offloading is available, propagate to the device */ if (macsec_is_offloaded(macsec)) { From c630d1fe6219769049c87d1a6a0e9a6de55328a1 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 22 Jul 2022 11:16:30 +0200 Subject: [PATCH 040/121] macsec: always read MACSEC_SA_ATTR_PN as a u64 Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a u32, sometimes forced into a u64 without checking the actual length of the attribute. Instead, we can use nla_get_u64 everywhere, which will read up to 64 bits into a u64, capped by the actual length of the attribute coming from userspace. This fixes several issues: - the check in validate_add_rxsa doesn't work with 32-bit attributes - the checks in validate_add_txsa and validate_upd_sa incorrectly reject X << 32 (with X != 0) Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller --- drivers/net/macsec.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index b3834e353c22..95578f04f212 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1698,7 +1698,7 @@ static bool validate_add_rxsa(struct nlattr **attrs) return false; if (attrs[MACSEC_SA_ATTR_PN] && - *(u64 *)nla_data(attrs[MACSEC_SA_ATTR_PN]) == 0) + nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0) return false; if (attrs[MACSEC_SA_ATTR_ACTIVE]) { @@ -1941,7 +1941,7 @@ static bool validate_add_txsa(struct nlattr **attrs) if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN) return false; - if (nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0) + if (nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0) return false; if (attrs[MACSEC_SA_ATTR_ACTIVE]) { @@ -2295,7 +2295,7 @@ static bool validate_upd_sa(struct nlattr **attrs) if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN) return false; - if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0) + if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0) return false; if (attrs[MACSEC_SA_ATTR_ACTIVE]) { From c7b205fbbf3cffa374721bb7623f7aa8c46074f1 Mon Sep 17 00:00:00 2001 From: Jianglei Nie Date: Fri, 22 Jul 2022 17:29:02 +0800 Subject: [PATCH 041/121] net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa() init_rx_sa() allocates relevant resource for rx_sa->stats and rx_sa-> key.tfm with alloc_percpu() and macsec_alloc_tfm(). When some error occurs after init_rx_sa() is called in macsec_add_rxsa(), the function released rx_sa with kfree() without releasing rx_sa->stats and rx_sa-> key.tfm, which will lead to a resource leak. We should call macsec_rxsa_put() instead of kfree() to decrease the ref count of rx_sa and release the relevant resource if the refcount is 0. The same bug exists in macsec_add_txsa() for tx_sa as well. This patch fixes the above two bugs. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Jianglei Nie Signed-off-by: David S. Miller --- drivers/net/macsec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 95578f04f212..f354fad05714 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1844,7 +1844,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct genl_info *info) return 0; cleanup: - kfree(rx_sa); + macsec_rxsa_put(rx_sa); rtnl_unlock(); return err; } @@ -2087,7 +2087,7 @@ static int macsec_add_txsa(struct sk_buff *skb, struct genl_info *info) cleanup: secy->operational = was_operational; - kfree(tx_sa); + macsec_txsa_put(tx_sa); rtnl_unlock(); return err; } From 3e7d18b9dca388940a19cae30bfc1f76dccd8c28 Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Fri, 22 Jul 2022 17:06:35 +0000 Subject: [PATCH 042/121] net: mld: fix reference count leak in mld_{query | report}_work() mld_{query | report}_work() processes queued events. If there are too many events in the queue, it re-queue a work. And then, it returns without in6_dev_put(). But if queuing is failed, it should call in6_dev_put(), but it doesn't. So, a reference count leak would occur. THREAD0 THREAD1 mld_report_work() spin_lock_bh() if (!mod_delayed_work()) in6_dev_hold(); spin_unlock_bh() spin_lock_bh() schedule_delayed_work() spin_unlock_bh() Script to reproduce(by Hangbin Liu): ip netns add ns1 ip netns add ns2 ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip -n ns1 link add veth0 type veth peer name veth0 netns ns2 ip -n ns1 link set veth0 up ip -n ns2 link set veth0 up for i in `seq 50`; do for j in `seq 100`; do ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0 ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0 done done modprobe -r veth ip -a netns del splat looks like: unregister_netdevice: waiting for veth0 to become free. Usage count = 2 leaked reference. ipv6_add_dev+0x324/0xec0 addrconf_notify+0x481/0xd10 raw_notifier_call_chain+0xe3/0x120 call_netdevice_notifiers+0x106/0x160 register_netdevice+0x114c/0x16b0 veth_newlink+0x48b/0xa50 [veth] rtnl_newlink+0x11a2/0x1a40 rtnetlink_rcv_msg+0x63f/0xc00 netlink_rcv_skb+0x1df/0x3e0 netlink_unicast+0x5de/0x850 netlink_sendmsg+0x6c9/0xa90 ____sys_sendmsg+0x76a/0x780 __sys_sendmsg+0x27c/0x340 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Tested-by: Hangbin Liu Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Taehee Yoo Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/mcast.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 7f695c39d9a8..87c699d57b36 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1522,7 +1522,6 @@ static void mld_query_work(struct work_struct *work) if (++cnt >= MLD_MAX_QUEUE) { rework = true; - schedule_delayed_work(&idev->mc_query_work, 0); break; } } @@ -1533,8 +1532,10 @@ static void mld_query_work(struct work_struct *work) __mld_query_work(skb); mutex_unlock(&idev->mc_lock); - if (!rework) - in6_dev_put(idev); + if (rework && queue_delayed_work(mld_wq, &idev->mc_query_work, 0)) + return; + + in6_dev_put(idev); } /* called with rcu_read_lock() */ @@ -1624,7 +1625,6 @@ static void mld_report_work(struct work_struct *work) if (++cnt >= MLD_MAX_QUEUE) { rework = true; - schedule_delayed_work(&idev->mc_report_work, 0); break; } } @@ -1635,8 +1635,10 @@ static void mld_report_work(struct work_struct *work) __mld_report_work(skb); mutex_unlock(&idev->mc_lock); - if (!rework) - in6_dev_put(idev); + if (rework && queue_delayed_work(mld_wq, &idev->mc_report_work, 0)) + return; + + in6_dev_put(idev); } static bool is_in(struct ifmcaddr6 *pmc, struct ip6_sf_list *psf, int type, From 59bf6c65a09fff74215517aecffbbdcd67df76e3 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:21:59 -0700 Subject: [PATCH 043/121] tcp: Fix data-races around sk_pacing_rate. While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 43e122b014c9 ("tcp: refine pacing rate determination") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c799f39cb774..dd05238f79f6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -910,9 +910,9 @@ static void tcp_update_pacing_rate(struct sock *sk) * end of slow start and should slow down. */ if (tcp_snd_cwnd(tp) < tp->snd_ssthresh / 2) - rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio; + rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio); else - rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio; + rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio); rate *= max(tcp_snd_cwnd(tp), tp->packets_out); From 02739545951ad4c1215160db7fbf9b7a918d3c0b Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:00 -0700 Subject: [PATCH 044/121] net: Fix data-races around sysctl_[rw]mem(_offset)?. While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- include/net/sock.h | 8 ++++---- net/decnet/af_decnet.c | 4 ++-- net/ipv4/tcp.c | 6 +++--- net/ipv4/tcp_input.c | 13 +++++++------ net/ipv4/tcp_output.c | 2 +- net/mptcp/protocol.c | 6 +++--- net/tipc/socket.c | 2 +- 7 files changed, 21 insertions(+), 20 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 9fa54762e077..7a48991cdb19 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2843,18 +2843,18 @@ static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ if (proto->sysctl_wmem_offset) - return *(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset); + return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset)); - return *proto->sysctl_wmem; + return READ_ONCE(*proto->sysctl_wmem); } static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_rmem ? */ if (proto->sysctl_rmem_offset) - return *(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset); + return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset)); - return *proto->sysctl_rmem; + return READ_ONCE(*proto->sysctl_rmem); } /* Default TCP Small queue budget is ~1 ms of data (1sec >> 10) diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index dc92a67baea3..7d542eb46172 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -480,8 +480,8 @@ static struct sock *dn_alloc_sock(struct net *net, struct socket *sock, gfp_t gf sk->sk_family = PF_DECnet; sk->sk_protocol = 0; sk->sk_allocation = gfp; - sk->sk_sndbuf = sysctl_decnet_wmem[1]; - sk->sk_rcvbuf = sysctl_decnet_rmem[1]; + sk->sk_sndbuf = READ_ONCE(sysctl_decnet_wmem[1]); + sk->sk_rcvbuf = READ_ONCE(sysctl_decnet_rmem[1]); /* Initialization of DECnet Session Control Port */ scp = DN_SK(sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a11e5de3a4c3..002a4a04efbe 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -452,8 +452,8 @@ void tcp_init_sock(struct sock *sk) icsk->icsk_sync_mss = tcp_sync_mss; - WRITE_ONCE(sk->sk_sndbuf, sock_net(sk)->ipv4.sysctl_tcp_wmem[1]); - WRITE_ONCE(sk->sk_rcvbuf, sock_net(sk)->ipv4.sysctl_tcp_rmem[1]); + WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1])); + WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1])); sk_sockets_allocated_inc(sk); } @@ -1724,7 +1724,7 @@ int tcp_set_rcvlowat(struct sock *sk, int val) if (sk->sk_userlocks & SOCK_RCVBUF_LOCK) cap = sk->sk_rcvbuf >> 1; else - cap = sock_net(sk)->ipv4.sysctl_tcp_rmem[2] >> 1; + cap = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1; val = min(val, cap); WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index dd05238f79f6..ff2e0d87aee4 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -426,7 +426,7 @@ static void tcp_sndbuf_expand(struct sock *sk) if (sk->sk_sndbuf < sndmem) WRITE_ONCE(sk->sk_sndbuf, - min(sndmem, sock_net(sk)->ipv4.sysctl_tcp_wmem[2])); + min(sndmem, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[2]))); } /* 2. Tuning advertised window (window_clamp, rcv_ssthresh) @@ -461,7 +461,7 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb, struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ int truesize = tcp_win_from_space(sk, skbtruesize) >> 1; - int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1; + int window = tcp_win_from_space(sk, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])) >> 1; while (tp->rcv_ssthresh <= window) { if (truesize <= skb->len) @@ -574,16 +574,17 @@ static void tcp_clamp_window(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); struct net *net = sock_net(sk); + int rmem2; icsk->icsk_ack.quick = 0; + rmem2 = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); - if (sk->sk_rcvbuf < net->ipv4.sysctl_tcp_rmem[2] && + if (sk->sk_rcvbuf < rmem2 && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) && !tcp_under_memory_pressure(sk) && sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) { WRITE_ONCE(sk->sk_rcvbuf, - min(atomic_read(&sk->sk_rmem_alloc), - net->ipv4.sysctl_tcp_rmem[2])); + min(atomic_read(&sk->sk_rmem_alloc), rmem2)); } if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss); @@ -745,7 +746,7 @@ void tcp_rcv_space_adjust(struct sock *sk) do_div(rcvwin, tp->advmss); rcvbuf = min_t(u64, rcvwin * rcvmem, - sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); if (rcvbuf > sk->sk_rcvbuf) { WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2efe41c84ee8..4c376b6d8764 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -238,7 +238,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, *rcv_wscale = 0; if (wscale_ok) { /* Set window scaling on max possible window */ - space = max_t(u32, space, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); space = max_t(u32, space, sysctl_rmem_max); space = min_t(u32, space, *window_clamp); *rcv_wscale = clamp_t(int, ilog2(space) - 15, diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 9bbd8cbe0acb..7e1518bb6115 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1926,7 +1926,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) do_div(rcvwin, advmss); rcvbuf = min_t(u64, rcvwin * rcvmem, - sock_net(sk)->ipv4.sysctl_tcp_rmem[2]); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); if (rcvbuf > sk->sk_rcvbuf) { u32 window_clamp; @@ -2669,8 +2669,8 @@ static int mptcp_init_sock(struct sock *sk) mptcp_ca_reset(sk); sk_sockets_allocated_inc(sk); - sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1]; - sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[1]; + sk->sk_rcvbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]); + sk->sk_sndbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]); return 0; } diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 43509c7e90fc..f1c3b8eb4b3d 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -517,7 +517,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock, timer_setup(&sk->sk_timer, tipc_sk_timeout, 0); sk->sk_shutdown = 0; sk->sk_backlog_rcv = tipc_sk_backlog_rcv; - sk->sk_rcvbuf = sysctl_tipc_rmem[1]; + sk->sk_rcvbuf = READ_ONCE(sysctl_tipc_rmem[1]); sk->sk_data_ready = tipc_data_ready; sk->sk_write_space = tipc_write_space; sk->sk_destruct = tipc_sock_destruct; From 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:01 -0700 Subject: [PATCH 045/121] tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns. While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ff2e0d87aee4..813744180d87 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5542,7 +5542,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) if (tp->srtt_us && tp->srtt_us < rtt) rtt = tp->srtt_us; - delay = min_t(unsigned long, sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns, + delay = min_t(unsigned long, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns), rtt * (NSEC_PER_USEC >> 3)/20); sock_hold(sk); hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay), From 22396941a7f343d704738360f9ef0e6576489d43 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:02 -0700 Subject: [PATCH 046/121] tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns. While reading sysctl_tcp_comp_sack_slack_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 813744180d87..a4d8851d83ff 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5547,7 +5547,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) rtt * (NSEC_PER_USEC >> 3)/20); sock_hold(sk); hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay), - sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns), HRTIMER_MODE_REL_PINNED_SOFT); } From 79f55473bfc8ac51bd6572929a679eeb4da22251 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:03 -0700 Subject: [PATCH 047/121] tcp: Fix a data-race around sysctl_tcp_comp_sack_nr. While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a4d8851d83ff..b1637990d570 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5521,7 +5521,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) } if (!tcp_is_sack(tp) || - tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr) + tp->compressed_ack >= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)) goto send_now; if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) { From 870e3a634b6a6cb1543b359007aca73fe6a03ac5 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:04 -0700 Subject: [PATCH 048/121] tcp: Fix data-races around sysctl_tcp_reflect_tos. While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima Acked-by: Wei Wang Signed-off-by: David S. Miller --- net/ipv4/tcp_ipv4.c | 4 ++-- net/ipv6/tcp_ipv6.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index d16e6e40f47b..586c102ce152 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1006,7 +1006,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, if (skb) { __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr); - tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? + tos = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ? (tcp_rsk(req)->syn_tos & ~INET_ECN_MASK) | (inet_sk(sk)->tos & INET_ECN_MASK) : inet_sk(sk)->tos; @@ -1526,7 +1526,7 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, /* Set ToS of the new socket based upon the value of incoming SYN. * ECT bits are set later in tcp_init_transfer(). */ - if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos)) newinet->tos = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK; if (!dst) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 9d3ede293258..be09941fe6d9 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -546,7 +546,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, if (np->repflow && ireq->pktopts) fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts)); - tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? + tclass = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ? (tcp_rsk(req)->syn_tos & ~INET_ECN_MASK) | (np->tclass & INET_ECN_MASK) : np->tclass; @@ -1314,7 +1314,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * /* Set ToS of the new socket based upon the value of incoming SYN. * ECT bits are set later in tcp_init_transfer(). */ - if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos)) newnp->tclass = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK; /* Clone native IPv6 options from listening socket (if any) From 96b9bd8c6d125490f9adfb57d387ef81a55a103e Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 22 Jul 2022 11:22:05 -0700 Subject: [PATCH 049/121] ipv4: Fix data-races around sysctl_fib_notify_on_flag_change. While reading sysctl_fib_notify_on_flag_change, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/fib_trie.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 46e8a5125853..452ff177e4da 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1042,6 +1042,7 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri) void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri) { + u8 fib_notify_on_flag_change; struct fib_alias *fa_match; struct sk_buff *skb; int err; @@ -1063,14 +1064,16 @@ void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri) WRITE_ONCE(fa_match->offload, fri->offload); WRITE_ONCE(fa_match->trap, fri->trap); + fib_notify_on_flag_change = READ_ONCE(net->ipv4.sysctl_fib_notify_on_flag_change); + /* 2 means send notifications only if offload_failed was changed. */ - if (net->ipv4.sysctl_fib_notify_on_flag_change == 2 && + if (fib_notify_on_flag_change == 2 && READ_ONCE(fa_match->offload_failed) == fri->offload_failed) goto out; WRITE_ONCE(fa_match->offload_failed, fri->offload_failed); - if (!net->ipv4.sysctl_fib_notify_on_flag_change) + if (!fib_notify_on_flag_change) goto out; skb = nlmsg_new(fib_nlmsg_size(fa_match->fa_info), GFP_ATOMIC); From a7a47a5dfa9a9692a41764ee9ab4054f12924a42 Mon Sep 17 00:00:00 2001 From: Umesh Nerlige Ramappa Date: Tue, 21 Jun 2022 12:21:05 -0700 Subject: [PATCH 050/121] drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend For execlists backend, current implementation of Wa_22011802037 is to stop the CS before doing a reset of the engine. This WA was further extended to wait for any pending MI FORCE WAKEUPs before issuing a reset. Add the extended steps in the execlist path of reset. In addition, extend the WA to gen11. v2: (Tvrtko) - Clarify comments, commit message, fix typos - Use IS_GRAPHICS_VER for gen 11/12 checks v3: (Daneile) - Drop changes to intel_ring_submission since WA does not apply to it - Log an error if MSG IDLE is not defined for an engine Signed-off-by: Umesh Nerlige Ramappa Fixes: f6aa0d713c88 ("drm/i915: Add Wa_22011802037 force cs halt") Acked-by: Tvrtko Ursulin Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: John Harrison Link: https://patchwork.freedesktop.org/patch/msgid/20220621192105.2100585-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 0667429ce68e0b08f9f1fec8fd0b1f57228f605e) Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 2 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 88 ++++++++++++++++++- .../drm/i915/gt/intel_execlists_submission.c | 7 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ++--------------- 5 files changed, 103 insertions(+), 79 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 1431f1e9dbee..04e435bce79b 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -201,6 +201,8 @@ int intel_ring_submission_setup(struct intel_engine_cs *engine); int intel_engine_stop_cs(struct intel_engine_cs *engine); void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine); +void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs *engine); + void intel_engine_set_hwsp_writemask(struct intel_engine_cs *engine, u32 mask); u64 intel_engine_get_active_head(const struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 14c6ddbbfde8..5b6ce10cb158 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1282,10 +1282,10 @@ static int __intel_engine_stop_cs(struct intel_engine_cs *engine, intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING)); /* - * Wa_22011802037 : gen12, Prior to doing a reset, ensure CS is + * Wa_22011802037 : gen11, gen12, Prior to doing a reset, ensure CS is * stopped, set ring stop bit and prefetch disable bit to halt CS */ - if (GRAPHICS_VER(engine->i915) == 12) + if (IS_GRAPHICS_VER(engine->i915, 11, 12)) intel_uncore_write_fw(uncore, RING_MODE_GEN7(engine->mmio_base), _MASKED_BIT_ENABLE(GEN12_GFX_PREFETCH_DISABLE)); @@ -1308,6 +1308,18 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine) return -ENODEV; ENGINE_TRACE(engine, "\n"); + /* + * TODO: Find out why occasionally stopping the CS times out. Seen + * especially with gem_eio tests. + * + * Occasionally trying to stop the cs times out, but does not adversely + * affect functionality. The timeout is set as a config parameter that + * defaults to 100ms. In most cases the follow up operation is to wait + * for pending MI_FORCE_WAKES. The assumption is that this timeout is + * sufficient for any pending MI_FORCEWAKEs to complete. Once root + * caused, the caller must check and handle the return from this + * function. + */ if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) { ENGINE_TRACE(engine, "timed out on STOP_RING -> IDLE; HEAD:%04x, TAIL:%04x\n", @@ -1334,6 +1346,78 @@ void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine) ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); } +static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine) +{ + static const i915_reg_t _reg[I915_NUM_ENGINES] = { + [RCS0] = MSG_IDLE_CS, + [BCS0] = MSG_IDLE_BCS, + [VCS0] = MSG_IDLE_VCS0, + [VCS1] = MSG_IDLE_VCS1, + [VCS2] = MSG_IDLE_VCS2, + [VCS3] = MSG_IDLE_VCS3, + [VCS4] = MSG_IDLE_VCS4, + [VCS5] = MSG_IDLE_VCS5, + [VCS6] = MSG_IDLE_VCS6, + [VCS7] = MSG_IDLE_VCS7, + [VECS0] = MSG_IDLE_VECS0, + [VECS1] = MSG_IDLE_VECS1, + [VECS2] = MSG_IDLE_VECS2, + [VECS3] = MSG_IDLE_VECS3, + [CCS0] = MSG_IDLE_CS, + [CCS1] = MSG_IDLE_CS, + [CCS2] = MSG_IDLE_CS, + [CCS3] = MSG_IDLE_CS, + }; + u32 val; + + if (!_reg[engine->id].reg) { + drm_err(&engine->i915->drm, + "MSG IDLE undefined for engine id %u\n", engine->id); + return 0; + } + + val = intel_uncore_read(engine->uncore, _reg[engine->id]); + + /* bits[29:25] & bits[13:9] >> shift */ + return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT; +} + +static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32 fw_mask) +{ + int ret; + + /* Ensure GPM receives fw up/down after CS is stopped */ + udelay(1); + + /* Wait for forcewake request to complete in GPM */ + ret = __intel_wait_for_register_fw(gt->uncore, + GEN9_PWRGT_DOMAIN_STATUS, + fw_mask, fw_mask, 5000, 0, NULL); + + /* Ensure CS receives fw ack from GPM */ + udelay(1); + + if (ret) + GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret); +} + +/* + * Wa_22011802037:gen12: In addition to stopping the cs, we need to wait for any + * pending MI_FORCE_WAKEUP requests that the CS has initiated to complete. The + * pending status is indicated by bits[13:9] (masked by bits[29:25]) in the + * MSG_IDLE register. There's one MSG_IDLE register per reset domain. Since we + * are concerned only with the gt reset here, we use a logical OR of pending + * forcewakeups from all reset domains and then wait for them to complete by + * querying PWRGT_DOMAIN_STATUS. + */ +void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs *engine) +{ + u32 fw_pending = __cs_pending_mi_force_wakes(engine); + + if (fw_pending) + __gpm_wait_for_fw_complete(engine->gt, fw_pending); +} + static u32 read_subslice_reg(const struct intel_engine_cs *engine, int slice, int subslice, i915_reg_t reg) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 2b0266cab66b..0627fa10d2dc 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2968,6 +2968,13 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) ring_set_paused(engine, 1); intel_engine_stop_cs(engine); + /* + * Wa_22011802037:gen11/gen12: In addition to stopping the cs, we need + * to wait for any pending mi force wakeups + */ + if (IS_GRAPHICS_VER(engine->i915, 11, 12)) + intel_engine_wait_for_pending_mi_fw(engine); + engine->execlists.reset_ccid = active_ccid(engine); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 2c4ad4a65089..8c6885f43d1a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -310,8 +310,8 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc) if (IS_DG2(gt->i915)) flags |= GUC_WA_DUAL_QUEUE; - /* Wa_22011802037: graphics version 12 */ - if (GRAPHICS_VER(gt->i915) == 12) + /* Wa_22011802037: graphics version 11/12 */ + if (IS_GRAPHICS_VER(gt->i915, 11, 12)) flags |= GUC_WA_PRE_PARSER; /* Wa_16011777198:dg2 */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9ffb343d0f79..2d9f5f1c79d3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1578,87 +1578,18 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) lrc_update_regs(ce, engine, head); } -static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine) -{ - static const i915_reg_t _reg[I915_NUM_ENGINES] = { - [RCS0] = MSG_IDLE_CS, - [BCS0] = MSG_IDLE_BCS, - [VCS0] = MSG_IDLE_VCS0, - [VCS1] = MSG_IDLE_VCS1, - [VCS2] = MSG_IDLE_VCS2, - [VCS3] = MSG_IDLE_VCS3, - [VCS4] = MSG_IDLE_VCS4, - [VCS5] = MSG_IDLE_VCS5, - [VCS6] = MSG_IDLE_VCS6, - [VCS7] = MSG_IDLE_VCS7, - [VECS0] = MSG_IDLE_VECS0, - [VECS1] = MSG_IDLE_VECS1, - [VECS2] = MSG_IDLE_VECS2, - [VECS3] = MSG_IDLE_VECS3, - [CCS0] = MSG_IDLE_CS, - [CCS1] = MSG_IDLE_CS, - [CCS2] = MSG_IDLE_CS, - [CCS3] = MSG_IDLE_CS, - }; - u32 val; - - if (!_reg[engine->id].reg) - return 0; - - val = intel_uncore_read(engine->uncore, _reg[engine->id]); - - /* bits[29:25] & bits[13:9] >> shift */ - return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT; -} - -static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32 fw_mask) -{ - int ret; - - /* Ensure GPM receives fw up/down after CS is stopped */ - udelay(1); - - /* Wait for forcewake request to complete in GPM */ - ret = __intel_wait_for_register_fw(gt->uncore, - GEN9_PWRGT_DOMAIN_STATUS, - fw_mask, fw_mask, 5000, 0, NULL); - - /* Ensure CS receives fw ack from GPM */ - udelay(1); - - if (ret) - GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret); -} - -/* - * Wa_22011802037:gen12: In addition to stopping the cs, we need to wait for any - * pending MI_FORCE_WAKEUP requests that the CS has initiated to complete. The - * pending status is indicated by bits[13:9] (masked by bits[ 29:25]) in the - * MSG_IDLE register. There's one MSG_IDLE register per reset domain. Since we - * are concerned only with the gt reset here, we use a logical OR of pending - * forcewakeups from all reset domains and then wait for them to complete by - * querying PWRGT_DOMAIN_STATUS. - */ static void guc_engine_reset_prepare(struct intel_engine_cs *engine) { - u32 fw_pending; - - if (GRAPHICS_VER(engine->i915) != 12) + if (!IS_GRAPHICS_VER(engine->i915, 11, 12)) return; - /* - * Wa_22011802037 - * TODO: Occasionally trying to stop the cs times out, but does not - * adversely affect functionality. The timeout is set as a config - * parameter that defaults to 100ms. Assuming that this timeout is - * sufficient for any pending MI_FORCEWAKEs to complete, ignore the - * timeout returned here until it is root caused. - */ intel_engine_stop_cs(engine); - fw_pending = __cs_pending_mi_force_wakes(engine); - if (fw_pending) - __gpm_wait_for_fw_complete(engine->gt, fw_pending); + /* + * Wa_22011802037:gen11/gen12: In addition to stopping the cs, we need + * to wait for any pending mi force wakeups + */ + intel_engine_wait_for_pending_mi_fw(engine); } static void guc_reset_nop(struct intel_engine_cs *engine) From d295ad34f236c3518634fb6403d4c0160456e470 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Sat, 23 Jul 2022 15:59:32 -0400 Subject: [PATCH 051/121] intel_idle: Fix false positive RCU splats due to incorrect hardirqs state Commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE") uses raw_local_irq_enable/local_irq_disable() around call to __intel_idle() in intel_idle_irq(). With interrupt enabled, timer tick interrupt can happen and a subsequently call to __do_softirq() may change the lockdep hardirqs state of a debug kernel back to 'on'. This will result in a mismatch between the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing a number of false positive "WARNING: suspicious RCU usage" splats. Fix that by using local_irq_disable() to disable interrupt in intel_idle_irq(). Fixes: 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE") Signed-off-by: Waiman Long Cc: 5.16+ # 5.16+ Signed-off-by: Rafael J. Wysocki --- drivers/idle/intel_idle.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index f5c6802aa6c3..907700d1e78e 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -162,7 +162,13 @@ static __cpuidle int intel_idle_irq(struct cpuidle_device *dev, raw_local_irq_enable(); ret = __intel_idle(dev, drv, index); - raw_local_irq_disable(); + + /* + * The lockdep hardirqs state may be changed to 'on' with timer + * tick interrupt followed by __do_softirq(). Use local_irq_disable() + * to keep the hardirqs state correct. + */ + local_irq_disable(); return ret; } From c653c591789b3acfa4bf6ae45d5af4f330e50a91 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Mon, 25 Jul 2022 14:37:29 +1000 Subject: [PATCH 052/121] drm/amdgpu: Re-enable DCN for 64-bit powerpc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine") disabled the DCN driver for all of powerpc due to unresolved build failures with some compilers. Further digging shows that the build failures only occur with compilers that default to 64-bit long double. Both the ppc64 and ppc64le ABIs define long double to be 128-bits, but there are compilers in the wild that default to 64-bits. The compilers provided by the major distros (Fedora, Ubuntu) default to 128-bits and are not affected by the build failure. There is a compiler flag to force 128-bit long double, which may be the correct long term fix, but as an interim fix only allow building the DCN driver if long double is 128-bits by default. The bisection in commit d11219ad53dc must have gone off the rails at some point, the build failure occurs all the way back to the original commit that enabled DCN support on powerpc, at least with some toolchains. Depends-on: d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine") Fixes: 16a9dea110a6 ("amdgpu: Enable initial DCN support on POWER") Signed-off-by: Michael Ellerman Acked-by: Alex Deucher Reviewed-by: Dan Horák Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2100 Link: https://lore.kernel.org/r/20220725123918.1903255-1-mpe@ellerman.id.au --- arch/powerpc/Kconfig | 4 ++++ drivers/gpu/drm/amd/display/Kconfig | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 7aa12e88c580..287cc2d4a4b3 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -281,6 +281,10 @@ config PPC # Please keep this list sorted alphabetically. # +config PPC_LONG_DOUBLE_128 + depends on PPC64 + def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1) + config PPC_BARRIER_NOSPEC bool default y diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 0ba0598eba20..ec6771e87e73 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -6,7 +6,7 @@ config DRM_AMD_DC bool "AMD DC - Enable new display engine" default y select SND_HDA_COMPONENT if SND_HDA_CORE - select DRM_AMD_DC_DCN if X86 && !(KCOV_INSTRUMENT_ALL && KCOV_ENABLE_COMPARISONS) + select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128) && !(KCOV_INSTRUMENT_ALL && KCOV_ENABLE_COMPARISONS) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and From 5fcbb711024aac6d4db385623e6f2fdf019f7782 Mon Sep 17 00:00:00 2001 From: Michal Maloszewski Date: Fri, 22 Jul 2022 10:54:01 -0700 Subject: [PATCH 053/121] i40e: Fix interface init with MSI interrupts (no MSI-X) Fix the inability to bring an interface up on a setup with only MSI interrupts enabled (no MSI-X). Solution is to add a default number of QPs = 1. This is enough, since without MSI-X support driver enables only a basic feature set. Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use") Signed-off-by: Dawid Lukwinski Signed-off-by: Michal Maloszewski Tested-by: Dave Switzer Signed-off-by: Tony Nguyen Link: https://lore.kernel.org/r/20220722175401.112572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 7f1a0d90dc51..685556e968f2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1925,11 +1925,15 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi, * non-zero req_queue_pairs says that user requested a new * queue count via ethtool's set_channels, so use this * value for queues distribution across traffic classes + * We need at least one queue pair for the interface + * to be usable as we see in else statement. */ if (vsi->req_queue_pairs > 0) vsi->num_queue_pairs = vsi->req_queue_pairs; else if (pf->flags & I40E_FLAG_MSIX_ENABLED) vsi->num_queue_pairs = pf->num_lan_msix; + else + vsi->num_queue_pairs = 1; } /* Number of queues per enabled TC */ From c7560d1203b7a1ea0b99a5c575547e95d564b2a8 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 23 Jul 2022 04:24:11 +0300 Subject: [PATCH 054/121] net: dsa: fix reference counting for LAG FDBs Due to an invalid conflict resolution on my side while working on 2 different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add() does not store the database associated with a dsa_mac_addr structure. So after adding an FDB entry associated with a LAG, dsa_mac_addr_find() fails to find it while deleting it, because &a->db is zeroized memory for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del() returns -ENOENT rather than deleting the entry. Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/switch.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dsa/switch.c b/net/dsa/switch.c index 2b56218fc57c..4dfd68cf61c5 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -344,6 +344,7 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag, ether_addr_copy(a->addr, addr); a->vid = vid; + a->db = db; refcount_set(&a->refcount, 1); list_add_tail(&a->list, &lag->fdbs); From b89fc26f741d9f9efb51cba3e9b241cf1380ec5a Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Sat, 23 Jul 2022 09:58:09 +0800 Subject: [PATCH 055/121] sctp: fix sleep in atomic context bug in timer handlers There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou Acked-by: Marcelo Ricardo Leitner Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski --- net/sctp/stream_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c index 518b1b9bf89d..1ad565ed5627 100644 --- a/net/sctp/stream_sched.c +++ b/net/sctp/stream_sched.c @@ -160,7 +160,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc, if (!SCTP_SO(&asoc->stream, i)->ext) continue; - ret = n->init_sid(&asoc->stream, i, GFP_KERNEL); + ret = n->init_sid(&asoc->stream, i, GFP_ATOMIC); if (ret) goto err; } From b354eaeec8637d87003945439209251d76a2bb95 Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Sun, 24 Jul 2022 13:51:13 +0530 Subject: [PATCH 056/121] octeontx2-pf: cn10k: Fix egress ratelimit configuration NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2 to CN10K. CN10K supports larger burst size. Fix burst exponent and burst mantissa configuration for CN10K. Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps' passed by stack is also u64. Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload") Signed-off-by: Sunil Goutham Signed-off-by: Subbaraya Sundeep Signed-off-by: Paolo Abeni --- .../ethernet/marvell/octeontx2/nic/otx2_tc.c | 76 ++++++++++++++----- 1 file changed, 55 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c index 28b19945d716..fa83cf2c9c63 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c @@ -28,6 +28,9 @@ #define MAX_RATE_EXPONENT 0x0FULL #define MAX_RATE_MANTISSA 0xFFULL +#define CN10K_MAX_BURST_MANTISSA 0x7FFFULL +#define CN10K_MAX_BURST_SIZE 8453888ULL + /* Bitfields in NIX_TLX_PIR register */ #define TLX_RATE_MANTISSA GENMASK_ULL(8, 1) #define TLX_RATE_EXPONENT GENMASK_ULL(12, 9) @@ -35,6 +38,9 @@ #define TLX_BURST_MANTISSA GENMASK_ULL(36, 29) #define TLX_BURST_EXPONENT GENMASK_ULL(40, 37) +#define CN10K_TLX_BURST_MANTISSA GENMASK_ULL(43, 29) +#define CN10K_TLX_BURST_EXPONENT GENMASK_ULL(47, 44) + struct otx2_tc_flow_stats { u64 bytes; u64 pkts; @@ -77,33 +83,42 @@ int otx2_tc_alloc_ent_bitmap(struct otx2_nic *nic) } EXPORT_SYMBOL(otx2_tc_alloc_ent_bitmap); -static void otx2_get_egress_burst_cfg(u32 burst, u32 *burst_exp, - u32 *burst_mantissa) +static void otx2_get_egress_burst_cfg(struct otx2_nic *nic, u32 burst, + u32 *burst_exp, u32 *burst_mantissa) { + int max_burst, max_mantissa; unsigned int tmp; + if (is_dev_otx2(nic->pdev)) { + max_burst = MAX_BURST_SIZE; + max_mantissa = MAX_BURST_MANTISSA; + } else { + max_burst = CN10K_MAX_BURST_SIZE; + max_mantissa = CN10K_MAX_BURST_MANTISSA; + } + /* Burst is calculated as * ((256 + BURST_MANTISSA) << (1 + BURST_EXPONENT)) / 256 * Max supported burst size is 130,816 bytes. */ - burst = min_t(u32, burst, MAX_BURST_SIZE); + burst = min_t(u32, burst, max_burst); if (burst) { *burst_exp = ilog2(burst) ? ilog2(burst) - 1 : 0; tmp = burst - rounddown_pow_of_two(burst); - if (burst < MAX_BURST_MANTISSA) + if (burst < max_mantissa) *burst_mantissa = tmp * 2; else *burst_mantissa = tmp / (1ULL << (*burst_exp - 7)); } else { *burst_exp = MAX_BURST_EXPONENT; - *burst_mantissa = MAX_BURST_MANTISSA; + *burst_mantissa = max_mantissa; } } -static void otx2_get_egress_rate_cfg(u32 maxrate, u32 *exp, +static void otx2_get_egress_rate_cfg(u64 maxrate, u32 *exp, u32 *mantissa, u32 *div_exp) { - unsigned int tmp; + u64 tmp; /* Rate calculation by hardware * @@ -132,21 +147,44 @@ static void otx2_get_egress_rate_cfg(u32 maxrate, u32 *exp, } } -static int otx2_set_matchall_egress_rate(struct otx2_nic *nic, u32 burst, u32 maxrate) +static u64 otx2_get_txschq_rate_regval(struct otx2_nic *nic, + u64 maxrate, u32 burst) +{ + u32 burst_exp, burst_mantissa; + u32 exp, mantissa, div_exp; + u64 regval = 0; + + /* Get exponent and mantissa values from the desired rate */ + otx2_get_egress_burst_cfg(nic, burst, &burst_exp, &burst_mantissa); + otx2_get_egress_rate_cfg(maxrate, &exp, &mantissa, &div_exp); + + if (is_dev_otx2(nic->pdev)) { + regval = FIELD_PREP(TLX_BURST_EXPONENT, (u64)burst_exp) | + FIELD_PREP(TLX_BURST_MANTISSA, (u64)burst_mantissa) | + FIELD_PREP(TLX_RATE_DIVIDER_EXPONENT, div_exp) | + FIELD_PREP(TLX_RATE_EXPONENT, exp) | + FIELD_PREP(TLX_RATE_MANTISSA, mantissa) | BIT_ULL(0); + } else { + regval = FIELD_PREP(CN10K_TLX_BURST_EXPONENT, (u64)burst_exp) | + FIELD_PREP(CN10K_TLX_BURST_MANTISSA, (u64)burst_mantissa) | + FIELD_PREP(TLX_RATE_DIVIDER_EXPONENT, div_exp) | + FIELD_PREP(TLX_RATE_EXPONENT, exp) | + FIELD_PREP(TLX_RATE_MANTISSA, mantissa) | BIT_ULL(0); + } + + return regval; +} + +static int otx2_set_matchall_egress_rate(struct otx2_nic *nic, + u32 burst, u64 maxrate) { struct otx2_hw *hw = &nic->hw; struct nix_txschq_config *req; - u32 burst_exp, burst_mantissa; - u32 exp, mantissa, div_exp; int txschq, err; /* All SQs share the same TL4, so pick the first scheduler */ txschq = hw->txschq_list[NIX_TXSCH_LVL_TL4][0]; - /* Get exponent and mantissa values from the desired rate */ - otx2_get_egress_burst_cfg(burst, &burst_exp, &burst_mantissa); - otx2_get_egress_rate_cfg(maxrate, &exp, &mantissa, &div_exp); - mutex_lock(&nic->mbox.lock); req = otx2_mbox_alloc_msg_nix_txschq_cfg(&nic->mbox); if (!req) { @@ -157,11 +195,7 @@ static int otx2_set_matchall_egress_rate(struct otx2_nic *nic, u32 burst, u32 ma req->lvl = NIX_TXSCH_LVL_TL4; req->num_regs = 1; req->reg[0] = NIX_AF_TL4X_PIR(txschq); - req->regval[0] = FIELD_PREP(TLX_BURST_EXPONENT, burst_exp) | - FIELD_PREP(TLX_BURST_MANTISSA, burst_mantissa) | - FIELD_PREP(TLX_RATE_DIVIDER_EXPONENT, div_exp) | - FIELD_PREP(TLX_RATE_EXPONENT, exp) | - FIELD_PREP(TLX_RATE_MANTISSA, mantissa) | BIT_ULL(0); + req->regval[0] = otx2_get_txschq_rate_regval(nic, maxrate, burst); err = otx2_sync_mbox_msg(&nic->mbox); mutex_unlock(&nic->mbox.lock); @@ -230,7 +264,7 @@ static int otx2_tc_egress_matchall_install(struct otx2_nic *nic, struct netlink_ext_ack *extack = cls->common.extack; struct flow_action *actions = &cls->rule->action; struct flow_action_entry *entry; - u32 rate; + u64 rate; int err; err = otx2_tc_validate_flow(nic, actions, extack); @@ -256,7 +290,7 @@ static int otx2_tc_egress_matchall_install(struct otx2_nic *nic, } /* Convert bytes per second to Mbps */ rate = entry->police.rate_bytes_ps * 8; - rate = max_t(u32, rate / 1000000, 1); + rate = max_t(u64, rate / 1000000, 1); err = otx2_set_matchall_egress_rate(nic, entry->police.burst, rate); if (err) return err; From 59e1be6f83b928a04189bbf3ab683a1fc6248db3 Mon Sep 17 00:00:00 2001 From: Subbaraya Sundeep Date: Sun, 24 Jul 2022 13:51:14 +0530 Subject: [PATCH 057/121] octeontx2-pf: Fix UDP/TCP src and dst port tc filters Check the mask for non-zero value before installing tc filters for L4 source and destination ports. Otherwise installing a filter for source port installs destination port too and vice-versa. Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic") Signed-off-by: Subbaraya Sundeep Signed-off-by: Sunil Goutham Signed-off-by: Paolo Abeni --- .../ethernet/marvell/octeontx2/nic/otx2_tc.c | 30 +++++++++++-------- 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c index fa83cf2c9c63..e64318c110fd 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c @@ -648,21 +648,27 @@ static int otx2_tc_prepare_flow(struct otx2_nic *nic, struct otx2_tc_flow *node, flow_spec->dport = match.key->dst; flow_mask->dport = match.mask->dst; - if (ip_proto == IPPROTO_UDP) - req->features |= BIT_ULL(NPC_DPORT_UDP); - else if (ip_proto == IPPROTO_TCP) - req->features |= BIT_ULL(NPC_DPORT_TCP); - else if (ip_proto == IPPROTO_SCTP) - req->features |= BIT_ULL(NPC_DPORT_SCTP); + + if (flow_mask->dport) { + if (ip_proto == IPPROTO_UDP) + req->features |= BIT_ULL(NPC_DPORT_UDP); + else if (ip_proto == IPPROTO_TCP) + req->features |= BIT_ULL(NPC_DPORT_TCP); + else if (ip_proto == IPPROTO_SCTP) + req->features |= BIT_ULL(NPC_DPORT_SCTP); + } flow_spec->sport = match.key->src; flow_mask->sport = match.mask->src; - if (ip_proto == IPPROTO_UDP) - req->features |= BIT_ULL(NPC_SPORT_UDP); - else if (ip_proto == IPPROTO_TCP) - req->features |= BIT_ULL(NPC_SPORT_TCP); - else if (ip_proto == IPPROTO_SCTP) - req->features |= BIT_ULL(NPC_SPORT_SCTP); + + if (flow_mask->sport) { + if (ip_proto == IPPROTO_UDP) + req->features |= BIT_ULL(NPC_SPORT_UDP); + else if (ip_proto == IPPROTO_TCP) + req->features |= BIT_ULL(NPC_SPORT_TCP); + else if (ip_proto == IPPROTO_SCTP) + req->features |= BIT_ULL(NPC_SPORT_SCTP); + } } return otx2_tc_parse_actions(nic, &rule->action, req, f, node); From 9b134b1694ec8926926ba6b7b80884ea829245a0 Mon Sep 17 00:00:00 2001 From: Benjamin Poirier Date: Mon, 25 Jul 2022 09:12:36 +0900 Subject: [PATCH 058/121] bridge: Do not send empty IFLA_AF_SPEC attribute After commit b6c02ef54913 ("bridge: Netlink interface fix."), br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a bridge vlan dump is requested but an interface does not have any vlans configured. iproute2 ignores such an empty attribute since commit b262a9becbcb ("bridge: Fix output with empty vlan lists") but older iproute2 versions as well as other utilities have their output changed by the cited kernel commit, resulting in failed test cases. Regardless, emitting an empty attribute is pointless and inefficient. Avoid this change by canceling the attribute if no AF_SPEC data was added. Fixes: b6c02ef54913 ("bridge: Netlink interface fix.") Reviewed-by: Ido Schimmel Signed-off-by: Benjamin Poirier Acked-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni --- net/bridge/br_netlink.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index bb01776d2d88..c96509c442a5 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -589,9 +589,13 @@ static int br_fill_ifinfo(struct sk_buff *skb, } done: + if (af) { + if (nlmsg_get_pos(skb) - (void *)af > nla_attr_size(0)) + nla_nest_end(skb, af); + else + nla_nest_cancel(skb, af); + } - if (af) - nla_nest_end(skb, af); nlmsg_end(skb, nlh); return 0; From 0c09bc33aa8e9dc867300acaadc318c2f0d85a1e Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Mon, 25 Jul 2022 16:36:29 -0700 Subject: [PATCH 059/121] drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When booting a kernel compiled with clang's CFI protection (CONFIG_CFI_CLANG), there is a CFI failure in drm_simple_kms_crtc_mode_valid() when trying to call simpledrm_simple_display_pipe_mode_valid() through ->mode_valid(): [ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8): ... [ 0.324928] Call trace: [ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60 [ 0.325053] __cfi_check_fail+0x3c/0x44 [ 0.325120] __cfi_slowpath_diag+0x178/0x200 [ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80 [ 0.325279] __drm_helper_update_and_validate+0x31c/0x464 ... The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs' expects a return type of 'enum drm_mode_status', not 'int'. Correct it to fix the CFI failure. Cc: stable@vger.kernel.org Fixes: 11e8f5fd223b ("drm: Add simpledrm driver") Link: https://github.com/ClangBuiltLinux/linux/issues/1647 Reported-by: Tomasz PaweÅ‚ Gajc Signed-off-by: Nathan Chancellor Signed-off-by: Thomas Zimmermann Reviewed-by: Sami Tolvanen Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-nathan@kernel.org --- drivers/gpu/drm/tiny/simpledrm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c index 768242a78e2b..5422363690e7 100644 --- a/drivers/gpu/drm/tiny/simpledrm.c +++ b/drivers/gpu/drm/tiny/simpledrm.c @@ -627,7 +627,7 @@ static const struct drm_connector_funcs simpledrm_connector_funcs = { .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, }; -static int +static enum drm_mode_status simpledrm_simple_display_pipe_mode_valid(struct drm_simple_display_pipe *pipe, const struct drm_display_mode *mode) { From 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 26 Jul 2022 12:42:06 +0200 Subject: [PATCH 060/121] netfilter: nf_queue: do not allow packet truncation below transport header offset Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano Signed-off-by: Florian Westphal Reviewed-by: Pablo Neira Ayuso --- net/netfilter/nfnetlink_queue.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index a364f8e5e698..87a9009d5234 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -843,11 +843,16 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) } static int -nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff) +nfqnl_mangle(void *data, unsigned int data_len, struct nf_queue_entry *e, int diff) { struct sk_buff *nskb; if (diff < 0) { + unsigned int min_len = skb_transport_offset(e->skb); + + if (data_len < min_len) + return -EINVAL; + if (pskb_trim(e->skb, data_len)) return -ENOMEM; } else if (diff > 0) { From 81ea010667417ef3f218dfd99b69769fe66c2b67 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 26 Jul 2022 12:44:35 +0200 Subject: [PATCH 061/121] netfilter: nf_tables: add rescheduling points during loop detection walks Add explicit rescheduling points during ruleset walk. Switching to a faster algorithm is possible but this is a much smaller change, suitable for nf tree. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460 Signed-off-by: Florian Westphal Acked-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 646d5fd53604..9f976b11d896 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3340,6 +3340,8 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) if (err < 0) return err; } + + cond_resched(); } return 0; @@ -9367,9 +9369,13 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx, break; } } + + cond_resched(); } list_for_each_entry(set, &ctx->table->sets, list) { + cond_resched(); + if (!nft_is_active_next(ctx->net, set)) continue; if (!(set->flags & NFT_SET_MAP) || From 47f4f510ad586032b85c89a0773fbb011d412425 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 26 Jul 2022 19:49:00 +0200 Subject: [PATCH 062/121] netfilter: nft_queue: only allow supported familes and hooks Trying to use 'queue' statement in ingress (for example) triggers a splat on reinject: WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291 ... because nf_reinject cannot find the ruleset head. The netdev family doesn't support async resume at the moment anyway, so disallow loading such rulesets with a more appropriate error message. v2: add 'validate' callback and also check hook points, v1 did allow ingress use in 'table inet', but that doesn't work either. (Pablo) Signed-off-by: Florian Westphal Reviewed-by: Pablo Neira Ayuso --- net/netfilter/nft_queue.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c index 15e4b7640dc0..da29e92c03e2 100644 --- a/net/netfilter/nft_queue.c +++ b/net/netfilter/nft_queue.c @@ -68,6 +68,31 @@ static void nft_queue_sreg_eval(const struct nft_expr *expr, regs->verdict.code = ret; } +static int nft_queue_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + static const unsigned int supported_hooks = ((1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING)); + + switch (ctx->family) { + case NFPROTO_IPV4: + case NFPROTO_IPV6: + case NFPROTO_INET: + case NFPROTO_BRIDGE: + break; + case NFPROTO_NETDEV: /* lacks okfn */ + fallthrough; + default: + return -EOPNOTSUPP; + } + + return nft_chain_validate_hooks(ctx->chain, supported_hooks); +} + static const struct nla_policy nft_queue_policy[NFTA_QUEUE_MAX + 1] = { [NFTA_QUEUE_NUM] = { .type = NLA_U16 }, [NFTA_QUEUE_TOTAL] = { .type = NLA_U16 }, @@ -164,6 +189,7 @@ static const struct nft_expr_ops nft_queue_ops = { .eval = nft_queue_eval, .init = nft_queue_init, .dump = nft_queue_dump, + .validate = nft_queue_validate, .reduce = NFT_REDUCE_READONLY, }; @@ -173,6 +199,7 @@ static const struct nft_expr_ops nft_queue_sreg_ops = { .eval = nft_queue_sreg_eval, .init = nft_queue_sreg_init, .dump = nft_queue_sreg_dump, + .validate = nft_queue_validate, .reduce = NFT_REDUCE_READONLY, }; From 1e308c6fb7127371f48a0fb9770ea0b30a6b5698 Mon Sep 17 00:00:00 2001 From: Przemyslaw Patynowski Date: Mon, 4 Jul 2022 15:46:12 +0200 Subject: [PATCH 063/121] ice: Fix max VLANs available for VF Legacy VLAN implementation allows for untrusted VF to have 8 VLAN filters, not counting VLAN 0 filters. Current VLAN_V2 implementation lowers available filters for VF, by counting in VLAN 0 filter for both TPIDs. Fix this by counting only non zero VLAN filters. Without this patch, untrusted VF would not be able to access 8 VLAN filters. Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski Signed-off-by: Mateusz Palczewski Tested-by: Marek Szlosek Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 4547bc1f7cee..24188ec594d5 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -2948,7 +2948,8 @@ ice_vc_validate_add_vlan_filter_list(struct ice_vsi *vsi, struct virtchnl_vlan_filtering_caps *vfc, struct virtchnl_vlan_filter_list_v2 *vfl) { - u16 num_requested_filters = vsi->num_vlan + vfl->num_elements; + u16 num_requested_filters = ice_vsi_num_non_zero_vlans(vsi) + + vfl->num_elements; if (num_requested_filters > vfc->max_filters) return false; From 01658aeeada6f93c2924af94d895ff28d559690c Mon Sep 17 00:00:00 2001 From: Przemyslaw Patynowski Date: Mon, 18 Jul 2022 13:34:27 +0200 Subject: [PATCH 064/121] ice: Fix tunnel checksum offload with fragmented traffic Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: 69e66c04c672 ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski Signed-off-by: Mateusz Palczewski Signed-off-by: Jedrzej Jagielski Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_txrx.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 3f8b7274ed2f..836dce840712 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1751,11 +1751,13 @@ int ice_tx_csum(struct ice_tx_buf *first, struct ice_tx_offload_params *off) protocol = vlan_get_protocol(skb); - if (eth_p_mpls(protocol)) + if (eth_p_mpls(protocol)) { ip.hdr = skb_inner_network_header(skb); - else + l4.hdr = skb_checksum_start(skb); + } else { ip.hdr = skb_network_header(skb); - l4.hdr = skb_checksum_start(skb); + l4.hdr = skb_transport_header(skb); + } /* compute outer L2 header size */ l2_len = ip.hdr - skb->data; From 5c8e3c7ff3e7bd7b938659be704f75cc746b697f Mon Sep 17 00:00:00 2001 From: Anirudh Venkataramanan Date: Thu, 21 Jul 2022 10:03:09 +0200 Subject: [PATCH 065/121] ice: Fix VSIs unable to share unicast MAC The driver currently does not allow two VSIs in the same PF domain to have the same unicast MAC address. This is incorrect in the sense that a policy decision is being made in the driver when it must be left to the user. This approach was causing issues when rebooting the system with VFs spawned not being able to change their MAC addresses. Such errors were present in dmesg: [ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1 Fix that by removing this restriction. Doing this also allows us to remove some additional code that's checking if a unicast MAC filter already exists. Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC") Signed-off-by: Anirudh Venkataramanan Signed-off-by: Sylwester Dziedziuch Signed-off-by: Mateusz Palczewski Signed-off-by: Jedrzej Jagielski Tested-by: Marek Szlosek Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 2 ++ drivers/net/ethernet/intel/ice/ice_sriov.c | 40 ---------------------- 2 files changed, 2 insertions(+), 40 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index ff2eac2f8c64..b41a45c03d22 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4656,6 +4656,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) ice_set_safe_mode_caps(hw); } + hw->ucast_shared = true; + err = ice_init_pf(pf); if (err) { dev_err(dev, "ice_init_pf failed: %d\n", err); diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c index bb1721f1321d..f4907a3c2d19 100644 --- a/drivers/net/ethernet/intel/ice/ice_sriov.c +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c @@ -1309,39 +1309,6 @@ ice_get_vf_cfg(struct net_device *netdev, int vf_id, struct ifla_vf_info *ivi) return ret; } -/** - * ice_unicast_mac_exists - check if the unicast MAC exists on the PF's switch - * @pf: PF used to reference the switch's rules - * @umac: unicast MAC to compare against existing switch rules - * - * Return true on the first/any match, else return false - */ -static bool ice_unicast_mac_exists(struct ice_pf *pf, u8 *umac) -{ - struct ice_sw_recipe *mac_recipe_list = - &pf->hw.switch_info->recp_list[ICE_SW_LKUP_MAC]; - struct ice_fltr_mgmt_list_entry *list_itr; - struct list_head *rule_head; - struct mutex *rule_lock; /* protect MAC filter list access */ - - rule_head = &mac_recipe_list->filt_rules; - rule_lock = &mac_recipe_list->filt_rule_lock; - - mutex_lock(rule_lock); - list_for_each_entry(list_itr, rule_head, list_entry) { - u8 *existing_mac = &list_itr->fltr_info.l_data.mac.mac_addr[0]; - - if (ether_addr_equal(existing_mac, umac)) { - mutex_unlock(rule_lock); - return true; - } - } - - mutex_unlock(rule_lock); - - return false; -} - /** * ice_set_vf_mac * @netdev: network interface device structure @@ -1376,13 +1343,6 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) if (ret) goto out_put_vf; - if (ice_unicast_mac_exists(pf, mac)) { - netdev_err(netdev, "Unicast MAC %pM already exists on this PF. Preventing setting VF %u unicast MAC address to %pM\n", - mac, vf_id, mac); - ret = -EINVAL; - goto out_put_vf; - } - mutex_lock(&vf->cfg_lock); /* VF is notified of its new MAC via the PF's response to the From 283d736ff7c7e96ac5b32c6c0de40372f8eb171e Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 7 Jul 2022 12:20:42 +0200 Subject: [PATCH 066/121] ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS) Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski Tested-by: George Kuruvinakunnel Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 70335f6e8524..4efa5e5846e0 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -658,7 +658,8 @@ static int ice_lbtest_receive_frames(struct ice_rx_ring *rx_ring) rx_desc = ICE_RX_DESC(rx_ring, i); if (!(rx_desc->wb.status_error0 & - cpu_to_le16(ICE_TX_DESC_CMD_EOP | ICE_TX_DESC_CMD_RS))) + (cpu_to_le16(BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S)) | + cpu_to_le16(BIT(ICE_RX_FLEX_DESC_STATUS0_EOF_S))))) continue; rx_buf = &rx_ring->rx_buf[i]; From cc019545a238518fa9da1e2a889f6e1bb1005a63 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 7 Jul 2022 12:20:43 +0200 Subject: [PATCH 067/121] ice: do not setup vlan for loopback VSI Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski Tested-by: George Kuruvinakunnel Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index b41a45c03d22..9f02b60459f1 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6013,10 +6013,12 @@ int ice_vsi_cfg(struct ice_vsi *vsi) if (vsi->netdev) { ice_set_rx_mode(vsi->netdev); - err = ice_vsi_vlan_setup(vsi); + if (vsi->type != ICE_VSI_LB) { + err = ice_vsi_vlan_setup(vsi); - if (err) - return err; + if (err) + return err; + } } ice_vsi_cfg_dcb_rings(vsi); From aa40d5a43526cca9439a2b45fcfdcd016594dece Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sun, 17 Jul 2022 21:21:52 +0900 Subject: [PATCH 068/121] wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot Signed-off-by: Tetsuo Handa Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot Acked-by: Toke Høiland-Jørgensen Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski --- net/mac80211/iface.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 15a73b7fdd75..1a9ada411879 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -377,9 +377,8 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, bool going_do bool cancel_scan; struct cfg80211_nan_func *func; - spin_lock_bh(&local->fq.lock); clear_bit(SDATA_STATE_RUNNING, &sdata->state); - spin_unlock_bh(&local->fq.lock); + synchronize_rcu(); /* flush _ieee80211_wake_txqs() */ cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata; if (cancel_scan) From 4b2f4e072fb2599b6a2e5e277f0d2b5705eaa630 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 26 Jul 2022 18:13:44 +0300 Subject: [PATCH 069/121] Bluetooth: mgmt: Fix double free on error path Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/mgmt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index ae758ab1b558..2f91a8c2b678 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -4723,7 +4723,6 @@ static int __add_adv_patterns_monitor(struct sock *sk, struct hci_dev *hdev, else status = MGMT_STATUS_FAILED; - mgmt_pending_remove(cmd); goto unlock; } From ef61b6ea154464fefd8a6712d7a3b43b445c3d4a Mon Sep 17 00:00:00 2001 From: Abhishek Pandit-Subedi Date: Mon, 25 Jul 2022 15:34:21 -0700 Subject: [PATCH 070/121] Bluetooth: Always set event mask on suspend When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_sync.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 1739e8cb3291..c17021642234 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -4973,6 +4973,9 @@ int hci_suspend_sync(struct hci_dev *hdev) return err; } + /* Update event mask so only the allowed event can wakeup the host */ + hci_set_event_mask_sync(hdev); + /* Only configure accept list if disconnect succeeded and wake * isn't being prevented. */ @@ -4984,9 +4987,6 @@ int hci_suspend_sync(struct hci_dev *hdev) /* Unpause to take care of updating scanning params */ hdev->scanning_paused = false; - /* Update event mask so only the allowed event can wakeup the host */ - hci_set_event_mask_sync(hdev); - /* Enable event filter for paired devices */ hci_update_event_filter_sync(hdev); From d0be8347c623e0ac4202a1d4e0373882821f56b0 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 21 Jul 2022 09:10:50 -0700 Subject: [PATCH 071/121] Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones Signed-off-by: Luiz Augusto von Dentz Tested-by: Lee Jones Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/l2cap.h | 1 + net/bluetooth/l2cap_core.c | 61 +++++++++++++++++++++++++++-------- 2 files changed, 49 insertions(+), 13 deletions(-) diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h index 3c4f550e5a8b..2f766e3437ce 100644 --- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -847,6 +847,7 @@ enum { }; void l2cap_chan_hold(struct l2cap_chan *c); +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c); void l2cap_chan_put(struct l2cap_chan *c); static inline void l2cap_chan_lock(struct l2cap_chan *chan) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index ae78490ecd3d..52668662ae8d 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -111,7 +111,8 @@ static struct l2cap_chan *__l2cap_get_chan_by_scid(struct l2cap_conn *conn, } /* Find channel with given SCID. - * Returns locked channel. */ + * Returns a reference locked channel. + */ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn, u16 cid) { @@ -119,15 +120,19 @@ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_scid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; } /* Find channel with given DCID. - * Returns locked channel. + * Returns a reference locked channel. */ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn, u16 cid) @@ -136,8 +141,12 @@ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_dcid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; @@ -162,8 +171,12 @@ static struct l2cap_chan *l2cap_get_chan_by_ident(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_ident(conn, ident); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; @@ -497,6 +510,16 @@ void l2cap_chan_hold(struct l2cap_chan *c) kref_get(&c->kref); } +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c) +{ + BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); + + if (!kref_get_unless_zero(&c->kref)) + return NULL; + + return c; +} + void l2cap_chan_put(struct l2cap_chan *c) { BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); @@ -1968,7 +1991,10 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm, src_match = !bacmp(&c->src, src); dst_match = !bacmp(&c->dst, dst); if (src_match && dst_match) { - l2cap_chan_hold(c); + c = l2cap_chan_hold_unless_zero(c); + if (!c) + continue; + read_unlock(&chan_list_lock); return c; } @@ -1983,7 +2009,7 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm, } if (c1) - l2cap_chan_hold(c1); + c1 = l2cap_chan_hold_unless_zero(c1); read_unlock(&chan_list_lock); @@ -4463,6 +4489,7 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; } @@ -4577,6 +4604,7 @@ static inline int l2cap_config_rsp(struct l2cap_conn *conn, done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; } @@ -5304,6 +5332,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, l2cap_send_move_chan_rsp(chan, result); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5396,6 +5425,7 @@ static void l2cap_move_continue(struct l2cap_conn *conn, u16 icid, u16 result) } l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid, @@ -5425,6 +5455,7 @@ static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid, l2cap_send_move_chan_cfm(chan, L2CAP_MC_UNCONFIRMED); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static int l2cap_move_channel_rsp(struct l2cap_conn *conn, @@ -5488,6 +5519,7 @@ static int l2cap_move_channel_confirm(struct l2cap_conn *conn, l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5523,6 +5555,7 @@ static inline int l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn, } l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5895,12 +5928,11 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn, if (credits > max_credits) { BT_ERR("LE credits overflow"); l2cap_send_disconn_req(chan, ECONNRESET); - l2cap_chan_unlock(chan); /* Return 0 so that we don't trigger an unnecessary * command reject packet. */ - return 0; + goto unlock; } chan->tx_credits += credits; @@ -5911,7 +5943,9 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn, if (chan->tx_credits) chan->ops->resume(chan); +unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -7597,6 +7631,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid, done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm, @@ -8085,7 +8120,7 @@ static struct l2cap_chan *l2cap_global_fixed_chan(struct l2cap_chan *c, if (src_type != c->src_type) continue; - l2cap_chan_hold(c); + c = l2cap_chan_hold_unless_zero(c); read_unlock(&chan_list_lock); return c; } From 0fde22c5420ed258ee538a760291c2f3935f6a01 Mon Sep 17 00:00:00 2001 From: David Jeffery Date: Fri, 22 Jul 2022 10:24:48 -0400 Subject: [PATCH 072/121] scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown During system shutdown or reboot, mpt3sas will reset the firmware back to ready state. However, the driver leaves running a watchdog work item intended to keep the firmware in operational state. This causes a second, unneeded reset on shutdown and moves the firmware back to operational instead of in ready state as intended. And if the mpt3sas_fwfault_debug module parameter is set, this extra reset also panics the system. mpt3sas's scsih_shutdown needs to stop the watchdog before resetting the firmware back to ready state. Link: https://lore.kernel.org/r/20220722142448.6289-1-djeffery@redhat.com Fixes: fae21608c31c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown") Tested-by: Laurence Oberman Acked-by: Sreekanth Reddy Signed-off-by: David Jeffery Signed-off-by: Martin K. Petersen --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index b519f4b59d30..5e8887fa02c8 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -11386,6 +11386,7 @@ scsih_shutdown(struct pci_dev *pdev) _scsih_ir_shutdown(ioc); _scsih_nvme_shutdown(ioc); mpt3sas_base_mask_interrupts(ioc); + mpt3sas_base_stop_watchdog(ioc); ioc->shost_recovery = 1; mpt3sas_base_make_ioc_ready(ioc, SOFT_RESET); ioc->shost_recovery = 0; From a3435afba87dc6cd83f5595e7607f3c40f93ef01 Mon Sep 17 00:00:00 2001 From: Liang He Date: Tue, 19 Jul 2022 15:15:29 +0800 Subject: [PATCH 073/121] scsi: ufs: host: Hold reference returned by of_parse_phandle() In ufshcd_populate_vreg(), we should hold the reference returned by of_parse_phandle() and then use it to call of_node_put() for refcount balance. Link: https://lore.kernel.org/r/20220719071529.1081166-1-windhl@126.com Fixes: aa4976130934 ("ufs: Add regulator enable support") Reviewed-by: Bart Van Assche Signed-off-by: Liang He Signed-off-by: Martin K. Petersen --- drivers/ufs/host/ufshcd-pltfrm.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/ufs/host/ufshcd-pltfrm.c b/drivers/ufs/host/ufshcd-pltfrm.c index e7332cc65b1f..173aea8e9997 100644 --- a/drivers/ufs/host/ufshcd-pltfrm.c +++ b/drivers/ufs/host/ufshcd-pltfrm.c @@ -108,9 +108,20 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba) return ret; } +static bool phandle_exists(const struct device_node *np, + const char *phandle_name, int index) +{ + struct device_node *parse_np = of_parse_phandle(np, phandle_name, index); + + if (parse_np) + of_node_put(parse_np); + + return parse_np != NULL; +} + #define MAX_PROP_SIZE 32 static int ufshcd_populate_vreg(struct device *dev, const char *name, - struct ufs_vreg **out_vreg) + struct ufs_vreg **out_vreg) { char prop_name[MAX_PROP_SIZE]; struct ufs_vreg *vreg = NULL; @@ -122,7 +133,7 @@ static int ufshcd_populate_vreg(struct device *dev, const char *name, } snprintf(prop_name, MAX_PROP_SIZE, "%s-supply", name); - if (!of_parse_phandle(np, prop_name, 0)) { + if (!phandle_exists(np, prop_name, 0)) { dev_info(dev, "%s: Unable to find %s regulator, assuming enabled\n", __func__, prop_name); goto out; From d9a434fa0c12ed5f7afe1e9dd30003ab5d059b85 Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Wed, 20 Jul 2022 10:51:20 +0800 Subject: [PATCH 074/121] scsi: core: Fix warning in scsi_alloc_sgtables() As explained in SG_IO howto[1]: "If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of iov_len lengths. If not, the minimum of the two is the transfer length." When iovec_count is non-zero and dxfer_len is zero, the sg_io() just genarated a null bio, and finally caused a warning below. To fix it, skip generating a bio for this request if dxfer_len is zero. [1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032 Modules linked in: CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014 RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032 Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8 93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41 bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb RSP: 0018:ffffc90000d07558 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000 RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000 R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000 FS: 00007effdec8e700(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline] scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline] scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730 blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851 __blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299 blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332 __blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968 __blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045 blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096 blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451 blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231 sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485 scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline] scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921 sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576 blkdev_ioctl+0x37a/0x800 block/ioctl.c:588 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7effdecdc5d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9 RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004 RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8 Link: https://lore.kernel.org/r/20220720025120.3226770-1-yanaijie@huawei.com Fixes: 25636e282fe9 ("block: fix SG_IO vector request data length handling") Reported-by: syzbot+d44b35ecfb807e5af0b5@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Signed-off-by: Jason Yan Signed-off-by: Martin K. Petersen --- drivers/scsi/scsi_ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c index a480c4d589f5..729e309e6034 100644 --- a/drivers/scsi/scsi_ioctl.c +++ b/drivers/scsi/scsi_ioctl.c @@ -450,7 +450,7 @@ static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr, fmode_t mode) goto out_put_request; ret = 0; - if (hdr->iovec_count) { + if (hdr->iovec_count && hdr->dxfer_len) { struct iov_iter i; struct iovec *iov = NULL; From f5c2976e0cb0f6236013bfb479868531b04f61d4 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 20 Jul 2022 10:02:23 -0700 Subject: [PATCH 075/121] scsi: ufs: core: Fix a race condition related to device management If a device management command completion happens after wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is called, then the completion code may crash on the complete() call in __ufshcd_transfer_req_compl(). Fix the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x64/0x178 __ufshcd_transfer_req_compl+0x30c/0x9c0 ufshcd_poll+0xf0/0x208 ufshcd_sl_intr+0xb8/0xf0 ufshcd_intr+0x168/0x2f4 __handle_irq_event_percpu+0xa0/0x30c handle_irq_event+0x84/0x178 handle_fasteoi_irq+0x150/0x2e8 __handle_domain_irq+0x114/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 efi_header_end+0x110/0x680 __irq_exit_rcu+0x108/0x124 __handle_domain_irq+0x118/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 cpuidle_enter_state+0x3ac/0x8c4 do_idle+0x2fc/0x55c cpu_startup_entry+0x84/0x90 kernel_init+0x0/0x310 start_kernel+0x0/0x608 start_kernel+0x4ec/0x608 Link: https://lore.kernel.org/r/20220720170228.1598842-1-bvanassche@acm.org Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU") Cc: Adrian Hunter Cc: Avri Altman Cc: Bean Huo Cc: Stanley Chu Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 58 +++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index c7b337480e3e..3d367be71728 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -2953,37 +2953,59 @@ ufshcd_dev_cmd_completion(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, int max_timeout) { - int err = 0; - unsigned long time_left; + unsigned long time_left = msecs_to_jiffies(max_timeout); unsigned long flags; + bool pending; + int err; +retry: time_left = wait_for_completion_timeout(hba->dev_cmd.complete, - msecs_to_jiffies(max_timeout)); + time_left); - spin_lock_irqsave(hba->host->host_lock, flags); - hba->dev_cmd.complete = NULL; if (likely(time_left)) { + /* + * The completion handler called complete() and the caller of + * this function still owns the @lrbp tag so the code below does + * not trigger any race conditions. + */ + hba->dev_cmd.complete = NULL; err = ufshcd_get_tr_ocs(lrbp); if (!err) err = ufshcd_dev_cmd_completion(hba, lrbp); - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - - if (!time_left) { + } else { err = -ETIMEDOUT; dev_dbg(hba->dev, "%s: dev_cmd request timedout, tag %d\n", __func__, lrbp->task_tag); - if (!ufshcd_clear_cmds(hba, 1U << lrbp->task_tag)) + if (ufshcd_clear_cmds(hba, 1U << lrbp->task_tag) == 0) { /* successfully cleared the command, retry if needed */ err = -EAGAIN; - /* - * in case of an error, after clearing the doorbell, - * we also need to clear the outstanding_request - * field in hba - */ - spin_lock_irqsave(&hba->outstanding_lock, flags); - __clear_bit(lrbp->task_tag, &hba->outstanding_reqs); - spin_unlock_irqrestore(&hba->outstanding_lock, flags); + /* + * Since clearing the command succeeded we also need to + * clear the task tag bit from the outstanding_reqs + * variable. + */ + spin_lock_irqsave(&hba->outstanding_lock, flags); + pending = test_bit(lrbp->task_tag, + &hba->outstanding_reqs); + if (pending) { + hba->dev_cmd.complete = NULL; + __clear_bit(lrbp->task_tag, + &hba->outstanding_reqs); + } + spin_unlock_irqrestore(&hba->outstanding_lock, flags); + + if (!pending) { + /* + * The completion handler ran while we tried to + * clear the command. + */ + time_left = 1; + goto retry; + } + } else { + dev_err(hba->dev, "%s: failed to clear tag %d\n", + __func__, lrbp->task_tag); + } } return err; From b5177ed92bf6f9d90a2493ed51c1327e088be1df Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Mon, 25 Jul 2022 13:52:31 -0700 Subject: [PATCH 076/121] mptcp: Do not return EINPROGRESS when subflow creation succeeds New subflows are created within the kernel using O_NONBLOCK, so EINPROGRESS is the expected return value from kernel_connect(). __mptcp_subflow_connect() has the correct logic to consider EINPROGRESS to be a successful case, but it has also used that error code as its return value. Before v5.19 this was benign: all the callers ignored the return value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic netlink command that does use the return value, so the EINPROGRESS gets propagated to userspace. Make __mptcp_subflow_connect() always return 0 on success instead. Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Acked-by: Paolo Abeni Signed-off-by: Mat Martineau Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski --- net/mptcp/subflow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 63e8892ec807..af28f3b60389 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1533,7 +1533,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, mptcp_sock_graft(ssk, sk->sk_socket); iput(SOCK_INODE(sf)); WRITE_ONCE(msk->allow_infinite_fallback, false); - return err; + return 0; failed_unlink: list_del(&subflow->node); From 5e2805d5379619c4a2e3ae4994e73b36439f4bad Mon Sep 17 00:00:00 2001 From: Toshi Kani Date: Thu, 21 Jul 2022 12:05:03 -0600 Subject: [PATCH 077/121] EDAC/ghes: Set the DIMM label unconditionally The commit cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") enforced that both the bank and device strings passed to dimm_setup_label() are not NULL. However, there are BIOSes, for example on a HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 which don't populate both strings: Handle 0x0020, DMI type 17, 84 bytes Memory Device Array Handle: 0x0013 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: PROC 1 DIMM 1 <===== device Bank Locator: Not Specified <===== bank This results in a buffer overflow because ghes_edac_register() calls strlen() on an uninitialized label, which had non-zero values left over from krealloc_array(): detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:983! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/0 Tainted: G I 5.18.6-200.fc36.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 RIP: 0010:fortify_panic ... Call Trace: ghes_edac_register.cold ghes_probe platform_probe really_probe __driver_probe_device driver_probe_device __driver_attach ? __device_attach_driver bus_for_each_dev bus_add_driver driver_register acpi_ghes_init acpi_init ? acpi_sleep_proc_init do_one_initcall The label contains garbage because the commit in Fixes reallocs the DIMMs array while scanning the system but doesn't clear the newly allocated memory. Change dimm_setup_label() to always initialize the label to fix the issue. Set it to the empty string in case BIOS does not provide both bank and device so that ghes_edac_register() can keep the default label given by edac_mc_alloc_dimms(). [ bp: Rewrite commit message. ] Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init") Co-developed-by: Robert Richter Signed-off-by: Robert Richter Signed-off-by: Toshi Kani Signed-off-by: Borislav Petkov Tested-by: Robert Elliott Cc: Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com --- drivers/edac/ghes_edac.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 59b0bedc9c24..c8fa7dcfdbd0 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -103,9 +103,14 @@ static void dimm_setup_label(struct dimm_info *dimm, u16 handle) dmi_memdev_name(handle, &bank, &device); - /* both strings must be non-zero */ - if (bank && *bank && device && *device) - snprintf(dimm->label, sizeof(dimm->label), "%s %s", bank, device); + /* + * Set to a NULL string when both bank and device are zero. In this case, + * the label assigned by default will be preserved. + */ + snprintf(dimm->label, sizeof(dimm->label), "%s%s%s", + (bank && *bank) ? bank : "", + (bank && *bank && device && *device) ? " " : "", + (device && *device) ? device : ""); } static void assign_dmi_dimm_info(struct dimm_info *dimm, struct memdev_dmi_entry *entry) From 5a159128faff151b7fe5f4eb0f310b1e0a2d56bf Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 25 Jul 2022 15:21:59 +0800 Subject: [PATCH 078/121] virtio-net: fix the race between refill work and close We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Reviewed-by: Xuan Zhuo Signed-off-by: David S. Miller --- drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 356cf8dd4164..ec8e1b3108c3 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -242,9 +242,15 @@ struct virtnet_info { /* Packet virtio header size */ u8 hdr_len; - /* Work struct for refilling if we run low on memory. */ + /* Work struct for delayed refilling if we run low on memory. */ struct delayed_work refill; + /* Is delayed refill enabled? */ + bool refill_enabled; + + /* The lock to synchronize the access to refill_enabled */ + spinlock_t refill_lock; + /* Work struct for config space updates */ struct work_struct config_work; @@ -348,6 +354,20 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static void enable_delayed_refill(struct virtnet_info *vi) +{ + spin_lock_bh(&vi->refill_lock); + vi->refill_enabled = true; + spin_unlock_bh(&vi->refill_lock); +} + +static void disable_delayed_refill(struct virtnet_info *vi) +{ + spin_lock_bh(&vi->refill_lock); + vi->refill_enabled = false; + spin_unlock_bh(&vi->refill_lock); +} + static void virtqueue_napi_schedule(struct napi_struct *napi, struct virtqueue *vq) { @@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget, } if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { - if (!try_fill_recv(vi, rq, GFP_ATOMIC)) - schedule_delayed_work(&vi->refill, 0); + if (!try_fill_recv(vi, rq, GFP_ATOMIC)) { + spin_lock(&vi->refill_lock); + if (vi->refill_enabled) + schedule_delayed_work(&vi->refill, 0); + spin_unlock(&vi->refill_lock); + } } u64_stats_update_begin(&rq->stats.syncp); @@ -1651,6 +1675,8 @@ static int virtnet_open(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i, err; + enable_delayed_refill(vi); + for (i = 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ @@ -2033,6 +2059,8 @@ static int virtnet_close(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i; + /* Make sure NAPI doesn't schedule refill work */ + disable_delayed_refill(vi); /* Make sure refill_work doesn't re-enable napi! */ cancel_delayed_work_sync(&vi->refill); @@ -2792,6 +2820,8 @@ static int virtnet_restore_up(struct virtio_device *vdev) virtio_device_ready(vdev); + enable_delayed_refill(vi); + if (netif_running(vi->dev)) { err = virtnet_open(vi->dev); if (err) @@ -3535,6 +3565,7 @@ static int virtnet_probe(struct virtio_device *vdev) vdev->priv = vi; INIT_WORK(&vi->config_work, virtnet_config_changed_work); + spin_lock_init(&vi->refill_lock); /* If we can receive ANY GSO packets, we must allocate large ones. */ if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || From 553de6e1157df63fc6cdfe4573e04c8edcbe68f2 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Thu, 1 Jul 2021 13:39:15 -0300 Subject: [PATCH 079/121] tools headers cpufeatures: Sync with the kernel sources To pick the changes from: 28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter Cc: Borislav Petkov Cc: Ian Rogers Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra --- tools/arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index 00f5227c8459..a77b915d36a8 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -302,6 +302,7 @@ #define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */ #define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */ #define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */ +#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ From b226521923aee7051f4b24df9be5bf07d53f0a2b Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Mon, 25 Jul 2022 18:42:20 +0800 Subject: [PATCH 080/121] perf scripts python: Let script to be python2 compliant The mainline kernel can be used for relative old distros, e.g. RHEL 7. The distro doesn't upgrade from python2 to python3, this causes the building error that the python script is not python2 compliant. To fix the building failure, this patch changes from the python f-string format to traditional string format. Fixes: 12fdd6c009da0d02 ("perf scripts python: Support Arm CoreSight trace data disassembly") Reported-by: Akemi Yagi Signed-off-by: Leo Yan Cc: Alexander Shishkin Cc: ElRepo Cc: Ian Rogers Cc: Ingo Molnar Cc: Jiri Olsa Cc: Leo Yan Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220725104220.1106663-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo --- .../scripts/python/arm-cs-trace-disasm.py | 34 ++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py index 5f57d9829956..4339692a8d0b 100755 --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -61,7 +61,7 @@ def get_optional(perf_dict, field): def get_offset(perf_dict, field): if field in perf_dict: - return f"+0x{perf_dict[field]:x}" + return "+%#x" % perf_dict[field] return "" def get_dso_file_path(dso_name, dso_build_id): @@ -76,7 +76,7 @@ def get_dso_file_path(dso_name, dso_build_id): else: append = "/elf" - dso_path = f"{os.environ['PERF_BUILDID_DIR']}/{dso_name}/{dso_build_id}{append}" + dso_path = os.environ['PERF_BUILDID_DIR'] + "/" + dso_name + "/" + dso_build_id + append; # Replace duplicate slash chars to single slash char dso_path = dso_path.replace('//', '/', 1) return dso_path @@ -94,8 +94,8 @@ def read_disam(dso_fname, dso_start, start_addr, stop_addr): start_addr = start_addr - dso_start; stop_addr = stop_addr - dso_start; disasm = [ options.objdump_name, "-d", "-z", - f"--start-address=0x{start_addr:x}", - f"--stop-address=0x{stop_addr:x}" ] + "--start-address="+format(start_addr,"#x"), + "--stop-address="+format(stop_addr,"#x") ] disasm += [ dso_fname ] disasm_output = check_output(disasm).decode('utf-8').split('\n') disasm_cache[addr_range] = disasm_output @@ -109,12 +109,14 @@ def print_disam(dso_fname, dso_start, start_addr, stop_addr): m = disasm_re.search(line) if m is None: continue - print(f"\t{line}") + print("\t" + line) def print_sample(sample): - print(f"Sample = {{ cpu: {sample['cpu']:04} addr: 0x{sample['addr']:016x} " \ - f"phys_addr: 0x{sample['phys_addr']:016x} ip: 0x{sample['ip']:016x} " \ - f"pid: {sample['pid']} tid: {sample['tid']} period: {sample['period']} time: {sample['time']} }}") + print("Sample = { cpu: %04d addr: 0x%016x phys_addr: 0x%016x ip: 0x%016x " \ + "pid: %d tid: %d period: %d time: %d }" % \ + (sample['cpu'], sample['addr'], sample['phys_addr'], \ + sample['ip'], sample['pid'], sample['tid'], \ + sample['period'], sample['time'])) def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') @@ -131,7 +133,7 @@ def common_start_str(comm, sample): cpu = sample["cpu"] pid = sample["pid"] tid = sample["tid"] - return f"{comm:>16} {pid:>5}/{tid:<5} [{cpu:04}] {sec:9}.{ns:09} " + return "%16s %5u/%-5u [%04u] %9u.%09u " % (comm, pid, tid, cpu, sec, ns) # This code is copied from intel-pt-events.py for printing source code # line and symbols. @@ -171,7 +173,7 @@ def print_srccode(comm, param_dict, sample, symbol, dso): glb_line_number = line_number glb_source_file_name = source_file_name - print(f"{start_str}{src_str}") + print(start_str, src_str) def process_event(param_dict): global cache_size @@ -188,7 +190,7 @@ def process_event(param_dict): symbol = get_optional(param_dict, "symbol") if (options.verbose == True): - print(f"Event type: {name}") + print("Event type: %s" % name) print_sample(sample) # If cannot find dso so cannot dump assembler, bail out @@ -197,7 +199,7 @@ def process_event(param_dict): # Validate dso start and end addresses if ((dso_start == '[unknown]') or (dso_end == '[unknown]')): - print(f"Failed to find valid dso map for dso {dso}") + print("Failed to find valid dso map for dso %s" % dso) return if (name[0:12] == "instructions"): @@ -244,15 +246,15 @@ def process_event(param_dict): # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4 if (start_addr == 0 and stop_addr == 4): - print(f"CPU{cpu}: CS_ETM_TRACE_ON packet is inserted") + print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu) return if (start_addr < int(dso_start) or start_addr > int(dso_end)): - print(f"Start address 0x{start_addr:x} is out of range [ 0x{dso_start:x} .. 0x{dso_end:x} ] for dso {dso}") + print("Start address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso)) return if (stop_addr < int(dso_start) or stop_addr > int(dso_end)): - print(f"Stop address 0x{stop_addr:x} is out of range [ 0x{dso_start:x} .. 0x{dso_end:x} ] for dso {dso}") + print("Stop address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso)) return if (options.objdump_name != None): @@ -267,6 +269,6 @@ def process_event(param_dict): if path.exists(dso_fname): print_disam(dso_fname, dso_vm_start, start_addr, stop_addr) else: - print(f"Failed to find dso {dso} for address range [ 0x{start_addr:x} .. 0x{stop_addr:x} ]") + print("Failed to find dso %s for address range [ 0x%x .. 0x%x ]" % (dso, start_addr, stop_addr)) print_srccode(comm, param_dict, sample, symbol, dso) From 2d86612aacb7805f72873691a2644d7279ed0630 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Sun, 24 Jul 2022 14:00:12 +0800 Subject: [PATCH 081/121] perf symbol: Correct address for bss symbols When using 'perf mem' and 'perf c2c', an issue is observed that tool reports the wrong offset for global data symbols. This is a common issue on both x86 and Arm64 platforms. Let's see an example, for a test program, below is the disassembly for its .bss section which is dumped with objdump: ... Disassembly of section .bss: 0000000000004040 : ... 0000000000004080 : ... 00000000000040c0 : ... 0000000000004100 : ... First we used 'perf mem record' to run the test program and then used 'perf --debug verbose=4 mem report' to observe what's the symbol info for 'buf1' and 'buf2' structures. # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8 # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf2 0x30a8-0x30e8 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf1 0x3068-0x30a8 ... The perf tool relies on libelf to parse symbols, in executable and shared object files, 'st_value' holds a virtual address; 'sh_addr' is the address at which section's first byte should reside in memory, and 'sh_offset' is the byte offset from the beginning of the file to the first byte in the section. The perf tool uses below formula to convert a symbol's memory address to a file address: file_address = st_value - sh_addr + sh_offset ^ ` Memory address We can see the final adjusted address ranges for buf1 and buf2 are [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is incorrect, in the code, the structure for 'buf1' and 'buf2' specifies compiler attribute with 64-byte alignment. The problem happens for 'sh_offset', libelf returns it as 0x3028 which is not 64-byte aligned, combining with disassembly, it's likely libelf doesn't respect the alignment for .bss section, therefore, it doesn't return the aligned value for 'sh_offset'. Suggested by Fangrui Song, ELF file contains program header which contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD segments contain the execution info. A better choice for converting memory address to file address is using the formula: file_address = st_value - p_vaddr + p_offset This patch introduces elf_read_program_header() which returns the program header based on the passed 'st_value', then it uses the formula above to calculate the symbol file address; and the debugging log is updated respectively. After applying the change: # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf2 0x30c0-0x3100 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf1 0x3080-0x30c0 ... Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing") Reported-by: Chang Rui Suggested-by: Fangrui Song Signed-off-by: Leo Yan Acked-by: Namhyung Kim Cc: Alexander Shishkin Cc: Ian Rogers Cc: Ingo Molnar Cc: Jiri Olsa Cc: Mark Rutland Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/symbol-elf.c | 45 ++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index ecd377938eea..ef6ced5c5746 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -233,6 +233,33 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep, return NULL; } +static int elf_read_program_header(Elf *elf, u64 vaddr, GElf_Phdr *phdr) +{ + size_t i, phdrnum; + u64 sz; + + if (elf_getphdrnum(elf, &phdrnum)) + return -1; + + for (i = 0; i < phdrnum; i++) { + if (gelf_getphdr(elf, i, phdr) == NULL) + return -1; + + if (phdr->p_type != PT_LOAD) + continue; + + sz = max(phdr->p_memsz, phdr->p_filesz); + if (!sz) + continue; + + if (vaddr >= phdr->p_vaddr && (vaddr < phdr->p_vaddr + sz)) + return 0; + } + + /* Not found any valid program header */ + return -1; +} + static bool want_demangle(bool is_kernel_sym) { return is_kernel_sym ? symbol_conf.demangle_kernel : symbol_conf.demangle; @@ -1209,6 +1236,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss, sym.st_value); used_opd = true; } + /* * When loading symbols in a data mapping, ABS symbols (which * has a value of SHN_ABS in its st_shndx) failed at @@ -1262,11 +1290,20 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss, goto out_elf_end; } else if ((used_opd && runtime_ss->adjust_symbols) || (!used_opd && syms_ss->adjust_symbols)) { + GElf_Phdr phdr; + + if (elf_read_program_header(syms_ss->elf, + (u64)sym.st_value, &phdr)) { + pr_warning("%s: failed to find program header for " + "symbol: %s st_value: %#" PRIx64 "\n", + __func__, elf_name, (u64)sym.st_value); + continue; + } pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " " - "sh_addr: %#" PRIx64 " sh_offset: %#" PRIx64 "\n", __func__, - (u64)sym.st_value, (u64)shdr.sh_addr, - (u64)shdr.sh_offset); - sym.st_value -= shdr.sh_addr - shdr.sh_offset; + "p_vaddr: %#" PRIx64 " p_offset: %#" PRIx64 "\n", + __func__, (u64)sym.st_value, (u64)phdr.p_vaddr, + (u64)phdr.p_offset); + sym.st_value -= phdr.p_vaddr - phdr.p_offset; } demangled = demangle_sym(dso, kmodule, elf_name); From 882528d2e77687c3ef26abb9c490f77a9c1f6e1a Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Sun, 24 Jul 2022 14:00:13 +0800 Subject: [PATCH 082/121] perf symbol: Skip symbols if SHF_ALLOC flag is not set Some symbols are observed with the 'st_value' field zeroed. E.g. libc.so.6 in Ubuntu contains a symbol '__evoke_link_warning_getwd' which resides in the '.gnu.warning.getwd' section. Unlike normal sections, such kind of sections are used for linker warning when a file calls deprecated functions, but they are not part of memory images, the symbols in these sections should be dropped. This patch checks the section attribute SHF_ALLOC bit, if the bit is not set, it skips symbols to avoid spurious ones. Suggested-by: Fangrui Song Signed-off-by: Leo Yan Acked-by: Namhyung Kim Cc: Alexander Shishkin Cc: Chang Rui Cc: Ian Rogers Cc: Ingo Molnar Cc: Jiri Olsa Cc: Mark Rutland Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220724060013.171050-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/symbol-elf.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index ef6ced5c5746..b3be5b1d9dbb 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -1255,6 +1255,17 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss, gelf_getshdr(sec, &shdr); + /* + * If the attribute bit SHF_ALLOC is not set, the section + * doesn't occupy memory during process execution. + * E.g. ".gnu.warning.*" section is used by linker to generate + * warnings when calling deprecated functions, the symbols in + * the section aren't loaded to memory during process execution, + * so skip them. + */ + if (!(shdr.sh_flags & SHF_ALLOC)) + continue; + secstrs = secstrs_sym; /* From 9a241805673ec0a826b7ddf84b00f4e03adb0a5e Mon Sep 17 00:00:00 2001 From: Ian Rogers Date: Tue, 26 Jul 2022 15:09:21 -0700 Subject: [PATCH 083/121] perf bpf: Remove undefined behavior from bpf_perf_object__next() bpf_perf_object__next() folded the last element in the list test with the empty list test. However, this meant that offsets were computed against null and that a struct list_head was compared against a 'struct bpf_perf_object'. Working around this with clang's undefined behavior sanitizer required -fno-sanitize=null and -fno-sanitize=object-size. Remove the undefined behavior by using the regular Linux list APIs and handling the starting case separately from the end testing case. Looking at uses like bpf_perf_object__for_each(), as the constant NULL or non-NULL argument can be constant propagated, the code is no less efficient. Signed-off-by: Ian Rogers Cc: Alexander Shishkin Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Christy Lee Cc: Ingo Molnar Cc: Jiri Olsa Cc: Mark Rutland Cc: Miaoqian Lin Cc: Namhyung Kim Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Rix Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20220726220921.2567761-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/bpf-loader.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index f8ad581ea247..cdd6463a5b68 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -63,20 +63,16 @@ static struct hashmap *bpf_map_hash; static struct bpf_perf_object * bpf_perf_object__next(struct bpf_perf_object *prev) { - struct bpf_perf_object *next; + if (!prev) { + if (list_empty(&bpf_objects_list)) + return NULL; - if (!prev) - next = list_first_entry(&bpf_objects_list, - struct bpf_perf_object, - list); - else - next = list_next_entry(prev, list); - - /* Empty list is noticed here so don't need checking on entry. */ - if (&next->list == &bpf_objects_list) + return list_first_entry(&bpf_objects_list, struct bpf_perf_object, list); + } + if (list_is_last(&prev->list, &bpf_objects_list)) return NULL; - return next; + return list_next_entry(prev, list); } #define bpf_perf_object__for_each(perf_obj, tmp) \ From 871808fd6981bcc6bb48f71032f983ca77748e96 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Fri, 22 Jul 2022 14:18:15 +0200 Subject: [PATCH 084/121] x86/configs: Update configs in x86_debug.config Commit 4675ff05de2d ("kmemcheck: rip it out") removed kmemcheck and its corresponding build config KMEMCHECK. Commit 0f620cefd775 ("objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"") renamed the debug config option. Adjust x86_debug.config to those changes in debug configs. Signed-off-by: Lukas Bulwahn Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220722121815.27535-1-lukas.bulwahn@gmail.com --- kernel/configs/x86_debug.config | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/configs/x86_debug.config b/kernel/configs/x86_debug.config index dcd86f32f4ed..6fac5b405334 100644 --- a/kernel/configs/x86_debug.config +++ b/kernel/configs/x86_debug.config @@ -7,12 +7,11 @@ CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_PAGEALLOC=y CONFIG_SLUB_DEBUG_ON=y -CONFIG_KMEMCHECK=y CONFIG_DEBUG_OBJECTS=y CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1 CONFIG_GCOV_KERNEL=y CONFIG_LOCKDEP=y CONFIG_PROVE_LOCKING=y CONFIG_SCHEDSTATS=y -CONFIG_VMLINUX_VALIDATION=y +CONFIG_NOINSTR_VALIDATION=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y From 5bb6c1d1126ebcbcd6314f80d82f50b021a9e351 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Wed, 27 Jul 2022 13:24:21 +0200 Subject: [PATCH 085/121] Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 007faec014cb5d26983c1f86fd08c6539b41392e. Now that hyperv does its own protocol negotiation: 49d6a3c062a1 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM") revert this exposure of the sev_es_ghcb_hv_call() helper. Cc: Wei Liu Signed-off-by: Borislav Petkov Reviewed-by:Tianyu Lan Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com --- arch/x86/include/asm/sev.h | 7 +------ arch/x86/kernel/sev-shared.c | 25 +++++++++---------------- arch/x86/kernel/sev.c | 17 ++++++++--------- 3 files changed, 18 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 19514524f0f8..4a23e52fe0ee 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -72,7 +72,6 @@ static inline u64 lower_bits(u64 val, unsigned int bits) struct real_mode_header; enum stack_type; -struct ghcb; /* Early IDT entry points for #VC handler */ extern void vc_no_ghcb(void); @@ -156,11 +155,7 @@ static __always_inline void sev_es_nmi_complete(void) __sev_es_nmi_complete(); } extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd); -extern enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, - bool set_ghcb_msr, - struct es_em_ctxt *ctxt, - u64 exit_code, u64 exit_info_1, - u64 exit_info_2); + static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { int rc; diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index b478edf43bec..3a5b0c9c4fcc 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -219,9 +219,10 @@ static enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt return ES_VMM_ERROR; } -enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, bool set_ghcb_msr, - struct es_em_ctxt *ctxt, u64 exit_code, - u64 exit_info_1, u64 exit_info_2) +static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, + struct es_em_ctxt *ctxt, + u64 exit_code, u64 exit_info_1, + u64 exit_info_2) { /* Fill in protocol and format specifiers */ ghcb->protocol_version = ghcb_version; @@ -231,14 +232,7 @@ enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, bool set_ghcb_msr, ghcb_set_sw_exit_info_1(ghcb, exit_info_1); ghcb_set_sw_exit_info_2(ghcb, exit_info_2); - /* - * Hyper-V unenlightened guests use a paravisor for communicating and - * GHCB pages are being allocated and set up by that paravisor. Linux - * should not change the GHCB page's physical address. - */ - if (set_ghcb_msr) - sev_es_wr_ghcb_msr(__pa(ghcb)); - + sev_es_wr_ghcb_msr(__pa(ghcb)); VMGEXIT(); return verify_exception_info(ghcb, ctxt); @@ -795,7 +789,7 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt) */ sw_scratch = __pa(ghcb) + offsetof(struct ghcb, shared_buffer); ghcb_set_sw_scratch(ghcb, sw_scratch); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_IOIO, + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_IOIO, exit_info_1, exit_info_2); if (ret != ES_OK) return ret; @@ -837,8 +831,7 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt) ghcb_set_rax(ghcb, rax); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, - SVM_EXIT_IOIO, exit_info_1, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_IOIO, exit_info_1, 0); if (ret != ES_OK) return ret; @@ -894,7 +887,7 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb, /* xgetbv will cause #GP - use reset value for xcr0 */ ghcb_set_xcr0(ghcb, 1); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_CPUID, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0); if (ret != ES_OK) return ret; @@ -919,7 +912,7 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb, bool rdtscp = (exit_code == SVM_EXIT_RDTSCP); enum es_result ret; - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, exit_code, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, 0, 0); if (ret != ES_OK) return ret; diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index c05f0124c410..63dc626627a0 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -786,7 +786,7 @@ static int vmgexit_psc(struct snp_psc_desc *desc) ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); /* This will advance the shared buffer data points to. */ - ret = sev_es_ghcb_hv_call(ghcb, true, &ctxt, SVM_VMGEXIT_PSC, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0); /* * Page State Change VMGEXIT can pass error code through @@ -1212,8 +1212,7 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt) ghcb_set_rdx(ghcb, regs->dx); } - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_MSR, - exit_info_1, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, exit_info_1, 0); if ((ret == ES_OK) && (!exit_info_1)) { regs->ax = ghcb->save.rax; @@ -1452,7 +1451,7 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt, ghcb_set_sw_scratch(ghcb, ghcb_pa + offsetof(struct ghcb, shared_buffer)); - return sev_es_ghcb_hv_call(ghcb, true, ctxt, exit_code, exit_info_1, exit_info_2); + return sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_2); } /* @@ -1628,7 +1627,7 @@ static enum es_result vc_handle_dr7_write(struct ghcb *ghcb, /* Using a value of 0 for ExitInfo1 means RAX holds the value */ ghcb_set_rax(ghcb, val); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_WRITE_DR7, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WRITE_DR7, 0, 0); if (ret != ES_OK) return ret; @@ -1658,7 +1657,7 @@ static enum es_result vc_handle_dr7_read(struct ghcb *ghcb, static enum es_result vc_handle_wbinvd(struct ghcb *ghcb, struct es_em_ctxt *ctxt) { - return sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_WBINVD, 0, 0); + return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0); } static enum es_result vc_handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt *ctxt) @@ -1667,7 +1666,7 @@ static enum es_result vc_handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt *ctxt ghcb_set_rcx(ghcb, ctxt->regs->cx); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_RDPMC, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDPMC, 0, 0); if (ret != ES_OK) return ret; @@ -1708,7 +1707,7 @@ static enum es_result vc_handle_vmmcall(struct ghcb *ghcb, if (x86_platform.hyper.sev_es_hcall_prepare) x86_platform.hyper.sev_es_hcall_prepare(ghcb, ctxt->regs); - ret = sev_es_ghcb_hv_call(ghcb, true, ctxt, SVM_EXIT_VMMCALL, 0, 0); + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0); if (ret != ES_OK) return ret; @@ -2197,7 +2196,7 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned ghcb_set_rbx(ghcb, input->data_npages); } - ret = sev_es_ghcb_hv_call(ghcb, true, &ctxt, exit_code, input->req_gpa, input->resp_gpa); + ret = sev_es_ghcb_hv_call(ghcb, &ctxt, exit_code, input->req_gpa, input->resp_gpa); if (ret) goto e_put; From fb0fd3469ead5b937293c213daa1f589b4b7ce46 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 19 Jul 2022 17:33:21 +0100 Subject: [PATCH 086/121] ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") added a check to determine whether arm_dma_zone_size is exceeding the amount of kernel virtual address space available between the upper 4GB virtual address limit and PAGE_OFFSET in order to provide a suitable definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual address space. The quantity used for comparison was off by a missing trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit quantity. This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being 0x1_0000_0000 which overflows the unsigned long type used throughout __pa() and then __virt_addr_valid(). Because the virtual address passed to __virt_addr_valid() would now be 0, the function would loudly warn and flood the kernel log, thus making the platform unable to boot properly. Fixes: 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") Signed-off-by: Florian Fainelli Reviewed-by: Linus Walleij Signed-off-by: Russell King (Oracle) --- arch/arm/include/asm/dma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/include/asm/dma.h b/arch/arm/include/asm/dma.h index a81dda65c576..45180a2cc47c 100644 --- a/arch/arm/include/asm/dma.h +++ b/arch/arm/include/asm/dma.h @@ -10,7 +10,7 @@ #else #define MAX_DMA_ADDRESS ({ \ extern phys_addr_t arm_dma_zone_size; \ - arm_dma_zone_size && arm_dma_zone_size < (0x10000000 - PAGE_OFFSET) ? \ + arm_dma_zone_size && arm_dma_zone_size < (0x100000000ULL - PAGE_OFFSET) ? \ (PAGE_OFFSET + arm_dma_zone_size) : 0xffffffffUL; }) #endif From e62d2e110356093c034998e093675df83057e511 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 26 Jul 2022 11:57:43 +0000 Subject: [PATCH 087/121] tcp: md5: fix IPv4-mapped support After the blamed commit, IPv4 SYN packets handled by a dual stack IPv6 socket are dropped, even if perfectly valid. $ nstat | grep MD5 TcpExtTCPMD5Failure 5 0.0 For a dual stack listener, an incoming IPv4 SYN packet would call tcp_inbound_md5_hash() with @family == AF_INET, while tp->af_specific is pointing to tcp_sock_ipv6_specific. Only later when an IPv4-mapped child is created, tp->af_specific is changed to tcp_sock_ipv6_mapped_specific. Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks") Reported-by: Brian Vazquez Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Dmitry Safonov Tested-by: Leonard Crestez Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 002a4a04efbe..766881775abb 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4459,9 +4459,18 @@ tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, return SKB_DROP_REASON_TCP_MD5UNEXPECTED; } - /* check the signature */ - genhash = tp->af_specific->calc_md5_hash(newhash, hash_expected, - NULL, skb); + /* Check the signature. + * To support dual stack listeners, we need to handle + * IPv4-mapped case. + */ + if (family == AF_INET) + genhash = tcp_v4_md5_hash_skb(newhash, + hash_expected, + NULL, skb); + else + genhash = tp->af_specific->calc_md5_hash(newhash, + hash_expected, + NULL, skb); if (genhash || memcmp(hash_location, newhash, 16) != 0) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMD5FAILURE); From 8dc592c41f38735306d1f1dc0b183601379c6d94 Mon Sep 17 00:00:00 2001 From: Jernej Skrabec Date: Tue, 19 Jul 2022 20:37:25 +0200 Subject: [PATCH 088/121] clk: sunxi-ng: Fix H6 RTC clock definition While RTC clock was added in H616 ccu_common list, it was not in H6 list. That caused invalid pointer dereference like this: Unable to handle kernel NULL pointer dereference at virtual address 000000000000020c Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000004d574000 [000000000000020c] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 3 PID: 339 Comm: cat Tainted: G B 5.18.0-rc1+ #1352 Hardware name: Tanix TX6 (DT) pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ccu_gate_is_enabled+0x48/0x74 lr : ccu_gate_is_enabled+0x40/0x74 sp : ffff80000c0b76d0 x29: ffff80000c0b76d0 x28: 00000000016e3600 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000002 x24: ffff00000952fe08 x23: ffff800009611400 x22: ffff00000952fe79 x21: 0000000000000000 x20: 0000000000000001 x19: ffff80000aad6f08 x18: 0000000000000000 x17: 2d2d2d2d2d2d2d2d x16: 2d2d2d2d2d2d2d2d x15: 2d2d2d2d2d2d2d2d x14: 0000000000000000 x13: 00000000f2f2f2f2 x12: ffff700001816e89 x11: 1ffff00001816e88 x10: ffff700001816e88 x9 : dfff800000000000 x8 : ffff80000c0b7447 x7 : 0000000000000001 x6 : ffff700001816e88 x5 : ffff80000c0b7440 x4 : 0000000000000001 x3 : ffff800008935c50 x2 : dfff800000000000 x1 : 0000000000000000 x0 : 000000000000020c Call trace: ccu_gate_is_enabled+0x48/0x74 clk_core_is_enabled+0x7c/0x1c0 clk_summary_show_subtree+0x1dc/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show_subtree+0x250/0x334 clk_summary_show+0x90/0xdc seq_read_iter+0x248/0x6d4 seq_read+0x17c/0x1fc full_proxy_read+0x90/0xf0 vfs_read+0xdc/0x28c ksys_read+0xc8/0x174 __arm64_sys_read+0x44/0x5c invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x38/0xa0 el0_svc+0x68/0x160 el0t_64_sync_handler+0x10c/0x140 el0t_64_sync+0x18c/0x190 Code: d1006260 97e5c981 785e8260 8b0002a0 (b9400000) ---[ end trace 0000000000000000 ]--- Fix that by adding rtc clock to H6 ccu_common list too. Fixes: 38d321b61bda ("clk: sunxi-ng: h6-r: Add RTC gate clock") Signed-off-by: Jernej Skrabec Link: https://lore.kernel.org/r/20220719183725.2605141-1-jernej.skrabec@gmail.com Reviewed-by: Samuel Holland Signed-off-by: Stephen Boyd --- drivers/clk/sunxi-ng/ccu-sun50i-h6-r.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6-r.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6-r.c index 29a8c710ae06..b7962e5149a5 100644 --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6-r.c +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6-r.c @@ -138,6 +138,7 @@ static struct ccu_common *sun50i_h6_r_ccu_clks[] = { &r_apb2_rsb_clk.common, &r_apb1_ir_clk.common, &r_apb1_w1_clk.common, + &r_apb1_rtc_clk.common, &ir_clk.common, &w1_clk.common, }; From 0c104556267242d922a3def60be8092b280e4fee Mon Sep 17 00:00:00 2001 From: Jonathan Lemon Date: Tue, 26 Jul 2022 15:06:04 -0700 Subject: [PATCH 089/121] ptp: ocp: Select CRC16 in the Kconfig. The crc16() function is used to check the firmware validity, but the library was not explicitly selected. Fixes: 3c3673bde50c ("ptp: ocp: Add firmware header checks") Reported-by: kernel test robot Signed-off-by: Jonathan Lemon Acked-by: Vadim Fedorenko Link: https://lore.kernel.org/r/20220726220604.1339972-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski --- drivers/ptp/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index 458218f88c5e..fe4971b65c64 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -176,6 +176,7 @@ config PTP_1588_CLOCK_OCP depends on !S390 depends on COMMON_CLK select NET_DEVLINK + select CRC16 help This driver adds support for an OpenCompute time card. From 67c3b611d92fc238c43734878bc3e232ab570c79 Mon Sep 17 00:00:00 2001 From: Alejandro Lucero Date: Tue, 26 Jul 2022 08:45:04 +0200 Subject: [PATCH 090/121] sfc: disable softirqs for ptp TX Sending a PTP packet can imply to use the normal TX driver datapath but invoked from the driver's ptp worker. The kernel generic TX code disables softirqs and preemption before calling specific driver TX code, but the ptp worker does not. Although current ptp driver functionality does not require it, there are several reasons for doing so: 1) The invoked code is always executed with softirqs disabled for non PTP packets. 2) Better if a ptp packet transmission is not interrupted by softirq handling which could lead to high latencies. 3) netdev_xmit_more used by the TX code requires preemption to be disabled. Indeed a solution for dealing with kernel preemption state based on static kernel configuration is not possible since the introduction of dynamic preemption level configuration at boot time using the static calls functionality. Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper") Signed-off-by: Alejandro Lucero Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/sfc/ptp.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c index 4625f85acab2..10ad0b93d283 100644 --- a/drivers/net/ethernet/sfc/ptp.c +++ b/drivers/net/ethernet/sfc/ptp.c @@ -1100,7 +1100,29 @@ static void efx_ptp_xmit_skb_queue(struct efx_nic *efx, struct sk_buff *skb) tx_queue = efx_channel_get_tx_queue(ptp_data->channel, type); if (tx_queue && tx_queue->timestamping) { + /* This code invokes normal driver TX code which is always + * protected from softirqs when called from generic TX code, + * which in turn disables preemption. Look at __dev_queue_xmit + * which uses rcu_read_lock_bh disabling preemption for RCU + * plus disabling softirqs. We do not need RCU reader + * protection here. + * + * Although it is theoretically safe for current PTP TX/RX code + * running without disabling softirqs, there are three good + * reasond for doing so: + * + * 1) The code invoked is mainly implemented for non-PTP + * packets and it is always executed with softirqs + * disabled. + * 2) This being a single PTP packet, better to not + * interrupt its processing by softirqs which can lead + * to high latencies. + * 3) netdev_xmit_more checks preemption is disabled and + * triggers a BUG_ON if not. + */ + local_bh_disable(); efx_enqueue_skb(tx_queue, skb); + local_bh_enable(); } else { WARN_ONCE(1, "PTP channel has no timestamped tx queue\n"); dev_kfree_skb_any(skb); From 181d8d2066c000ba0a0e6940a7ad80f1a0e68e9d Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 25 Jul 2022 18:11:06 -0400 Subject: [PATCH 091/121] sctp: leave the err path free in sctp_stream_init to sctp_stream_free A NULL pointer dereference was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:__list_del_entry_valid+0x26/0x80 Call Trace: sctp_sched_dequeue_common+0x1c/0x90 sctp_sched_prio_dequeue+0x67/0x80 __sctp_outq_teardown+0x299/0x380 sctp_outq_free+0x15/0x20 sctp_association_free+0xc3/0x440 sctp_do_sm+0x1ca7/0x2210 sctp_assoc_bh_rcv+0x1f6/0x340 This happens when calling sctp_sendmsg without connecting to server first. In this case, a data chunk already queues up in send queue of client side when processing the INIT_ACK from server in sctp_process_init() where it calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in all stream_out will be freed in sctp_stream_init's err path. Then in the asoc freeing it will crash when dequeuing this data chunk as stream_out is missing. As we can't free stream out before dequeuing all data from send queue, and this patch is to fix it by moving the err path stream_out/in freeing in sctp_stream_init() to sctp_stream_free() which is eventually called when freeing the asoc in sctp_association_free(). This fix also makes the code in sctp_process_init() more clear. Note that in sctp_association_init() when it fails in sctp_stream_init(), sctp_association_free() will not be called, and in that case it should go to 'stream_free' err path to free stream instead of 'fail_init'. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Wei Chen Signed-off-by: Xin Long Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski --- net/sctp/associola.c | 5 ++--- net/sctp/stream.c | 19 +++---------------- 2 files changed, 5 insertions(+), 19 deletions(-) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index be29da09cc7a..3460abceba44 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -229,9 +229,8 @@ static struct sctp_association *sctp_association_init( if (!sctp_ulpq_init(&asoc->ulpq, asoc)) goto fail_init; - if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams, - 0, gfp)) - goto fail_init; + if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams, 0, gfp)) + goto stream_free; /* Initialize default path MTU. */ asoc->pathmtu = sp->pathmtu; diff --git a/net/sctp/stream.c b/net/sctp/stream.c index 6dc95dcc0ff4..ef9fceadef8d 100644 --- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -137,7 +137,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, ret = sctp_stream_alloc_out(stream, outcnt, gfp); if (ret) - goto out_err; + return ret; for (i = 0; i < stream->outcnt; i++) SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN; @@ -145,22 +145,9 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, handle_in: sctp_stream_interleave_init(stream); if (!incnt) - goto out; + return 0; - ret = sctp_stream_alloc_in(stream, incnt, gfp); - if (ret) - goto in_err; - - goto out; - -in_err: - sched->free(stream); - genradix_free(&stream->in); -out_err: - genradix_free(&stream->out); - stream->outcnt = 0; -out: - return ret; + return sctp_stream_alloc_in(stream, incnt, gfp); } int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid) From 51a83391d77bb0f7ff0aef06ca4c7f5aa9e80b4c Mon Sep 17 00:00:00 2001 From: Dimitris Michailidis Date: Tue, 26 Jul 2022 14:59:23 -0700 Subject: [PATCH 092/121] net/funeth: Fix fun_xdp_tx() and XDP packet reclaim The current implementation of fun_xdp_tx(), used for XPD_TX, is incorrect in that it takes an address/length pair and later releases it with page_frag_free(). It is OK for XDP_TX but the same code is used by ndo_xdp_xmit. In that case it loses the XDP memory type and releases the packet incorrectly for some of the types. Assorted breakage follows. Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in reclaim. Fixes: db37bc177dae ("net/funeth: add the data path") Signed-off-by: Dimitris Michailidis Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com Signed-off-by: Paolo Abeni --- .../net/ethernet/fungible/funeth/funeth_rx.c | 5 ++++- .../net/ethernet/fungible/funeth/funeth_tx.c | 20 +++++++++---------- .../ethernet/fungible/funeth/funeth_txrx.h | 6 +++--- 3 files changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/fungible/funeth/funeth_rx.c b/drivers/net/ethernet/fungible/funeth/funeth_rx.c index 0f6a549b9f67..29a6c2ede43a 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_rx.c +++ b/drivers/net/ethernet/fungible/funeth/funeth_rx.c @@ -142,6 +142,7 @@ static void *fun_run_xdp(struct funeth_rxq *q, skb_frag_t *frags, void *buf_va, int ref_ok, struct funeth_txq *xdp_q) { struct bpf_prog *xdp_prog; + struct xdp_frame *xdpf; struct xdp_buff xdp; u32 act; @@ -163,7 +164,9 @@ static void *fun_run_xdp(struct funeth_rxq *q, skb_frag_t *frags, void *buf_va, case XDP_TX: if (unlikely(!ref_ok)) goto pass; - if (!fun_xdp_tx(xdp_q, xdp.data, xdp.data_end - xdp.data)) + + xdpf = xdp_convert_buff_to_frame(&xdp); + if (!xdpf || !fun_xdp_tx(xdp_q, xdpf)) goto xdp_error; FUN_QSTAT_INC(q, xdp_tx); q->xdp_flush |= FUN_XDP_FLUSH_TX; diff --git a/drivers/net/ethernet/fungible/funeth/funeth_tx.c b/drivers/net/ethernet/fungible/funeth/funeth_tx.c index ff6e29237253..2f6698b98b03 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_tx.c +++ b/drivers/net/ethernet/fungible/funeth/funeth_tx.c @@ -466,7 +466,7 @@ static unsigned int fun_xdpq_clean(struct funeth_txq *q, unsigned int budget) do { fun_xdp_unmap(q, reclaim_idx); - page_frag_free(q->info[reclaim_idx].vaddr); + xdp_return_frame(q->info[reclaim_idx].xdpf); trace_funeth_tx_free(q, reclaim_idx, 1, head); @@ -479,11 +479,11 @@ static unsigned int fun_xdpq_clean(struct funeth_txq *q, unsigned int budget) return npkts; } -bool fun_xdp_tx(struct funeth_txq *q, void *data, unsigned int len) +bool fun_xdp_tx(struct funeth_txq *q, struct xdp_frame *xdpf) { struct fun_eth_tx_req *req; struct fun_dataop_gl *gle; - unsigned int idx; + unsigned int idx, len; dma_addr_t dma; if (fun_txq_avail(q) < FUN_XDP_CLEAN_THRES) @@ -494,7 +494,8 @@ bool fun_xdp_tx(struct funeth_txq *q, void *data, unsigned int len) return false; } - dma = dma_map_single(q->dma_dev, data, len, DMA_TO_DEVICE); + len = xdpf->len; + dma = dma_map_single(q->dma_dev, xdpf->data, len, DMA_TO_DEVICE); if (unlikely(dma_mapping_error(q->dma_dev, dma))) { FUN_QSTAT_INC(q, tx_map_err); return false; @@ -514,7 +515,7 @@ bool fun_xdp_tx(struct funeth_txq *q, void *data, unsigned int len) gle = (struct fun_dataop_gl *)req->dataop.imm; fun_dataop_gl_init(gle, 0, 0, len, dma); - q->info[idx].vaddr = data; + q->info[idx].xdpf = xdpf; u64_stats_update_begin(&q->syncp); q->stats.tx_bytes += len; @@ -545,12 +546,9 @@ int fun_xdp_xmit_frames(struct net_device *dev, int n, if (unlikely(q_idx >= fp->num_xdpqs)) return -ENXIO; - for (q = xdpqs[q_idx], i = 0; i < n; i++) { - const struct xdp_frame *xdpf = frames[i]; - - if (!fun_xdp_tx(q, xdpf->data, xdpf->len)) + for (q = xdpqs[q_idx], i = 0; i < n; i++) + if (!fun_xdp_tx(q, frames[i])) break; - } if (unlikely(flags & XDP_XMIT_FLUSH)) fun_txq_wr_db(q); @@ -577,7 +575,7 @@ static void fun_xdpq_purge(struct funeth_txq *q) unsigned int idx = q->cons_cnt & q->mask; fun_xdp_unmap(q, idx); - page_frag_free(q->info[idx].vaddr); + xdp_return_frame(q->info[idx].xdpf); q->cons_cnt++; } } diff --git a/drivers/net/ethernet/fungible/funeth/funeth_txrx.h b/drivers/net/ethernet/fungible/funeth/funeth_txrx.h index 04c9f91b7489..8708e2895946 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_txrx.h +++ b/drivers/net/ethernet/fungible/funeth/funeth_txrx.h @@ -95,8 +95,8 @@ struct funeth_txq_stats { /* per Tx queue SW counters */ struct funeth_tx_info { /* per Tx descriptor state */ union { - struct sk_buff *skb; /* associated packet */ - void *vaddr; /* start address for XDP */ + struct sk_buff *skb; /* associated packet (sk_buff path) */ + struct xdp_frame *xdpf; /* associated XDP frame (XDP path) */ }; }; @@ -245,7 +245,7 @@ static inline int fun_irq_node(const struct fun_irq *p) int fun_rxq_napi_poll(struct napi_struct *napi, int budget); int fun_txq_napi_poll(struct napi_struct *napi, int budget); netdev_tx_t fun_start_xmit(struct sk_buff *skb, struct net_device *netdev); -bool fun_xdp_tx(struct funeth_txq *q, void *data, unsigned int len); +bool fun_xdp_tx(struct funeth_txq *q, struct xdp_frame *xdpf); int fun_xdp_xmit_frames(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags); From e0339f036ef4beb9b20f0b6532a1e0ece7f594c6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 28 Jul 2022 10:31:06 +0100 Subject: [PATCH 093/121] watch_queue: Fix missing rcu annotation Since __post_watch_notification() walks wlist->watchers with only the RCU read lock held, we need to use RCU methods to add to the list (we already use RCU methods to remove from the list). Fix add_watch_to_object() to use hlist_add_head_rcu() instead of hlist_add_head() for that list. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- kernel/watch_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c index bb9962b33f95..2c351765c409 100644 --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -494,7 +494,7 @@ int add_watch_to_object(struct watch *watch, struct watch_list *wlist) unlock_wqueue(wqueue); } - hlist_add_head(&watch->list_node, &wlist->watchers); + hlist_add_head_rcu(&watch->list_node, &wlist->watchers); return 0; } EXPORT_SYMBOL(add_watch_to_object); From e64ab2dbd882933b65cd82ff6235d705ad65dbb6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 28 Jul 2022 10:31:12 +0100 Subject: [PATCH 094/121] watch_queue: Fix missing locking in add_watch_to_object() If a watch is being added to a queue, it needs to guard against interference from addition of a new watch, manual removal of a watch and removal of a watch due to some other queue being destroyed. KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by holding the key->sem writelocked and by holding refs on both the key and the queue - but that doesn't prevent interaction from other {key,queue} pairs. While add_watch_to_object() does take the spinlock on the event queue, it doesn't take the lock on the source's watch list. The assumption was that the caller would prevent that (say by taking key->sem) - but that doesn't prevent interference from the destruction of another queue. Fix this by locking the watcher list in add_watch_to_object(). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com Signed-off-by: David Howells cc: keyrings@vger.kernel.org Signed-off-by: Linus Torvalds --- kernel/watch_queue.c | 58 +++++++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 22 deletions(-) diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c index 2c351765c409..59ddb00d6944 100644 --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -454,6 +454,33 @@ void init_watch(struct watch *watch, struct watch_queue *wqueue) rcu_assign_pointer(watch->queue, wqueue); } +static int add_one_watch(struct watch *watch, struct watch_list *wlist, struct watch_queue *wqueue) +{ + const struct cred *cred; + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + cred = current_cred(); + if (atomic_inc_return(&cred->user->nr_watches) > task_rlimit(current, RLIMIT_NOFILE)) { + atomic_dec(&cred->user->nr_watches); + return -EAGAIN; + } + + watch->cred = get_cred(cred); + rcu_assign_pointer(watch->watch_list, wlist); + + kref_get(&wqueue->usage); + kref_get(&watch->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + hlist_add_head_rcu(&watch->list_node, &wlist->watchers); + return 0; +} + /** * add_watch_to_object - Add a watch on an object to a watch list * @watch: The watch to add @@ -468,34 +495,21 @@ void init_watch(struct watch *watch, struct watch_queue *wqueue) */ int add_watch_to_object(struct watch *watch, struct watch_list *wlist) { - struct watch_queue *wqueue = rcu_access_pointer(watch->queue); - struct watch *w; + struct watch_queue *wqueue; + int ret = -ENOENT; - hlist_for_each_entry(w, &wlist->watchers, list_node) { - struct watch_queue *wq = rcu_access_pointer(w->queue); - if (wqueue == wq && watch->id == w->id) - return -EBUSY; - } - - watch->cred = get_current_cred(); - rcu_assign_pointer(watch->watch_list, wlist); - - if (atomic_inc_return(&watch->cred->user->nr_watches) > - task_rlimit(current, RLIMIT_NOFILE)) { - atomic_dec(&watch->cred->user->nr_watches); - put_cred(watch->cred); - return -EAGAIN; - } + rcu_read_lock(); + wqueue = rcu_access_pointer(watch->queue); if (lock_wqueue(wqueue)) { - kref_get(&wqueue->usage); - kref_get(&watch->usage); - hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_lock(&wlist->lock); + ret = add_one_watch(watch, wlist, wqueue); + spin_unlock(&wlist->lock); unlock_wqueue(wqueue); } - hlist_add_head_rcu(&watch->list_node, &wlist->watchers); - return 0; + rcu_read_unlock(); + return ret; } EXPORT_SYMBOL(add_watch_to_object); From e27326009a3d247b831eda38878c777f6f4eb3d1 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 27 Jul 2022 18:22:20 -0700 Subject: [PATCH 095/121] net: ping6: Fix memleak in ipv6_renew_options(). When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM. struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd; hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4; fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd); To fix memory leaks, let's add a destroy function. Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it. $ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647 Also, there could be another path reported with these options, and some of them require CAP_NET_RAW. setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt) getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list) For the record, I left a different splat with syzbot's one. unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta Signed-off-by: Kuniyuki Iwashima Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski --- net/ipv6/ping.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index ecf3a553a0dc..8c6c2d82c1cd 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -22,6 +22,11 @@ #include #include +static void ping_v6_destroy(struct sock *sk) +{ + inet6_destroy_sock(sk); +} + /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -181,6 +186,7 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, + .destroy = ping_v6_destroy, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, .setsockopt = ipv6_setsockopt, From 85f0173df35e5462d89947135a6a5599c6c3ef6f Mon Sep 17 00:00:00 2001 From: Ziyang Xuan Date: Thu, 28 Jul 2022 09:33:07 +0800 Subject: [PATCH 096/121] ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr Change net device's MTU to smaller than IPV6_MIN_MTU or unregister device while matching route. That may trigger null-ptr-deref bug for ip6_ptr probability as following. ========================================================= BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134 Read of size 4 at addr 0000000000000308 by task ping6/263 CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14 Call trace: dump_backtrace+0x1a8/0x230 show_stack+0x20/0x70 dump_stack_lvl+0x68/0x84 print_report+0xc4/0x120 kasan_report+0x84/0x120 __asan_load4+0x94/0xd0 find_match.part.0+0x70/0x134 __find_rr_leaf+0x408/0x470 fib6_table_lookup+0x264/0x540 ip6_pol_route+0xf4/0x260 ip6_pol_route_output+0x58/0x70 fib6_rule_lookup+0x1a8/0x330 ip6_route_output_flags_noref+0xd8/0x1a0 ip6_route_output_flags+0x58/0x160 ip6_dst_lookup_tail+0x5b4/0x85c ip6_dst_lookup_flow+0x98/0x120 rawv6_sendmsg+0x49c/0xc70 inet_sendmsg+0x68/0x94 Reproducer as following: Firstly, prepare conditions: $ip netns add ns1 $ip netns add ns2 $ip link add veth1 type veth peer name veth2 $ip link set veth1 netns ns1 $ip link set veth2 netns ns2 $ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1 $ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2 $ip netns exec ns1 ifconfig veth1 up $ip netns exec ns2 ifconfig veth2 up $ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1 $ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1 Secondly, execute the following two commands in two ssh windows respectively: $ip netns exec ns1 sh $while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done $ip netns exec ns1 sh $while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly, then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check. cpu0 cpu1 fib6_table_lookup __find_rr_leaf addrconf_notify [ NETDEV_CHANGEMTU ] addrconf_ifdown RCU_INIT_POINTER(dev->ip6_ptr, NULL) find_match ip6_ignore_linkdown So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to fix the null-ptr-deref bug. Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev") Signed-off-by: Ziyang Xuan Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski --- include/net/addrconf.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index f7506f08e505..c04f359655b8 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -405,6 +405,9 @@ static inline bool ip6_ignore_linkdown(const struct net_device *dev) { const struct inet6_dev *idev = __in6_dev_get(dev); + if (unlikely(!idev)) + return true; + return !!idev->cnf.ignore_routes_with_linkdown; } From 4d3d3a1b244fd54629a6b7047f39a7bbc8d11910 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 28 Jul 2022 14:52:09 +0300 Subject: [PATCH 097/121] stmmac: dwmac-mediatek: fix resource leak in probe If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt() before returning. Otherwise it is a resource leak. Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index ca8ab290013c..d42e1afb6521 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -688,18 +688,19 @@ static int mediatek_dwmac_probe(struct platform_device *pdev) ret = mediatek_dwmac_clks_config(priv_plat, true); if (ret) - return ret; + goto err_remove_config_dt; ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); - if (ret) { - stmmac_remove_config_dt(pdev, plat_dat); + if (ret) goto err_drv_probe; - } return 0; err_drv_probe: mediatek_dwmac_clks_config(priv_plat, false); +err_remove_config_dt: + stmmac_remove_config_dt(pdev, plat_dat); + return ret; } From 66cee9097e2b74ff3c8cc040ce5717c521a0c3fa Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Wed, 20 Jul 2022 16:27:45 +1000 Subject: [PATCH 098/121] nouveau/svm: Fix to migrate all requested pages Users may request that pages from an OpenCL SVM allocation be migrated to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into nouveau_dmem_migrate_vma() to do the migration. If the total range to be migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the starting address means that only the first chunk will get migrated. Fix the calculation so that the entire range will get migrated if possible. Signed-off-by: Alistair Popple Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration") Reviewed-by: Ralph Campbell Reviewed-by: Lyude Paul Signed-off-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com Cc: # v5.8+ --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 7ba66ad68a8a..16356611b5b9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -680,7 +680,11 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, goto out_free_dma; for (i = 0; i < npages; i += max) { - args.end = start + (max << PAGE_SHIFT); + if (args.start + (max << PAGE_SHIFT) > end) + args.end = end; + else + args.end = args.start + (max << PAGE_SHIFT); + ret = migrate_vma_setup(&args); if (ret) goto out_free_pfns; From 571c30b1a88465a1c85a6f7762609939b9085a15 Mon Sep 17 00:00:00 2001 From: Thadeu Lima de Souza Cascardo Date: Thu, 28 Jul 2022 09:26:02 -0300 Subject: [PATCH 099/121] x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available Some cloud hypervisors do not provide IBPB on very recent CPU processors, including AMD processors affected by Retbleed. Using IBPB before firmware calls on such systems would cause a GPF at boot like the one below. Do not enable such calls when IBPB support is not present. EFI Variables Facility v0.08 2004-May-17 general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: efi_rts_wq efi_call_rts RIP: 0010:efi_call_rts Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48 RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049 RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300 RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78 R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0 Call Trace: ? __wake_up process_one_work worker_thread ? rescuer_thread kthread ? kthread_complete_and_exit ret_from_fork Modules linked in: Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls") Reported-by: Dimitri John Ledkov Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Borislav Petkov Cc: Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com --- arch/x86/kernel/cpu/bugs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 6454bc767f0f..6761668100b9 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1520,6 +1520,7 @@ static void __init spectre_v2_select_mitigation(void) * enable IBRS around firmware calls. */ if (boot_cpu_has_bug(X86_BUG_RETBLEED) && + boot_cpu_has(X86_FEATURE_IBPB) && (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) { From ec85bd369fd2bfaed6f45dd678706429d4f75b48 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Tue, 26 Jul 2022 23:51:48 +0100 Subject: [PATCH 100/121] ARM: findbit: fix overflowing offset When offset is larger than the size of the bit array, we should not attempt to access the array as we can perform an access beyond the end of the array. Fix this by changing the pre-condition. Using "cmp r2, r1; bhs ..." covers us for the size == 0 case, since this will always take the branch when r1 is zero, irrespective of the value of r2. This means we can fix this bug without adding any additional code! Tested-by: Guenter Roeck Signed-off-by: Russell King (Oracle) --- arch/arm/lib/findbit.S | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/lib/findbit.S b/arch/arm/lib/findbit.S index b5e8b9ae4c7d..7fd3600db8ef 100644 --- a/arch/arm/lib/findbit.S +++ b/arch/arm/lib/findbit.S @@ -40,8 +40,8 @@ ENDPROC(_find_first_zero_bit_le) * Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset) */ ENTRY(_find_next_zero_bit_le) - teq r1, #0 - beq 3b + cmp r2, r1 + bhs 3b ands ip, r2, #7 beq 1b @ If new byte, goto old routine ARM( ldrb r3, [r0, r2, lsr #3] ) @@ -81,8 +81,8 @@ ENDPROC(_find_first_bit_le) * Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset) */ ENTRY(_find_next_bit_le) - teq r1, #0 - beq 3b + cmp r2, r1 + bhs 3b ands ip, r2, #7 beq 1b @ If new byte, goto old routine ARM( ldrb r3, [r0, r2, lsr #3] ) @@ -115,8 +115,8 @@ ENTRY(_find_first_zero_bit_be) ENDPROC(_find_first_zero_bit_be) ENTRY(_find_next_zero_bit_be) - teq r1, #0 - beq 3b + cmp r2, r1 + bhs 3b ands ip, r2, #7 beq 1b @ If new byte, goto old routine eor r3, r2, #0x18 @ big endian byte ordering @@ -149,8 +149,8 @@ ENTRY(_find_first_bit_be) ENDPROC(_find_first_bit_be) ENTRY(_find_next_bit_be) - teq r1, #0 - beq 3b + cmp r2, r1 + bhs 3b ands ip, r2, #7 beq 1b @ If new byte, goto old routine eor r3, r2, #0x18 @ big endian byte ordering From d8e7f201a4cf148c3801cdc9603963061d28d64f Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:16 +0800 Subject: [PATCH 101/121] LoongArch: Use ABI names of registers where appropriate Some of the assembly in the LoongArch port seem to come from a prehistoric time, when the assembler didn't even have support for the ABI names we all come to know and love, thus used raw register numbers which hampered readability. The usages are found with a regex match inside arch/loongarch, then manually adjusted for those non-definitions. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/barrier.h | 4 +- arch/loongarch/include/asm/loongson.h | 4 +- arch/loongarch/include/asm/stacktrace.h | 12 ++-- arch/loongarch/include/asm/thread_info.h | 4 +- arch/loongarch/include/asm/uaccess.h | 2 +- arch/loongarch/mm/page.S | 4 +- arch/loongarch/mm/tlbex.S | 80 ++++++++++++------------ 7 files changed, 55 insertions(+), 55 deletions(-) diff --git a/arch/loongarch/include/asm/barrier.h b/arch/loongarch/include/asm/barrier.h index b6517eeeb141..cda977675854 100644 --- a/arch/loongarch/include/asm/barrier.h +++ b/arch/loongarch/include/asm/barrier.h @@ -48,9 +48,9 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, __asm__ __volatile__( "sltu %0, %1, %2\n\t" #if (__SIZEOF_LONG__ == 4) - "sub.w %0, $r0, %0\n\t" + "sub.w %0, $zero, %0\n\t" #elif (__SIZEOF_LONG__ == 8) - "sub.d %0, $r0, %0\n\t" + "sub.d %0, $zero, %0\n\t" #endif : "=r" (mask) : "r" (index), "r" (size) diff --git a/arch/loongarch/include/asm/loongson.h b/arch/loongarch/include/asm/loongson.h index 6a8038725ba7..8522afafc24e 100644 --- a/arch/loongarch/include/asm/loongson.h +++ b/arch/loongarch/include/asm/loongson.h @@ -58,7 +58,7 @@ static inline void xconf_writel(u32 val, volatile void __iomem *addr) { asm volatile ( " st.w %[v], %[hw], 0 \n" - " ld.b $r0, %[hw], 0 \n" + " ld.b $zero, %[hw], 0 \n" : : [hw] "r" (addr), [v] "r" (val) ); @@ -68,7 +68,7 @@ static inline void xconf_writeq(u64 val64, volatile void __iomem *addr) { asm volatile ( " st.d %[v], %[hw], 0 \n" - " ld.b $r0, %[hw], 0 \n" + " ld.b $zero, %[hw], 0 \n" : : [hw] "r" (addr), [v] "r" (val64) ); diff --git a/arch/loongarch/include/asm/stacktrace.h b/arch/loongarch/include/asm/stacktrace.h index 26483e396ad1..6b5c2a7aa706 100644 --- a/arch/loongarch/include/asm/stacktrace.h +++ b/arch/loongarch/include/asm/stacktrace.h @@ -23,13 +23,13 @@ static __always_inline void prepare_frametrace(struct pt_regs *regs) { __asm__ __volatile__( - /* Save $r1 */ + /* Save $ra */ STORE_ONE_REG(1) - /* Use $r1 to save PC */ - "pcaddi $r1, 0\n\t" - STR_LONG_S " $r1, %0\n\t" - /* Restore $r1 */ - STR_LONG_L " $r1, %1, "STR_LONGSIZE"\n\t" + /* Use $ra to save PC */ + "pcaddi $ra, 0\n\t" + STR_LONG_S " $ra, %0\n\t" + /* Restore $ra */ + STR_LONG_L " $ra, %1, "STR_LONGSIZE"\n\t" STORE_ONE_REG(2) STORE_ONE_REG(3) STORE_ONE_REG(4) diff --git a/arch/loongarch/include/asm/thread_info.h b/arch/loongarch/include/asm/thread_info.h index 99beb11c2fa8..b7dd9f19a5a9 100644 --- a/arch/loongarch/include/asm/thread_info.h +++ b/arch/loongarch/include/asm/thread_info.h @@ -44,14 +44,14 @@ struct thread_info { } /* How to get the thread information struct from C. */ -register struct thread_info *__current_thread_info __asm__("$r2"); +register struct thread_info *__current_thread_info __asm__("$tp"); static inline struct thread_info *current_thread_info(void) { return __current_thread_info; } -register unsigned long current_stack_pointer __asm__("$r3"); +register unsigned long current_stack_pointer __asm__("$sp"); #endif /* !__ASSEMBLY__ */ diff --git a/arch/loongarch/include/asm/uaccess.h b/arch/loongarch/include/asm/uaccess.h index 217c6a3727b1..42da43211765 100644 --- a/arch/loongarch/include/asm/uaccess.h +++ b/arch/loongarch/include/asm/uaccess.h @@ -162,7 +162,7 @@ do { \ "2: \n" \ " .section .fixup,\"ax\" \n" \ "3: li.w %0, %3 \n" \ - " or %1, $r0, $r0 \n" \ + " or %1, $zero, $zero \n" \ " b 2b \n" \ " .previous \n" \ " .section __ex_table,\"a\" \n" \ diff --git a/arch/loongarch/mm/page.S b/arch/loongarch/mm/page.S index ddc78ab33c7b..270d509adbaa 100644 --- a/arch/loongarch/mm/page.S +++ b/arch/loongarch/mm/page.S @@ -32,7 +32,7 @@ SYM_FUNC_START(clear_page) st.d zero, a0, -8 bne t0, a0, 1b - jirl $r0, ra, 0 + jirl zero, ra, 0 SYM_FUNC_END(clear_page) EXPORT_SYMBOL(clear_page) @@ -79,6 +79,6 @@ SYM_FUNC_START(copy_page) st.d t7, a0, -8 bne t8, a0, 1b - jirl $r0, ra, 0 + jirl zero, ra, 0 SYM_FUNC_END(copy_page) EXPORT_SYMBOL(copy_page) diff --git a/arch/loongarch/mm/tlbex.S b/arch/loongarch/mm/tlbex.S index 7eee40271577..9e98afe7a67f 100644 --- a/arch/loongarch/mm/tlbex.S +++ b/arch/loongarch/mm/tlbex.S @@ -47,7 +47,7 @@ SYM_FUNC_START(handle_tlb_load) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, $r0, vmalloc_load + blt t0, zero, vmalloc_load csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_load: @@ -80,7 +80,7 @@ vmalloc_done_load: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, $r0, tlb_huge_update_load + bne t0, zero, tlb_huge_update_load csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -100,12 +100,12 @@ smp_pgtable_change_load: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, 1 - beq ra, $r0, nopage_tlb_load + beq ra, zero, nopage_tlb_load ori t0, t0, _PAGE_VALID #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, smp_pgtable_change_load + beq t0, zero, smp_pgtable_change_load #else st.d t0, t1, 0 #endif @@ -139,23 +139,23 @@ tlb_huge_update_load: #endif srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, 1 - beq ra, $r0, nopage_tlb_load + beq ra, zero, nopage_tlb_load tlbsrch ori t0, t0, _PAGE_VALID #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, tlb_huge_update_load + beq t0, zero, tlb_huge_update_load ld.d t0, t1, 0 #else st.d t0, t1, 0 #endif - addu16i.d t1, $r0, -(CSR_TLBIDX_EHINV >> 16) + addu16i.d t1, zero, -(CSR_TLBIDX_EHINV >> 16) addi.d ra, t1, 0 csrxchg ra, t1, LOONGARCH_CSR_TLBIDX tlbwr - csrxchg $r0, t1, LOONGARCH_CSR_TLBIDX + csrxchg zero, t1, LOONGARCH_CSR_TLBIDX /* * A huge PTE describes an area the size of the @@ -178,27 +178,27 @@ tlb_huge_update_load: addi.d t0, ra, 0 /* Convert to entrylo1 */ - addi.d t1, $r0, 1 + addi.d t1, zero, 1 slli.d t1, t1, (HPAGE_SHIFT - 1) add.d t0, t0, t1 csrwr t0, LOONGARCH_CSR_TLBELO1 /* Set huge page tlb entry size */ - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX tlbfill - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX nopage_tlb_load: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_0 - jirl $r0, t0, 0 + jirl zero, t0, 0 SYM_FUNC_END(handle_tlb_load) SYM_FUNC_START(handle_tlb_store) @@ -210,7 +210,7 @@ SYM_FUNC_START(handle_tlb_store) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, $r0, vmalloc_store + blt t0, zero, vmalloc_store csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_store: @@ -244,7 +244,7 @@ vmalloc_done_store: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, $r0, tlb_huge_update_store + bne t0, zero, tlb_huge_update_store csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -265,12 +265,12 @@ smp_pgtable_change_store: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) xori ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) - bne ra, $r0, nopage_tlb_store + bne ra, zero, nopage_tlb_store ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, smp_pgtable_change_store + beq t0, zero, smp_pgtable_change_store #else st.d t0, t1, 0 #endif @@ -306,24 +306,24 @@ tlb_huge_update_store: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) xori ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) - bne ra, $r0, nopage_tlb_store + bne ra, zero, nopage_tlb_store tlbsrch ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, tlb_huge_update_store + beq t0, zero, tlb_huge_update_store ld.d t0, t1, 0 #else st.d t0, t1, 0 #endif - addu16i.d t1, $r0, -(CSR_TLBIDX_EHINV >> 16) + addu16i.d t1, zero, -(CSR_TLBIDX_EHINV >> 16) addi.d ra, t1, 0 csrxchg ra, t1, LOONGARCH_CSR_TLBIDX tlbwr - csrxchg $r0, t1, LOONGARCH_CSR_TLBIDX + csrxchg zero, t1, LOONGARCH_CSR_TLBIDX /* * A huge PTE describes an area the size of the * configured huge page size. This is twice the @@ -345,28 +345,28 @@ tlb_huge_update_store: addi.d t0, ra, 0 /* Convert to entrylo1 */ - addi.d t1, $r0, 1 + addi.d t1, zero, 1 slli.d t1, t1, (HPAGE_SHIFT - 1) add.d t0, t0, t1 csrwr t0, LOONGARCH_CSR_TLBELO1 /* Set huge page tlb entry size */ - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX tlbfill /* Reset default page size */ - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX nopage_tlb_store: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_1 - jirl $r0, t0, 0 + jirl zero, t0, 0 SYM_FUNC_END(handle_tlb_store) SYM_FUNC_START(handle_tlb_modify) @@ -378,7 +378,7 @@ SYM_FUNC_START(handle_tlb_modify) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, $r0, vmalloc_modify + blt t0, zero, vmalloc_modify csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_modify: @@ -411,7 +411,7 @@ vmalloc_done_modify: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, $r0, tlb_huge_update_modify + bne t0, zero, tlb_huge_update_modify csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -431,12 +431,12 @@ smp_pgtable_change_modify: srli.d ra, t0, _PAGE_WRITE_SHIFT andi ra, ra, 1 - beq ra, $r0, nopage_tlb_modify + beq ra, zero, nopage_tlb_modify ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, smp_pgtable_change_modify + beq t0, zero, smp_pgtable_change_modify #else st.d t0, t1, 0 #endif @@ -471,14 +471,14 @@ tlb_huge_update_modify: srli.d ra, t0, _PAGE_WRITE_SHIFT andi ra, ra, 1 - beq ra, $r0, nopage_tlb_modify + beq ra, zero, nopage_tlb_modify tlbsrch ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, $r0, tlb_huge_update_modify + beq t0, zero, tlb_huge_update_modify ld.d t0, t1, 0 #else st.d t0, t1, 0 @@ -504,28 +504,28 @@ tlb_huge_update_modify: addi.d t0, ra, 0 /* Convert to entrylo1 */ - addi.d t1, $r0, 1 + addi.d t1, zero, 1 slli.d t1, t1, (HPAGE_SHIFT - 1) add.d t0, t0, t1 csrwr t0, LOONGARCH_CSR_TLBELO1 /* Set huge page tlb entry size */ - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX tlbwr /* Reset default page size */ - addu16i.d t0, $r0, (CSR_TLBIDX_PS >> 16) - addu16i.d t1, $r0, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) + addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) + addu16i.d t1, zero, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) csrxchg t1, t0, LOONGARCH_CSR_TLBIDX nopage_tlb_modify: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_1 - jirl $r0, t0, 0 + jirl zero, t0, 0 SYM_FUNC_END(handle_tlb_modify) SYM_FUNC_START(handle_tlb_refill) From 07b480695d24d1c9f27bb60fd4b980ae87e8bc1e Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:17 +0800 Subject: [PATCH 102/121] LoongArch: Use the "jr" pseudo-instruction where applicable Some of the assembly code in the LoongArch port likely originated from a time when the assembler did not support pseudo-instructions like "move" or "jr", so the desugared form was used and readability suffers (to a minor degree) as a result. As the upstream toolchain supports these pseudo-instructions from the beginning, migrate the existing few usages to them for better readability. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/kernel/fpu.S | 12 ++++++------ arch/loongarch/kernel/genex.S | 4 ++-- arch/loongarch/kernel/head.S | 4 ++-- arch/loongarch/mm/page.S | 4 ++-- arch/loongarch/mm/tlbex.S | 6 +++--- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S index a631a7137667..e14f096d40bd 100644 --- a/arch/loongarch/kernel/fpu.S +++ b/arch/loongarch/kernel/fpu.S @@ -153,7 +153,7 @@ SYM_FUNC_START(_save_fp) fpu_save_csr a0 t1 fpu_save_double a0 t1 # clobbers t1 fpu_save_cc a0 t1 t2 # clobbers t1, t2 - jirl zero, ra, 0 + jr ra SYM_FUNC_END(_save_fp) EXPORT_SYMBOL(_save_fp) @@ -164,7 +164,7 @@ SYM_FUNC_START(_restore_fp) fpu_restore_double a0 t1 # clobbers t1 fpu_restore_csr a0 t1 fpu_restore_cc a0 t1 t2 # clobbers t1, t2 - jirl zero, ra, 0 + jr ra SYM_FUNC_END(_restore_fp) /* @@ -216,7 +216,7 @@ SYM_FUNC_START(_init_fpu) movgr2fr.d $f30, t1 movgr2fr.d $f31, t1 - jirl zero, ra, 0 + jr ra SYM_FUNC_END(_init_fpu) /* @@ -229,7 +229,7 @@ SYM_FUNC_START(_save_fp_context) sc_save_fcsr a2 t1 sc_save_fp a0 li.w a0, 0 # success - jirl zero, ra, 0 + jr ra SYM_FUNC_END(_save_fp_context) /* @@ -242,10 +242,10 @@ SYM_FUNC_START(_restore_fp_context) sc_restore_fcc a1 t1 t2 sc_restore_fcsr a2 t1 li.w a0, 0 # success - jirl zero, ra, 0 + jr ra SYM_FUNC_END(_restore_fp_context) SYM_FUNC_START(fault) li.w a0, -EFAULT # failure - jirl zero, ra, 0 + jr ra SYM_FUNC_END(fault) diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S index 93496852b3cc..0df6d17dde23 100644 --- a/arch/loongarch/kernel/genex.S +++ b/arch/loongarch/kernel/genex.S @@ -28,7 +28,7 @@ SYM_FUNC_START(__arch_cpu_idle) nop idle 0 /* end of rollback region */ -1: jirl zero, ra, 0 +1: jr ra SYM_FUNC_END(__arch_cpu_idle) SYM_FUNC_START(handle_vint) @@ -91,5 +91,5 @@ SYM_FUNC_END(except_vec_cex) SYM_FUNC_START(handle_sys) la.abs t0, handle_syscall - jirl zero, t0, 0 + jr t0 SYM_FUNC_END(handle_sys) diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S index d01e62dd414f..e553c5fc17da 100644 --- a/arch/loongarch/kernel/head.S +++ b/arch/loongarch/kernel/head.S @@ -32,7 +32,7 @@ SYM_CODE_START(kernel_entry) # kernel entry point /* We might not get launched at the address the kernel is linked to, so we jump there. */ la.abs t0, 0f - jirl zero, t0, 0 + jr t0 0: la t0, __bss_start # clear .bss st.d zero, t0, 0 @@ -86,7 +86,7 @@ SYM_CODE_START(smpboot_entry) ld.d tp, t0, CPU_BOOT_TINFO la.abs t0, 0f - jirl zero, t0, 0 + jr t0 0: bl start_secondary SYM_CODE_END(smpboot_entry) diff --git a/arch/loongarch/mm/page.S b/arch/loongarch/mm/page.S index 270d509adbaa..1e20dd5e3a4b 100644 --- a/arch/loongarch/mm/page.S +++ b/arch/loongarch/mm/page.S @@ -32,7 +32,7 @@ SYM_FUNC_START(clear_page) st.d zero, a0, -8 bne t0, a0, 1b - jirl zero, ra, 0 + jr ra SYM_FUNC_END(clear_page) EXPORT_SYMBOL(clear_page) @@ -79,6 +79,6 @@ SYM_FUNC_START(copy_page) st.d t7, a0, -8 bne t8, a0, 1b - jirl zero, ra, 0 + jr ra SYM_FUNC_END(copy_page) EXPORT_SYMBOL(copy_page) diff --git a/arch/loongarch/mm/tlbex.S b/arch/loongarch/mm/tlbex.S index 9e98afe7a67f..f1234a9c311f 100644 --- a/arch/loongarch/mm/tlbex.S +++ b/arch/loongarch/mm/tlbex.S @@ -198,7 +198,7 @@ nopage_tlb_load: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_0 - jirl zero, t0, 0 + jr t0 SYM_FUNC_END(handle_tlb_load) SYM_FUNC_START(handle_tlb_store) @@ -366,7 +366,7 @@ nopage_tlb_store: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_1 - jirl zero, t0, 0 + jr t0 SYM_FUNC_END(handle_tlb_store) SYM_FUNC_START(handle_tlb_modify) @@ -525,7 +525,7 @@ nopage_tlb_modify: dbar 0 csrrd ra, EXCEPTION_KS2 la.abs t0, tlb_do_page_fault_1 - jirl zero, t0, 0 + jr t0 SYM_FUNC_END(handle_tlb_modify) SYM_FUNC_START(handle_tlb_refill) From 57ce5d3eefacfaadfe2ed0a3a85713d1ae6287b9 Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:18 +0800 Subject: [PATCH 103/121] LoongArch: Use the "move" pseudo-instruction where applicable Some of the assembly code in the LoongArch port likely originated from a time when the assembler did not support pseudo-instructions like "move" or "jr", so the desugared form was used and readability suffers (to a minor degree) as a result. As the upstream toolchain supports these pseudo-instructions from the beginning, migrate the existing few usages to them for better readability. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/atomic.h | 8 ++++---- arch/loongarch/include/asm/cmpxchg.h | 2 +- arch/loongarch/include/asm/futex.h | 2 +- arch/loongarch/include/asm/uaccess.h | 2 +- arch/loongarch/kernel/head.S | 2 +- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h index 979367ad4e2c..a0a33ee793d6 100644 --- a/arch/loongarch/include/asm/atomic.h +++ b/arch/loongarch/include/asm/atomic.h @@ -157,7 +157,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) __asm__ __volatile__( "1: ll.w %1, %2 # atomic_sub_if_positive\n" " addi.w %0, %1, %3 \n" - " or %1, %0, $zero \n" + " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.w %1, %2 \n" " beq $zero, %1, 1b \n" @@ -170,7 +170,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) __asm__ __volatile__( "1: ll.w %1, %2 # atomic_sub_if_positive\n" " sub.w %0, %1, %3 \n" - " or %1, %0, $zero \n" + " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.w %1, %2 \n" " beq $zero, %1, 1b \n" @@ -320,7 +320,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) __asm__ __volatile__( "1: ll.d %1, %2 # atomic64_sub_if_positive \n" " addi.d %0, %1, %3 \n" - " or %1, %0, $zero \n" + " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.d %1, %2 \n" " beq %1, $zero, 1b \n" @@ -333,7 +333,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) __asm__ __volatile__( "1: ll.d %1, %2 # atomic64_sub_if_positive \n" " sub.d %0, %1, %3 \n" - " or %1, %0, $zero \n" + " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.d %1, %2 \n" " beq %1, $zero, 1b \n" diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h index 75b3a4478652..9e9939196471 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++ b/arch/loongarch/include/asm/cmpxchg.h @@ -55,7 +55,7 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x, __asm__ __volatile__( \ "1: " ld " %0, %2 # __cmpxchg_asm \n" \ " bne %0, %z3, 2f \n" \ - " or $t0, %z4, $zero \n" \ + " move $t0, %z4 \n" \ " " st " $t0, %1 \n" \ " beq $zero, $t0, 1b \n" \ "2: \n" \ diff --git a/arch/loongarch/include/asm/futex.h b/arch/loongarch/include/asm/futex.h index 9de8231694ec..170ec9f97e58 100644 --- a/arch/loongarch/include/asm/futex.h +++ b/arch/loongarch/include/asm/futex.h @@ -82,7 +82,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newv "# futex_atomic_cmpxchg_inatomic \n" "1: ll.w %1, %3 \n" " bne %1, %z4, 3f \n" - " or $t0, %z5, $zero \n" + " move $t0, %z5 \n" "2: sc.w $t0, %2 \n" " beq $zero, $t0, 1b \n" "3: \n" diff --git a/arch/loongarch/include/asm/uaccess.h b/arch/loongarch/include/asm/uaccess.h index 42da43211765..2b44edc604a2 100644 --- a/arch/loongarch/include/asm/uaccess.h +++ b/arch/loongarch/include/asm/uaccess.h @@ -162,7 +162,7 @@ do { \ "2: \n" \ " .section .fixup,\"ax\" \n" \ "3: li.w %0, %3 \n" \ - " or %1, $zero, $zero \n" \ + " move %1, $zero \n" \ " b 2b \n" \ " .previous \n" \ " .section __ex_table,\"a\" \n" \ diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S index e553c5fc17da..fd6a62f17161 100644 --- a/arch/loongarch/kernel/head.S +++ b/arch/loongarch/kernel/head.S @@ -50,7 +50,7 @@ SYM_CODE_START(kernel_entry) # kernel entry point /* KSave3 used for percpu base, initialized as 0 */ csrwr zero, PERCPU_BASE_KS /* GPR21 used for percpu base (runtime), initialized as 0 */ - or u0, zero, zero + move u0, zero la tp, init_thread_union /* Set the SP after an empty pt_regs. */ From d47b2dc87c58154052daf8ac0f9229db5c7890cc Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:19 +0800 Subject: [PATCH 104/121] LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ While B{EQ,NE}Z and B{EQ,NE} are different instructions, and the vastly expanded range for branch destination does not really matter in the few cases touched, use the B{EQ,NE}Z where possible for shorter lines and better consistency (e.g. some places used "BEQ foo, zero", while some used "BEQ zero, foo"). Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/atomic.h | 8 ++++---- arch/loongarch/include/asm/cmpxchg.h | 2 +- arch/loongarch/include/asm/futex.h | 4 ++-- arch/loongarch/mm/tlbex.S | 30 ++++++++++++++-------------- 4 files changed, 22 insertions(+), 22 deletions(-) diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h index a0a33ee793d6..0869bec2c937 100644 --- a/arch/loongarch/include/asm/atomic.h +++ b/arch/loongarch/include/asm/atomic.h @@ -160,7 +160,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.w %1, %2 \n" - " beq $zero, %1, 1b \n" + " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB : "=&r" (result), "=&r" (temp), @@ -173,7 +173,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.w %1, %2 \n" - " beq $zero, %1, 1b \n" + " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB : "=&r" (result), "=&r" (temp), @@ -323,7 +323,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.d %1, %2 \n" - " beq %1, $zero, 1b \n" + " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB : "=&r" (result), "=&r" (temp), @@ -336,7 +336,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) " move %1, %0 \n" " blt %0, $zero, 2f \n" " sc.d %1, %2 \n" - " beq %1, $zero, 1b \n" + " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB : "=&r" (result), "=&r" (temp), diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h index 9e9939196471..0a9b0fac1eee 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++ b/arch/loongarch/include/asm/cmpxchg.h @@ -57,7 +57,7 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x, " bne %0, %z3, 2f \n" \ " move $t0, %z4 \n" \ " " st " $t0, %1 \n" \ - " beq $zero, $t0, 1b \n" \ + " beqz $t0, 1b \n" \ "2: \n" \ __WEAK_LLSC_MB \ : "=&r" (__ret), "=ZB"(*m) \ diff --git a/arch/loongarch/include/asm/futex.h b/arch/loongarch/include/asm/futex.h index 170ec9f97e58..837659335fb1 100644 --- a/arch/loongarch/include/asm/futex.h +++ b/arch/loongarch/include/asm/futex.h @@ -17,7 +17,7 @@ "1: ll.w %1, %4 # __futex_atomic_op\n" \ " " insn " \n" \ "2: sc.w $t0, %2 \n" \ - " beq $t0, $zero, 1b \n" \ + " beqz $t0, 1b \n" \ "3: \n" \ " .section .fixup,\"ax\" \n" \ "4: li.w %0, %6 \n" \ @@ -84,7 +84,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newv " bne %1, %z4, 3f \n" " move $t0, %z5 \n" "2: sc.w $t0, %2 \n" - " beq $zero, $t0, 1b \n" + " beqz $t0, 1b \n" "3: \n" __WEAK_LLSC_MB " .section .fixup,\"ax\" \n" diff --git a/arch/loongarch/mm/tlbex.S b/arch/loongarch/mm/tlbex.S index f1234a9c311f..4d16e27020e0 100644 --- a/arch/loongarch/mm/tlbex.S +++ b/arch/loongarch/mm/tlbex.S @@ -80,7 +80,7 @@ vmalloc_done_load: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, zero, tlb_huge_update_load + bnez t0, tlb_huge_update_load csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -100,12 +100,12 @@ smp_pgtable_change_load: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, 1 - beq ra, zero, nopage_tlb_load + beqz ra, nopage_tlb_load ori t0, t0, _PAGE_VALID #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, smp_pgtable_change_load + beqz t0, smp_pgtable_change_load #else st.d t0, t1, 0 #endif @@ -139,13 +139,13 @@ tlb_huge_update_load: #endif srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, 1 - beq ra, zero, nopage_tlb_load + beqz ra, nopage_tlb_load tlbsrch ori t0, t0, _PAGE_VALID #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, tlb_huge_update_load + beqz t0, tlb_huge_update_load ld.d t0, t1, 0 #else st.d t0, t1, 0 @@ -244,7 +244,7 @@ vmalloc_done_store: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, zero, tlb_huge_update_store + bnez t0, tlb_huge_update_store csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -265,12 +265,12 @@ smp_pgtable_change_store: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) xori ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) - bne ra, zero, nopage_tlb_store + bnez ra, nopage_tlb_store ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, smp_pgtable_change_store + beqz t0, smp_pgtable_change_store #else st.d t0, t1, 0 #endif @@ -306,14 +306,14 @@ tlb_huge_update_store: srli.d ra, t0, _PAGE_PRESENT_SHIFT andi ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) xori ra, ra, ((_PAGE_PRESENT | _PAGE_WRITE) >> _PAGE_PRESENT_SHIFT) - bne ra, zero, nopage_tlb_store + bnez ra, nopage_tlb_store tlbsrch ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, tlb_huge_update_store + beqz t0, tlb_huge_update_store ld.d t0, t1, 0 #else st.d t0, t1, 0 @@ -411,7 +411,7 @@ vmalloc_done_modify: * see if we need to jump to huge tlb processing. */ andi t0, ra, _PAGE_HUGE - bne t0, zero, tlb_huge_update_modify + bnez t0, tlb_huge_update_modify csrrd t0, LOONGARCH_CSR_BADV srli.d t0, t0, (PAGE_SHIFT + PTE_ORDER) @@ -431,12 +431,12 @@ smp_pgtable_change_modify: srli.d ra, t0, _PAGE_WRITE_SHIFT andi ra, ra, 1 - beq ra, zero, nopage_tlb_modify + beqz ra, nopage_tlb_modify ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, smp_pgtable_change_modify + beqz t0, smp_pgtable_change_modify #else st.d t0, t1, 0 #endif @@ -471,14 +471,14 @@ tlb_huge_update_modify: srli.d ra, t0, _PAGE_WRITE_SHIFT andi ra, ra, 1 - beq ra, zero, nopage_tlb_modify + beqz ra, nopage_tlb_modify tlbsrch ori t0, t0, (_PAGE_VALID | _PAGE_DIRTY | _PAGE_MODIFIED) #ifdef CONFIG_SMP sc.d t0, t1, 0 - beq t0, zero, tlb_huge_update_modify + beqz t0, tlb_huge_update_modify ld.d t0, t1, 0 #else st.d t0, t1, 0 From d1bc75d7595b237f78b594509ea7cc159f98cae9 Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:20 +0800 Subject: [PATCH 105/121] LoongArch: Simplify "BLT foo, zero" with BLTZ Support for the syntactic sugar is present in upstream binutils port from the beginning. Use it for shorter lines and better consistency. Generated code should be identical. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/atomic.h | 8 ++++---- arch/loongarch/mm/tlbex.S | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h index 0869bec2c937..dc2ae4f22c8e 100644 --- a/arch/loongarch/include/asm/atomic.h +++ b/arch/loongarch/include/asm/atomic.h @@ -158,7 +158,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) "1: ll.w %1, %2 # atomic_sub_if_positive\n" " addi.w %0, %1, %3 \n" " move %1, %0 \n" - " blt %0, $zero, 2f \n" + " bltz %0, 2f \n" " sc.w %1, %2 \n" " beqz %1, 1b \n" "2: \n" @@ -171,7 +171,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) "1: ll.w %1, %2 # atomic_sub_if_positive\n" " sub.w %0, %1, %3 \n" " move %1, %0 \n" - " blt %0, $zero, 2f \n" + " bltz %0, 2f \n" " sc.w %1, %2 \n" " beqz %1, 1b \n" "2: \n" @@ -321,7 +321,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) "1: ll.d %1, %2 # atomic64_sub_if_positive \n" " addi.d %0, %1, %3 \n" " move %1, %0 \n" - " blt %0, $zero, 2f \n" + " bltz %0, 2f \n" " sc.d %1, %2 \n" " beqz %1, 1b \n" "2: \n" @@ -334,7 +334,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) "1: ll.d %1, %2 # atomic64_sub_if_positive \n" " sub.d %0, %1, %3 \n" " move %1, %0 \n" - " blt %0, $zero, 2f \n" + " bltz %0, 2f \n" " sc.d %1, %2 \n" " beqz %1, 1b \n" "2: \n" diff --git a/arch/loongarch/mm/tlbex.S b/arch/loongarch/mm/tlbex.S index 4d16e27020e0..9ca1e3ff1ded 100644 --- a/arch/loongarch/mm/tlbex.S +++ b/arch/loongarch/mm/tlbex.S @@ -47,7 +47,7 @@ SYM_FUNC_START(handle_tlb_load) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, zero, vmalloc_load + bltz t0, vmalloc_load csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_load: @@ -210,7 +210,7 @@ SYM_FUNC_START(handle_tlb_store) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, zero, vmalloc_store + bltz t0, vmalloc_store csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_store: @@ -378,7 +378,7 @@ SYM_FUNC_START(handle_tlb_modify) * The vmalloc handling is not in the hotpath. */ csrrd t0, LOONGARCH_CSR_BADV - blt t0, zero, vmalloc_modify + bltz t0, vmalloc_modify csrrd t1, LOONGARCH_CSR_PGDL vmalloc_done_modify: From 1fdb9a92495a6b6996530d27781892796e22f08b Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:21 +0800 Subject: [PATCH 106/121] LoongArch: Simplify "BGT foo, zero" with BGTZ Support for the syntactic sugar is present in upstream binutils port from the beginning. Use it for shorter lines and better consistency. Generated code should be identical. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/lib/clear_user.S | 2 +- arch/loongarch/lib/copy_user.S | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/loongarch/lib/clear_user.S b/arch/loongarch/lib/clear_user.S index 25d9be5fbb19..16ba2b8dd68a 100644 --- a/arch/loongarch/lib/clear_user.S +++ b/arch/loongarch/lib/clear_user.S @@ -32,7 +32,7 @@ SYM_FUNC_START(__clear_user) 1: st.b zero, a0, 0 addi.d a0, a0, 1 addi.d a1, a1, -1 - bgt a1, zero, 1b + bgtz a1, 1b 2: move a0, a1 jr ra diff --git a/arch/loongarch/lib/copy_user.S b/arch/loongarch/lib/copy_user.S index 9ae507f851b5..97d20327a69e 100644 --- a/arch/loongarch/lib/copy_user.S +++ b/arch/loongarch/lib/copy_user.S @@ -35,7 +35,7 @@ SYM_FUNC_START(__copy_user) addi.d a0, a0, 1 addi.d a1, a1, 1 addi.d a2, a2, -1 - bgt a2, zero, 1b + bgtz a2, 1b 3: move a0, a2 jr ra From f5c3c22f21b6a002e371afdcc9180a2fa47dc267 Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:22 +0800 Subject: [PATCH 107/121] LoongArch: Re-tab the assembly files Reflow the *.S files for better stylistic consistency, namely hard tabs after mnemonic position, and vertical alignment of the first operand with hard tabs. Tab width is obviously 8. Some pre-existing intra-block vertical alignments are preserved. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/kernel/entry.S | 4 +- arch/loongarch/kernel/fpu.S | 170 ++++++++++++++++----------------- arch/loongarch/kernel/genex.S | 8 +- arch/loongarch/kernel/head.S | 4 +- arch/loongarch/kernel/switch.S | 4 +- arch/loongarch/mm/page.S | 118 +++++++++++------------ arch/loongarch/mm/tlbex.S | 18 ++-- 7 files changed, 163 insertions(+), 163 deletions(-) diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S index d5b3dbcf5425..d53b631c9022 100644 --- a/arch/loongarch/kernel/entry.S +++ b/arch/loongarch/kernel/entry.S @@ -27,7 +27,7 @@ SYM_FUNC_START(handle_syscall) addi.d sp, sp, -PT_SIZE cfi_st t2, PT_R3 - cfi_rel_offset sp, PT_R3 + cfi_rel_offset sp, PT_R3 st.d zero, sp, PT_R0 csrrd t2, LOONGARCH_CSR_PRMD st.d t2, sp, PT_PRMD @@ -50,7 +50,7 @@ SYM_FUNC_START(handle_syscall) cfi_st a7, PT_R11 csrrd ra, LOONGARCH_CSR_ERA st.d ra, sp, PT_ERA - cfi_rel_offset ra, PT_ERA + cfi_rel_offset ra, PT_ERA cfi_st tp, PT_R2 cfi_st u0, PT_R21 diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S index e14f096d40bd..576b3370a296 100644 --- a/arch/loongarch/kernel/fpu.S +++ b/arch/loongarch/kernel/fpu.S @@ -27,78 +27,78 @@ .endm .macro sc_save_fp base - EX fst.d $f0, \base, (0 * FPU_REG_WIDTH) - EX fst.d $f1, \base, (1 * FPU_REG_WIDTH) - EX fst.d $f2, \base, (2 * FPU_REG_WIDTH) - EX fst.d $f3, \base, (3 * FPU_REG_WIDTH) - EX fst.d $f4, \base, (4 * FPU_REG_WIDTH) - EX fst.d $f5, \base, (5 * FPU_REG_WIDTH) - EX fst.d $f6, \base, (6 * FPU_REG_WIDTH) - EX fst.d $f7, \base, (7 * FPU_REG_WIDTH) - EX fst.d $f8, \base, (8 * FPU_REG_WIDTH) - EX fst.d $f9, \base, (9 * FPU_REG_WIDTH) - EX fst.d $f10, \base, (10 * FPU_REG_WIDTH) - EX fst.d $f11, \base, (11 * FPU_REG_WIDTH) - EX fst.d $f12, \base, (12 * FPU_REG_WIDTH) - EX fst.d $f13, \base, (13 * FPU_REG_WIDTH) - EX fst.d $f14, \base, (14 * FPU_REG_WIDTH) - EX fst.d $f15, \base, (15 * FPU_REG_WIDTH) - EX fst.d $f16, \base, (16 * FPU_REG_WIDTH) - EX fst.d $f17, \base, (17 * FPU_REG_WIDTH) - EX fst.d $f18, \base, (18 * FPU_REG_WIDTH) - EX fst.d $f19, \base, (19 * FPU_REG_WIDTH) - EX fst.d $f20, \base, (20 * FPU_REG_WIDTH) - EX fst.d $f21, \base, (21 * FPU_REG_WIDTH) - EX fst.d $f22, \base, (22 * FPU_REG_WIDTH) - EX fst.d $f23, \base, (23 * FPU_REG_WIDTH) - EX fst.d $f24, \base, (24 * FPU_REG_WIDTH) - EX fst.d $f25, \base, (25 * FPU_REG_WIDTH) - EX fst.d $f26, \base, (26 * FPU_REG_WIDTH) - EX fst.d $f27, \base, (27 * FPU_REG_WIDTH) - EX fst.d $f28, \base, (28 * FPU_REG_WIDTH) - EX fst.d $f29, \base, (29 * FPU_REG_WIDTH) - EX fst.d $f30, \base, (30 * FPU_REG_WIDTH) - EX fst.d $f31, \base, (31 * FPU_REG_WIDTH) + EX fst.d $f0, \base, (0 * FPU_REG_WIDTH) + EX fst.d $f1, \base, (1 * FPU_REG_WIDTH) + EX fst.d $f2, \base, (2 * FPU_REG_WIDTH) + EX fst.d $f3, \base, (3 * FPU_REG_WIDTH) + EX fst.d $f4, \base, (4 * FPU_REG_WIDTH) + EX fst.d $f5, \base, (5 * FPU_REG_WIDTH) + EX fst.d $f6, \base, (6 * FPU_REG_WIDTH) + EX fst.d $f7, \base, (7 * FPU_REG_WIDTH) + EX fst.d $f8, \base, (8 * FPU_REG_WIDTH) + EX fst.d $f9, \base, (9 * FPU_REG_WIDTH) + EX fst.d $f10, \base, (10 * FPU_REG_WIDTH) + EX fst.d $f11, \base, (11 * FPU_REG_WIDTH) + EX fst.d $f12, \base, (12 * FPU_REG_WIDTH) + EX fst.d $f13, \base, (13 * FPU_REG_WIDTH) + EX fst.d $f14, \base, (14 * FPU_REG_WIDTH) + EX fst.d $f15, \base, (15 * FPU_REG_WIDTH) + EX fst.d $f16, \base, (16 * FPU_REG_WIDTH) + EX fst.d $f17, \base, (17 * FPU_REG_WIDTH) + EX fst.d $f18, \base, (18 * FPU_REG_WIDTH) + EX fst.d $f19, \base, (19 * FPU_REG_WIDTH) + EX fst.d $f20, \base, (20 * FPU_REG_WIDTH) + EX fst.d $f21, \base, (21 * FPU_REG_WIDTH) + EX fst.d $f22, \base, (22 * FPU_REG_WIDTH) + EX fst.d $f23, \base, (23 * FPU_REG_WIDTH) + EX fst.d $f24, \base, (24 * FPU_REG_WIDTH) + EX fst.d $f25, \base, (25 * FPU_REG_WIDTH) + EX fst.d $f26, \base, (26 * FPU_REG_WIDTH) + EX fst.d $f27, \base, (27 * FPU_REG_WIDTH) + EX fst.d $f28, \base, (28 * FPU_REG_WIDTH) + EX fst.d $f29, \base, (29 * FPU_REG_WIDTH) + EX fst.d $f30, \base, (30 * FPU_REG_WIDTH) + EX fst.d $f31, \base, (31 * FPU_REG_WIDTH) .endm .macro sc_restore_fp base - EX fld.d $f0, \base, (0 * FPU_REG_WIDTH) - EX fld.d $f1, \base, (1 * FPU_REG_WIDTH) - EX fld.d $f2, \base, (2 * FPU_REG_WIDTH) - EX fld.d $f3, \base, (3 * FPU_REG_WIDTH) - EX fld.d $f4, \base, (4 * FPU_REG_WIDTH) - EX fld.d $f5, \base, (5 * FPU_REG_WIDTH) - EX fld.d $f6, \base, (6 * FPU_REG_WIDTH) - EX fld.d $f7, \base, (7 * FPU_REG_WIDTH) - EX fld.d $f8, \base, (8 * FPU_REG_WIDTH) - EX fld.d $f9, \base, (9 * FPU_REG_WIDTH) - EX fld.d $f10, \base, (10 * FPU_REG_WIDTH) - EX fld.d $f11, \base, (11 * FPU_REG_WIDTH) - EX fld.d $f12, \base, (12 * FPU_REG_WIDTH) - EX fld.d $f13, \base, (13 * FPU_REG_WIDTH) - EX fld.d $f14, \base, (14 * FPU_REG_WIDTH) - EX fld.d $f15, \base, (15 * FPU_REG_WIDTH) - EX fld.d $f16, \base, (16 * FPU_REG_WIDTH) - EX fld.d $f17, \base, (17 * FPU_REG_WIDTH) - EX fld.d $f18, \base, (18 * FPU_REG_WIDTH) - EX fld.d $f19, \base, (19 * FPU_REG_WIDTH) - EX fld.d $f20, \base, (20 * FPU_REG_WIDTH) - EX fld.d $f21, \base, (21 * FPU_REG_WIDTH) - EX fld.d $f22, \base, (22 * FPU_REG_WIDTH) - EX fld.d $f23, \base, (23 * FPU_REG_WIDTH) - EX fld.d $f24, \base, (24 * FPU_REG_WIDTH) - EX fld.d $f25, \base, (25 * FPU_REG_WIDTH) - EX fld.d $f26, \base, (26 * FPU_REG_WIDTH) - EX fld.d $f27, \base, (27 * FPU_REG_WIDTH) - EX fld.d $f28, \base, (28 * FPU_REG_WIDTH) - EX fld.d $f29, \base, (29 * FPU_REG_WIDTH) - EX fld.d $f30, \base, (30 * FPU_REG_WIDTH) - EX fld.d $f31, \base, (31 * FPU_REG_WIDTH) + EX fld.d $f0, \base, (0 * FPU_REG_WIDTH) + EX fld.d $f1, \base, (1 * FPU_REG_WIDTH) + EX fld.d $f2, \base, (2 * FPU_REG_WIDTH) + EX fld.d $f3, \base, (3 * FPU_REG_WIDTH) + EX fld.d $f4, \base, (4 * FPU_REG_WIDTH) + EX fld.d $f5, \base, (5 * FPU_REG_WIDTH) + EX fld.d $f6, \base, (6 * FPU_REG_WIDTH) + EX fld.d $f7, \base, (7 * FPU_REG_WIDTH) + EX fld.d $f8, \base, (8 * FPU_REG_WIDTH) + EX fld.d $f9, \base, (9 * FPU_REG_WIDTH) + EX fld.d $f10, \base, (10 * FPU_REG_WIDTH) + EX fld.d $f11, \base, (11 * FPU_REG_WIDTH) + EX fld.d $f12, \base, (12 * FPU_REG_WIDTH) + EX fld.d $f13, \base, (13 * FPU_REG_WIDTH) + EX fld.d $f14, \base, (14 * FPU_REG_WIDTH) + EX fld.d $f15, \base, (15 * FPU_REG_WIDTH) + EX fld.d $f16, \base, (16 * FPU_REG_WIDTH) + EX fld.d $f17, \base, (17 * FPU_REG_WIDTH) + EX fld.d $f18, \base, (18 * FPU_REG_WIDTH) + EX fld.d $f19, \base, (19 * FPU_REG_WIDTH) + EX fld.d $f20, \base, (20 * FPU_REG_WIDTH) + EX fld.d $f21, \base, (21 * FPU_REG_WIDTH) + EX fld.d $f22, \base, (22 * FPU_REG_WIDTH) + EX fld.d $f23, \base, (23 * FPU_REG_WIDTH) + EX fld.d $f24, \base, (24 * FPU_REG_WIDTH) + EX fld.d $f25, \base, (25 * FPU_REG_WIDTH) + EX fld.d $f26, \base, (26 * FPU_REG_WIDTH) + EX fld.d $f27, \base, (27 * FPU_REG_WIDTH) + EX fld.d $f28, \base, (28 * FPU_REG_WIDTH) + EX fld.d $f29, \base, (29 * FPU_REG_WIDTH) + EX fld.d $f30, \base, (30 * FPU_REG_WIDTH) + EX fld.d $f31, \base, (31 * FPU_REG_WIDTH) .endm .macro sc_save_fcc base, tmp0, tmp1 movcf2gr \tmp0, $fcc0 - move \tmp1, \tmp0 + move \tmp1, \tmp0 movcf2gr \tmp0, $fcc1 bstrins.d \tmp1, \tmp0, 15, 8 movcf2gr \tmp0, $fcc2 @@ -113,11 +113,11 @@ bstrins.d \tmp1, \tmp0, 55, 48 movcf2gr \tmp0, $fcc7 bstrins.d \tmp1, \tmp0, 63, 56 - EX st.d \tmp1, \base, 0 + EX st.d \tmp1, \base, 0 .endm .macro sc_restore_fcc base, tmp0, tmp1 - EX ld.d \tmp0, \base, 0 + EX ld.d \tmp0, \base, 0 bstrpick.d \tmp1, \tmp0, 7, 0 movgr2cf $fcc0, \tmp1 bstrpick.d \tmp1, \tmp0, 15, 8 @@ -138,11 +138,11 @@ .macro sc_save_fcsr base, tmp0 movfcsr2gr \tmp0, fcsr0 - EX st.w \tmp0, \base, 0 + EX st.w \tmp0, \base, 0 .endm .macro sc_restore_fcsr base, tmp0 - EX ld.w \tmp0, \base, 0 + EX ld.w \tmp0, \base, 0 movgr2fcsr fcsr0, \tmp0 .endm @@ -151,9 +151,9 @@ */ SYM_FUNC_START(_save_fp) fpu_save_csr a0 t1 - fpu_save_double a0 t1 # clobbers t1 + fpu_save_double a0 t1 # clobbers t1 fpu_save_cc a0 t1 t2 # clobbers t1, t2 - jr ra + jr ra SYM_FUNC_END(_save_fp) EXPORT_SYMBOL(_save_fp) @@ -161,10 +161,10 @@ EXPORT_SYMBOL(_save_fp) * Restore a thread's fp context. */ SYM_FUNC_START(_restore_fp) - fpu_restore_double a0 t1 # clobbers t1 - fpu_restore_csr a0 t1 - fpu_restore_cc a0 t1 t2 # clobbers t1, t2 - jr ra + fpu_restore_double a0 t1 # clobbers t1 + fpu_restore_csr a0 t1 + fpu_restore_cc a0 t1 t2 # clobbers t1, t2 + jr ra SYM_FUNC_END(_restore_fp) /* @@ -225,11 +225,11 @@ SYM_FUNC_END(_init_fpu) * a2: fcsr */ SYM_FUNC_START(_save_fp_context) - sc_save_fcc a1 t1 t2 - sc_save_fcsr a2 t1 - sc_save_fp a0 - li.w a0, 0 # success - jr ra + sc_save_fcc a1 t1 t2 + sc_save_fcsr a2 t1 + sc_save_fp a0 + li.w a0, 0 # success + jr ra SYM_FUNC_END(_save_fp_context) /* @@ -238,11 +238,11 @@ SYM_FUNC_END(_save_fp_context) * a2: fcsr */ SYM_FUNC_START(_restore_fp_context) - sc_restore_fp a0 - sc_restore_fcc a1 t1 t2 - sc_restore_fcsr a2 t1 - li.w a0, 0 # success - jr ra + sc_restore_fp a0 + sc_restore_fcc a1 t1 t2 + sc_restore_fcsr a2 t1 + li.w a0, 0 # success + jr ra SYM_FUNC_END(_restore_fp_context) SYM_FUNC_START(fault) diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S index 0df6d17dde23..75e5be807a0d 100644 --- a/arch/loongarch/kernel/genex.S +++ b/arch/loongarch/kernel/genex.S @@ -35,16 +35,16 @@ SYM_FUNC_START(handle_vint) BACKUP_T0T1 SAVE_ALL la.abs t1, __arch_cpu_idle - LONG_L t0, sp, PT_ERA + LONG_L t0, sp, PT_ERA /* 32 byte rollback region */ ori t0, t0, 0x1f xori t0, t0, 0x1f bne t0, t1, 1f - LONG_S t0, sp, PT_ERA + LONG_S t0, sp, PT_ERA 1: move a0, sp move a1, sp la.abs t0, do_vint - jirl ra, t0, 0 + jirl ra, t0, 0 RESTORE_ALL_AND_RET SYM_FUNC_END(handle_vint) @@ -72,7 +72,7 @@ SYM_FUNC_END(except_vec_cex) build_prep_\prep move a0, sp la.abs t0, do_\handler - jirl ra, t0, 0 + jirl ra, t0, 0 RESTORE_ALL_AND_RET SYM_FUNC_END(handle_\exception) .endm diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S index fd6a62f17161..7062cdf0e33e 100644 --- a/arch/loongarch/kernel/head.S +++ b/arch/loongarch/kernel/head.S @@ -85,8 +85,8 @@ SYM_CODE_START(smpboot_entry) ld.d sp, t0, CPU_BOOT_STACK ld.d tp, t0, CPU_BOOT_TINFO - la.abs t0, 0f - jr t0 + la.abs t0, 0f + jr t0 0: bl start_secondary SYM_CODE_END(smpboot_entry) diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S index 53e2fa8e580e..37e84ac8ffc2 100644 --- a/arch/loongarch/kernel/switch.S +++ b/arch/loongarch/kernel/switch.S @@ -24,8 +24,8 @@ SYM_FUNC_START(__switch_to) move tp, a2 cpu_restore_nonscratch a1 - li.w t0, _THREAD_SIZE - 32 - PTR_ADD t0, t0, tp + li.w t0, _THREAD_SIZE - 32 + PTR_ADD t0, t0, tp set_saved_sp t0, t1, t2 ldptr.d t1, a1, THREAD_CSRPRMD diff --git a/arch/loongarch/mm/page.S b/arch/loongarch/mm/page.S index 1e20dd5e3a4b..4c874a7af0ad 100644 --- a/arch/loongarch/mm/page.S +++ b/arch/loongarch/mm/page.S @@ -10,75 +10,75 @@ .align 5 SYM_FUNC_START(clear_page) - lu12i.w t0, 1 << (PAGE_SHIFT - 12) - add.d t0, t0, a0 + lu12i.w t0, 1 << (PAGE_SHIFT - 12) + add.d t0, t0, a0 1: - st.d zero, a0, 0 - st.d zero, a0, 8 - st.d zero, a0, 16 - st.d zero, a0, 24 - st.d zero, a0, 32 - st.d zero, a0, 40 - st.d zero, a0, 48 - st.d zero, a0, 56 - addi.d a0, a0, 128 - st.d zero, a0, -64 - st.d zero, a0, -56 - st.d zero, a0, -48 - st.d zero, a0, -40 - st.d zero, a0, -32 - st.d zero, a0, -24 - st.d zero, a0, -16 - st.d zero, a0, -8 - bne t0, a0, 1b + st.d zero, a0, 0 + st.d zero, a0, 8 + st.d zero, a0, 16 + st.d zero, a0, 24 + st.d zero, a0, 32 + st.d zero, a0, 40 + st.d zero, a0, 48 + st.d zero, a0, 56 + addi.d a0, a0, 128 + st.d zero, a0, -64 + st.d zero, a0, -56 + st.d zero, a0, -48 + st.d zero, a0, -40 + st.d zero, a0, -32 + st.d zero, a0, -24 + st.d zero, a0, -16 + st.d zero, a0, -8 + bne t0, a0, 1b - jr ra + jr ra SYM_FUNC_END(clear_page) EXPORT_SYMBOL(clear_page) .align 5 SYM_FUNC_START(copy_page) - lu12i.w t8, 1 << (PAGE_SHIFT - 12) - add.d t8, t8, a0 + lu12i.w t8, 1 << (PAGE_SHIFT - 12) + add.d t8, t8, a0 1: - ld.d t0, a1, 0 - ld.d t1, a1, 8 - ld.d t2, a1, 16 - ld.d t3, a1, 24 - ld.d t4, a1, 32 - ld.d t5, a1, 40 - ld.d t6, a1, 48 - ld.d t7, a1, 56 + ld.d t0, a1, 0 + ld.d t1, a1, 8 + ld.d t2, a1, 16 + ld.d t3, a1, 24 + ld.d t4, a1, 32 + ld.d t5, a1, 40 + ld.d t6, a1, 48 + ld.d t7, a1, 56 - st.d t0, a0, 0 - st.d t1, a0, 8 - ld.d t0, a1, 64 - ld.d t1, a1, 72 - st.d t2, a0, 16 - st.d t3, a0, 24 - ld.d t2, a1, 80 - ld.d t3, a1, 88 - st.d t4, a0, 32 - st.d t5, a0, 40 - ld.d t4, a1, 96 - ld.d t5, a1, 104 - st.d t6, a0, 48 - st.d t7, a0, 56 - ld.d t6, a1, 112 - ld.d t7, a1, 120 - addi.d a0, a0, 128 - addi.d a1, a1, 128 + st.d t0, a0, 0 + st.d t1, a0, 8 + ld.d t0, a1, 64 + ld.d t1, a1, 72 + st.d t2, a0, 16 + st.d t3, a0, 24 + ld.d t2, a1, 80 + ld.d t3, a1, 88 + st.d t4, a0, 32 + st.d t5, a0, 40 + ld.d t4, a1, 96 + ld.d t5, a1, 104 + st.d t6, a0, 48 + st.d t7, a0, 56 + ld.d t6, a1, 112 + ld.d t7, a1, 120 + addi.d a0, a0, 128 + addi.d a1, a1, 128 - st.d t0, a0, -64 - st.d t1, a0, -56 - st.d t2, a0, -48 - st.d t3, a0, -40 - st.d t4, a0, -32 - st.d t5, a0, -24 - st.d t6, a0, -16 - st.d t7, a0, -8 + st.d t0, a0, -64 + st.d t1, a0, -56 + st.d t2, a0, -48 + st.d t3, a0, -40 + st.d t4, a0, -32 + st.d t5, a0, -24 + st.d t6, a0, -16 + st.d t7, a0, -8 - bne t8, a0, 1b - jr ra + bne t8, a0, 1b + jr ra SYM_FUNC_END(copy_page) EXPORT_SYMBOL(copy_page) diff --git a/arch/loongarch/mm/tlbex.S b/arch/loongarch/mm/tlbex.S index 9ca1e3ff1ded..de19fa2d7f0d 100644 --- a/arch/loongarch/mm/tlbex.S +++ b/arch/loongarch/mm/tlbex.S @@ -18,7 +18,7 @@ REG_S a2, sp, PT_BVADDR li.w a1, \write la.abs t0, do_page_fault - jirl ra, t0, 0 + jirl ra, t0, 0 RESTORE_ALL_AND_RET SYM_FUNC_END(tlb_do_page_fault_\write) .endm @@ -34,7 +34,7 @@ SYM_FUNC_START(handle_tlb_protect) csrrd a2, LOONGARCH_CSR_BADV REG_S a2, sp, PT_BVADDR la.abs t0, do_page_fault - jirl ra, t0, 0 + jirl ra, t0, 0 RESTORE_ALL_AND_RET SYM_FUNC_END(handle_tlb_protect) @@ -151,8 +151,8 @@ tlb_huge_update_load: st.d t0, t1, 0 #endif addu16i.d t1, zero, -(CSR_TLBIDX_EHINV >> 16) - addi.d ra, t1, 0 - csrxchg ra, t1, LOONGARCH_CSR_TLBIDX + addi.d ra, t1, 0 + csrxchg ra, t1, LOONGARCH_CSR_TLBIDX tlbwr csrxchg zero, t1, LOONGARCH_CSR_TLBIDX @@ -319,8 +319,8 @@ tlb_huge_update_store: st.d t0, t1, 0 #endif addu16i.d t1, zero, -(CSR_TLBIDX_EHINV >> 16) - addi.d ra, t1, 0 - csrxchg ra, t1, LOONGARCH_CSR_TLBIDX + addi.d ra, t1, 0 + csrxchg ra, t1, LOONGARCH_CSR_TLBIDX tlbwr csrxchg zero, t1, LOONGARCH_CSR_TLBIDX @@ -454,7 +454,7 @@ leave_modify: ertn #ifdef CONFIG_64BIT vmalloc_modify: - la.abs t1, swapper_pg_dir + la.abs t1, swapper_pg_dir b vmalloc_done_modify #endif @@ -512,14 +512,14 @@ tlb_huge_update_modify: /* Set huge page tlb entry size */ addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) addu16i.d t1, zero, (PS_HUGE_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) - csrxchg t1, t0, LOONGARCH_CSR_TLBIDX + csrxchg t1, t0, LOONGARCH_CSR_TLBIDX tlbwr /* Reset default page size */ addu16i.d t0, zero, (CSR_TLBIDX_PS >> 16) addu16i.d t1, zero, (PS_DEFAULT_SIZE << (CSR_TLBIDX_PS_SHIFT - 16)) - csrxchg t1, t0, LOONGARCH_CSR_TLBIDX + csrxchg t1, t0, LOONGARCH_CSR_TLBIDX nopage_tlb_modify: dbar 0 From ab6e57a69df515cc9231b578de5b820f9ba3d0be Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Tue, 26 Jul 2022 23:57:15 +0800 Subject: [PATCH 108/121] LoongArch: Remove several syntactic sugar macros for branches These syntactic sugars have been supported by upstream binutils from the beginning, so no need to patch them locally. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/asmmacro.h | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/arch/loongarch/include/asm/asmmacro.h b/arch/loongarch/include/asm/asmmacro.h index a1a04083bd67..be037a40580d 100644 --- a/arch/loongarch/include/asm/asmmacro.h +++ b/arch/loongarch/include/asm/asmmacro.h @@ -274,16 +274,4 @@ nor \dst, \src, zero .endm -.macro bgt r0 r1 label - blt \r1, \r0, \label -.endm - -.macro bltz r0 label - blt \r0, zero, \label -.endm - -.macro bgez r0 label - bge \r0, zero, \label -.endm - #endif /* _ASM_ASMMACRO_H */ From f62b7626cb79dfbfe292145b7ebeee4dc63c9499 Mon Sep 17 00:00:00 2001 From: Jun Yi Date: Thu, 21 Jul 2022 19:10:49 +0800 Subject: [PATCH 109/121] LoongArch: Remove useless header compiler.h The content of LoongArch's compiler.h is trivial, with some unused anywhere, so inline the definitions and remove the header. Signed-off-by: Jun Yi Signed-off-by: Huacai Chen --- arch/loongarch/Kconfig | 1 - arch/loongarch/include/asm/atomic.h | 13 ++++--------- arch/loongarch/include/asm/compiler.h | 15 --------------- arch/loongarch/include/asm/futex.h | 5 ++--- arch/loongarch/include/asm/irqflags.h | 1 - arch/loongarch/include/asm/local.h | 1 - arch/loongarch/kernel/reset.c | 1 - arch/loongarch/lib/delay.c | 1 - 8 files changed, 6 insertions(+), 32 deletions(-) delete mode 100644 arch/loongarch/include/asm/compiler.h diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index b57daee98b89..62b5b07fa4e1 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -69,7 +69,6 @@ config LOONGARCH select GENERIC_TIME_VSYSCALL select GPIOLIB select HAVE_ARCH_AUDITSYSCALL - select HAVE_ARCH_COMPILER_H select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h index dc2ae4f22c8e..6b9aca9ab6e9 100644 --- a/arch/loongarch/include/asm/atomic.h +++ b/arch/loongarch/include/asm/atomic.h @@ -10,7 +10,6 @@ #include #include #include -#include #if __SIZEOF_LONG__ == 4 #define __LL "ll.w " @@ -163,8 +162,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB - : "=&r" (result), "=&r" (temp), - "+" GCC_OFF_SMALL_ASM() (v->counter) + : "=&r" (result), "=&r" (temp), "+ZC" (v->counter) : "I" (-i)); } else { __asm__ __volatile__( @@ -176,8 +174,7 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB - : "=&r" (result), "=&r" (temp), - "+" GCC_OFF_SMALL_ASM() (v->counter) + : "=&r" (result), "=&r" (temp), "+ZC" (v->counter) : "r" (i)); } @@ -326,8 +323,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB - : "=&r" (result), "=&r" (temp), - "+" GCC_OFF_SMALL_ASM() (v->counter) + : "=&r" (result), "=&r" (temp), "+ZC" (v->counter) : "I" (-i)); } else { __asm__ __volatile__( @@ -339,8 +335,7 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v) " beqz %1, 1b \n" "2: \n" __WEAK_LLSC_MB - : "=&r" (result), "=&r" (temp), - "+" GCC_OFF_SMALL_ASM() (v->counter) + : "=&r" (result), "=&r" (temp), "+ZC" (v->counter) : "r" (i)); } diff --git a/arch/loongarch/include/asm/compiler.h b/arch/loongarch/include/asm/compiler.h deleted file mode 100644 index 657cebe70ace..000000000000 --- a/arch/loongarch/include/asm/compiler.h +++ /dev/null @@ -1,15 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright (C) 2020-2022 Loongson Technology Corporation Limited - */ -#ifndef _ASM_COMPILER_H -#define _ASM_COMPILER_H - -#define GCC_OFF_SMALL_ASM() "ZC" - -#define LOONGARCH_ISA_LEVEL "loongarch" -#define LOONGARCH_ISA_ARCH_LEVEL "arch=loongarch" -#define LOONGARCH_ISA_LEVEL_RAW loongarch -#define LOONGARCH_ISA_ARCH_LEVEL_RAW LOONGARCH_ISA_LEVEL_RAW - -#endif /* _ASM_COMPILER_H */ diff --git a/arch/loongarch/include/asm/futex.h b/arch/loongarch/include/asm/futex.h index 837659335fb1..feb6658c84ff 100644 --- a/arch/loongarch/include/asm/futex.h +++ b/arch/loongarch/include/asm/futex.h @@ -8,7 +8,6 @@ #include #include #include -#include #include #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ @@ -95,8 +94,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newv " "__UA_ADDR "\t1b, 4b \n" " "__UA_ADDR "\t2b, 4b \n" " .previous \n" - : "+r" (ret), "=&r" (val), "=" GCC_OFF_SMALL_ASM() (*uaddr) - : GCC_OFF_SMALL_ASM() (*uaddr), "Jr" (oldval), "Jr" (newval), + : "+r" (ret), "=&r" (val), "=ZC" (*uaddr) + : "ZC" (*uaddr), "Jr" (oldval), "Jr" (newval), "i" (-EFAULT) : "memory", "t0"); diff --git a/arch/loongarch/include/asm/irqflags.h b/arch/loongarch/include/asm/irqflags.h index 52121cd791fe..319a8c616f1f 100644 --- a/arch/loongarch/include/asm/irqflags.h +++ b/arch/loongarch/include/asm/irqflags.h @@ -9,7 +9,6 @@ #include #include -#include #include static inline void arch_local_irq_enable(void) diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h index 2052a2267337..65fbbae9fc4d 100644 --- a/arch/loongarch/include/asm/local.h +++ b/arch/loongarch/include/asm/local.h @@ -9,7 +9,6 @@ #include #include #include -#include typedef struct { atomic_long_t a; diff --git a/arch/loongarch/kernel/reset.c b/arch/loongarch/kernel/reset.c index 2b86469e4718..800c965a17ea 100644 --- a/arch/loongarch/kernel/reset.c +++ b/arch/loongarch/kernel/reset.c @@ -13,7 +13,6 @@ #include #include -#include #include #include #include diff --git a/arch/loongarch/lib/delay.c b/arch/loongarch/lib/delay.c index 5d856694fcfe..831d4761f385 100644 --- a/arch/loongarch/lib/delay.c +++ b/arch/loongarch/lib/delay.c @@ -7,7 +7,6 @@ #include #include -#include #include void __delay(unsigned long cycles) From 71610ab1d017e131a9888ef8acd035284fb0e1dd Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Wed, 20 Jul 2022 15:21:51 +0800 Subject: [PATCH 110/121] LoongArch: Remove clock setting during cpu hotplug stage On physical machine we can save power by disabling clock of hot removed cpu. However as different platforms require different methods to configure clocks, the code is platform-specific, and probably belongs to firmware/pmu or cpu regulator, rather than generic arch/loongarch code. Also, there is no such register on QEMU virt machine since the clock/frequency regulation is not emulated. This patch removes the hard-coded clock register accesses in generic LoongArch cpu hotplug flow. Reviewed-by: WANG Xuerui Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kernel/smp.c | 113 +++++------------------------------- include/linux/cpuhotplug.h | 1 - 2 files changed, 13 insertions(+), 101 deletions(-) diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index 73cec62504fb..09743103d9b3 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -278,116 +278,29 @@ void loongson3_cpu_die(unsigned int cpu) mb(); } -/* - * The target CPU should go to XKPRANGE (uncached area) and flush - * ICache/DCache/VCache before the control CPU can safely disable its clock. - */ -static void loongson3_play_dead(int *state_addr) +void play_dead(void) { - register int val; - register void *addr; + register uint64_t addr; register void (*init_fn)(void); - __asm__ __volatile__( - " li.d %[addr], 0x8000000000000000\n" - "1: cacop 0x8, %[addr], 0 \n" /* flush ICache */ - " cacop 0x8, %[addr], 1 \n" - " cacop 0x8, %[addr], 2 \n" - " cacop 0x8, %[addr], 3 \n" - " cacop 0x9, %[addr], 0 \n" /* flush DCache */ - " cacop 0x9, %[addr], 1 \n" - " cacop 0x9, %[addr], 2 \n" - " cacop 0x9, %[addr], 3 \n" - " addi.w %[sets], %[sets], -1 \n" - " addi.d %[addr], %[addr], 0x40 \n" - " bnez %[sets], 1b \n" - " li.d %[addr], 0x8000000000000000\n" - "2: cacop 0xa, %[addr], 0 \n" /* flush VCache */ - " cacop 0xa, %[addr], 1 \n" - " cacop 0xa, %[addr], 2 \n" - " cacop 0xa, %[addr], 3 \n" - " cacop 0xa, %[addr], 4 \n" - " cacop 0xa, %[addr], 5 \n" - " cacop 0xa, %[addr], 6 \n" - " cacop 0xa, %[addr], 7 \n" - " cacop 0xa, %[addr], 8 \n" - " cacop 0xa, %[addr], 9 \n" - " cacop 0xa, %[addr], 10 \n" - " cacop 0xa, %[addr], 11 \n" - " cacop 0xa, %[addr], 12 \n" - " cacop 0xa, %[addr], 13 \n" - " cacop 0xa, %[addr], 14 \n" - " cacop 0xa, %[addr], 15 \n" - " addi.w %[vsets], %[vsets], -1 \n" - " addi.d %[addr], %[addr], 0x40 \n" - " bnez %[vsets], 2b \n" - " li.w %[val], 0x7 \n" /* *state_addr = CPU_DEAD; */ - " st.w %[val], %[state_addr], 0 \n" - " dbar 0 \n" - " cacop 0x11, %[state_addr], 0 \n" /* flush entry of *state_addr */ - : [addr] "=&r" (addr), [val] "=&r" (val) - : [state_addr] "r" (state_addr), - [sets] "r" (cpu_data[smp_processor_id()].dcache.sets), - [vsets] "r" (cpu_data[smp_processor_id()].vcache.sets)); - + idle_task_exit(); local_irq_enable(); - change_csr_ecfg(ECFG0_IM, ECFGF_IPI); + set_csr_ecfg(ECFGF_IPI); + __this_cpu_write(cpu_state, CPU_DEAD); - __asm__ __volatile__( - " idle 0 \n" - " li.w $t0, 0x1020 \n" - " iocsrrd.d %[init_fn], $t0 \n" /* Get init PC */ - : [init_fn] "=&r" (addr) - : /* No Input */ - : "a0"); - init_fn = __va(addr); + __smp_mb(); + do { + __asm__ __volatile__("idle 0\n\t"); + addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); + } while (addr == 0); + + init_fn = (void *)TO_CACHE(addr); + iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR); init_fn(); unreachable(); } -void play_dead(void) -{ - int *state_addr; - unsigned int cpu = smp_processor_id(); - void (*play_dead_uncached)(int *s); - - idle_task_exit(); - play_dead_uncached = (void *)TO_UNCACHE(__pa((unsigned long)loongson3_play_dead)); - state_addr = &per_cpu(cpu_state, cpu); - mb(); - play_dead_uncached(state_addr); -} - -static int loongson3_enable_clock(unsigned int cpu) -{ - uint64_t core_id = cpu_data[cpu].core; - uint64_t package_id = cpu_data[cpu].package; - - LOONGSON_FREQCTRL(package_id) |= 1 << (core_id * 4 + 3); - - return 0; -} - -static int loongson3_disable_clock(unsigned int cpu) -{ - uint64_t core_id = cpu_data[cpu].core; - uint64_t package_id = cpu_data[cpu].package; - - LOONGSON_FREQCTRL(package_id) &= ~(1 << (core_id * 4 + 3)); - - return 0; -} - -static int register_loongson3_notifier(void) -{ - return cpuhp_setup_state_nocalls(CPUHP_LOONGARCH_SOC_PREPARE, - "loongarch/loongson:prepare", - loongson3_enable_clock, - loongson3_disable_clock); -} -early_initcall(register_loongson3_notifier); - #endif /* diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 19f0dbfdd7fe..b66c5f389159 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -130,7 +130,6 @@ enum cpuhp_state { CPUHP_ZCOMP_PREPARE, CPUHP_TIMERS_PREPARE, CPUHP_MIPS_SOC_PREPARE, - CPUHP_LOONGARCH_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20, CPUHP_BRINGUP_CPU, From 3a3a4f7a65e3ff7ad395afc8c41ac317c8667546 Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Wed, 20 Jul 2022 15:21:52 +0800 Subject: [PATCH 111/121] LoongArch: Remove unused variables There are some variables never used or referenced, this patch removes these varaibles and make the code cleaner. Reviewed-by: WANG Xuerui Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/loongson.h | 12 ------------ arch/loongarch/kernel/env.c | 20 -------------------- 2 files changed, 32 deletions(-) diff --git a/arch/loongarch/include/asm/loongson.h b/arch/loongarch/include/asm/loongson.h index 8522afafc24e..6e8f6972ceb6 100644 --- a/arch/loongarch/include/asm/loongson.h +++ b/arch/loongarch/include/asm/loongson.h @@ -39,18 +39,6 @@ extern const struct plat_smp_ops loongson3_smp_ops; #define MAX_PACKAGES 16 -/* Chip Config register of each physical cpu package */ -extern u64 loongson_chipcfg[MAX_PACKAGES]; -#define LOONGSON_CHIPCFG(id) (*(volatile u32 *)(loongson_chipcfg[id])) - -/* Chip Temperature register of each physical cpu package */ -extern u64 loongson_chiptemp[MAX_PACKAGES]; -#define LOONGSON_CHIPTEMP(id) (*(volatile u32 *)(loongson_chiptemp[id])) - -/* Freq Control register of each physical cpu package */ -extern u64 loongson_freqctrl[MAX_PACKAGES]; -#define LOONGSON_FREQCTRL(id) (*(volatile u32 *)(loongson_freqctrl[id])) - #define xconf_readl(addr) readl(addr) #define xconf_readq(addr) readq(addr) diff --git a/arch/loongarch/kernel/env.c b/arch/loongarch/kernel/env.c index 467946ecf451..82b478a5c665 100644 --- a/arch/loongarch/kernel/env.c +++ b/arch/loongarch/kernel/env.c @@ -17,21 +17,6 @@ u64 efi_system_table; struct loongson_system_configuration loongson_sysconf; EXPORT_SYMBOL(loongson_sysconf); -u64 loongson_chipcfg[MAX_PACKAGES]; -u64 loongson_chiptemp[MAX_PACKAGES]; -u64 loongson_freqctrl[MAX_PACKAGES]; -unsigned long long smp_group[MAX_PACKAGES]; - -static void __init register_addrs_set(u64 *registers, const u64 addr, int num) -{ - u64 i; - - for (i = 0; i < num; i++) { - *registers = (i << 44) | addr; - registers++; - } -} - void __init init_environ(void) { int efi_boot = fw_arg0; @@ -50,11 +35,6 @@ void __init init_environ(void) efi_memmap_init_early(&data); memblock_reserve(data.phys_map & PAGE_MASK, PAGE_ALIGN(data.size + (data.phys_map & ~PAGE_MASK))); - - register_addrs_set(smp_group, TO_UNCACHE(0x1fe01000), 16); - register_addrs_set(loongson_chipcfg, TO_UNCACHE(0x1fe00180), 16); - register_addrs_set(loongson_chiptemp, TO_UNCACHE(0x1fe0019c), 16); - register_addrs_set(loongson_freqctrl, TO_UNCACHE(0x1fe001d0), 16); } static int __init init_cpu_fullname(void) From 317980e6b4d03884429f2cdaf51efd28f01b71b0 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 26 Jul 2022 20:43:11 +0800 Subject: [PATCH 112/121] LoongArch: Disable executable stack by default Disable executable stack for LoongArch by default, as all modern architectures do. Reported-by: Andreas Schwab Suggested-by: WANG Xuerui Link: https://sourceware.org/pipermail/binutils/2022-July/121992.html Tested-by: WANG Xuerui Tested-by: Xi Ruoyao Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/elf.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h index f3960b18a90e..5f3ff4781fda 100644 --- a/arch/loongarch/include/asm/elf.h +++ b/arch/loongarch/include/asm/elf.h @@ -288,8 +288,6 @@ struct arch_elf_state { .interp_fp_abi = LOONGARCH_ABI_FP_ANY, \ } -#define elf_read_implies_exec(ex, exec_stk) (exec_stk == EXSTACK_DEFAULT) - extern int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *elf, bool is_interp, struct arch_elf_state *state); From 1aea29d7c3569e5b6c40e73c51e9f4b2142c96ef Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Wed, 13 Jul 2022 18:00:41 +0800 Subject: [PATCH 113/121] LoongArch: Fix shared cache size calculation Current calculation of shared cache size is from the node (die) scope, but we hope 'lscpu' to show the shared cache size of the whole package for multi-die chips (e.g., Loongson-3C5000L, which contains 4 dies in one package). So fix it by multiplying nodes_per_package. Signed-off-by: Huacai Chen --- arch/loongarch/kernel/cacheinfo.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/loongarch/kernel/cacheinfo.c b/arch/loongarch/kernel/cacheinfo.c index b38f5489d094..4662b06269f4 100644 --- a/arch/loongarch/kernel/cacheinfo.c +++ b/arch/loongarch/kernel/cacheinfo.c @@ -4,8 +4,9 @@ * * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ -#include #include +#include +#include /* Populates leaf and increments to next leaf */ #define populate_cache(cache, leaf, c_level, c_type) \ @@ -17,6 +18,8 @@ do { \ leaf->ways_of_associativity = c->cache.ways; \ leaf->size = c->cache.linesz * c->cache.sets * \ c->cache.ways; \ + if (leaf->level > 2) \ + leaf->size *= nodes_per_package; \ leaf++; \ } while (0) @@ -95,11 +98,15 @@ static void cache_cpumap_setup(unsigned int cpu) int populate_cache_leaves(unsigned int cpu) { - int level = 1; + int level = 1, nodes_per_package = 1; struct cpuinfo_loongarch *c = ¤t_cpu_data; struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); struct cacheinfo *this_leaf = this_cpu_ci->info_list; + if (loongson_sysconf.nr_nodes > 1) + nodes_per_package = loongson_sysconf.cores_per_package + / loongson_sysconf.cores_per_node; + if (c->icache.waysize) { populate_cache(dcache, this_leaf, level, CACHE_TYPE_DATA); populate_cache(icache, this_leaf, level++, CACHE_TYPE_INST); From b0f3bdc00240fc9d7bf0f2a076943122d168c95e Mon Sep 17 00:00:00 2001 From: Qi Hu Date: Thu, 14 Jul 2022 14:25:50 +0800 Subject: [PATCH 114/121] LoongArch: Fix missing fcsr in ptrace's fpr_set In file ptrace.c, function fpr_set does not copy fcsr data from ubuf to kbuf. That's the reason why fcsr cannot be modified by ptrace. This patch fixs this problem and allows users using ptrace to modify the fcsr. Co-developed-by: Xu Li Signed-off-by: Qi Hu Signed-off-by: Huacai Chen --- arch/loongarch/kernel/ptrace.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c index e6ab87948e1d..dc2b82ea894c 100644 --- a/arch/loongarch/kernel/ptrace.c +++ b/arch/loongarch/kernel/ptrace.c @@ -193,7 +193,7 @@ static int fpr_set(struct task_struct *target, const void *kbuf, const void __user *ubuf) { const int fcc_start = NUM_FPU_REGS * sizeof(elf_fpreg_t); - const int fcc_end = fcc_start + sizeof(u64); + const int fcsr_start = fcc_start + sizeof(u64); int err; BUG_ON(count % sizeof(elf_fpreg_t)); @@ -209,10 +209,12 @@ static int fpr_set(struct task_struct *target, if (err) return err; - if (count > 0) - err |= user_regset_copyin(&pos, &count, &kbuf, &ubuf, - &target->thread.fpu.fcc, - fcc_start, fcc_end); + err |= user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.fpu.fcc, fcc_start, + fcc_start + sizeof(u64)); + err |= user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.fpu.fcsr, fcsr_start, + fcsr_start + sizeof(u32)); return err; } From 45b53c9051770c0d9145083a328548745ee2e75b Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Thu, 21 Jul 2022 17:53:01 +0800 Subject: [PATCH 115/121] LoongArch: Fix wrong "ROM Size" of boardinfo We can see the "ROM Size" is different in the following outputs: [root@linux loongson]# cat /sys/firmware/loongson/boardinfo BIOS Information Vendor : Loongson Version : vUDK2018-LoongArch-V2.0.pre-beta8 ROM Size : 63 KB Release Date : 06/15/2022 Board Information Manufacturer : Loongson Board Name : Loongson-LS3A5000-7A1000-1w-A2101 Family : LOONGSON64 [root@linux loongson]# dmidecode | head -11 ... Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Loongson Version: vUDK2018-LoongArch-V2.0.pre-beta8 Release Date: 06/15/2022 ROM Size: 4 MB According to "BIOS Information (Type 0) structure" in the SMBIOS Reference Specification [1], it shows 64K * (n+1) is the size of the physical device containing the BIOS if the size is less than 16M. Additionally, we can see the related code in dmidecode [2]: u64 s = { .l = (code1 + 1) << 6 }; So the output of dmidecode is correct, the output of boardinfo is wrong, fix it. By the way, at present no need to consider the size is 16M or greater on LoongArch, because it is usually 4M or 8M which is enough to use. [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.6.0.pdf [2] https://git.savannah.nongnu.org/cgit/dmidecode.git/tree/dmidecode.c#n347 Fixes: 628c3bb40e9a ("LoongArch: Add boot and setup routines") Reviewed-by: WANG Xuerui Signed-off-by: Tiezhu Yang Signed-off-by: Huacai Chen --- arch/loongarch/kernel/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index c74860b53375..8f5c2f9a1a83 100644 --- a/arch/loongarch/kernel/setup.c +++ b/arch/loongarch/kernel/setup.c @@ -126,7 +126,7 @@ static void __init parse_bios_table(const struct dmi_header *dm) char *dmi_data = (char *)dm; bios_extern = *(dmi_data + SMBIOS_BIOSEXTERN_OFFSET); - b_info.bios_size = *(dmi_data + SMBIOS_BIOSSIZE_OFFSET); + b_info.bios_size = (*(dmi_data + SMBIOS_BIOSSIZE_OFFSET) + 1) << 6; if (bios_extern & LOONGSON_EFI_ENABLE) set_bit(EFI_BOOT, &efi.flags); From 46a4d679ef88285ea17c3e1e4fed330be2044f21 Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Fri, 29 Jul 2022 17:44:38 +0800 Subject: [PATCH 116/121] workqueue: Avoid a false warning in unbind_workers() Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible fails and trigger the false warning. Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs. It is very easy to trigger the warning: Set wq_unbound_cpumask to a small set of CPUs. Offline all the CPUs of wq_unbound_cpumask. Offline an extra CPU and trigger the warning. Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo --- kernel/workqueue.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1ea50f6be843..aa8a82bc6738 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5001,7 +5001,10 @@ static void unbind_workers(int cpu) for_each_pool_worker(worker, pool) { kthread_set_per_cpu(worker->task, -1); - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0); + if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask)) + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0); + else + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0); } mutex_unlock(&wq_pool_attach_mutex); From 9282012fc0aa248b77a69f5eb802b67c5a16bb13 Mon Sep 17 00:00:00 2001 From: Jaewon Kim Date: Mon, 25 Jul 2022 18:52:12 +0900 Subject: [PATCH 117/121] page_alloc: fix invalid watermark check on a negative value There was a report that a task is waiting at the throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was increasing. This is a bug where zone_watermark_fast returns true even when the free is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") changed the watermark fast to consider highatomic reserve. But it did not handle a negative value case which can be happened when reserved_highatomic pageblock is bigger than the actual free. If watermark is considered as ok for the negative value, allocating contexts for order-0 will consume all free pages without direct reclaim, and finally free page may become depleted except highatomic free. Then allocating contexts may fall into throttle_direct_reclaim. This symptom may easily happen in a system where wmark min is low and other reclaimers like kswapd does not make free pages quickly. Handle the negative case by using MIN. Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: Jaewon Kim Reported-by: GyeongHwan Hong Acked-by: Mel Gorman Cc: Minchan Kim Cc: Baoquan He Cc: Vlastimil Babka Cc: Johannes Weiner Cc: Michal Hocko Cc: Yong-Taek Lee Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e008a3df0485..b5b14b78c4fd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3968,11 +3968,15 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, * need to be calculated. */ if (!order) { - long fast_free; + long usable_free; + long reserved; - fast_free = free_pages; - fast_free -= __zone_watermark_unusable_free(z, 0, alloc_flags); - if (fast_free > mark + z->lowmem_reserve[highest_zoneidx]) + usable_free = free_pages; + reserved = __zone_watermark_unusable_free(z, 0, alloc_flags); + + /* reserved may over estimate high-atomic reserves. */ + usable_free -= min(usable_free, reserved); + if (usable_free > mark + z->lowmem_reserve[highest_zoneidx]) return true; } From 8a295dbbaf7292c582a40ce469c326f472d51f66 Mon Sep 17 00:00:00 2001 From: Ralph Campbell Date: Mon, 25 Jul 2022 11:36:14 -0700 Subject: [PATCH 118/121] mm/hmm: fault non-owner device private entries If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a device private PTE is found, the hmm_range::dev_private_owner page is used to determine if the device private page should not be faulted in. However, if the device private page is not owned by the caller, hmm_range_fault() returns an error instead of calling migrate_to_ram() to fault in the page. For example, if a page is migrated to GPU private memory and a RDMA fault capable NIC tries to read the migrated page, without this patch it will get an error. With this patch, the page will be migrated back to system memory and the NIC will be able to read the data. Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com Fixes: 08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()") Signed-off-by: Ralph Campbell Reported-by: Felix Kuehling Reviewed-by: Alistair Popple Cc: Philip Yang Cc: Jason Gunthorpe Cc: Signed-off-by: Andrew Morton --- mm/hmm.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 3fd3242c5e50..f2aa63b94d9b 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -212,14 +212,6 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, unsigned long end, unsigned long hmm_pfns[], pmd_t pmd); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool hmm_is_device_private_entry(struct hmm_range *range, - swp_entry_t entry) -{ - return is_device_private_entry(entry) && - pfn_swap_entry_to_page(entry)->pgmap->owner == - range->dev_private_owner; -} - static inline unsigned long pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte) { @@ -252,10 +244,12 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, swp_entry_t entry = pte_to_swp_entry(pte); /* - * Never fault in device private pages, but just report - * the PFN even if not present. + * Don't fault in device private pages owned by the caller, + * just report the PFN. */ - if (hmm_is_device_private_entry(range, entry)) { + if (is_device_private_entry(entry) && + pfn_swap_entry_to_page(entry)->pgmap->owner == + range->dev_private_owner) { cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |= HMM_PFN_WRITE; @@ -273,6 +267,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, if (!non_swap_entry(entry)) goto fault; + if (is_device_private_entry(entry)) + goto fault; + if (is_device_exclusive_entry(entry)) goto fault; From ea304a8b89fd0d6cf94ee30cb139dc23d9f1a62f Mon Sep 17 00:00:00 2001 From: Eiichi Tsukata Date: Thu, 28 Jul 2022 04:39:07 +0000 Subject: [PATCH 119/121] docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt" with the respective retbleed= settings. Signed-off-by: Eiichi Tsukata Signed-off-by: Borislav Petkov Cc: corbet@lwn.net Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c0fdb04a0435..cc3ea8febc62 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3176,6 +3176,7 @@ no_entry_flush [PPC] no_uaccess_flush [PPC] mmio_stale_data=off [X86] + retbleed=off [X86] Exceptions: This does not have any effect on @@ -3198,6 +3199,7 @@ mds=full,nosmt [X86] tsx_async_abort=full,nosmt [X86] mmio_stale_data=full,nosmt [X86] + retbleed=auto,nosmt [X86] mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this From 6eebd5fb20838f5971ba17df9f55cc4f84a31053 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Wed, 22 Jun 2022 16:04:19 -0400 Subject: [PATCH 120/121] locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"), the writer that sets the handoff bit can be interrupted out without clearing the bit if the wait queue isn't empty. This disables reader and writer optimistic lock spinning and stealing. Now if a non-first writer in the queue is somehow woken up or a new waiter enters the slowpath, it can't acquire the lock. This is not the case before commit d257cc8cb8d5 as the writer that set the handoff bit will clear it when exiting out via the out_nolock path. This is less efficient as the busy rwsem stays in an unlock state for a longer time. In some cases, this new behavior may cause lockups as shown in [1] and [2]. This patch allows a non-first writer to ignore the handoff bit if it is not originally set or initiated by the first waiter. This patch is shown to be effective in fixing the lockup problem reported in [1]. [1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/ [2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/ Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Acked-by: John Donnelly Tested-by: Mel Gorman Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com --- kernel/locking/rwsem.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 9d1db4a54d34..65f0262f635e 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -335,8 +335,6 @@ struct rwsem_waiter { struct task_struct *task; enum rwsem_waiter_type type; unsigned long timeout; - - /* Writer only, not initialized in reader */ bool handoff_set; }; #define rwsem_first_waiter(sem) \ @@ -459,10 +457,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, * to give up the lock), request a HANDOFF to * force the issue. */ - if (!(oldcount & RWSEM_FLAG_HANDOFF) && - time_after(jiffies, waiter->timeout)) { - adjustment -= RWSEM_FLAG_HANDOFF; - lockevent_inc(rwsem_rlock_handoff); + if (time_after(jiffies, waiter->timeout)) { + if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + adjustment -= RWSEM_FLAG_HANDOFF; + lockevent_inc(rwsem_rlock_handoff); + } + waiter->handoff_set = true; } atomic_long_add(-adjustment, &sem->count); @@ -599,7 +599,7 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, struct rwsem_waiter *waiter) { - bool first = rwsem_first_waiter(sem) == waiter; + struct rwsem_waiter *first = rwsem_first_waiter(sem); long count, new; lockdep_assert_held(&sem->wait_lock); @@ -609,11 +609,20 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); if (has_handoff) { - if (!first) + /* + * Honor handoff bit and yield only when the first + * waiter is the one that set it. Otherwisee, we + * still try to acquire the rwsem. + */ + if (first->handoff_set && (waiter != first)) return false; - /* First waiter inherits a previously set handoff bit */ - waiter->handoff_set = true; + /* + * First waiter can inherit a previously set handoff + * bit and spin on rwsem if lock acquisition fails. + */ + if (waiter == first) + waiter->handoff_set = true; } new = count; @@ -1027,6 +1036,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; + waiter.handoff_set = false; raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { From 3d7cb6b04c3f3115719235cc6866b10326de34cd Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 31 Jul 2022 14:03:01 -0700 Subject: [PATCH 121/121] Linux 5.19 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b79c1c18149d..df92892325ae 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 19 SUBLEVEL = 0 -EXTRAVERSION = -rc8 +EXTRAVERSION = NAME = Superb Owl # *DOCUMENTATION*