Commit graph

1030219 commits

Author SHA1 Message Date
Nikolay Aleksandrov
58d913a326 net: bridge: multicast: add context support for host-joined groups
Adding bridge multicast context support for host-joined groups is easy
because we only need the proper timer value. We pass the already chosen
context and use its timer value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:34:47 -07:00
Nikolay Aleksandrov
6567cb438a net: bridge: multicast: add mdb context support
Choose the proper bridge multicast context when user-spaces is adding
mdb entries. Currently we require the vlan to be configured on at least
one device (port or bridge) in order to add an mdb entry if vlan
mcast snooping is enabled (vlan snooping implies vlan filtering).
Note that we always allow deleting an entry, regardless of the vlan state.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:34:47 -07:00
Joakim Zhang
dabb5db17c ARM: dts: imx6qdl: move phy properties into phy device node
This patch fixes issues found by dtbs_check:
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml

According to the Micrel PHY dt-binding:
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt,
Add clock delay in an Ethernet OF device node is deprecated, so move
these properties to PHY OF device node.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:31:22 -07:00
Joakim Zhang
649502a337 dt-bindings: net: fsl,fec: improve the binding a bit
This patch improves the yaml a bit according to Rob Herring comments:
1) normalize interrupt-names property, there is no reason to support
random order.
2) validate each string in clock-names property.
3) add constraints for fsl,num-tx-queues/fsl,num-rx-queues property.
4) change additionalProperties to false in order to do strict checking.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:31:22 -07:00
Xin Long
02dc2ee7c7 sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set
Currently, in sctp_packet_config(), sctp_transport_pmtu_check() is
called to update transport pathmtu with dst's mtu when dst's mtu
has been changed by non sctp stack like xfrm.

However, this should only happen when SPP_PMTUD_ENABLE is set, no
matter where dst's mtu changed. This patch is to fix by checking
SPP_PMTUD_ENABLE flag before calling sctp_transport_pmtu_check().

Thanks Jacek for reporting and looking into this issue.

v1->v2:
  - add the missing "{" to fix the build error.

Fixes: 69fec325a6 ('Revert "sctp: remove sctp_transport_pmtu_check"')
Reported-by: Jacek Szafraniec <jacek.szafraniec@nokia.com>
Tested-by: Jacek Szafraniec <jacek.szafraniec@nokia.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:17:58 -07:00
Linus Torvalds
3d5895cd35 s390 updates for 5.14-rc3
- fix / add expoline usage in "DMA" code
 
 - fix compat vdso Makefile to avoid permanent rebuild
 
 - fix ftrace_update_ftrace_func to avoid NULL pointer dereference
 
 - update defconfigs
 
 - trivial coding style fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmD4aBQACgkQIg7DeRsp
 bsJu3Q/9F5roAWqs+L6Us9tCd+rYvUsJYBhYeI7eQDM9eyb860IgJgCSmxj45WGH
 798C0yO01u7DkzN21XOfl8GnnGYSEX7yN1IZkXm7k8rDauxqbk0tc/XsPRJM9wOT
 WckqSuKJrgZYxnEuiMQQgxv3aBYsQBWAG3z1spDE/PlUyfcBKTYrUrR55t8WZgQp
 UPE21ArH8wzx2sycxet5+m3UwttbMxdhN0VY9yEg1tFesVKX0CcjpwhGcX1DUPY1
 +lJRCEjEvryJWg8xBdPXXBeUjV2U5h8035gi2um75IGe5LL+J3uwQlmkGgUXa635
 /oWJ6ST1F7TyJ2wM9Nmcn+MjvwMHs+Sd8vfSgpX040GidNGMlZYL/1494zxQ91XS
 h51JB1b9m/8kP+RAtoYKEpK0k4ELWXWaI5zAh9gsKGisfH25pMOSxsuQIBpTijRu
 o9VDUtRXHkmqoQ3wurxrFzWZLVDneBGv5EOK7SxoKGdkpg1PDqkO+ZLarppMsx9g
 IcLmQwV3mQYQrBttAa061UkgvDxUFhrfUANJraEhI9283aEc8tzc+dRJkNJu48kA
 kHWHhgKqVysKiyppQO2XbidRMp2vVlVHuvB8e0sMbDsC7a7EH5LW2pRq/B+RCB+1
 797j5B9EczsneISU2YNL10JWPMfHeZ8BYnkA5CWRqAbcYdDBAAI=
 =E8AN
 -----END PGP SIGNATURE-----

Merge tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - fix / add expoline usage in "DMA" code

 - fix compat vdso Makefile to avoid permanent rebuild

 - fix ftrace_update_ftrace_func to avoid NULL pointer dereference

 - update defconfigs

 - trivial coding style fix

* tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  s390/cpumf: fix semicolon.cocci warnings
  s390/boot: fix use of expolines in the DMA code
  s390/ftrace: fix ftrace_update_ftrace_func implementation
  s390/defconfig: allow early device mapper disks
  s390/vdso32: add vdso32.lds to targets
2021-07-21 13:51:26 -07:00
Linus Torvalds
7b6ae471e5 spi: Fixes for v5.14
A collection of driver specific fixes, there was a bit of a kerfuffle
 with some last minute review on hte spi-cadence-quadspi division by zero
 change but otherwise nothing terribly remarkable here - important fixes
 if you have the hardware but nothing with too wide an impact.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmD4WjwACgkQJNaLcl1U
 h9CofQf/exao6MVAh2aMFv4A0UjQ4jI/wOWzhDY84ooxtTpcSlmAcPNR/yXt7yvc
 2s7OcDOSRL8uO9agrnLINDzRrB/+Z8N7ra3sUwzzkNEe6YoOcLYW+GvFSYbyqqPQ
 w8Ij6xn05RINQ63WuwwNHNwxlNBcXAT/bkKUkuzAinQi91hehwVQrAgoaNmbDzvI
 B0NhSwVEYvlEsfvmVwOJN4VUDbFor31oE3hNvK685ATRvPEQssbQarloK4OsPajv
 hOKqDNY/UKggSEFSaeN8gExrylhqEsXk1r+p0S8kKfZnyoNkemPa9xK2QD5YjulE
 V9rDs90qjWoA1Rw90e61HvDbtDBgdQ==
 =74y9
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A collection of driver specific fixes, there was a bit of a kerfuffle
  with some last minute review on hte spi-cadence-quadspi division by
  zero change but otherwise nothing terribly remarkable here - important
  fixes if you have the hardware but nothing with too wide an impact"

* tag 'spi-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-bcm2835: Fix deadlock
  spi: cadence: Correct initialisation of runtime PM again
  spi: cadence-quadspi: Disable Auto-HW polling
  spi: spi-cadence-quadspi: Fix division by zero warning
  spi: spi-cadence-quadspi: Revert "Fix division by zero warning"
  spi: spi-cadence-quadspi: Fix division by zero warning
  spi: mediatek: move devm_spi_register_master position
  spi: mediatek: fix fifo rx mode
  spi: atmel: Fix CS and initialization bug
  spi: stm32: fixes pm_runtime calls in probe/remove
  spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay
  spi: stm32h7: fix full duplex irq handler handling
2021-07-21 12:41:41 -07:00
Linus Torvalds
7c3d49b0b5 regulator: Fixes for v5.14
A few driver specific fixes that came in since the merge window, plus a
 change to mark the regulator-fixed-domain DT binding as deprecated in
 order to try to to discourage any new users while a better solution is
 put in place.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmD4WckACgkQJNaLcl1U
 h9DltAgAgxwqrdGPLt/0rGlhpI183L6x81fbLcbvVYwWiyUDqU1rSqZhR+UnxkOf
 3tOX4B+4EJTNpArLRcoXD2uu1KP92AhOwv3uGKsJGibS0OC6AwV7adyK+qkBuZsS
 IJgyI63YfRGP1udWfklZle+CxK6JIRMILbT9oTMGoWrPnK3QfS2rNDapTtpfYRn1
 PTIu0TqMQ6xKHg8XPbtmD4INbrbhxP0H3848g0ZhohGozEwggKSEwB1c55cbQc2J
 I3HpBMVk3sO0UrG6NBpX+fSj0BT4MwjNdFDgmstgwEn/31gSSDjAxaHnfZqWLJqJ
 cW6rlhw0eSqRXAjrqh6WfK5QiojRXw==
 =+jG2
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A few driver specific fixes that came in since the merge window, plus
  a change to mark the regulator-fixed-domain DT binding as deprecated
  in order to try to to discourage any new users while a better solution
  is put in place"

* tag 'regulator-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: hi6421: Fix getting wrong drvdata
  regulator: mtk-dvfsrc: Fix wrong dev pointer for devm_regulator_register
  regulator: fixed: Mark regulator-fixed-domain as deprecated
  regulator: bd9576: Fix testing wrong flag in check_temp_flag_mismatch
  regulator: hi6421v600: Fix getting wrong drvdata that causes boot failure
  regulator: rt5033: Fix n_voltages settings for BUCK and LDO
  regulator: rtmv20: Fix wrong mask for strobe-polarity-high
2021-07-21 12:37:49 -07:00
Linus Torvalds
b4e62aaf95 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmD4LMkACgkQ+7dXa6fL
 C2t4vg//eOjVirl2L9NSwW27btAv3tySeX7zN78i4IIy1KesVTV4An/yNYnZ1XDg
 yackjIIgRaw/pX+W/dWiQalRtAcYno4vFam874lzBq+Yourk9j+7mU7WDtkofRh4
 UtuWAyARGVfLZdkJQp1Eb1YoZj0oro31Qa7eFeHQ/1Wlx6Wo1gdsRAdGfRpukLeC
 fTv/DXrpK+cC36ENZjP3VuhfsOAKIxKVqf7rLKiMBbeyAGW7jRs3fSTxgOJAgJN0
 opj/WTGZf8vskq+u3uTbAv8dwK9CakqB6QEkDq22kZSVh+LSxKzIB6E/3wSXfhAj
 1NAKUVQb4e6caXKk/vxFUhnyq8wWdzWW6ExVzH2DYSdNn442A6eo9g6d6qE7wIGK
 ui0sL1p7ilwV/1aBVA7imYM5bSS3HDpVFCW1+HCZpbVxnARDv87su8iOJGErWpz+
 5fDqifhm+Wrtrhoo05RnsYoygCf5elnz5iUu4MaOYKno1dFQrHBBJhFsv8PAduUq
 nZkr+q8HUFH7EYcUjidJiMR+NSiVpe8bCV4LoaGczY4j/AYc5rOfD1s3kW0tfiqG
 zqx+pGy5AXTwrZqzVf28I5T6cBCW1GxLnFLmSWZ4DsQdNGziYbPUTYDCJcJxfT0U
 cASaEXQSItfxlkVw+JIr+Hb/QXYEfoNEdwWLvUQ2LrPueJwH2V8=
 =Pn/w
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20210721' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:

 - Fix a tracepoint that causes one of the tracing subsystem query files
   to crash if the module is loaded

 - Fix afs_writepages() to take account of whether the storage rpc
   actually succeeded when updating the cyclic writeback counter

 - Fix some error code propagation/handling

 - Fix place where afs_writepages() was setting writeback_index to a
   file position rather than a page index

* tag 'afs-fixes-20210721' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Remove redundant assignment to ret
  afs: Fix setting of writeback_index
  afs: check function return
  afs: Fix tracepoint string placement with built-in AFS
2021-07-21 11:51:59 -07:00
Arnd Bergmann
161dcc0242 net: ixp46x: fix ptp build failure
The rework of the ixp46x cpu detection left the network driver in
a half broken state:

drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init':
drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function)
  290 |                 (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT;
      |                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ethernet/xscale/ptp_ixp46x.c: At top level:
drivers/net/ethernet/xscale/ptp_ixp46x.c:323:1: error: data definition has no type or storage class [-Werror]
  323 | module_init(ptp_ixp_init);

I have patches to complete the transition for a future release, but
for the moment, add the missing include statements to get it to build
again.

Fixes: 09aa9aabdc ("soc: ixp4xx: move cpu detection to linux/soc/ixp4xx/cpu.h")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:10:24 -07:00
Eric Dumazet
240bfd134c tcp: tweak len/truesize ratio for coalesce candidates
tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh
which has a direct impact on advertized window sizes.

We added TCP coalescing in linux-3.4 & linux-3.5:

Instead of storing skbs with one or two MSS in receive queue (or OFO queue),
we try to append segments together to reduce memory overhead.

High performance network drivers tend to cook skb with 3 parts :

1) sk_buff structure (256 bytes)
2) skb->head contains room to copy headers as needed, and skb_shared_info
3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU)

Once coalesced into a previous skb, 1) and 2) are freed.

We can therefore tweak the way we compute len/truesize ratio knowing
that skb->truesize is inflated by 1) and 2) soon to be freed.

This is done only for in-order skb, or skb coalesced into OFO queue.

The result is that low rate flows no longer pay the memory price of having
low GRO aggregation factor. Same result for drivers not using GRO.

This is critical to allow a big enough receiver window,
typically tcp_rmem[2] / 2.

We have been using this at Google for about 5 years, it is due time
to make it upstream.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:05:57 -07:00
Nikolay Aleksandrov
54cb43199e net: bridge: multicast: fix igmp/mld port context null pointer dereferences
With the recent change to use bridge/port multicast context pointers
instead of bridge/port I missed to convert two locations which pass the
port pointer as-is, but with the new model we need to verify the port
context is non-NULL first and retrieve the port from it. The first
location is when doing querier selection when a query is received, the
second location is when leaving a group. The port context will be null
if the packets originated from the bridge device (i.e. from the host).
The fix is simple just check if the port context exists and retrieve
the port pointer from it.

Fixes: adc47037a7 ("net: bridge: multicast: use multicast contexts instead of bridge or port")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:04:03 -07:00
Leon Romanovsky
524df92c19 ionic: drop useless check of PCI driver data validity
The driver core will call to .remove callback only if .probe succeeded
and it will ensure that driver data has pointer to struct ionic.

There is no need to check it again.

Fixes: fbfb803153 ("ionic: Add hardware init and device commands")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:03:07 -07:00
Eric Dumazet
739b2adf99 tcp: avoid indirect call in tcp_new_space()
For tcp sockets, sk->sk_write_space is most probably sk_stream_write_space().

Other sk->sk_write_space() calls in TCP are slow path and do not deserve
any change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:02:11 -07:00
Andy Shevchenko
7f8b20d0de net: wwan: iosm: Switch to use module_pci_driver() macro
Eliminate some boilerplate code by using module_pci_driver() instead of
init/exit, moving the salient bits from init into probe.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 09:01:05 -07:00
Dongliang Mu
dcb713d53e usb: hso: remove the bailout parameter
There are two invocation sites of hso_free_net_device. After
refactoring hso_create_net_device, this parameter is useless.
Remove the bailout in the hso_free_net_device and change the invocation
sites of this function.

Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:59:56 -07:00
Dongliang Mu
788e67f18d usb: hso: fix error handling code of hso_create_net_device
The current error handling code of hso_create_net_device is
hso_free_net_device, no matter which errors lead to. For example,
WARNING in hso_free_net_device [1].

Fix this by refactoring the error handling code of
hso_create_net_device by handling different errors by different code.

[1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76efe

Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com
Fixes: 5fcfb6d0bf ("hso: fix bailout in error case of probe")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:59:56 -07:00
Sukadev Bhattiprolu
bb55362bd6 ibmvnic: Remove the proper scrq flush
Commit 65d6470d13 ("ibmvnic: clean pending indirect buffs during reset")
intended to remove the call to ibmvnic_tx_scrq_flush() when the
->resetting flag is true and was tested that way. But during the final
rebase to net-next, the hunk got applied to a block few lines below
(which happened to have the same diff context) and the wrong call to
ibmvnic_tx_scrq_flush() got removed.

Fix that by removing the correct ibmvnic_tx_scrq_flush() and restoring
the one that was incorrectly removed.

Fixes: 65d6470d13 ("ibmvnic: clean pending indirect buffs during reset")
Reported-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:57:41 -07:00
Wei Liu
f5a11c69b6 Revert "x86/hyperv: fix logical processor creation"
This reverts commit 450605c28d.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-07-21 15:55:43 +00:00
Piotr Kwapulinski
1050713026 i40e: add support for PTP external synchronization clock
Add support for external synchronization clock via GPIOs.
1PPS signals are handled via the dedicated 3 GPIOs: SDP3_2,
SDP3_3 and GPIO_4.
Previously it was not possible to use the external PTP
synchronization clock.
All possible HW configurations are supported.
	SDP3_2,	SDP3_3,	GPIO_4
	off,	off,	off
	off,	in_A,	off
	off,	out_A,	off
	off,	in_B,	off
	off,	out_B,	off
	in_A,	off,	off
	in_A,	in_B,	off
	in_A,	out_B,	off
	out_A,	off,	off
	out_A,	in_B,	off
	in_B,	off,	off
	in_B,	in_A,	off
	in_B,	out_A,	off
	out_B,	off,	off
	out_B,	in_A,	off
	off,	off,	in_A
	off,	out_A,	in_A
	off,	in_B,	in_A
	off,	out_B,	in_A
	out_A,	off,	in_A
	out_A,	in_B,	in_A
	in_B,	off,	in_A
	in_B,	out_A,	in_A
	out_B,	off,	in_A
	off,	off,	out_A
	off,	in_A,	out_A
	off,	in_B,	out_A
	off,	out_B,	out_A
	in_A,	off,	out_A
	in_A,	in_B,	out_A
	in_A,	out_B,	out_A
	in_B,	off,	out_A
	in_B,	in_A,	out_A
	out_B,	off,	out_A
	out_B,	in_A,	out_A
	off,	off,	in_B
	off,	in_A,	in_B
	off,	out_A,	in_B
	off,	out_B,	in_B
	in_A,	off,	in_B
	in_A,	out_B,	in_B
	out_A,	off,	in_B
	out_B,	off,	in_B
	out_B,	in_A,	in_B
	off,	off,	out_B
	off,	in_A,	out_B
	off,	out_A,	out_B
	off,	in_B,	out_B
	in_A,	off,	out_B
	in_A,	in_B,	out_B
	out_A,	off,	out_B
	out_A,	in_B,	out_B
	in_B,	off,	out_B
	in_B,	in_A,	out_B
	in_B,	out_A,	out_B

Tested with oscilloscope, 1PPS generator and ts2phc.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Ashish K <ashishx.k@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:53:54 -07:00
David S. Miller
3ddaed6b09 Merge branch 'pmtu-esp'
Vadim Fedorenko ays:

====================
Fix PMTU for ESP-in-UDP encapsulation

Bug 213669 uncovered regression in PMTU discovery for UDP-encapsulated
routes and some incorrect usage in udp tunnel fields. This series fixes
problems and also adds such case for selftests

v3:
 - update checking logic to account SCTP use case
v2:
 - remove refactor code that was in first patch
 - move checking logic to __udp{4,6}_lib_err_encap
 - add more tests, especially routed configuration
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:50:52 -07:00
Vadim Fedorenko
ece1278a9b selftests: net: add ESP-in-UDP PMTU test
The case of ESP in UDP encapsulation was not covered before. Add
cases of local changes of MTU and difference on routed path.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:50:35 -07:00
Vadim Fedorenko
9bfce73c89 udp: check encap socket in __udp_lib_err
Commit d26796ae58 ("udp: check udp sock encap_type in __udp_lib_err")
added checks for encapsulated sockets but it broke cases when there is
no implementation of encap_err_lookup for encapsulation, i.e. ESP in
UDP encapsulation. Fix it by calling encap_err_lookup only if socket
implements this method otherwise treat it as legal socket.

Fixes: d26796ae58 ("udp: check udp sock encap_type in __udp_lib_err")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:49:31 -07:00
Xin Long
58acd10092 sctp: update active_key for asoc when old key is being replaced
syzbot reported a call trace:

  BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
  Call Trace:
   sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
   sctp_set_owner_w net/sctp/socket.c:131 [inline]
   sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865
   sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027
   inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821
   sock_sendmsg_nosec net/socket.c:703 [inline]
   sock_sendmsg+0xcf/0x120 net/socket.c:723

This is an use-after-free issue caused by not updating asoc->shkey after
it was replaced in the key list asoc->endpoint_shared_keys, and the old
key was freed.

This patch is to fix by also updating active_key for asoc when old key is
being replaced with a new one. Note that this issue doesn't exist in
sctp_auth_del_key_id(), as it's not allowed to delete the active_key
from the asoc.

Fixes: 1b1e0bc994 ("sctp: add refcnt support for sh_key")
Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:44:44 -07:00
Vadim Fedorenko
ac6627a28d net: ipv4: Consolidate ipv4_mtu and ip_dst_mtu_maybe_forward
Consolidate IPv4 MTU code the same way it is done in IPv6 to have code
aligned in both address families

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:22:03 -07:00
Vadim Fedorenko
427faee167 net: ipv6: introduce ip6_dst_mtu_maybe_forward
Replace ip6_dst_mtu_forward with ip6_dst_mtu_maybe_forward and
reuse this code in ip6_mtu. Actually these two functions were
almost duplicates, this change will simplify the maintaince of
mtu calculation code.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:22:02 -07:00
David S. Miller
7c804e91df Merge branch 'ipv6-ioam'
Justin Iurman says:

====================
Support for the IOAM Pre-allocated Trace with IPv6

v5:
 - Refine types, min/max and default values for new sysctls
 - Introduce a "_wide" sysctl for each "ioam6_id" sysctl
 - Add more validation on headers before processing data
 - RCU for sc <> ns pointers + appropriate accessors
 - Generic Netlink policies are now per op, not per family anymore
 - Address other comments/remarks from Jakub (thanks again)
 - Revert "__packed" to "__attribute__((packed))" for uapi headers
 - Add tests to cover the functionality added, as requested by David Ahern

v4:
 - Address warnings from checkpatch (ignore errors related to unnamed bitfields
   in the first patch)
 - Use of hweight32 (thanks Jakub)
 - Remove inline keyword from static functions in C files and let the compiler
   decide what to do (thanks Jakub)

v3:
 - Fix warning "unused label 'out_unregister_genl'" by adding conditional macro
 - Fix lwtunnel output redirect bug: dst cache useless in this case, use
   orig_output instead

v2:
 - Fix warning with static for __ioam6_fill_trace_data
 - Fix sparse warning with __force when casting __be64 to __be32
 - Fix unchecked dereference when removing IOAM namespaces or schemas
 - exthdrs.c: Don't drop by default (now: ignore) to match the act bits "00"
 - Add control plane support for the inline insertion (lwtunnel)
 - Provide uapi structures
 - Use __net_timestamp if skb->tstamp is empty
 - Add note about the temporary IANA allocation
 - Remove support for "removable" TLVs
 - Remove support for virtual/anonymous tunnel decapsulation

In-situ Operations, Administration, and Maintenance (IOAM) records
operational and telemetry information in a packet while it traverses
a path between two points in an IOAM domain. It is defined in
draft-ietf-ippm-ioam-data [1]. IOAM data fields can be encapsulated
into a variety of protocols. The IPv6 encapsulation is defined in
draft-ietf-ippm-ioam-ipv6-options [2], via extension headers. IOAM
can be used to complement OAM mechanisms based on e.g. ICMP or other
types of probe packets.

This patchset implements support for the Pre-allocated Trace, carried
by a Hop-by-Hop. Therefore, a new IPv6 Hop-by-Hop TLV option is
introduced, see IANA [3]. The three other IOAM options are not included
in this patchset (Incremental Trace, Proof-of-Transit and Edge-to-Edge).
The main idea behind the IOAM Pre-allocated Trace is that a node
pre-allocates some room in packets for IOAM data. Then, each IOAM node
on the path will insert its data. There exist several interesting use-
cases, e.g. Fast failure detection/isolation or Smart service selection.
Another killer use-case is what we have called Cross-Layer Telemetry,
see the demo video on its repository [4], that aims to make the entire
stack (L2/L3 -> L7) visible for distributed tracing tools (e.g. Jaeger),
instead of the current L5 -> L7 limited view. So, basically, this is a
nice feature for the Linux Kernel.

This patchset also provides support for the control plane part, but only for the
inline insertion (host-to-host use case), through lightweight tunnels. Indeed,
for in-transit traffic, the solution is to have an IPv6-in-IPv6 encapsulation,
which brings some difficulties and still requires a little bit of work and
discussion (ie anonymous tunnel decapsulation and multi egress resolution).

- Patch 1: IPv6 IOAM headers definition
- Patch 2: Data plane support for Pre-allocated Trace
- Patch 3: IOAM Generic Netlink API
- Patch 4: Support for IOAM injection with lwtunnels
- Patch 5: Documentation for new IOAM sysctls
- Patch 6: Test for the IOAM insertion with IPv6

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
  [4] https://github.com/iurmanj/cross-layer-telemetry
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:21:39 -07:00
Justin Iurman
968691c777 selftests: net: Test for the IOAM insertion with IPv6
This test evaluates the IOAM insertion for IPv6 by checking the IOAM data
integrity on the receiver.

The topology is formed by 3 nodes: Alpha (sender), Beta (router in-between)
and Gamma (receiver). An IOAM domain is configured from Alpha to Gamma only,
which means not on the reverse path. When Gamma is the destination, Alpha
adds an IOAM option (Pre-allocated Trace) inside a Hop-by-hop and fills the
trace with its own IOAM data. Beta and Gamma also fill the trace. The IOAM
data integrity is checked on Gamma, by comparing with the pre-defined IOAM
configuration (see below).

    +-------------------+            +-------------------+
    |                   |            |                   |
    |    alpha netns    |            |    gamma netns    |
    |                   |            |                   |
    |  +-------------+  |            |  +-------------+  |
    |  |    veth0    |  |            |  |    veth0    |  |
    |  |  db01::2/64 |  |            |  |  db02::2/64 |  |
    |  +-------------+  |            |  +-------------+  |
    |         .         |            |         .         |
    +-------------------+            +-------------------+
              .                                .
              .                                .
              .                                .
    +----------------------------------------------------+
    |         .                                .         |
    |  +-------------+                  +-------------+  |
    |  |    veth0    |                  |    veth1    |  |
    |  |  db01::1/64 | ................ |  db02::1/64 |  |
    |  +-------------+                  +-------------+  |
    |                                                    |
    |                      beta netns                    |
    |                                                    |
    +--------------------------+-------------------------+

~~~~~~~~~~~~~~~~~~~~~~
| IOAM configuration |
~~~~~~~~~~~~~~~~~~~~~~

Alpha
+-----------------------------------------------------------+
| Type                | Value                               |
+-----------------------------------------------------------+
| Node ID             | 1                                   |
+-----------------------------------------------------------+
| Node Wide ID        | 11111111                            |
+-----------------------------------------------------------+
| Ingress ID          | 0xffff (default value)              |
+-----------------------------------------------------------+
| Ingress Wide ID     | 0xffffffff (default value)          |
+-----------------------------------------------------------+
| Egress ID           | 101                                 |
+-----------------------------------------------------------+
| Egress Wide ID      | 101101                              |
+-----------------------------------------------------------+
| Namespace Data      | 0xdeadbee0                          |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf00dc0de                  |
+-----------------------------------------------------------+
| Schema ID           | 777                                 |
+-----------------------------------------------------------+
| Schema Data         | something that will be 4n-aligned   |
+-----------------------------------------------------------+

Note: When Gamma is the destination, Alpha adds an IOAM Pre-allocated Trace
      option inside a Hop-by-hop, where 164 bytes are pre-allocated for the
      trace, with 123 as the IOAM-Namespace and with 0xfff00200 as the trace
      type (= all available options at this time). As a result, and based on
      IOAM configurations here, only both Alpha and Beta should be capable of
      inserting their IOAM data while Gamma won't have enough space and will
      set the overflow bit.

Beta
+-----------------------------------------------------------+
| Type                | Value                               |
+-----------------------------------------------------------+
| Node ID             | 2                                   |
+-----------------------------------------------------------+
| Node Wide ID        | 22222222                            |
+-----------------------------------------------------------+
| Ingress ID          | 201                                 |
+-----------------------------------------------------------+
| Ingress Wide ID     | 201201                              |
+-----------------------------------------------------------+
| Egress ID           | 202                                 |
+-----------------------------------------------------------+
| Egress Wide ID      | 202202                              |
+-----------------------------------------------------------+
| Namespace Data      | 0xdeadbee1                          |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf11dc0de                  |
+-----------------------------------------------------------+
| Schema ID           | 0xffffff (= None)                   |
+-----------------------------------------------------------+
| Schema Data         |                                     |
+-----------------------------------------------------------+

Gamma
+-----------------------------------------------------------+
| Type                | Value                               |
+-----------------------------------------------------------+
| Node ID             | 3                                   |
+-----------------------------------------------------------+
| Node Wide ID        | 33333333                            |
+-----------------------------------------------------------+
| Ingress ID          | 301                                 |
+-----------------------------------------------------------+
| Ingress Wide ID     | 301301                              |
+-----------------------------------------------------------+
| Egress ID           | 0xffff (default value)              |
+-----------------------------------------------------------+
| Egress Wide ID      | 0xffffffff (default value)          |
+-----------------------------------------------------------+
| Namespace Data      | 0xdeadbee2                          |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf22dc0de                  |
+-----------------------------------------------------------+
| Schema ID           | 0xffffff (= None)                   |
+-----------------------------------------------------------+
| Schema Data         |                                     |
+-----------------------------------------------------------+

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:16:11 -07:00
Justin Iurman
de8e80a54c ipv6: ioam: Documentation for new IOAM sysctls
Add documentation for new IOAM sysctls:
 - ioam6_id and ioam6_id_wide: two per-namespace sysctls
 - ioam6_enabled, ioam6_id and ioam6_id_wide: three per-interface sysctls

Example of IOAM configuration based on the following simple topology:

 _____              _____              _____
|     | eth0  eth0 |     | eth1  eth0 |     |
|  A  |.----------.|  B  |.----------.|  C  |
|_____|            |_____|            |_____|

1) Node and interface IDs can be configured for IOAM:

  # IOAM ID of A = 1, IOAM ID of A.eth0 = 11
  (A) sysctl -w net.ipv6.ioam6_id=1
  (A) sysctl -w net.ipv6.conf.eth0.ioam6_id=11

  # IOAM ID of B = 2, IOAM ID of B.eth0 = 21, IOAM ID of B.eth1 = 22
  (B) sysctl -w net.ipv6.ioam6_id=2
  (B) sysctl -w net.ipv6.conf.eth0.ioam6_id=21
  (B) sysctl -w net.ipv6.conf.eth1.ioam6_id=22

  # IOAM ID of C = 3, IOAM ID of C.eth0 = 31
  (C) sysctl -w net.ipv6.ioam6_id=3
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_id=31

  Note that "_wide" IDs equivalents can be configured the same way.

2) Each node can be configured to form an IOAM domain. For instance,
   we allow IOAM from A to C only (not the reverse path), i.e. enable
   IOAM on ingress for B.eth0 and C.eth0:

  (B) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1

3) An IOAM domain (e.g. ID=123) is defined and made known to each node:

  (A) ip ioam namespace add 123
  (B) ip ioam namespace add 123
  (C) ip ioam namespace add 123

4) Finally, an IOAM Pre-allocated Trace can be inserted in traffic sent
   by A when C (e.g. db02::2) is the destination:

  (A) ip -6 route add db02::2/128 encap ioam6 trace type 0x800000 ns 123
      size 12 dev eth0

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Justin Iurman
3edede08ff ipv6: ioam: Support for IOAM injection with lwtunnels
Add support for the IOAM inline insertion (only for the host-to-host use case)
which is per-route configured with lightweight tunnels. The target is iproute2
and the patch is ready. It will be posted as soon as this patchset is merged.
Here is an overview:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace type 0x800000 ns 1 size 12 dev eth0

This example configures an IOAM Pre-allocated Trace option attached to the
fc00::1/128 prefix. The IOAM namespace (ns) is 1, the size of the pre-allocated
trace data block is 12 octets (size) and only the first IOAM data (bit 0:
hop_limit + node id) is included in the trace (type) represented as a bitfield.

The reason why the in-transit (IPv6-in-IPv6 encapsulation) use case is not
implemented is explained on the patchset cover.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Justin Iurman
8c6f6fa677 ipv6: ioam: IOAM Generic Netlink API
Add Generic Netlink commands to allow userspace to configure IOAM
namespaces and schemas. The target is iproute2 and the patch is ready.
It will be posted as soon as this patchset is merged. Here is an overview:

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Justin Iurman
9ee11f0fff ipv6: ioam: Data plane support for Pre-allocated Trace
Implement support for processing the IOAM Pre-allocated Trace with IPv6,
see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option, see IANA [3].

A new per-interface sysctl is introduced. The value is a boolean to accept (=1)
or ignore (=0, by default) IPv6 IOAM options on ingress for an interface:
 - net.ipv6.conf.XXX.ioam6_enabled

Two other sysctls are introduced to define IOAM IDs, represented by an integer.
They are respectively per-namespace and per-interface:
 - net.ipv6.ioam6_id
 - net.ipv6.conf.XXX.ioam6_id

The value of the first one represents the IOAM ID of the node itself (u32; max
and default value = U32_MAX>>8, due to hop limit concatenation) while the other
represents the IOAM ID of an interface (u16; max and default value = U16_MAX).

Each "ioam6_id" sysctl has a "_wide" equivalent:
 - net.ipv6.ioam6_id_wide
 - net.ipv6.conf.XXX.ioam6_id_wide

The value of the first one represents the wide IOAM ID of the node itself (u64;
max and default value = U64_MAX>>8, due to hop limit concatenation) while the
other represents the wide IOAM ID of an interface (u32; max and default value
= U32_MAX).

The use of short and wide equivalents is not exclusive, a deployment could
choose to leverage both. For example, net.ipv6.conf.XXX.ioam6_id (short format)
could be an identifier for a physical interface, whereas
net.ipv6.conf.XXX.ioam6_id_wide (wide format) could be an identifier for a
logical sub-interface. Documentation about new sysctls is provided at the end
of this patchset.

Two relativistic hash tables are used: one for IOAM namespaces, the other for
IOAM schemas. A namespace can only have a single active schema and a schema
can only be attached to a single namespace (1:1 relationship).

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Justin Iurman
db67f219fc uapi: IPv6 IOAM headers definition
This patch provides the IPv6 IOAM option header [1] as well as the IOAM
Trace header [2]. An IOAM option must be 4n-aligned. Here is an overview of
a Hop-by-Hop with an IOAM Trace option:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Option Type  |  Opt Data Len |    Reserved   |   IOAM Type   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Namespace-ID          | NodeLen | Flags | RemainingLen|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                IOAM-Trace-Type                |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
|                                                               |  |
|                         node data [n]                         |  |
|                                                               |  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  D
|                                                               |  a
|                         node data [n-1]                       |  t
|                                                               |  a
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                             ...                               ~  S
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  p
|                                                               |  a
|                         node data [1]                         |  c
|                                                               |  e
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  |
|                                                               |  |
|                         node data [0]                         |  |
|                                                               |  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+

The IOAM option header starts at "Option Type" and ends after "IOAM
Type". The IOAM Trace header starts at "Namespace-ID" and ends after
"IOAM-Trace-Type/Reserved".

IOAM Type: either Pre-allocated Trace (=0), Incremental Trace (=1),
Proof-of-Transit (=2) or Edge-to-Edge (=3). Note that both the
Pre-allocated Trace and the Incremental Trace look the same. The two
others are not implemented.

Namespace-ID: IOAM namespace identifier, not to be confused with network
namespaces. It adds further context to IOAM options and associated data,
and allows devices which are IOAM capable to determine whether IOAM
options must be processed or ignored. It can also be used by an operator
to distinguish different operational domains or to identify different
sets of devices.

NodeLen: Length of data added by each node. It depends on the Trace
Type.

Flags: Only the Overflow (O) flag for now. The O flag is set by a
transit node when there are not enough octets left to record its data.

RemainingLen: Remaining free space to record data.

IOAM-Trace-Type: Bit field where each bit corresponds to a specific kind
of IOAM data. See [2] for a detailed list.

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Vladimir Oltean
71f4f89a03 net: switchdev: recurse into __switchdev_handle_fdb_del_to_device
The difference between __switchdev_handle_fdb_del_to_device and
switchdev_handle_del_to_device is that the former takes an extra
orig_dev argument, while the latter starts with dev == orig_dev.

We should recurse into the variant that does not lose the orig_dev along
the way. This is relevant when deleting FDB entries pointing towards a
bridge (dev changes to the lower interfaces, but orig_dev shouldn't).

The addition helper already recurses properly, just the deletion one
doesn't.

Fixes: 8ca07176ab ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:00:16 -07:00
Vladimir Oltean
94111dfc18 net: switchdev: remove stray semicolon in switchdev_handle_fdb_del_to_device shim
With the semicolon at the end, the compiler sees the shim function as a
declaration and not as a definition, and warns:

'switchdev_handle_fdb_del_to_device' declared 'static' but never defined

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 8ca07176ab ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:00:16 -07:00
Vladimir Oltean
f5621a01c8 net: phy: at803x: finish the phy id checking simplification
The blamed commit was probably not tested on net-next, since it did not
refactor the extra phy id check introduced in commit b856150c80 ("net:
phy: at803x: mask 1000 Base-X link mode").

Fixes: 8887ca5474 ("net: phy: at803x: simplify custom phy id matching")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 07:58:20 -07:00
Sayanta Pattanayak
e9a72f874d r8169: Avoid duplicate sysfs entry creation error
When registering the MDIO bus for a r8169 device, we use the PCI
bus/device specifier as a (seemingly) unique device identifier.
However the very same BDF number can be used on another PCI segment,
which makes the driver fail probing:

[ 27.544136] r8169 0002:07:00.0: enabling device (0000 -> 0003)
[ 27.559734] sysfs: cannot create duplicate filename '/class/mdio_bus/r8169-700'
....
[ 27.684858] libphy: mii_bus r8169-700 failed to register
[ 27.695602] r8169: probe of 0002:07:00.0 failed with error -22

Add the segment number to the device name to make it more unique.

This fixes operation on ARM N1SDP boards, with two boards connected
together to form an SMP system, and all on-board devices showing up
twice, just on different PCI segments. A similar issue would occur on
large systems with many PCI slots and multiple RTL8169 NICs.

Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Signed-off-by: Sayanta Pattanayak <sayanta.pattanayak@arm.com>
[Andre: expand commit message, use pci_domain_nr()]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 07:54:30 -07:00
Russell King (Oracle)
7cefb0b0e9 net: phylink: cleanup ksettings_set
We only need to fiddle about with the supported mask after we have
validated the user's requested parameters. Simplify and streamline the
code by moving the linkmode copy and update of the autoneg bit after
validating the user's request.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 07:52:15 -07:00
Jiapeng Chong
b428081282 afs: Remove redundant assignment to ret
Variable ret is set to -ENOENT and -ENOMEM but this value is never
read as it is overwritten or not used later on, hence it is a
redundant assignment and can be removed.

Cleans up the following clang-analyzer warning:

fs/afs/dir.c:2014:4: warning: Value stored to 'ret' is never read
[clang-analyzer-deadcode.DeadStores].

fs/afs/dir.c:659:2: warning: Value stored to 'ret' is never read
[clang-analyzer-deadcode.DeadStores].

[DH made the following modifications:

 - In afs_rename(), -ENOMEM should be placed in op->error instead of ret,
   rather than the assignment being removed entirely.  afs_put_operation()
   will pick it up from there and return it.

 - If afs_sillyrename() fails, its error code should be placed in op->error
   rather than in ret also.
]

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/1619691492-83866-1-git-send-email-jiapeng.chong@linux.alibaba.com
Link: https://lore.kernel.org/r/162609465444.3133237.7562832521724298900.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162610729052.3408253.17364333638838151299.stgit@warthog.procyon.org.uk/ # v2
2021-07-21 15:11:22 +01:00
David Howells
5a972474cf afs: Fix setting of writeback_index
Fix afs_writepages() to always set mapping->writeback_index to a page index
and not a byte position[1].

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/CAB9dFdvHsLsw7CMnB+4cgciWDSqVjuij4mH3TaXnHQB8sz5rHw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/162610728339.3408253.4604750166391496546.stgit@warthog.procyon.org.uk/ # v2 (no v1)
2021-07-21 15:10:23 +01:00
Tom Rix
afe6949862 afs: check function return
Static analysis reports this problem

write.c:773:29: warning: Assigned value is garbage or undefined
  mapping->writeback_index = next;
                           ^ ~~~~
The call to afs_writepages_region() can return without setting
next.  So check the function return before using next.

Changes:
 ver #2:
   - Need to fix the range_cyclic case also[1].

Fixes: e87b03f583 ("afs: Prepare for use of THPs")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210430155031.3287870-1-trix@redhat.com
Link: https://lore.kernel.org/r/CAB9dFdvHsLsw7CMnB+4cgciWDSqVjuij4mH3TaXnHQB8sz5rHw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/162609464716.3133237.10354897554363093252.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162610727640.3408253.8687445613469681311.stgit@warthog.procyon.org.uk/ # v2
2021-07-21 15:10:23 +01:00
David Howells
6c881ca0b3 afs: Fix tracepoint string placement with built-in AFS
To quote Alexey[1]:

    I was adding custom tracepoint to the kernel, grabbed full F34 kernel
    .config, disabled modules and booted whole shebang as VM kernel.

    Then did

	perf record -a -e ...

    It crashed:

	general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [#1] SMP PTI
	CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ #26
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
	RIP: 0010:t_show+0x22/0xd0

    Then reproducer was narrowed to

	# cat /sys/kernel/tracing/printk_formats

    Original F34 kernel with modules didn't crash.

    So I started to disable options and after disabling AFS everything
    started working again.

    The root cause is that AFS was placing char arrays content into a
    section full of _pointers_ to strings with predictable consequences.

    Non canonical address 435f5346592e4243 is "CB.YFS_" which came from
    CM_NAME macro.

    Steps to reproduce:

	CONFIG_AFS=y
	CONFIG_TRACING=y

	# cat /sys/kernel/tracing/printk_formats

Fix this by the following means:

 (1) Add enum->string translation tables in the event header with the AFS
     and YFS cache/callback manager operations listed by RPC operation ID.

 (2) Modify the afs_cb_call tracepoint to print the string from the
     translation table rather than using the string at the afs_call name
     pointer.

 (3) Switch translation table depending on the service we're being accessed
     as (AFS or YFS) in the tracepoint print clause.  Will this cause
     problems to userspace utilities?

     Note that the symbolic representation of the YFS service ID isn't
     available to this header, so I've put it in as a number.  I'm not sure
     if this is the best way to do this.

 (4) Remove the name wrangling (CM_NAME) macro and put the names directly
     into the afs_call_type structs in cmservice.c.

Fixes: 8e8d7f13b6 ("afs: Add some tracepoints")
Reported-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1]
Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@warthog.procyon.org.uk/ # v1 (repost)
Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@warthog.procyon.org.uk/ # v2
2021-07-21 15:08:35 +01:00
Jonathan Marek
d8a719059b Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge"
This reverts commit c742199a01.

c742199a01 ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge")
breaks arm64 in at least two ways for configurations where PUD or PMD
folding occur:

  1. We no longer install huge-vmap mappings and silently fall back to
     page-granular entries, despite being able to install block entries
     at what is effectively the PGD level.

  2. If the linear map is backed with block mappings, these will now
     silently fail to be created in alloc_init_pud(), causing a panic
     early during boot.

The pgtable selftests caught this, although a fix has not been
forthcoming and Christophe is AWOL at the moment, so just revert the
change for now to get a working -rc3 on which we can queue patches for
5.15.

A simple revert breaks the build for 32-bit PowerPC 8xx machines, which
rely on the default function definitions when the corresponding
page-table levels are folded, since commit a6a8f7c4aa ("powerpc/8xx:
add support for huge pages on VMAP and VMALLOC"), eg:

  powerpc64-linux-ld: mm/vmalloc.o: in function `vunmap_pud_range':
  linux/mm/vmalloc.c:362: undefined reference to `pud_clear_huge'

To avoid that, add stubs for pud_clear_huge() and pmd_clear_huge() in
arch/powerpc/mm/nohash/8xx.c as suggested by Christophe.

Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: c742199a01 ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
[mpe: Fold in 8xx.c changes from Christophe and mention in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linux-arm-kernel/CAMuHMdXShORDox-xxaeUfDW3wx2PeggFSqhVSHVZNKCGK-y_vQ@mail.gmail.com/
Link: https://lore.kernel.org/r/20210717160118.9855-1-jonathan@marek.ca
Link: https://lore.kernel.org/r/87r1fs1762.fsf@mpe.ellerman.id.au
Signed-off-by: Will Deacon <will@kernel.org>
2021-07-21 11:28:09 +01:00
Jean-Philippe Brucker
a7c3acca53 arm64: smccc: Save lr before calling __arm_smccc_sve_check()
Commit cfa7ff959a ("arm64: smccc: Support SMCCC v1.3 SVE register
saving hint") added a call to __arm_smccc_sve_check() which clobbers the
lr (register x30), causing __arm_smccc_hvc() to return to itself and
crash. Save lr on the stack before calling __arm_smccc_sve_check(). Save
the frame pointer (x29) to complete the frame record, and adjust the
offsets used to access stack parameters.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Fixes: cfa7ff959a ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210721071834.69130-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-07-21 11:23:25 +01:00
Markus Boehme
09cfae9f13 ixgbe: Fix packet corruption due to missing DMA sync
When receiving a packet with multiple fragments, hardware may still
touch the first fragment until the entire packet has been received. The
driver therefore keeps the first fragment mapped for DMA until end of
packet has been asserted, and delays its dma_sync call until then.

The driver tries to fit multiple receive buffers on one page. When using
3K receive buffers (e.g. using Jumbo frames and legacy-rx is turned
off/build_skb is being used) on an architecture with 4K pages, the
driver allocates an order 1 compound page and uses one page per receive
buffer. To determine the correct offset for a delayed DMA sync of the
first fragment of a multi-fragment packet, the driver then cannot just
use PAGE_MASK on the DMA address but has to construct a mask based on
the actual size of the backing page.

Using PAGE_MASK in the 3K RX buffer/4K page architecture configuration
will always sync the first page of a compound page. With the SWIOTLB
enabled this can lead to corrupted packets (zeroed out first fragment,
re-used garbage from another packet) and various consequences, such as
slow/stalling data transfers and connection resets. For example, testing
on a link with MTU exceeding 3058 bytes on a host with SWIOTLB enabled
(e.g. "iommu=soft swiotlb=262144,force") TCP transfers quickly fizzle
out without this patch.

Cc: stable@vger.kernel.org
Fixes: 0c5661ecc5 ("ixgbe: fix crash in build_skb Rx code path")
Signed-off-by: Markus Boehme <markubo@amazon.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:58:41 -07:00
David S. Miller
3389d3027f Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-07-20

This series contains updates to e1000e and igc drivers.

Sasha adds initial S0ix support for devices with CSME and adds polling
for exiting of DPG. He sets the PHY to low power idle when in S0ix. He
also adds support for new device IDs for and adds a space to debug
messaging to help with readability for e1000e.

For igc, he ensures that q_vector array is not accessed beyond its
bounds and removes unneeded PHY related checks.

Tree Davies corrects a spelling mistake in e1000e.

Muhammad corrects the value written when there is no TSN offloading
and adjusts timeout value to avoid possible Tx hang for igc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:57:06 -07:00
Joakim Zhang
41667a933c arm64: dts: imx8mp: change interrupt order per dt-binding
This patch changs interrupt order which found by dtbs_check.

$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/nxp,dwmac-imx.yaml
arch/arm64/boot/dts/freescale/imx8mp-evk.dt.yaml: ethernet@30bf0000: interrupt-names:0: 'macirq' was expected
arch/arm64/boot/dts/freescale/imx8mp-evk.dt.yaml: ethernet@30bf0000: interrupt-names:1: 'eth_wake_irq' was expected

According to Documentation/devicetree/bindings/net/snps,dwmac.yaml, we
should list interrupt in it's order.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:56:37 -07:00
Joakim Zhang
03e85b1703 dt-bindings: net: imx-dwmac: convert imx-dwmac bindings to yaml
In order to automate the verification of DT nodes covert imx-dwmac to
nxp,dwmac-imx.yaml, and pass below checking.

$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/nxp,dwmac-imx.yaml
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/nxp,dwmac-imx.yaml

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:56:37 -07:00
Joakim Zhang
bc71d3ef59 dt-bindings: net: snps,dwmac: add missing DWMAC IP version
Add missing DWMAC IP version in snps,dwmac.yaml which found by below
command, as NXP i.MX8 families support SNPS DWMAC 5.10a IP.

$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/nxp,dwmac-imx.yaml
Documentation/devicetree/bindings/net/nxp,dwmac-imx.example.dt.yaml:
ethernet@30bf0000: compatible: None of ['nxp,imx8mp-dwmac-eqos', 'snps,dwmac-5.10a'] are valid under the given schema

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:56:37 -07:00
Muhammad Husaini Zulkifli
b27b8dc77b igc: Increase timeout value for Speed 100/1000/2500
As the cycle time is set to maximum of 1s, the TX Hang timeout need to
be increase to avoid possible TX Hang.

There is no dedicated number specific in data sheet for the timeout factor.
Timeout factor was determined during the debugging to solve the "Tx Hang"
issues that happen in some cases mainly during ETF(Earliest TxTime First).

This can be test by using TSN Schedule Tx Tools udp_tai sample application.

Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-07-20 16:11:36 -07:00