linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-18 00:24:39 +00:00

Author	SHA1	Message	Date
Nikolay Aleksandrov	5d1fcaf35d	net: bridge: fdb: eliminate extra port state tests from fast-path When commit `df1c0b8468` ("[BRIDGE]: Packets leaking out of disabled/blocked ports.") introduced the port state tests in br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was also used to avoid learning/refreshing from user-space with NTF_USE. Those two tests are done for every packet entering the bridge if it's learning, but for the fast-path we already have them checked in br_handle_frame() and is unnecessary to do it again. Thus push the checks to the unlikely cases and drop them from br_fdb_update(), the new nbp_state_should_learn() helper is used to determine if the port state allows br_fdb_update() to be called. The two places which need to do it manually are: - user-space add call with NTF_USE set - link-local packet learning done in __br_handle_local_finish() Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-04 11:15:27 -08:00
David S. Miller	1574cf83c7	mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl28qcYACgkQSD+KveBX +j4+tgf9HM1EpNeeX/kfZGECMtxHgKuNC4NwpOExKU/OpvUxCNBZb1KXDjaeraE9 8fLvgA/T2cEHfNojJ+S9rRb64KxGw/96ieYimc9aYkIH4L5YEXYcUHj44RRMNYpI miWUQgcWABRvO5JvDMXqG+NIr7ctyodA7Qb3vlFWJvSB8KLNViPfPfPHn+oVDnxR ZBp1CWZXnlwO1ZMYQ2TDY1l0csDIx+awxkYXL3SFqJKBzheDItmQ4Ybw/yh+Mfv3 eS5DJo2rRADHsTDTKSbyaBzcTY1UEWhJW+stYlN0SP9YkH1y2Q3lDeNKQLfK4xkc YwYaCp8ZvLmOtLxwoujAIl2R5sjDpQ== =8kXy -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 19:23:49 -08:00
YueHaibing	a37ac8ae66	mISDN: remove unused variable 'faxmodulation_s' drivers/isdn/hardware/mISDN/mISDNisar.c:30:17: warning: faxmodulation_s defined but not used [-Wunused-const-variable=] It is never used, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 19:10:30 -08:00
Vincent Cheng	3a6ba7dc77	ptp: Add a ptp clock driver for IDT ClockMatrix. The IDT ClockMatrix (TM) family includes integrated devices that provide eight PLL channels. Each PLL channel can be independently configured as a frequency synthesizer, jitter attenuator, digitally controlled oscillator (DCO), or a digital phase lock loop (DPLL). Typically these devices are used as timing references and clock sources for PTP applications. This patch adds support for the device. Co-developed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:35:40 -08:00
Vincent Cheng	5c5e7aac63	dt-bindings: ptp: Add device tree binding for IDT ClockMatrix based PTP clock Add device tree binding doc for the IDT ClockMatrix PTP clock. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:35:40 -08:00
Francesco Ruggeri	fac6fce9bd	net: icmp6: provide input address for traceroute6 traceroute6 output can be confusing, in that it shows the address that a router would use to reach the sender, rather than the address the packet used to reach the router. Consider this case: ------------------------ N2 \| \| ------ ------ N3 ---- \| R1 \| \| R2 \|------\|H2\| ------ ------ ---- \| \| ------------------------ N1 \| ---- \|H1\| ---- where H1's default route is through R1, and R1's default route is through R2 over N2. traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2. The script below can be used to reproduce this scenario. traceroute6 output without this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms 2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms traceroute6 output with this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms 2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms #!/bin/bash # # ------------------------ N2 # \| \| # ------ ------ N3 ---- # \| R1 \| \| R2 \|------\|H2\| # ------ ------ ---- # \| \| # ------------------------ N1 # \| # ---- # \|H1\| # ---- # # N1: 2000:101::/64 # N2: 2000:102::/64 # N3: 2000:103::/64 # # R1's host part of address: 1 # R2's host part of address: 2 # H1's host part of address: 3 # H2's host part of address: 4 # # For example: # the IPv6 address of R1's interface on N2 is 2000:102::1/64 # # Nets are implemented by macvlan interfaces (bridge mode) over # dummy interfaces. # # Create net namespaces ip netns add host1 ip netns add host2 ip netns add rtr1 ip netns add rtr2 # Create nets ip link add net1 type dummy; ip link set net1 up ip link add net2 type dummy; ip link set net2 up ip link add net3 type dummy; ip link set net3 up # Add interfaces to net1, move them to their nemaspaces ip link add link net1 dev host1net1 type macvlan mode bridge ip link set host1net1 netns host1 ip link add link net1 dev rtr1net1 type macvlan mode bridge ip link set rtr1net1 netns rtr1 ip link add link net1 dev rtr2net1 type macvlan mode bridge ip link set rtr2net1 netns rtr2 # Add interfaces to net2, move them to their nemaspaces ip link add link net2 dev rtr1net2 type macvlan mode bridge ip link set rtr1net2 netns rtr1 ip link add link net2 dev rtr2net2 type macvlan mode bridge ip link set rtr2net2 netns rtr2 # Add interfaces to net3, move them to their nemaspaces ip link add link net3 dev rtr2net3 type macvlan mode bridge ip link set rtr2net3 netns rtr2 ip link add link net3 dev host2net3 type macvlan mode bridge ip link set host2net3 netns host2 # Configure interfaces and routes in host1 ip netns exec host1 ip link set lo up ip netns exec host1 ip link set host1net1 up ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1 ip netns exec host1 ip -6 route add default via 2000:101::1 # Configure interfaces and routes in rtr1 ip netns exec rtr1 ip link set lo up ip netns exec rtr1 ip link set rtr1net1 up ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1 ip netns exec rtr1 ip link set rtr1net2 up ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2 ip netns exec rtr1 ip -6 route add default via 2000:102::2 ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in rtr2 ip netns exec rtr2 ip link set lo up ip netns exec rtr2 ip link set rtr2net1 up ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1 ip netns exec rtr2 ip link set rtr2net2 up ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2 ip netns exec rtr2 ip link set rtr2net3 up ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3 ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in host2 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set host2net3 up ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3 ip netns exec host2 ip -6 route add default via 2000:103::2 # Ping host2 from host1 ip netns exec host1 ping6 -c5 2000:103::4 # Traceroute host2 from host1 ip netns exec host1 traceroute6 2000:103::4 # Delete nets ip link del net3 ip link del net2 ip link del net1 # Delete namespaces ip netns del rtr2 ip netns del rtr1 ip netns del host2 ip netns del host1 Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Original-patch-by: Honggang Xu <hxu@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:26:53 -08:00
Tuong Lien	06e7c70c6e	tipc: improve message bundling algorithm As mentioned in commit `e95584a889` ("tipc: fix unlimited bundling of small messages"), the current message bundling algorithm is inefficient that can generate bundles of only one payload message, that causes unnecessary overheads for both the sender and receiver. This commit re-designs the 'tipc_msg_make_bundle()' function (now named as 'tipc_msg_try_bundle()'), so that when a message comes at the first place, we will just check & keep a reference to it if the message is suitable for bundling. The message buffer will be put into the link backlog queue and processed as normal. Later on, when another one comes we will make a bundle with the first message if possible and so on... This way, a bundle if really needed will always consist of at least two payload messages. Otherwise, we let the first buffer go its way without any need of bundling, so reduce the overheads to zero. Moreover, since now we have both the messages in hand, we can even optimize the 'tipc_msg_bundle()' function, make bundle of a very large (size ~ MSS) and small messages which is not with the current algorithm e.g. [1400-byte message] + [10-byte message] (MTU = 1500). Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:26:15 -08:00
Francesco Ruggeri	2adf81c0f7	net: icmp: use input address in traceroute Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the primary address of the interface the packet was received on, even if the path goes through a secondary address. In the example: 1.0.3.1/24 ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ---- \|H1\|--------------------------\|R1\|--------------------------\|H2\| ---- N1 ---- N2 ---- where 1.0.3.1/24 is R1's primary address on N1, traceroute from H1 to H2 returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms After applying this patch, it returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms Original-patch-by: Bill Fenner <fenner@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:25:18 -08:00
David S. Miller	c219a16622	Merge branch 'optimize-openvswitch-flow-looking-up' Tonghao Zhang says: ==================== optimize openvswitch flow looking up This series patch optimize openvswitch for performance or simplify codes. Patch 1, 2, 4: Port Pravin B Shelar patches to linux upstream with little changes. Patch 5, 6, 7: Optimize the flow looking up and simplify the flow hash. Patch 8, 9: are bugfix. The performance test is on Intel Xeon E5-2630 v4. The test topology is show as below: +-----------------------------------+ \| +---------------------------+ \| \| \| eth0 ovs-switch eth1 \| \| Host0 \| +---------------------------+ \| +-----------------------------------+ ^ \| \| \| \| \| \| \| \| v +-----+----+ +----+-----+ \| netperf \| Host1 \| netserver\| Host2 +----------+ +----------+ We use netperf send the 64B packets, and insert 255+ flow-mask: $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2 ... $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2 $ $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18 * Without series patch, throughput 8.28Mbps * With series patch, throughput 46.05Mbps v6: some coding style fixes v5: rewrite patch 8, release flow-mask when freeing flow v4: access ma->count with READ_ONCE/WRITE_ONCE API. More information, see patch 5 comments. v3: update ma point when realloc mask_array in patch 5 v2: simplify codes. e.g. use kfree_rcu instead of call_rcu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	eec62eadd1	net: openvswitch: simplify the ovs_dp_cmd_new use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	4c76bf696a	net: openvswitch: don't unlock mutex when changing the user_features fails Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: `95a7233c4` ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	50b0e61b32	net: openvswitch: fix possible memleak on destroy flow-table When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	0a3e01371d	net: openvswitch: add likely in flow_lookup The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	515b65a4b9	net: openvswitch: simplify the flow_hash Simplify the code and remove the unnecessary BUILD_BUG_ON. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	57f7d7b916	net: openvswitch: optimize flow-mask looking up The full looking up on flow table traverses all mask array. If mask-array is too large, the number of invalid flow-mask increase, performance will be drop. One bad case, for example: M means flow-mask is valid and NULL of flow-mask means deleted. +-------------------------------------------+ \| M \| NULL \| ... \| NULL \| M\| +-------------------------------------------+ In that case, without this patch, openvswitch will traverses all mask array, because there will be one flow-mask in the tail. This patch changes the way of flow-mask inserting and deleting, and the mask array will be keep as below: there is not a NULL hole. In the fast path, we can "break" "for" (not "continue") in flow_lookup when we get a NULL flow-mask. "break" v +-------------------------------------------+ \| M \| M \| NULL \|... \| NULL \| NULL\| +-------------------------------------------+ This patch don't optimize slow or control path, still using ma->max to traverse. Slow path: * tbl_mask_array_realloc * ovs_flow_tbl_lookup_exact * flow_mask_find Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	a7f35e78e7	net: openvswitch: optimize flow mask cache hash collision Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| In case hash collision on mask cache, OVS does extra flow \| lookup. Following patch avoid it. Link: `0e6efbe271` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	1689754de6	net: openvswitch: shrink the mask array if necessary When creating and inserting flow-mask, if there is no available flow-mask, we realloc the mask array. When removing flow-mask, if necessary, we shrink mask array. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	4bc63b1b53	net: openvswitch: convert mask list in mask array Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| mask caches index of mask in mask_list. On packet recv OVS \| need to traverse mask-list to get cached mask. Therefore array \| is better for retrieving cached mask. This also allows better \| cache replacement algorithm by directly checking mask's existence. Link: `d49fc3ff53` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	04b7d136d0	net: openvswitch: add flow-mask cache for performance The idea of this optimization comes from a patch which is committed in 2014, openvswitch community. The author is Pravin B Shelar. In order to get high performance, I implement it again. Later patches will use it. Pravin B Shelar, says: \| On every packet OVS needs to lookup flow-table with every \| mask until it finds a match. The packet flow-key is first \| masked with mask in the list and then the masked key is \| looked up in flow-table. Therefore number of masks can \| affect packet processing performance. Link: `5604935e4e` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
David S. Miller	ae8a76fb8b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-11-02 The following pull-request contains BPF updates for your net-next tree. We've added 30 non-merge commits during the last 7 day(s) which contain a total of 41 files changed, 1864 insertions(+), 474 deletions(-). The main changes are: 1) Fix long standing user vs kernel access issue by introducing bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel. 2) Accelerated xskmap lookup, from Björn and Maciej. 3) Support for automatic map pinning in libbpf, from Toke. 4) Cleanup of BTF-enabled raw tracepoints, from Alexei. 5) Various fixes to libbpf and selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 15:29:58 -07:00
David S. Miller	d31e95585c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 13:54:56 -07:00
Alexei Starovoitov	358fdb4562	Merge branch 'bpf_probe_read_user' Daniel Borkmann says: ==================== This set adds probe_read_{user,kernel}(), probe_read_str_{user,kernel}() helpers, fixes probe_write_user() helper and selftests. For details please see individual patches. Thanks! v2 -> v3: - noticed two more things that are fixed in here: - bpf uapi helper description used 'int size' for *_str helpers, now u32 - we need TASK_SIZE_MAX + guard page on x86-64 in patch 2 otherwise we'll trigger the `00c42373d3` warn as well, so full range covered now v1 -> v2: - standardize unsafe_ptr terminology in uapi header comment (Andrii) - probe_read_{user,kernel}[_str] naming scheme (Andrii) - use global data in last test case, remove relaxed_maps (Andrii) - add strict non-pagefault kernel read funcs to avoid warning in kernel probe read helpers (Alexei) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-02 12:45:15 -07:00
Daniel Borkmann	fa553d9b57	bpf, testing: Add selftest to read/write sockaddr from user space Tested on x86-64 and Ilya was also kind enough to give it a spin on s390x, both passing with probe_user:OK there. The test is using the newly added bpf_probe_read_user() to dump sockaddr from connect call into .bss BPF map and overrides the user buffer via bpf_probe_write_user(): # ./test_progs [...] #17 pkt_md_access:OK #18 probe_user:OK #19 prog_run_xattr:OK [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/90f449d8af25354e05080e82fc6e2d3179da30ea.1572649915.git.daniel@iogearbox.net	2019-11-02 12:45:08 -07:00
Daniel Borkmann	50f9aa44ca	bpf, testing: Convert prog tests to probe_read_{user, kernel}{, _str} helper Use probe read *_{kernel,user}{,_str}() helpers instead of bpf_probe_read() or bpf_probe_read_user_str() for program tests where appropriate. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/4a61d4b71ce3765587d8ef5cb93afa18515e5b3e.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:13 -07:00
Daniel Borkmann	251e2d337a	bpf, samples: Use bpf_probe_read_user where appropriate Use bpf_probe_read_user() helper instead of bpf_probe_read() for samples that attach to kprobes probing on user addresses. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/5b0144b3f8e031ec5e2438bd7de8d7877e63bf2f.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:13 -07:00
Daniel Borkmann	6e07a63412	bpf: Switch BPF probe insns to bpf_probe_read_kernel Commit `2a02759ef5` ("bpf: Add support for BTF pointers to interpreter") explicitly states that the pointer to BTF object is a pointer to a kernel object or NULL. Therefore we should also switch to using the strict kernel probe helper which is restricted to kernel addresses only when architectures have non-overlapping address spaces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/d2b90827837685424a4b8008dfe0460558abfada.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	6ae08ae3de	bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers The current bpf_probe_read() and bpf_probe_read_str() helpers are broken in that they assume they can be used for probing memory access for kernel space addresses /as well as/ user space addresses. However, plain use of probe_kernel_read() for both cases will attempt to always access kernel space address space given access is performed under KERNEL_DS and some archs in-fact have overlapping address spaces where a kernel pointer and user pointer would have the /same/ address value and therefore accessing application memory via bpf_probe_read{,_str}() would read garbage values. Lets fix BPF side by making use of recently added `3d7081822f` ("uaccess: Add non-pagefault user-space read functions"). Unfortunately, the only way to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}() helpers are kept as-is to retain their current behavior. The two _user() variants attempt the access always under USER_DS set, the two _kernel() variants will -EFAULT when accessing user memory if the underlying architecture has non-overlapping address ranges, also avoiding throwing the kernel warning via `00c42373d3` ("x86-64: add warning for non-canonical user access address dereferences"). Fixes: `a5e8c07059` ("bpf: add bpf_probe_read_str helper") Fixes: `2541517c32` ("tracing, perf: Implement BPF programs attached to kprobes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	eb1b668874	bpf: Make use of probe_user_write in probe write helper Convert the bpf_probe_write_user() helper to probe_user_write() such that writes are not attempted under KERNEL_DS anymore which is buggy as kernel and user space pointers can have overlapping addresses. Also, given we have the access_ok() check inside probe_user_write(), the helper doesn't need to do it twice. Fixes: `96ae522795` ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	75a1a607bb	uaccess: Add strict non-pagefault kernel-space read function Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict() helpers which by default alias to the __probe_kernel_read() and the __strncpy_from_unsafe(), respectively, but can be overridden by archs which have non-overlapping address ranges for kernel space and user space in order to bail out with -EFAULT when attempting to probe user memory including non-canonical user access addresses [0]: 4-level page tables: user-space mem: 0x0000000000000000 - 0x00007fffffffffff non-canonical: 0x0000800000000000 - 0xffff7fffffffffff 5-level page tables: user-space mem: 0x0000000000000000 - 0x00ffffffffffffff non-canonical: 0x0100000000000000 - 0xfeffffffffffffff The idea is that these helpers are complementary to the probe_user_read() and strncpy_from_unsafe_user() which probe user-only memory. Both added helpers here do the same, but for kernel-only addresses. Both set of helpers are going to be used for BPF tracing. They also explicitly avoid throwing the splat for non-canonical user addresses from `00c42373d3` ("x86-64: add warning for non-canonical user access address dereferences"). For compat, the current probe_kernel_read() and strncpy_from_unsafe() are left as-is. [0] Documentation/x86/x86_64/mm.txt Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: x86@kernel.org Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	1d1585ca0f	uaccess: Add non-pagefault user-space write function Commit `3d7081822f` ("uaccess: Add non-pagefault user-space read functions") missed to add probe write function, therefore factor out a probe_write_common() helper with most logic of probe_kernel_write() except setting KERNEL_DS, and add a new probe_user_write() helper so it can be used from BPF side. Again, on some archs, the user address space and kernel address space can co-exist and be overlapping, so in such case, setting KERNEL_DS would mean that the given address is treated as being in kernel address space. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Alexei Starovoitov	e1cb7d2d60	Merge branch 'map-pinning' Toke Høiland-Jørgensen says: ==================== This series adds support to libbpf for reading 'pinning' settings from BTF-based map definitions. It introduces a new open option which can set the pinning path; if no path is set, /sys/fs/bpf is used as the default. Callers can customise the pinning between open and load by setting the pin path per map, and still get the automatic reuse feature. The semantics of the pinning is similar to the iproute2 "PIN_GLOBAL" setting, and the eventual goal is to move the iproute2 implementation to be based on libbpf and the functions introduced in this series. Changelog: v6: - Fix leak of struct bpf_object in selftest - Make struct bpf_map arg const in bpf_map__is_pinned() and bpf_map__get_pin_path() v5: - Don't pin maps with pinning set, but with a value of LIBBPF_PIN_NONE - Add a few more selftests: - Should not pin map with pinning set, but value LIBBPF_PIN_NONE - Should fail to load a map with an invalid pinning value - Should fail to re-use maps with parameter mismatch - Alphabetise libbpf.map - Whitespace and typo fixes v4: - Don't check key_type_id and value_type_id when checking for map reuse compatibility. - Move building of map->pin_path into init_user_btf_map() - Get rid of 'pinning' attribute in struct bpf_map - Make sure we also create parent directory on auto-pin (new patch 3). - Abort the selftest on error instead of attempting to continue. - Support unpinning all pinned maps with bpf_object__unpin_maps(obj, NULL) - Support pinning at map->pin_path with bpf_object__pin_maps(obj, NULL) - Make re-pinning a map at the same path a noop - Rename the open option to pin_root_path - Add a bunch more self-tests for pin_maps(NULL) and unpin_maps(NULL) - Fix a couple of smaller nits v3: - Drop bpf_object__pin_maps_opts() and just use an open option to customise the pin path; also don't touch bpf_object__{un,}pin_maps() - Integrate pinning and reuse into bpf_object__create_maps() instead of having multiple loops though the map structure - Make errors in map reuse and pinning fatal to the load procedure - Add selftest to exercise pinning feature - Rebase series to latest bpf-next v2: - Drop patch that adds mounting of bpffs - Only support a single value of the pinning attribute - Add patch to fixup error handling in reuse_fd() - Implement the full automatic pinning and map reuse logic on load ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-02 12:35:12 -07:00
Toke Høiland-Jørgensen	2f4a32cc83	selftests: Add tests for automatic map pinning This adds a new BPF selftest to exercise the new automatic map pinning code. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298209.394725.15420085139296213182.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	57a00f4164	libbpf: Add auto-pinning of maps when loading BPF objects This adds support to libbpf for setting map pinning information as part of the BTF map declaration, to get automatic map pinning (and reuse) on load. The pinning type currently only supports a single PIN_BY_NAME mode, where each map will be pinned by its name in a path that can be overridden, but defaults to /sys/fs/bpf. Since auto-pinning only does something if any maps actually have a 'pinning' BTF attribute set, we default the new option to enabled, on the assumption that seamless pinning is what most callers want. When a map has a pin_path set at load time, libbpf will compare the map pinned at that location (if any), and if the attributes match, will re-use that map instead of creating a new one. If no existing map is found, the newly created map will instead be pinned at the location. Programs wanting to customise the pinning can override the pinning paths using bpf_map__set_pin_path() before calling bpf_object__load() (including setting it to NULL to disable pinning of a particular map). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298092.394725.3966306029218559681.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	196f8487f5	libbpf: Move directory creation into _pin() functions The existing pin_*() functions all try to create the parent directory before pinning. Move this check into the per-object _pin() functions instead. This ensures consistent behaviour when auto-pinning is added (which doesn't go through the top-level pin_maps() function), at the cost of a few more calls to mkdir(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297985.394725.5882630952992598610.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	4580b25fce	libbpf: Store map pin path and status in struct bpf_map Support storing and setting a pin path in struct bpf_map, which can be used for automatic pinning. Also store the pin status so we can avoid attempts to re-pin a map that has already been pinned (or reused from a previous pinning). The behaviour of bpf_object__{un,}pin_maps() is changed so that if it is called with a NULL path argument (which was previously illegal), it will (un)pin only those maps that have a pin_path set. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297876.394725.14782206533681896279.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	d1b4574a4b	libbpf: Fix error handling in bpf_map__reuse_fd() bpf_map__reuse_fd() was calling close() in the error path before returning an error value based on errno. However, close can change errno, so that can lead to potentially misleading error messages. Instead, explicitly store errno in the err variable before each goto. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297769.394725.12634985106772698611.stgit@toke.dk	2019-11-02 12:35:06 -07:00
Linus Torvalds	1204c70d9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix free/alloc races in batmanadv, from Sven Eckelmann. 2) Several leaks and other fixes in kTLS support of mlx5 driver, from Tariq Toukan. 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke Høiland-Jørgensen. 4) Add an r8152 device ID, from Kazutoshi Noguchi. 5) Missing include in ipv6's addrconf.c, from Ben Dooks. 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can easily infer the 32-bit secret otherwise etc. 7) Several netdevice nesting depth fixes from Taehee Yoo. 8) Fix several KCSAN reported errors, from Eric Dumazet. For example, when doing lockless skb_queue_empty() checks, and accessing sk_napi_id/sk_incoming_cpu lockless as well. 9) Fix jumbo packet handling in RXRPC, from David Howells. 10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet. 11) Fix DMA synchronization in gve driver, from Yangchun Fu. 12) Several bpf offload fixes, from Jakub Kicinski. 13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo. 14) Fix ping latency during high traffic rates in hisilicon driver, from Jiangfent Xiao. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) net: fix installing orphaned programs net: cls_bpf: fix NULL deref on offload filter removal selftests: bpf: Skip write only files in debugfs selftests: net: reuseport_dualstack: fix uninitalized parameter r8169: fix wrong PHY ID issue with RTL8168dp net: dsa: bcm_sf2: Fix IMP setup for port different than 8 net: phylink: Fix phylink_dbg() macro gve: Fixes DMA synchronization. inet: stop leaking jiffies on the wire ixgbe: Remove duplicate clear_bit() call Documentation: networking: device drivers: Remove stray asterisks e1000: fix memory leaks i40e: Fix receive buffer starvation for AF_XDP igb: Fix constant media auto sense switching when no cable is connected net: ethernet: arc: add the missed clk_disable_unprepare igb: Enable media autosense for the i350. igb/igc: Don't warn on fatal read failures when the device is removed tcp: increase tcp_max_syn_backlog max value net: increase SOMAXCONN to 4096 netdevsim: Fix use-after-free during device dismantle ...	2019-11-01 17:48:11 -07:00
Linus Torvalds	372bf6c1c8	NFS Client Bugfixes for Linux 5.4-rc6 Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl28mZ4ACgkQ18tUv7Cl QOvAAxAA5h15NJ8OTOpr/kyWQAHEYWIEWQyTNYT8Gi/TI/YCfauET6kxAKg7ETk+ Bx6+A7Q5OlKSqFZXf2UMMIHPnvH7Ar06uIof6DLjWxA9N/q57SNW0YszQCUhvWjp 4Ry7d9A2yNrCizEk/vTmY5zfFIo2S88AvwkQ6gLCEyfIn5REWQyQnS4SS3vL7MuS E/iWBsXFV2C9/yON+zzZnC0ewMIPbtWA3/CsVI2zDKLsHKvBYOx6n7TbfTtMtc/p mCdgLzdEEZlN0KOCzzMp7ziME8y1qUGX6eo17Jrr0ZBZIM7o5d7sYMA88Apu6dMg MQM5iEEAf//kmHLWzQ6yg8XZEZrXFMOiGKYu0kWt4pVlqi78gemEfE6T8dClsB5m P7jqn/4ntDlJPWPciImsVEzDOHlzlZBIXwZs1JjufeO/hIB570cs80PjeDQP0id8 Cx3yR7Hr1ldKzwSwpWrtIaqEPljJJw0jhxQ7YYlvfbA1Ogd4ek4QU4RuvyW11CRr d/5Oaa7xvjGs63klu8HpZG4NHFby3PPmZT6t4xUcH75MR14S15YCGtaUvPJ7ORck C6tZJAkVtJqSoAjsGuBtTc5qiOdo+hMRghutW4Pd6UpnPYJKwQUtTqp3MbfM4+bA S19tGG6qn+7cl7bW4NJpNy8HfbC7BdS+m7maHpQfGIdMsxKVFoI= =ttcU -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client bugfixes from Anna Schumaker: "This contains two delegation fixes (with the RCU lock leak fix marked for stable), and three patches to fix destroying the the sunrpc back channel. Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation" * tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() NFSv4: Don't allow a cached open with a revoked delegation SUNRPC: Destroy the back channel when we destroy the host transport SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding SUNRPC: The TCP back channel mustn't disappear while requests are outstanding	2019-11-01 17:37:44 -07:00
Linus Torvalds	0821de2896	for-linus-20191101 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl28lRQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjtbEADbrMXdsdPV2CApxSiZIaWO1mR78yy/btxp cHsJ+avaPGxhNukSsose2KWm656SriH/OfQspqtvzDpslbu40V41+vSqSknqGRPr 8jW5efZIAy6dq0FjbtBnmIV6PhC5d4F/nAEQbsnVRn8RSr3OwQcm8/smpSFA8urI oHVU8jiyLsQiSbDvjf2KPhPYhWBHO0W5SyGo29HY8pSzQpsMzGkQ6TcHL4EzwPZP WPtGglr14v8rMyhNMxUdHZ9eHCMq7uufFPuyJXzesE/qyM+H8p2pwwxyfflHGZil w2vxLJRu8d4UIkHEkNbC0bydXJ+eCtRMBZON1ZGdrZwQ58L9AbBPBZmxKb0LkmHb 4tc/yQm/0kSUUXwFtDoUoIBFjjy36Pl5BsLt4n5fofsl04myhm5CLqZ8oWxyU0vO sCinJwk1+eQO/tbQVDfven+MroNlYVPCnXhIe/12/wEba3EJ7Ab4X5p0lJoJ1oY7 9dQyY6+BaHd4wV9p0domOP5y7dJnXM9k46EF0/5YoNjoqaH5MWPMq355VH2xNjdw 5HzRcZfvOAlXASrnXuQAAQAdR2b+s/iFZaNKA7bTZxjNPvYE0zySCMeQeNXmfVKe CrDuwViWukwIzETDZHYqMWJxOV4nyOeL3jTo7rQp5A5TEWwBiJKQ4aGBif2eqc+L Mk41ziQGuQ== =+rar -----END PGP SIGNATURE----- Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Two small nvme fixes, one is a fabrics connection fix, the other one a cleanup made possible by that fix (Anton, via Keith) - Fix requeue handling in umb ubd (Anton) - Fix spin_lock_irq() nesting in blk-iocost (Dan) - Three small io_uring fixes: - Install io_uring fd after done with ctx (me) - Clear ->result before every poll issue (me) - Fix leak of shadow request on error (Pavel) * tag 'for-linus-20191101' of git://git.kernel.dk/linux-block: iocost: don't nest spin_lock_irq in ioc_weight_write() io_uring: ensure we clear io_kiocb->result before each issue um-ubd: Entrust re-queue to the upper layers nvme-multipath: remove unused groups_only mode in ana log nvme-multipath: fix possible io hang after ctrl reconnect io_uring: don't touch ctx in setup after ring fd install io_uring: Fix leaked shadow_req	2019-11-01 17:33:12 -07:00
Linus Torvalds	e5897c7d2e	RISC-V updates for v5.4-rc6 One fix for PCIe users: - Fix legacy PCI I/O port access emulation One set of cleanups: - Resolve most of the warnings generated by sparse across arch/riscv. No functional changes And one MAINTAINERS update: - Update Palmer's E-mail address -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl28yFoACgkQx4+xDQu9 KkvNSQ/8DpNzyFysNQZQEwbh3s9VOl2mWpie2Ym/ND2KJHL4zavFURwX5wO7QY7I a6H1fYZ3yjw9JOmUsCj/0ia2gzvAlrAPpy8Vd3hO/MPTXvxqPYJ0CC325Spn4aG+ qYJHJWH/kBKhaaWDVa4wtOok35fx2F9UD352BZjyJfmvjqLKK0H+iflVId8ZrKPl hM8q4uO75b9NJUFakXMYAfYSTAzDe2qleNQzfJMlBRNjfxyx6E39O6U0VYNmIYXp c+U8OxfFKvJ0aU3JMLRxvt1gZRhsVk3lpScqaRvSRhtbUBcP6ya/hvySIDg3T+wK b7k3x0uINMJqxlAu/akw2X3QPFfqrQUxpXP80qZW+duGSEPsOeICGFvzi0gOn80F vq+SjVv3pXLxt1psd4Y6Os1kofFki+/VpUmtaQQwnBIHM3U7RomFnfSgCF2H/mPY SdYrH5F4RHVzptvG3MBSFqrs+QycPE2NJUdMKz794VQ4qiDqlpV1HKAO1Fbbw7KA b0eUd4MatHp2KSRG7YyT3RpCAKSyz7Ar9yu5rGr/I/J40PiPzHtR0d4FK0mQfiZw GBxvwUlErswgFrrmD3JMnWaFTEa+kzGKiYDgunjazGzJ8W06i3la0wWVcHXVdbj+ K6SWla67dTw+N45U3Cs1WvtdgVK9QpQkl14X5HLynqUviw546ng= =hkRB -----END PGP SIGNATURE----- Merge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "One fix for PCIe users: - Fix legacy PCI I/O port access emulation One set of cleanups: - Resolve most of the warnings generated by sparse across arch/riscv. No functional changes And one MAINTAINERS update: - Update Palmer's E-mail address" * tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: MAINTAINERS: Change to my personal email address RISC-V: Add PCIe I/O BAR memory mapping riscv: for C functions called only from assembly, mark with __visible riscv: fp: add missing __user pointer annotations riscv: add missing header file includes riscv: mark some code and data as file-static riscv: init: merge split string literals in preprocessor directive riscv: add prototypes for assembly language functions from head.S	2019-11-01 17:20:53 -07:00
Daniel Borkmann	78db77fab1	Merge branch 'bpf-xskmap-perf-improvements' Björn Töpel says: ==================== This set consists of three patches from Maciej and myself which are optimizing the XSKMAP lookups. In the first patch, the sockets are moved to be stored at the tail of the struct xsk_map. The second patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch, introduced in this revision, moves various XSKMAP functions, to permit the compiler to do more aggressive inlining. Based on the XDP program from tools/lib/bpf/xsk.c where bpf_map_lookup_elem() is explicitly called, this work yields a 5% improvement for xdpsock's rxdrop scenario. The last patch yields 2% improvement. Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the overflow checks are done in the bpf_map_area_alloc() and bpf_map_charge_init() functions, which was fixed in commit `ff1c08e1f7` ("bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}()"). [1] https://patchwork.ozlabs.org/patch/1186170/ v1->v2: * Change size/cost to size_t and use {struct, array}_size where appropriate. (Jakub) v2->v3: * Proper commit message for patch 2. v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub) * Introduced patch 3. v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct pointer-sized loads. (Daniel) ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-02 00:38:59 +01:00
Björn Töpel	d817991cc7	xsk: Restructure/inline XSKMAP lookup/redirect/flush In this commit the XSKMAP entry lookup function used by the XDP redirect code is moved from the xskmap.c file to the xdp_sock.h header, so the lookup can be inlined from, e.g., the bpf_xdp_redirect_map() function. Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush() functions. Finally, all the XDP socket functions were moved from linux/bpf.h to net/xdp_sock.h, where most of the XDP sockets functions are anyway. This yields a ~2% performance boost for the xdpsock "rx_drop" scenario. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Maciej Fijalkowski	e65650f291	bpf: Implement map_gen_lookup() callback for XSKMAP Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup() callback. This results in emitting the bpf instructions in place of bpf_map_lookup_elem() helper call and better performance of bpf programs. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Björn Töpel	64fe8c061d	xsk: Store struct xdp_sock as a flexible array member of the XSKMAP Prior this commit, the array storing XDP socket instances were stored in a separate allocated array of the XSKMAP. Now, we store the sockets as a flexible array member in a similar fashion as the arraymap. Doing so, we do less pointer chasing in the lookup. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Linus Torvalds	31408fbe33	Merge branch 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix a parisc kernel crash with ftrace functions when compiled without frame pointers" * 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: fix frame pointer in ftrace_regs_caller()	2019-11-01 15:16:25 -07:00
David S. Miller	aeb1b85c34	Merge branch 'fix-BPF-offload-related-bugs' Jakub Kicinski says: ==================== fix BPF offload related bugs test_offload.py catches some recently added bugs. First of a bug in test_offload.py itself after recent changes to netdevsim is fixed. Second patch fixes a bug in cls_bpf, and last one addresses a problem with the recently added XDP installation optimization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 15:16:01 -07:00
Jakub Kicinski	aefc3e723a	net: fix installing orphaned programs When netdevice with offloaded BPF programs is destroyed the programs are orphaned and removed from the program IDA - their IDs get released (the programs may remain accessible via existing open file descriptors and pinned files). After IDs are released they are set to 0. This confuses dev_change_xdp_fd() because it compares the __dev_xdp_query() result where 0 means no program with prog->aux->id where 0 means orphaned. dev_change_xdp_fd() would have incorrectly returned success even though it had not installed the program. Since drivers already catch this case via bpf_offload_dev_match() let them handle this case. The error message drivers produce in this case ("program loaded for a different device") is in fact correct as the orphaned program must had to be loaded for a different device. Fixes: `c14a9f633d` ("net: Don't call XDP_SETUP_PROG when nothing is changed") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 15:16:01 -07:00
Jakub Kicinski	41aa29a58b	net: cls_bpf: fix NULL deref on offload filter removal Commit `4011921137` ("net: sched: refactor block offloads counter usage") missed the fact that either new prog or old prog may be NULL. Fixes: `4011921137` ("net: sched: refactor block offloads counter usage") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 15:16:01 -07:00
Jakub Kicinski	8101e06941	selftests: bpf: Skip write only files in debugfs DebugFS for netdevsim now contains some "action trigger" files which are write only. Don't try to capture the contents of those. Note that we can't use os.access() because the script requires root. Fixes: `4418f862d6` ("netdevsim: implement support for devlink region and snapshots") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 15:16:01 -07:00
Wei Wang	d64479a3e3	selftests: net: reuseport_dualstack: fix uninitalized parameter This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN) occasionally due to the uninitialized length parameter. Initialize it to fix this, and also use int for "test_family" to comply with the API standard. Fixes: `d6a61f80b8` ("soreuseport: test mixed v4/v6 sockets") Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Craig Gallek <cgallek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 15:11:02 -07:00

1 2 3 4 5 ...

873482 commits