linux-stable/net/ipv4
David Ahern f415c26417 ipv4: Update exception handling for multipath routes via same device
[ Upstream commit 2fbc6e89b2 ]

Kfir reported that pmtu exceptions are not created properly for
deployments where multipath routes use the same device.

After some digging I see 2 compounding problems:
1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
   the route lookup. This is the second use case where this has
   been a problem (the first is related to use of vti devices with
   VRF). I can not find any reason for the oif to be changed after the
   lookup; the code goes back to the start of git. It does not seem
   logical so remove it.

2. fib_lookups for exceptions do not call fib_select_path to handle
   multipath route selection based on the hash.

The end result is that the fib_lookup used to add the exception
always creates it based using the first leg of the route.

An example topology showing the problem:

                 |  host1
             +------+
             | eth0 |  .209
             +------+
                 |
             +------+
     switch  | br0  |
             +------+
                 |
       +---------+---------+
       | host2             |  host3
   +------+             +------+
   | eth0 | .250        | eth0 | 192.168.252.252
   +------+             +------+

   +-----+             +-----+
   | vti | .2          | vti | 192.168.247.3
   +-----+             +-----+
       \                  /
 =================================
 tunnels
         192.168.247.1/24

for h in host1 host2 host3; do
        ip netns add ${h}
        ip -netns ${h} link set lo up
        ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
done

ip netns add switch
ip -netns switch li set lo up
ip -netns switch link add br0 type bridge stp 0
ip -netns switch link set br0 up

for n in 1 2 3; do
        ip -netns switch link add eth-sw type veth peer name eth-h${n}
        ip -netns switch li set eth-h${n} master br0 up
        ip -netns switch li set eth-sw netns host${n} name eth0
done

ip -netns host1 addr add 192.168.252.209/24 dev eth0
ip -netns host1 link set dev eth0 up
ip -netns host1 route add 192.168.247.0/24 \
        nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0

ip -netns host2 addr add 192.168.252.250/24 dev eth0
ip -netns host2 link set dev eth0 up

ip -netns host2 addr add 192.168.252.252/24 dev eth0
ip -netns host3 link set dev eth0 up

ip netns add tunnel
ip -netns tunnel li set lo up
ip -netns tunnel li add br0 type bridge
ip -netns tunnel li set br0 up
for n in $(seq 11 20); do
        ip -netns tunnel addr add dev br0 192.168.247.${n}/24
done

for n in 2 3
do
        ip -netns tunnel link add vti${n} type veth peer name eth${n}
        ip -netns tunnel link set eth${n} mtu 1360 master br0 up
        ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
        ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
done
ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3

ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
ip -netns host1 ro ls cache

Before this patch the cache always shows exceptions against the first
leg in the multipath route; 192.168.252.250 per this example. Since the
hash has an initial random seed, you may need to vary the final octet
more than what is listed. In my tests, using addresses between 11 and 19
usually found 1 that used both legs.

With this patch, the cache will have exceptions for both legs.

Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions")
Reported-by: Kfir Itzhak <mastertheknife@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01 13:12:25 +02:00
..
netfilter netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code 2020-06-03 08:18:09 +02:00
af_inet.c gso_segment: Reset skb->mac_len after modifying network header 2018-09-29 03:06:00 -07:00
ah4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-06-23 14:17:31 -04:00
arp.c arp: fix arp_filter on l3slave devices 2018-04-12 12:32:22 +02:00
cipso_ipv4.c netlabel: cope with NULL catmap 2020-05-20 08:17:12 +02:00
datagram.c inet: stop leaking jiffies on the wire 2019-11-10 11:25:37 +01:00
devinet.c devinet: fix memleak in inetdev_init() 2020-06-11 09:22:58 +02:00
esp4.c esp4: add length check for UDP encapsulation 2019-05-25 18:25:34 +02:00
esp4_offload.c esp: Fix GRO when the headers not fully in the linear part of the skb. 2018-02-25 11:07:46 +01:00
fib_frontend.c ipv4: Return error for RTA_VIA attribute 2019-03-13 14:03:09 -07:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c net: fib_rules: Implement notification logic in core 2017-08-03 15:35:59 -07:00
fib_semantics.c net: Fix the arp error in some cases 2020-06-30 15:38:00 -04:00
fib_trie.c ipv4: Silence suspicious RCU usage warning 2020-08-21 09:48:00 +02:00
fou.c net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv 2019-04-27 09:35:34 +02:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-20 10:54:07 +01:00
gre_offload.c net: gre: recompute gre csum for sctp over gre tunnels 2020-08-21 09:48:01 +02:00
icmp.c net: icmp: fix data-race in cmp_global_allow() 2020-01-04 14:00:08 +01:00
igmp.c igmp: fix memory leak in igmpv3_del_delrec() 2019-07-31 07:28:44 +02:00
inet_connection_sock.c net: refactor bind_bucket fastreuse into helper 2020-08-21 09:48:14 +02:00
inet_diag.c inet_diag: return classid for all socket types 2020-03-20 10:54:13 +01:00
inet_fragment.c net: IP defrag: encapsulate rbtree defrag code into callable functions 2019-04-27 09:35:40 +02:00
inet_hashtables.c net: initialize fastreuse on inet_inherit_port 2020-08-21 09:48:16 +02:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-05-16 10:10:24 +02:00
inetpeer.c inetpeer: fix data-race in inet_putpeer / inet_putpeer 2020-01-04 14:00:07 +01:00
ip_forward.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_fragment.c net: IP defrag: encapsulate rbtree defrag code into callable functions 2019-04-27 09:35:40 +02:00
ip_gre.c ip_gre: fix parsing gre header in ipgre_err 2019-11-20 18:00:02 +01:00
ip_input.c vrf: check accept_source_route on the original netdevice 2019-04-17 08:37:48 +02:00
ip_options.c vrf: check accept_source_route on the original netdevice 2019-04-17 08:37:48 +02:00
ip_output.c ip: fix tos reflection in ack and reset packets 2020-10-01 13:12:24 +02:00
ip_sockglue.c ip: on queued skb use skb_header_pointer instead of pskb_may_pull 2019-01-23 08:09:47 +01:00
ip_tunnel.c ip_tunnel: fix use-after-free in ip_tunnel_lookup() 2020-06-30 15:37:59 -04:00
ip_tunnel_core.c ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL 2019-08-04 09:32:03 +02:00
ip_vti.c ip_vti: receive ipip packet by calling ip_tunnel_rcv 2020-06-03 08:18:08 +02:00
ipcomp.c
ipconfig.c ipconfig: Correctly initialise ic_nameservers 2018-08-03 07:50:39 +02:00
ipip.c net: ipip: fix wrong address family in init error path 2020-06-03 08:17:31 +02:00
ipmr.c ipv4: Fix potential Spectre v1 vulnerability 2019-01-09 17:14:42 +01:00
Kconfig vti[6]: fix packet tx through bpf_redirect() in XinY cases 2020-04-02 16:34:32 +02:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
netfilter.c
ping.c ipv4: fill fl4_icmp_{type,code} in ping_v4_sendmsg 2020-07-22 09:22:19 +02:00
proc.c tcp: tcp_fragment() should apply sane memory limits 2019-06-17 19:52:44 +02:00
protocol.c
raw.c net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg() 2020-05-02 17:24:11 +02:00
raw_diag.c inet_diag: return classid for all socket types 2020-03-20 10:54:13 +01:00
route.c ipv4: Update exception handling for multipath routes via same device 2020-10-01 13:12:25 +02:00
syncookies.c tcp: handle inet_csk_reqsk_queue_add() failures 2019-03-19 13:13:23 +01:00
sysctl_net_ipv4.c tcp: add tcp_min_snd_mss sysctl 2019-06-17 19:52:44 +02:00
tcp.c tcp: make sure listeners don't initialize congestion-control state 2020-07-22 09:22:20 +02:00
tcp_bbr.c tcp_bbr: improve arithmetic division in bbr_update_bw() 2020-01-29 15:02:36 +01:00
tcp_bic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cdg.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cong.c tcp: make sure listeners don't initialize congestion-control state 2020-07-22 09:22:20 +02:00
tcp_cubic.c tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT 2020-06-30 15:37:59 -04:00
tcp_dctcp.c tcp: Ensure DCTCP reacts to losses 2019-04-17 08:37:47 +02:00
tcp_diag.c tcp_diag: report TCP MD5 signing keys and addresses 2017-09-01 18:38:09 -07:00
tcp_fastopen.c net: add rb_to_skb() and other rb tree helpers 2018-09-19 22:43:47 +02:00
tcp_highspeed.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_htcp.c tcp: fix cwnd undo in Reno and HTCP congestion controls 2017-08-06 21:25:10 -07:00
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-05-30 07:52:06 +02:00
tcp_input.c tcp: allow at most one TLP probe per flight 2020-07-31 16:44:45 +02:00
tcp_ipv4.c tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers 2020-07-22 09:22:20 +02:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_minisocks.c tcp: do not restart timewait timer on rst reception 2018-09-15 09:45:25 +02:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-04-26 11:02:13 +02:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-31 14:03:47 +01:00
tcp_output.c tcp: allow at most one TLP probe per flight 2020-07-31 16:44:45 +02:00
tcp_probe.c tcp: remove redundant argument from tcp_rcv_established() 2017-07-24 17:28:12 -07:00
tcp_rate.c tcp: invalidate rate samples during SACK reneging 2018-01-02 20:31:09 +01:00
tcp_recovery.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_scalable.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_timer.c tcp: exit if nothing to retransmit on RTO timeout 2019-12-17 20:38:43 +01:00
tcp_ulp.c tcp, ulp: add alias for all ulp modules 2018-09-15 09:45:29 +02:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-12-25 14:26:30 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_westwood.c tcp: Revert "tcp: remove CA_ACK_SLOWPATH" 2017-08-30 11:20:08 -07:00
tcp_yeah.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tunnel4.c
udp.c net: udp: Fix wrong clean up for IS_UDPLITE macro 2020-07-31 16:44:44 +02:00
udp_diag.c inet_diag: return classid for all socket types 2020-03-20 10:54:13 +01:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c net: fix use-after-free in GRO with ESP 2018-07-22 14:28:44 +02:00
udp_tunnel.c net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udplite.c
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-11-04 14:52:37 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-11-04 14:52:37 +01:00
xfrm4_mode_tunnel.c
xfrm4_output.c xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish 2020-05-02 17:24:18 +02:00
xfrm4_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2020-01-04 14:00:14 +01:00
xfrm4_protocol.c
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c