Commit graph

872638 commits

Author SHA1 Message Date
Roman Mashak
a8fad5459d tc-testing: updated pedit TDC tests
Added test cases for IP header operations:
- set tos/precedence
- add value to tos/precedence
- clear tos/precedence
- invert tos/precedence

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:38:51 -07:00
David S. Miller
7170debecd Merge branch 'mvneta-xdp'
Lorenzo Bianconi says:

====================
add XDP support to mvneta driver

Add XDP support to mvneta driver for devices that rely on software
buffer management. Supported verdicts are:
- XDP_DROP
- XDP_PASS
- XDP_REDIRECT
- XDP_TX
Moreover set ndo_xdp_xmit net_device_ops function pointer in order
to support redirecting from other device (e.g. virtio-net).
Convert mvneta driver to page_pool API.
This series is based on previous work done by Jesper and Ilias.
We will send follow-up patches to reduce DMA-sync operations.

Changes since v4:
- reset page_pool pointer to NULL in mvneta_rxq_drop_pkts and in
  mvneta_create_page_pool error path
- move dma sync in mvneta_rx_refill() in patch 2/7
- verify bpf prog pointer in mvneta_xdp_setup to double-check if
  stop/start is really necessary
- coding style fixes

Changes since v3:
- rename MVNETA_XDP_CONSUMED in MVNETA_XDP_DROPPED
- squash patch 4/8 and patch 3/8
- fix dma sync for XDP_TX verdict
- fix queue_index in xdp_rxq_info_reg
- cosmetics

Changes since v2:
- rely on page_pool_recycle_direct instead of xdp_return_buff for XDP_DROP
- define xdp buffer in mvneta_rx_swbm and avoid default initializations
- use dma_sync_single_for_cpu instead of dma_sync_single_range_for_cpu
- run page_pool_release_page in mvneta_swbm_add_rx_fragment even if
  the buffer contains just ETH_FCS

Changes since v1:
- sync dma buffers before refilling hw queues
- fix stats accounting

Changes since RFC:
- implement XDP_TX
- make tx pending buffer list agnostic
- code refactoring
- check if device is running in mvneta_xdp_setup
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:08 -07:00
Lorenzo Bianconi
b0a43db908 net: mvneta: add XDP_TX support
Implement XDP_TX verdict and ndo_xdp_xmit net_device_ops function
pointer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
9e58c8b410 net: mvneta: make tx buffer array agnostic
Allow tx buffer array to contain both skb and xdp buffers in order to
enable xdp frame recycling adding XDP_TX verdict support

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
fa383f6b77 net: mvneta: move header prefetch in mvneta_swbm_rx_frame
Move data buffer prefetch in mvneta_swbm_rx_frame after
dma_sync_single_range_for_cpu

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
0db51da7a8 net: mvneta: add basic XDP support
Add basic XDP support to mvneta driver for devices that rely on software
buffer management. Currently supported verdicts are:
- XDP_DROP
- XDP_PASS
- XDP_REDIRECT
- XDP_ABORTED

- iptables drop:
$iptables -t raw -I PREROUTING -p udp --dport 9 -j DROP
$nstat -n && sleep 1 && nstat
IpInReceives		151169		0.0
IpExtInOctets		6953544		0.0
IpExtInNoECTPkts	151165		0.0

- XDP_DROP via xdp1
$./samples/bpf/xdp1 3
proto 0:	421419 pkt/s
proto 0:	421444 pkt/s
proto 0:	421393 pkt/s
proto 0:	421440 pkt/s
proto 0:	421184 pkt/s

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
8dc9a0888f net: mvneta: rely on build_skb in mvneta_rx_swbm poll routine
Refactor mvneta_rx_swbm code introducing mvneta_swbm_rx_frame and
mvneta_swbm_add_rx_fragment routines. Rely on build_skb in oreder to
allocate skb since the previous patch introduced buffer recycling using
the page_pool API.
This patch fixes even an issue in the original driver where dma buffers
are accessed before dma sync.
mvneta driver can run on not cache coherent devices so it is
necessary to sync DMA buffers before sending them to the device
in order to avoid memory corruptions. Running perf analysis we can
see a performance cost associated with this DMA-sync (anyway it is
already there in the original driver code). In follow up patches we
will add more logic to reduce DMA-sync as much as possible.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
568a3fa24a net: mvneta: introduce page pool API for sw buffer manager
Use the page_pool api for allocations and DMA handling instead of
__dev_alloc_page()/dma_map_page() and free_page()/dma_unmap_page().
Pages are unmapped using page_pool_release_page before packets
go into the network stack.

The page_pool API offers buffer recycling capabilities for XDP but
allocates one page per packet, unless the driver splits and manages
the allocated page.
This is a preliminary patch to add XDP support to mvneta driver

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi
ff519e2acd net: mvneta: introduce mvneta_update_stats routine
Introduce mvneta_update_stats routine to collect {rx/tx} statistics
(packets and bytes). This is a preliminary patch to add XDP support to
mvneta driver

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Sasha Neftin
70332577e4 igc: Clean up unused shadow_vfta pointer
VLAN filter table array not implemented yet and shadow_vfta pointer
not used. Clean up the code and remove the unused shadow_vfta pointer.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:27:01 -07:00
Sasha Neftin
3bdd7086f7 igc: Add Rx checksum support
Extend the socket buffer field process and add Rx checksum functionality
Minor: fix indentation with tab instead of spaces.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:26:39 -07:00
Sasha Neftin
7f839684c5 igc: Add set_rx_mode support
Add multicast addresses list to the MTA table.
Implement basic Rx mode support.
Add option for IPv6 address settings.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
Sasha Neftin
f15bb6dde7 e1000e: Add support for S0ix
Implement flow for S0ix support. Modern SoCs support S0ix low power
states during idle periods, which are sub-states of ACPI S0 that increase
power saving while supporting an instant-on experience for providing
lower latency that ACPI S0. The S0ix states shut off parts of the SoC
when they are not in use, while still maintaning optimal performance.
This patch add support for S0ix started from an Ice Lake platform.

Suggested-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
Sasha Neftin
0ac960a8e1 igc: Add SCTP CRC checksumming functionality
Add stream control transmission protocol CRC checksum.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
David S. Miller
13faf77185 Merge branch 'hns3-next'
Huazhong Tan says:

====================
net: hns3: add some cleanups & optimizations

This patchset includes some cleanups and optimizations for the HNS3
ethernet driver.

[patch 1/8] removes unused and unnecessary structures.

[patch 2/8] uses a ETH_ALEN u8 array to replace two mac_addr_*
field in struct hclge_mac_mgr_tbl_entry_cmd.

[patch 3/8] optimizes the barrier used in the IO path.

[patch 4/8] introduces macro ring_to_netdev() to get netdevive
from struct hns3_enet_ring variable.

[patch 5/8] makes struct hns3_enet_ring to be cacheline aligned

[patch 6/8] adds a minor cleanup for hns3_handle_rx_bd().

[patch 7/8] removes linear data allocating for fraglist SKB.

[patch 8/8] clears hardware error when resetting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Jian Shen
4fdd0bca61 net: hns3: log and clear hardware error after reset complete
When device is resetting, the CMDQ service may be stopped until
reset completed. If a new RAS error occurs at this moment, it
will no be able to clear the RAS source. This patch fixes it
by clear the RAS source after reset complete.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin
7fda3a930d net: hns3: do not allocate linear data for fraglist skb
Currently, napi_alloc_skb() is used to allocate skb for fraglist
when the head skb is not enough to hold the remaining data, and
the remaining data is added to the frags part of the fraglist skb,
leaving the linear part unused.

So this patch passes length of 0 to allocate fraglist skb with
zero size of linear data.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin
d35bced88f net: hns3: minor cleanup for hns3_handle_rx_bd()
Since commit e559709505 ("net: hns3: Add handling of GRO Pkts
not fully RX'ed in NAPI poll"), ring->skb is used to record the
current SKB when processing the RX BD in hns3_handle_rx_bd(),
so the parameter out_skb is unnecessary.

This patch also adjusts the err checking to reduce duplication
in hns3_handle_rx_bd(), and "err == -ENXIO" is rare case, so put
it in the unlikely annotation.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin
76643555a1 net: hns3: make struct hns3_enet_ring cacheline aligned
Since struct hns3_enet_ring is a frequently used in critical data
path, so make it cacheline aligned as struct hns3_enet_tqp_vector.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin
c871195601 net: hns3: introduce ring_to_netdev() in enet module
There are a few places that need to access the netdev of a ring
through ring->tqp->handle->kinfo.netdev, and ring->tqp is a struct
which both in enet and hclge modules, it is better to use the
struct that is only used in enet module.

This patch adds the ring_to_netdev() to access the netdev of ring
through ring->tqp_vector->napi.dev.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin
88b7c58c19 net: hns3: minor optimization for barrier in IO path
Currently, the TX and RX ring in a queue is bounded to the
same IRQ, there may be unnecessary barrier op when only one of
the ring need to be processed.

This patch adjusts the location of rmb() in hns3_clean_tx_ring()
and adds a checking in hns3_clean_rx_ring() to avoid unnecessary
barrier op when there is nothing to do for the ring.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Guojia Liao
0e02a53d64 net: hns3: optimized MAC address in management table.
mac_addr_hi32 and mac_addr_lo16 are used to store the MAC address
for management table. But using array of mac_addr[ETH_ALEN] would
be more general and not need to care about the big-endian mode of
the CPU.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin
5f06b903cb net: hns3: remove struct hns3_nic_ring_data in hns3_enet module
Only the queue_index field in struct hns3_nic_ring_data is
used, other field is unused and unnecessary for hns3 driver,
so this patch removes it and move the queue_index field to
hns3_enet_ring.

This patch also removes an unused struct hns_queue declaration.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Stefan Wahren
3347a80965 Bluetooth: hci_bcm: Fix RTS handling during startup
The RPi 4 uses the hardware handshake lines for CYW43455, but the chip
doesn't react to HCI requests during DT probe. The reason is the inproper
handling of the RTS line during startup. According to the startup
signaling sequence in the CYW43455 datasheet, the hosts RTS line must
be driven after BT_REG_ON and BT_HOST_WAKE.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-21 17:05:14 +02:00
Jeffrey Hugo
bba79fee7a Revert "Bluetooth: hci_qca: Add delay for wcn3990 stability"
This reverts commit cde9dde6e1.

The frame reassembly errors were root caused to a transient gpio issue.
The missing response was root caused to an issue with properly managing
RFR in the uart driver.  Addressing those root causes occurs outside of
hci_qca and eliminates the need for the 50ms delay, so remove it.

Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-21 17:03:59 +02:00
Daniel Borkmann
46a4a97063 Merge branch 'bpf-libbpf-cleanups'
Andrii Nakryiko says:

====================
This patch set's main goal is to teach bpf_object__open() (and its variants)
to automatically derive BPF program type/expected attach type from section
names, similarly to how bpf_prog_load() was doing it. This significantly
improves user experience by eliminating yet another
obvious-only-in-the-hindsight surprise, when using libbpf APIs.

There are a bunch of auxiliary clean-ups and improvements. E.g.,
bpf_program__get_type() and bpf_program__get_expected_attach_type() are added
for completeness and symmetry with corresponding setter APIs. Some clean up
and fixes in selftests/bpf are done as well.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-10-21 14:49:17 +02:00
Andrii Nakryiko
1678e33c21 selftest/bpf: Get rid of a bunch of explicit BPF program type setting
Now that libbpf can correctly guess BPF program types from section
names, remove a bunch of explicit bpf_program__set_type() calls
throughout tests.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-8-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
8af1c8b8d6 selftests/bpf: Make reference_tracking test use subtests
reference_tracking is actually a set of 9 sub-tests. Make it explicitly so.

Also, add explicit "classifier/" prefix to BPF program section names to
let libbpf correctly guess program type. Thus, also remove explicit
bpf_prog__set_type() call.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-7-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
f90415e960 selftests/bpf: Make a copy of subtest name
test_progs never created a copy of subtest name, rather just stored
pointer to whatever string test provided. This is bad as that string
might be freed or modified by the end of subtest. Fix this by creating
a copy of given subtest name when subtest starts.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-6-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
dd4436bb83 libbpf: Teach bpf_object__open to guess program types
Teach bpf_object__open how to guess program type and expected attach
type from section names, similar to what bpf_prog_load() does. This
seems like a really useful features and an oversight to not have this
done during bpf_object_open(). To preserver backwards compatible
behavior of bpf_prog_load(), its attr->prog_type is treated as an
override of bpf_object__open() decisions, if attr->prog_type is not
UNSPECIFIED.

There is a slight difference in behavior for bpf_prog_load().
Previously, if bpf_prog_load() was loading BPF object with more than one
program, first program's guessed program type and expected attach type
would determine corresponding attributes of all the subsequent program
types, even if their sections names suggest otherwise. That seems like
a rather dubious behavior and with this change it will behave more
sanely: each program's type is determined individually, unless they are
forced to uniformity through attr->prog_type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-5-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
32dff6db29 libbpf: Add uprobe/uretprobe and tp/raw_tp section suffixes
Map uprobe/uretprobe into KPROBE program type. tp/raw_tp are just an
alias for more verbose tracepoint/raw_tracepoint, respectively.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-4-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
f1eead9e3c libbpf: Add bpf_program__get_{type, expected_attach_type) APIs
There are bpf_program__set_type() and
bpf_program__set_expected_attach_type(), but no corresponding getters,
which seems rather incomplete. Fix this.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-3-andriin@fb.com
2019-10-21 14:49:12 +02:00
Andrii Nakryiko
bc3f2956f2 tools: Sync if_link.h
Sync if_link.h into tools/ and get rid of annoying libbpf Makefile warning.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191021033902.3856966-2-andriin@fb.com
2019-10-21 14:49:12 +02:00
Kefeng Wang
be18010ea2 tools, bpf: Rename pr_warning to pr_warn to align with kernel logging
For kernel logging macros, pr_warning() is completely removed and
replaced by pr_warn(). By using pr_warn() in tools/lib/bpf/ for
symmetry to kernel logging macros, we could eventually drop the
use of pr_warning() in the whole kernel tree.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191021055532.185245-1-wangkefeng.wang@huawei.com
2019-10-21 14:38:41 +02:00
Jakub Sitnicki
ab81e203bc scripts/bpf: Print an error when known types list needs updating
Don't generate a broken bpf_helper_defs.h header if the helper script needs
updating because it doesn't recognize a newly added type. Instead print an
error that explains why the build is failing, clean up the partially
generated header and stop.

v1->v2:
- Switched from temporary file to .DELETE_ON_ERROR.

Fixes: 456a513bb5 ("scripts/bpf: Emit an #error directive known types list needs updating")
Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191020112344.19395-1-jakub@cloudflare.com
2019-10-20 18:21:21 -07:00
David S. Miller
2f184393e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Several cases of overlapping changes which were for the most
part trivially resolvable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20 10:43:00 -07:00
Linus Torvalds
531e93d114 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "I was battling a cold after some recent trips, so quite a bit piled up
  meanwhile, sorry about that.

  Highlights:

   1) Fix fd leak in various bpf selftests, from Brian Vazquez.

   2) Fix crash in xsk when device doesn't support some methods, from
      Magnus Karlsson.

   3) Fix various leaks and use-after-free in rxrpc, from David Howells.

   4) Fix several SKB leaks due to confusion of who owns an SKB and who
      should release it in the llc code. From Eric Biggers.

   5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

   6) Jumbo packets don't work after resume on r8169, as the BIOS resets
      the chip into non-jumbo mode during suspend. From Heiner Kallweit.

   7) Corrupt L2 header during MPLS push, from Davide Caratti.

   8) Prevent possible infinite loop in tc_ctl_action, from Eric
      Dumazet.

   9) Get register bits right in bcmgenet driver, based upon chip
      version. From Florian Fainelli.

  10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

  11) Cure race between route lookup and invalidation in ipv4, from Wei
      Wang.

  12) Fix performance regression due to false sharing in 'net'
      structure, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
  net: reorder 'struct net' fields to avoid false sharing
  net: dsa: fix switch tree list
  net: ethernet: dwmac-sun8i: show message only when switching to promisc
  net: aquantia: add an error handling in aq_nic_set_multicast_list
  net: netem: correct the parent's backlog when corrupted packet was dropped
  net: netem: fix error path for corrupted GSO frames
  macb: propagate errors when getting optional clocks
  xen/netback: fix error path of xenvif_connect_data()
  net: hns3: fix mis-counting IRQ vector numbers issue
  net: usb: lan78xx: Connect PHY before registering MAC
  vsock/virtio: discard packets if credit is not respected
  vsock/virtio: send a credit update when buffer size is changed
  mlxsw: spectrum_trap: Push Ethernet header before reporting trap
  net: ensure correct skb->tstamp in various fragmenters
  net: bcmgenet: reset 40nm EPHY on energy detect
  net: bcmgenet: soft reset 40nm EPHYs before MAC init
  net: phy: bcm7xxx: define soft_reset for 40nm EPHY
  net: bcmgenet: don't set phydev->link from MAC
  net: Update address for MediaTek ethernet driver in MAINTAINERS
  ipv4: fix race condition between route lookup and invalidation
  ...
2019-10-19 17:09:11 -04:00
Eric Dumazet
2a06b8982f net: reorder 'struct net' fields to avoid false sharing
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.

Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)

So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.

We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.

We move in the first cache line fields that can be
dirtied often.

We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.

Fixes: 355b985537 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:21:53 -07:00
Vivien Didelot
50c7d2ba9d net: dsa: fix switch tree list
If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.

Fixes: 83c0afaec7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:19:41 -07:00
Mans Rullgard
05908d72cc net: ethernet: dwmac-sun8i: show message only when switching to promisc
Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam.  Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:18:10 -07:00
Chenwandun
3d00cf2fbb net: aquantia: add an error handling in aq_nic_set_multicast_list
add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.

Signed-off-by: Chenwandun <chenwandun@huawei.com>
Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:16:38 -07:00
David S. Miller
708738376c Merge branch 'netem-fix-further-issues-with-packet-corruption'
Jakub Kicinski says:

====================
net: netem: fix further issues with packet corruption

This set is fixing two more issues with the netem packet corruption.

First patch (which was previously posted) avoids NULL pointer dereference
if the first frame gets freed due to allocation or checksum failure.
v2 improves the clarity of the code a little as requested by Cong.

Second patch ensures we don't return SUCCESS if the frame was in fact
dropped. Thanks to this commit message for patch 1 no longer needs the
"this will still break with a single-frame failure" disclaimer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:12:36 -07:00
Jakub Kicinski
e0ad032e14 net: netem: correct the parent's backlog when corrupted packet was dropped
If packet corruption failed we jump to finish_segs and return
NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
increment its backlog, that's incorrect - we need to return
NET_XMIT_DROP.

Fixes: 6071bd1aa1 ("netem: Segment GSO packets on enqueue")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:12:36 -07:00
Jakub Kicinski
a7fa12d158 net: netem: fix error path for corrupted GSO frames
To corrupt a GSO frame we first perform segmentation.  We then
proceed using the first segment instead of the full GSO skb and
requeue the rest of the segments as separate packets.

If there are any issues with processing the first segment we
still want to process the rest, therefore we jump to the
finish_segs label.

Commit 177b800746 ("net: netem: fix backlog accounting for
corrupted GSO frames") started using the pointer to the first
segment in the "rest of segments processing", but as mentioned
above the first segment may had already been freed at this point.

Backlog corrections for parent qdiscs have to be adjusted.

Fixes: 177b800746 ("net: netem: fix backlog accounting for corrupted GSO frames")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:12:35 -07:00
Michael Tretter
bd310aca44 macb: propagate errors when getting optional clocks
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.

In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.

For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.

Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.

Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 11:58:39 -07:00
Juergen Gross
3d5c1a037d xen/netback: fix error path of xenvif_connect_data()
xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().

Remove the superfluous module_put().

Fixes: 279f438e36 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 11:43:29 -07:00
Yonglong Liu
580a05f9d4 net: hns3: fix mis-counting IRQ vector numbers issue
Currently, the num_msi_left means the vector numbers of NIC,
but if the PF supported RoCE, it contains the vector numbers
of NIC and RoCE(Not expected).

This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.

This patch adds a new variable num_nic_msi to store the vector
numbers of NIC, and adjust the default TQP numbers and rss_size
according to the value of num_nic_msi.

Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 11:40:55 -07:00
Linus Torvalds
998d75510e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "Rather a lot of fixes, almost all affecting mm/"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
  scripts/gdb: fix debugging modules on s390
  kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
  mm/thp: allow dropping THP from page cache
  mm/vmscan.c: support removing arbitrary sized pages from mapping
  mm/thp: fix node page state in split_huge_page_to_list()
  proc/meminfo: fix output alignment
  mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
  mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
  mm: include <linux/huge_mm.h> for is_vma_temporary_stack
  zram: fix race between backing_dev_show and backing_dev_store
  mm/memcontrol: update lruvec counters in mem_cgroup_move_account
  ocfs2: fix panic due to ocfs2_wq is null
  hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
  mm: memblock: do not enforce current limit for memblock_phys* family
  mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
  mm/gup: fix a misnamed "write" argument, and a related bug
  mm/gup_benchmark: add a missing "w" to getopt string
  ocfs2: fix error handling in ocfs2_setattr()
  mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
  mm/memunmap: don't access uninitialized memmap in memunmap_pages()
  ...
2019-10-19 06:53:59 -04:00
Ilya Leoshkevich
585d730d41 scripts/gdb: fix debugging modules on s390
Currently lx-symbols assumes that module text is always located at
module->core_layout->base, but s390 uses the following layout:

  +------+  <- module->core_layout->base
  | GOT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset
  | PLT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset +
  | TEXT |     module->arch->plt_size
  +------+

Therefore, when trying to debug modules on s390, all the symbol
addresses are skewed by plt_offset + plt_size.

Fix by adding plt_offset + plt_size to module_addr in
load_module_symbols().

Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:33 -04:00
Song Liu
aa5de305c9 kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries.  On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.

However, the regroup is broken For perf_event based trace_uprobe.  This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER.  The second call will split the PMD mapped
page table entry, which is not the desired behavior.

Fix this by only use FOLL_SPLIT_PMD for uprobe register case.

Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.

Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
Fixes: 5a52c9df62 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:33 -04:00