Commit graph

937235 commits

Author SHA1 Message Date
Jens Axboe
d6364a867c Merge branch 'nvme-5.8' of git://git.infradead.org/nvme into block-5.8
Pull NVMe fixes from Christoph.

* 'nvme-5.8' of git://git.infradead.org/nvme:
  nvme: add a Identify Namespace Identification Descriptor list quirk
  nvme-pci: prevent SK hynix PC400 from using Write Zeroes command
  nvme-tcp: fix possible hang waiting for icresp response
2020-07-29 11:21:14 -06:00
Leon Romanovsky
81530ab08e RDMA/mlx5: Allow providing extra scatter CQE QP flag
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.

Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.

Existing user applications are failing on this new validity check.

Fixes: 90ecb37a75 ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/20200728120255.805733-1-leon@kernel.org
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:19:01 -03:00
Qiushi Wu
fe3c606843 firmware: Fix a reference count leak.
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Callback function fw_cfg_sysfs_release_entry() in kobject_put()
can handle the pointer "entry" properly.

Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Link: https://lore.kernel.org/r/20200613190533.15712-1-wu000273@umn.edu
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-29 13:13:50 -04:00
Tony Nguyen
6221595fc5 ice: fix unused parameter warning
Depending on PAGE_SIZE, the following unused parameter warning can be
reported:

drivers/net/ethernet/intel/ice/ice_txrx.c: In function ‘ice_rx_frame_truesize’:
drivers/net/ethernet/intel/ice/ice_txrx.c:513:21: warning: unused parameter ‘size’ [-Wunused-parameter]
        unsigned int size)

The 'size' variable is used only when PAGE_SIZE >= 8192. Add __maybe_unused
to remove the warning.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2020-07-29 08:38:56 -07:00
Ben Shelton
7dfff9ffe8 ice: disable no longer needed workaround for FW logging
For the FW logging info AQ command, we currently set the ICE_AQ_FLAG_RD
in order to work around a FW issue. This issue has been fixed so remove the
workaround.

Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:56 -07:00
Bruce Allan
e923f04d66 ice: reduce scope of variable
The scope of the macro local variable 'i' can be reduced.  Do so to avoid
static analysis tools from complaining.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Marcin Szycik
78116e979d ice: cleanup VSI on probe fail
As part of ice_setup_pf_sw() a PF VSI is setup; release the VSI in case of
failure.

Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Brett Creeley
cd1f56f429 ice: Allow all VLANs in safe mode
Currently the PF VSI's context parameters are left in a bad state when
going into safe mode. This is causing VLAN traffic to not pass. Fix this
by configuring the PF VSI to allow all VLAN tagged traffic.

Also, remove redundant comment explaining the safe mode flow in
ice_probe().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Krzysztof Kazimierczak
682dfedcee ice: need_wakeup flag might not be set for Tx
This is a port of i40e commit 705639572e ("i40e: need_wakeup flag might
not be set for Tx").

Quoting the original commit message:

"The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an interrupt
since interrupts are disabled in the napi poll loop) between the time we
stopped processing the Tx completions and interrupts are enabled again.
In this case, the need_wakeup flag will have been cleared at the end of
the Tx completion processing as we believe we will get an interrupt from
the outstanding completion at a later point in time. But if this
completion interrupt occurs before interrupts are enable, we lose it and
should at that point really have set the need_wakeup flag since there
are no more outstanding completions that can generate an interrupt to
continue the processing. When this happens, user space will see a Tx
queue need_wakeup of 0 and skip issuing a syscall, which means will
never get into the Tx processing again and we have a deadlock."

As a result, packet processing stops. This patch introduces a fix for
this issue, by always setting the need_wakeup flag at the end of an
interrupt processing. This ensures that the deadlock will not happen.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Victor Raj
4043818c13 ice: distribute Tx queues evenly
Distribute the Tx queues evenly across all queue groups. This will
help the queues to get more equal sharing among the queues when all
are in use.

In the previous algorithm, the next queue group node will be picked up
only after the previous one filled with max children.
For example: if VSI is configured with 9 queues, the first 8 queues
will be assigned to queue group 1 and the 9th queue will be assigned to
queue group 2.

The 2 queue groups split the bandwidth between them equally (50:50).
The first queue group node will share the 50% bandwidth with all of
its children (8 queues). And the second queue group node will share
the entire 50% bandwidth with its only children.

The new algorithm will fix this issue.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Tarun Singh
984824a210 ice: Adjust scheduler default BW weight
By default the queues are configured in legacy mode. The default
BW settings for legacy/advanced modes are different. The existing
code was using the advanced mode default value of 1 which was
incorrect. This caused the unbalanced BW sharing among siblings.
The recommended default value is applied.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Tarun Singh
b3b93d6ce1 ice: Add RL profile bit mask check
Mask bits before accessing the profile type field.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Paul M Stillwell Jr
a02016de00 ice: fix overwriting TX/RX descriptor values when rebuilding VSI
If a user sets the value of the TX or RX descriptors to some non-default
value using 'ethtool -G' then we need to not overwrite the values when
we rebuild the VSI. The VSI rebuild could happen as a result of a user
setting the number of queues via the 'ethtool -L' command. Fix this by
checking to see if the value we have stored is non-zero and if it is
then don't change the value.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Kiran Patil
ca1fdb885e ice: return correct error code from ice_aq_sw_rules
Return ICE_ERR_DOES_NOT_EXIST return code if admin command error code is
ICE_AQ_RC_ENOENT (not exist). ice_aq_sw_rules is used when switch
rule is getting added/deleted/updated. In case of delete/update
switch rule, admin command can return ICE_AQ_RC_ENOENT error code
if such rule does not exist, hence return ICE_ERR_DOES_NOT_EXIST error
code from ice_aq_sw_rule, so that caller of this function can decide
how to handle ICE_ERR_DOES_NOT_EXIST.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Nick Nunley
a54a0b24f4 ice: restore VF MSI-X state during PCI reset
During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability
register will be cleared. This can lead to issues when a VF is
assigned to a VM because in these cases the VF driver receives no
indication of the PF PCI error/reset and additionally it is incapable
of restoring the cleared flag in the hypervisor configuration space
without fully reinitializing the driver interrupt functionality.

Since the VF driver is unable to easily resolve this condition on its own,
restore the VF MSI-X flag during the PF PCI reset handling.

Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:55 -07:00
Dave Ertman
0ce6c34a8f ice: fix link event handling timing
When the driver experiences a link event (especially link up)
there can be multiple events generated. Some of these are
link fault and still have a state of DOWN set.  The problem
happens when the link comes UP during the PF driver handling
one of the LINK DOWN events.  The status of the link is updated
and is now seen as UP, so when the actual LINK UP event comes,
the port information has already been updated to be seen as UP,
even though none of the UP activities have been completed.

After the link information has been updated in the link
handler and evaluated for MEDIA PRESENT, if the state
of the link has been changed to UP, treat the DOWN event
as an UP event since the link is now UP.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:54 -07:00
Dave Ertman
b767ca650f ice: Fix link broken after GLOBR reset
After a GLOBR, the link was broken so that a link
up situation was being seen as a link down.

The problem was that the rebuild process was updating
the port_info link status without doing any of the
other things that need to be done when link changes.

This was causing the port_info struct to have current
"UP" information so that any further UP interrupts
were skipped as redundant.

The rebuild flow should *not* be updating the port_info
struct link information, so eliminate this and leave
it to the link event handling code.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:54 -07:00
Dave Ertman
7d9c9b791f ice: Implement LFC workaround
There is a bug where the LFC settings are not being preserved
through a link event.  The registers in question are the ones
that are touched (and restored) when a set_local_mib AQ command
is performed.

On a link-up event, make sure that a set_local_mib is being
performed.

Move the function ice_aq_set_lldp_mib() from the DCB specific
ice_dcb.c to ice_common.c so that the driver always has access
to this AQ command.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:54 -07:00
Paul Moore
8ac68dc455 revert: 1320a4052e ("audit: trigger accompanying records when no rules present")
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records".  Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.

We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.

Cc: stable@vger.kernel.org
Fixes: 1320a4052e ("audit: trigger accompanying records when no rules present")
Reported-by: j2468h@googlemail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-29 10:00:36 -04:00
Ranjani Sridharan
7fcd9bb5ac ALSA: hda: fix NULL pointer dereference during suspend
When the ASoC card registration fails and the codec component driver
never probes, the codec device is not initialized and therefore
memory for codec->wcaps is not allocated. This results in a NULL pointer
dereference when the codec driver suspend callback is invoked during
system suspend. Fix this by returning without performing any actions
during codec suspend/resume if the card was not registered successfully.

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/20200728231011.1454066-1-ranjani.sridharan@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-07-29 09:54:49 +02:00
Christoph Hellwig
5bedd3afee nvme: add a Identify Namespace Identification Descriptor list quirk
Add a quirk for a device that does not support the Identify Namespace
Identification Descriptor list despite claiming 1.3 compliance.

Fixes: ea43d9709f ("nvme: fix identify error status silent ignore")
Reported-by: Ingo Brunberg <ingo_brunberg@web.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ingo Brunberg <ingo_brunberg@web.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-29 08:05:44 +02:00
Dave Airlie
a4a2739beb * drm: fix possible use-after-free
* dbi: fix SPI Type 1 transfer
  * drm_fb_helper: use memcpy_io on bochs' sparc64
  * mcde: fix stability
  * panel: fix display noise on auo,kd101n80-45na
  * panel: delay HPD checks for boe_nv133fhm_n61
  * bridge: drop connector check in nwl-dsi bridge
  * bridge: set proper bridge type for adv7511
  * of: fix a double free
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl8gBZkACgkQaA3BHVML
 eiMf4Af+ITzLTKmaaWfQyiaE9KsMNa0dzv2bBpG/H15RevJ40O2qEgY2R4hYmONZ
 zMSXLfT8fgj0ZVEac9jE2VoLi6QtAcB+cB9k0jfIL4kT5aG33sek9go/LmAtL9FB
 tyqS3k4lt8wxnVjVJs+Cqiz4BpnKHC9RxxGB8l83kPRbSE+Ifq3sciB0HJx3x6eI
 K2FQqphsYuXyIdewJNCoZ5RKHaS9UjQutargnwWi2Tb3OAmUblZxvojbjAtNlHhx
 PkOD8/iCygsL87GCawoopLnWaPJTDmOEKmxttzLs37Dqw2rhTsRU47/E6MlBZuwe
 LBuXCAAdNs4iRDj9HUoIXnup4YGXOw==
 =gfQ2
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-07-28' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

 * drm: fix possible use-after-free
 * dbi: fix SPI Type 1 transfer
 * drm_fb_helper: use memcpy_io on bochs' sparc64
 * mcde: fix stability
 * panel: fix display noise on auo,kd101n80-45na
 * panel: delay HPD checks for boe_nv133fhm_n61
 * bridge: drop connector check in nwl-dsi bridge
 * bridge: set proper bridge type for adv7511
 * of: fix a double free

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728110446.GA8076@linux-uq9g
2020-07-29 12:46:58 +10:00
Martin Varghese
1ed06dbc21 Documentation: bareudp: Corrected description of bareudp module.
Removed redundant words.

Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:53:03 -07:00
David S. Miller
490ed0b908 Merge branch 'net-stmmac-improve-WOL'
Jisheng Zhang says:

====================
net: stmmac: improve WOL

Currently, stmmac driver relies on the HW PMT to support WOL. We want
to support phy based WOL.

patch1 is a small improvement to disable WAKE_MAGIC for PMT case if
no pmt_magic_frame.
patch2 and patch3 are two prepation patches.
patch4 implement the phy based WOL
patch5 tries to save a bit energy if WOL is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
77b2898394 net: stmmac: Speed down the PHY if WoL to save energy
When WoL is enabled and the machine is powered off, the PHY remains
waiting for wakeup events at max speed, which is a waste of energy.

Slow down the PHY speed before stopping the ethernet if WoL is enabled,

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
1d8e5b0f3f net: stmmac: Support WOL with phy
Currently, the stmmac driver WOL implementation relies on MAC's PMT
feature. We have a case: the MAC HW doesn't enable PMT, instead, we
rely on the phy to support WOL. Implement the support for this case.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
e8377e7a29 net: stmmac: only call pmt() during suspend/resume if HW enables PMT
This is to prepare WOL support with phy. Compared with WOL
implementation which relies on the MAC's PMT features, in phy
supported WOL case, device_may_wakeup() may also be true, but we
should not call mac's pmt() function if HW doesn't enable PMT.

And during resume, we should call phylink_start() if PMT is disabled.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
2f45f7a13e net: stmmac: Move device_can_wakeup() check earlier in set_wol
If !device_can_wakeup(), there's no need to futher check. And return
-EOPNOTSUPP rather than -EINVAL if !device_can_wakeup().

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
1057d685c6 net: stmmac: Remove WAKE_MAGIC if HW shows no pmt_magic_frame
Remove WAKE_MAGIC from supported modes if the HW capability register
shows no support for pmt_magic_frame.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:19 -07:00
David S. Miller
f11df0454f Merge branch 'RTL8366-VLAN-callback-fixes'
Linus Walleij says:

====================
RTL8366 VLAN callback fixes

While we are pondering how to make the core set up the VLANs
the right way, let's merge the uncontroversial fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Linus Walleij
788abc6d9d net: dsa: rtl8366: Fix VLAN set-up
Alter the rtl8366_vlan_add() to call rtl8366_set_vlan()
inside the loop that goes over all VIDs since we now
properly support calling that function more than once.
Augment the loop to postincrement as this is more
intuitive.

The loop moved past the last VID but called
rtl8366_set_vlan() with the port number instead of
the VID, assuming a 1-to-1 correspondence between
ports and VIDs. This was also a bug.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Linus Walleij
15ab7906cc net: dsa: rtl8366: Fix VLAN semantics
The RTL8366 would not handle adding new members (ports) to
a VLAN: the code assumed that ->port_vlan_add() was only
called once for a single port. When intializing the
switch with .configure_vlan_while_not_filtering set to
true, the function is called numerous times for adding
all ports to VLAN1, which was something the code could
not handle.

Alter rtl8366_set_vlan() to just |= new members and
untagged flags to 4k and MC VLAN table entries alike.
This makes it possible to just add new ports to a
VLAN.

Put in some helpful debug code that can be used to find
any further bugs here.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Brian Vazquez
b9aaec8f0b fib: use indirect call wrappers in the most common fib_rules_ops
This avoids another inderect call per RX packet which save us around
20-40 ns.

Changelog:

v1 -> v2:
- Move declaraions to fib_rules.h to remove warnings

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:42:31 -07:00
Cong Wang
608b4adab1 net_sched: initialize timer earlier in red_init()
When red_init() fails, red_destroy() is called to clean up.
If the timer is not initialized yet, del_timer_sync() will
complain. So we have to move timer_setup() before any failure.

Reported-and-tested-by: syzbot+6e95a4fabf88dc217145@syzkaller.appspotmail.com
Fixes: aee9caa03f ("net: sched: sch_red: Add qevents "early_drop" and "mark"")
Cc: Petr Machata <petrm@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:41:23 -07:00
Guillaume Nault
302d201b5c bareudp: forbid mixing IP and MPLS in multiproto mode
In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and
IPv6 packets regardless of the bareudp ethertype. In practice, this
let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send
IPv6 packets.

We need to restrict the test further, so that the multiproto mode only
enables
  * IPv6 for IPv4 tunnels,
  * or multicast MPLS for unicast MPLS tunnels.

To improve clarity, the protocol validation is moved to its own
function, where each logical test has its own condition.

v2: s/ntohs/htons/

Fixes: 4b5f67232d ("net: Special handling for IP & MPLS.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:30:25 -07:00
Xiyu Yang
706ec91916 ipv6: Fix nexthop refcnt leak when creating ipv6 route info
ip6_route_info_create() invokes nexthop_get(), which increases the
refcount of the "nh".

When ip6_route_info_create() returns, local variable "nh" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
ip6_route_info_create(). When nexthops can not be used with source
routing, the function forgets to decrease the refcnt increased by
nexthop_get(), causing a refcnt leak.

Fix this issue by pulling up the error source routing handling when
nexthops can not be used with source routing.

Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:24:08 -07:00
David S. Miller
1255457429 Merge branch 'hinic-add-some-error-messages-for-debug'
Luo bin says:

====================
hinic: add some error messages for debug

patch #1: support to handle hw abnormal event
patch #2: improve the error messages when functions return failure and
	  dump relevant registers in some exception handling processes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:03 -07:00
Luo bin
90f86b8a36 hinic: add log in exception handling processes
improve the error message when functions return failure and dump
relevant registers in some exception handling processes

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:03 -07:00
Luo bin
c15850c709 hinic: add support to handle hw abnormal event
add support to handle hw abnormal event such as hardware failure,
cable unplugged,link error

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:02 -07:00
David S. Miller
fa662d7816 Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver'
Subbaraya Sundeep says:

====================
Fix bugs in Octeontx2 netdev driver

There are problems in the existing Octeontx2
netdev drivers like missing cancel_work for the
reset task, missing lock in reset task and
missing unergister_netdev in driver remove.
This patch set fixes the above problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
ed543f5c6a octeontx2-pf: Unregister netdev at driver remove
Added unregister_netdev in the driver remove
function. Generally unregister_netdev is called
after disabling all the device interrupts but here
it is called before disabling device mailbox
interrupts. The reason behind this is VF needs
mailbox interrupt to communicate with its PF to
clean up its resources during otx2_stop.
otx2_stop disables packet I/O and queue interrupts
first and by using mailbox interrupt communicates
to PF to free VF resources. Hence this patch
calls unregister_device just before
disabling mailbox interrupts.

Fixes: 3184fb5ba9 ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
c0376f473c octeontx2-pf: cancel reset_task work
During driver exit cancel the queued
reset_task work in VF driver.

Fixes: 3184fb5ba9 ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
948a66338f octeontx2-pf: Fix reset_task bugs
Two bugs exist in the code related to reset_task
in PF driver one is the missing protection
against network stack ndo_open and ndo_close.
Other one is the missing cancel_work.
This patch fixes those problems.

Fixes: 4ff7d1488a ("octeontx2-pf: Error handling support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Jakub Kicinski
3cab8c6552 mlx4: disable device on shutdown
It appears that not disabling a PCI device on .shutdown may lead to
a Hardware Error with particular (perhaps buggy) BIOS versions:

    mlx4_en: eth0: Close port called
    mlx4_en 0000:04:00.0: removed PHC
    reboot: Restarting system
    {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
    {1}[Hardware Error]: event severity: fatal
    {1}[Hardware Error]:  Error 0, type: fatal
    {1}[Hardware Error]:   section_type: PCIe error
    {1}[Hardware Error]:   port_type: 4, root port
    {1}[Hardware Error]:   version: 1.16
    {1}[Hardware Error]:   command: 0x4010, status: 0x0143
    {1}[Hardware Error]:   device_id: 0000:00:02.2
    {1}[Hardware Error]:   slot: 0
    {1}[Hardware Error]:   secondary_bus: 0x04
    {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f06
    {1}[Hardware Error]:   class_code: 000604
    {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
    {1}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
    {1}[Hardware Error]:   aer_uncor_severity: 0x00062030
    {1}[Hardware Error]:   TLP Header: 40000018 040000ff 791f4080 00000000
[hw error repeats]
    Kernel panic - not syncing: Fatal hardware error!
    CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017

Fix the mlx4 driver.

This is a very similar problem to what had been fixed in:
commit 0d98ba8d70 ("scsi: hpsa: disable device during shutdown")
to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.

Fixes: 2ba5fbd62b ("net/mlx4_core: Handle AER flow properly")
Reported-by: Jake Lawrence <lawja@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:10:56 -07:00
David S. Miller
a7ef23e568 Merge branch 'rhashtable-Fix-unprotected-RCU-dereference-in-__rht_ptr'
Herbert Xu says:

====================
rhashtable: Fix unprotected RCU dereference in __rht_ptr

This patch series fixes an unprotected dereference in __rht_ptr.
The first patch is a minimal fix that does not use the correct
RCU markings but is suitable for backport, and the second patch
cleans up the RCU markings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
Herbert Xu
ce9b362bf6 rhashtable: Restore RCU marking on rhash_lock_head
This patch restores the RCU marking on bucket_table->buckets as
it really does need RCU protection.  Its removal had led to a fatal
bug.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
Herbert Xu
1748f6a2cb rhashtable: Fix unprotected RCU dereference in __rht_ptr
The rcu_dereference call in rht_ptr_rcu is completely bogus because
we've already dereferenced the value in __rht_ptr and operated on it.
This causes potential double readings which could be fatal.  The RCU
dereference must occur prior to the comparison in __rht_ptr.

This patch changes the order of RCU dereference so that it is done
first and the result is then fed to __rht_ptr.  The RCU marking
changes have been minimised using casts which will be removed in
a follow-up patch.

Fixes: ba6306e3f6 ("rhashtable: Remove RCU marking from...")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
David S. Miller
aff7543126 Merge branch 'introduce-PLDM-firmware-update-library'
Jacob Keller says:

====================
introduce PLDM firmware update library

This series goal is to enable support for updating the ice hardware flash
using the devlink flash command.

The ice firmware update files are distributed using the file format
described by the "PLDM for Firmware Update" standard:

https://www.dmtf.org/documents/pmci/pldm-firmware-update-specification-100

Because this file format is standard, this series introduces a new library
that handles the generic logic for parsing the PLDM file header. The library
uses a design that is very similar to the Mellanox mlxfw module. That is, a
simple ops table is setup and device drivers instantiate an instance of the
pldmfw library with the device specific operations.

Doing so allows for each device to implement the low level behavior for how
to interact with its firmware.

This series includes the library and an implementation for the ice hardware.
I've removed all of the parameters, and the proposed changes to support
overwrite mode. I'll be working on the overwrite mask suggestion from Jakub
as a follow-up series.

Because the PLDM file format is a standard and not something that is
specific to the Intel hardware, I opted to place this update library in
lib/pldmfw. I should note that while I tried to make the library generic, it
does not attempt to mimic the complete "Update Agent" as defined in the
standard. This is mostly due to the fact that the actual interfaces exposed
to software for the ice hardware would not allow this.

This series depends on some work just recently and is based on top of the
patch series sent by Tony published at:

https://lore.kernel.org/netdev/20200723234720.1547308-1-anthony.l.nguyen@intel.com/T/#t

Changes since v2 RFC
* Removed overwrite mode patches, as this can become a follow up series with
  a separate discussion
* Fixed a minor bug in the pldm_timestamp structure not being packed.
* Dropped Cc for other driver maintainers, as this series no longer includes
  changes to the core flash update command.
* Re-ordered patches slightly.
Changes since v1 RFC
* Removed the "allow_downgrade_on_flash_update" parameter. Instead, the
  driver will always attempt to flash the device, even when firmware
  indicates that it would be a downgrade. A dev_warn is used to indicate
  when this occurs.
* Removed the "ignore_pending_flash_update". Instead, the driver will always
  check for and cancel any previous pending update. A devlink flash status
  message will be sent when this cancellation occurs.
* Removed the "reset_after_flash_update" parameter. This will instead be
  implemented as part of a devlink reset interface, work left for a future
  change.
* Replaced the "flash_update_preservation_level" parameter with a new
  "overwrite" mode attribute on the flash update command. For ice, this mode
  will select the preservation level. For all other drivers, I modified them
  to check that the mode is "OVERWRITE_NOTHING", and have Cc'd the
  maintainers to get their input.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacob Keller
d69ea414c9 ice: implement device flash update via devlink
Use the newly added pldmfw library to implement device flash update for
the Intel ice networking device driver. This support uses the devlink
flash update interface.

The main parts of the flash include the Option ROM, the netlist module,
and the main NVM data. The PLDM firmware file contains modules for each
of these components.

Using the pldmfw library, the provided firmware file will be scanned for
the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
the main NVM module containing the primary device firmware, and
"fw.netlist" containing the netlist module.

The flash is separated into two banks, the active bank containing the
running firmware, and the inactive bank which we use for update. Each
module is updated in a staged process. First, the inactive bank is
erased, preparing the device for update. Second, the contents of the
component are copied to the inactive portion of the flash. After all
components are updated, the driver signals the device to switch the
active bank during the next EMP reset (which would usually occur during
the next reboot).

Although the firmware AdminQ interface does report an immediate status
for each command, the NVM erase and NVM write commands receive status
asynchronously. The driver must not continue writing until previous
erase and write commands have finished. The real status of the NVM
commands is returned over the receive AdminQ. Implement a simple
interface that uses a wait queue so that the main update thread can
sleep until the completion status is reported by firmware. For erasing
the inactive banks, this can take quite a while in practice.

To help visualize the process to the devlink application and other
applications based on the devlink netlink interface, status is reported
via the devlink_flash_update_status_notify. While we do report status
after each 4k block when writing, there is no real status we can report
during erasing. We simply must wait for the complete module erasure to
finish.

With this implementation, basic flash update for the ice hardware is
supported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacob Keller
2ab560a78e ice: add flags indicating pending update of firmware module
After a flash update, the pending status of the update can be determined
from the device capabilities.

Read the appropriate device capability and store whether there is
a pending update awaiting a reboot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00