Commit Graph

1355 Commits

Author SHA1 Message Date
Peter Oberparleiter cf245e831a s390/cio: fix invalid -EBUSY on ccw_device_start
commit 5ef1dc40ff upstream.

The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code
when drivers try to start I/O while a path-verification (PV) process is
pending. This can lead to failed device initialization attempts with
symptoms like broken network connectivity after boot.

Fix this by replacing the -EBUSY return code with a deferred condition
code 1 reply to make path-verification handling consistent from a
driver's point of view.

The problem can be reproduced semi-regularly using the following process,
while repeating steps 2-3 as necessary (example assumes an OSA device
with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02):

1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group
2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online
3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \
   echo on > /sys/devices/css0/chp0.02/status

Background information:

The common I/O layer starts path-verification I/Os when it receives
indications about changes in a device path's availability. This occurs
for example when hardware events indicate a change in channel-path
status, or when a manual operation such as a CHPID vary or configure
operation is performed.

If a driver attempts to start I/O while a PV is running, CIO reports a
successful I/O start (ccw_device_start() return code 0). Then, after
completion of PV, CIO synthesizes an interrupt response that indicates
an asynchronous status condition that prevented the start of the I/O
(deferred condition code 1).

If a PV indication arrives while a device is busy with driver-owned I/O,
PV is delayed until after I/O completion was reported to the driver's
interrupt handler. To ensure that PV can be started eventually, CIO
reports a device busy condition (ccw_device_start() return code -EBUSY)
if a driver tries to start another I/O while PV is pending.

In some cases this -EBUSY return code causes device drivers to consider
a device not operational, resulting in failed device initialization.

Note: The code that introduced the problem was added in 2003. Symptoms
started appearing with the following CIO commit that causes a PV
indication when a device is removed from the cio_ignore list after the
associated parent subchannel device was probed, but before online
processing of the CCW device has started:

2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")

During boot, the cio_ignore list is modified by the cio_ignore dracut
module [1] as well as Linux vendor-specific systemd service scripts[2].
When combined, this commit and boot scripts cause a frequent occurrence
of the problem during boot.

[1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore
[2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service

Cc: stable@vger.kernel.org # v5.15+
Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Tested-By: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01 13:34:58 +01:00
Dinghao Liu 63e8b94ad1 s390/cio: fix a memleak in css_alloc_subchannel
When dma_set_coherent_mask() fails, sch->lock has not been
freed, which is allocated in css_sch_create_locks(), leading
to a memleak.

Fixes: 4520a91a97 ("s390/cio: use dma helpers for setting masks")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Message-Id: <20230921071412.13806-1-dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/linux-s390/bd38baa8-7b9d-4d89-9422-7e943d626d6e@linux.ibm.com/
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-16 13:03:05 +02:00
Linus Torvalds 4a0fc73da9 more s390 updates for 6.6 merge window
- Couple of virtual vs physical address confusion fixes
 
 - Rework locking in dcssblk driver to address a lockdep warning
 
 - Remove support for "noexec" kernel command line option since there
   is no use case where it would make sense
 
 - Simplify kernel mapping setup and get rid of quite a bit of code
 
 - Add architecture specific __set_memory_yy() functions which allow to
   modify kernel mappings. Unlike the set_memory_xx() variants they
   take void pointer start and end parameters, which allows to use them
   without the usual casts, and also to use them on areas larger than
   8TB.
   Note that the set_memory_xx() family comes with an int num_pages
   parameter which overflows with 8TB. This could be addressed by
   changing the num_pages parameter to unsigned long, however requires
   to change all architectures, since the module code expects an int
   parameter (see module_set_memory()).
   This was indeed an issue since for debug_pagealloc() we call
   set_memory_4k() on the whole identity mapping. Therefore address
   this for now with the __set_memory_yy() variant, and address common
   code later
 
 - Use dev_set_name() and also fix memory leak in zcrypt driver error
   handling
 
 - Remove unused lsi_mask from airq_struct
 
 - Add warning for invalid kernel mapping requests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmT5sYkACgkQIg7DeRsp
 bsJQHg//SFJOcVr2uViorvmt+IyiW78H75y8+AC6qoxZ3fuJeuyW2zz/resRsFRO
 UDCjkV2QYT885ADFUMcunf7LQnJYpSFVjAB6WEhgFr+bFdMWtX1crjHsMHkpcobR
 Fo4IFurG5NOMIdunPwKiO/dLU8pNGTm1T430uyLZH3ApVrmJ8stgOczusqTO/h3V
 tjbC5HAm9BBwNyOd2lnZSi8pbUMcfGE2DkRYhFQup1N9NT0BC4S6B8vTsD7NVYVZ
 WbJzqSl3CnTUJxZFX287YDW6cSaPCLAUfCl7r5FJZUoGvUtOVLlqcOjIIm/72epv
 YygMI8RdM0KHuzOd8XTNDbBnyYWTmWnZypCQVgNezY1GiO5nfTCkwYLZZagcc0Lo
 6fx1B7xBbNao9/pURa6wJ9dXsx+NmD52DjM806hP1H+qoL/YP8dLqV6Qh2C1PFzI
 VncrnG2mnxjwry5WoOyl19SKqXDgmDylnFuHQHKA9mdCaqZWNkqovOo/6qE6OdWI
 Q3TinPwTug9vmgwBuQgua/2R1owy7G9mJwwDW9AtZXVzlMG6k06deKIYrZjLxf6Z
 hOtAu+Xy5IeoXOaaOFU92RsRKX481OTG77rWLREC+orAT1dPrfnguPLlQ7KeFbkR
 foduTTHcDPOLiIVL1RbWQC9AAw/i7IjN04Jp6KFlQTZqLATyPbM=
 =+iSZ
 -----END PGP SIGNATURE-----

Merge tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - A couple of virtual vs physical address confusion fixes

 - Rework locking in dcssblk driver to address a lockdep warning

 - Remove support for "noexec" kernel command line option since there is
   no use case where it would make sense

 - Simplify kernel mapping setup and get rid of quite a bit of code

 - Add architecture specific __set_memory_yy() functions which allow us
   to modify kernel mappings. Unlike the set_memory_xx() variants they
   take void pointer start and end parameters, which allows using them
   without the usual casts, and also to use them on areas larger than
   8TB.

   Note that the set_memory_xx() family comes with an int num_pages
   parameter which overflows with 8TB. This could be addressed by
   changing the num_pages parameter to unsigned long, however requires
   to change all architectures, since the module code expects an int
   parameter (see module_set_memory()).

   This was indeed an issue since for debug_pagealloc() we call
   set_memory_4k() on the whole identity mapping. Therefore address this
   for now with the __set_memory_yy() variant, and address common code
   later

 - Use dev_set_name() and also fix memory leak in zcrypt driver error
   handling

 - Remove unused lsi_mask from airq_struct

 - Add warning for invalid kernel mapping requests

* tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vmem: do not silently ignore mapping limit
  s390/zcrypt: utilize dev_set_name() ability to use a formatted string
  s390/zcrypt: don't leak memory if dev_set_name() fails
  s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusion
  s390/airq: remove lsi_mask from airq_struct
  s390/mm: use __set_memory() variants where useful
  s390/set_memory: add __set_memory() variant
  s390/set_memory: generate all set_memory() functions
  s390/mm: improve description of mapping permissions of prefix pages
  s390/amode31: change type of __samode31, __eamode31, etc
  s390/mm: simplify kernel mapping setup
  s390: remove "noexec" option
  s390/vmem: fix virtual vs physical address confusion
  s390/dcssblk: fix lockdep warning
  s390/monreader: fix virtual vs physical address confusion
2023-09-07 10:52:13 -07:00
Benjamin Block acf00b5ef9 s390/airq: remove lsi_mask from airq_struct
Remove the field `lsi_mask` from `struct airq_struct` as it is not
utilized for any adapter interrupt, other than setting it to the default
value of 0xff.

Because nobody is using this functionality, all it does is cost a little
bit of time with each delivered adapter interrupt.

Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:28 +02:00
Yi Liu 8cfa718602 vfio-iommufd: Add detach_ioas support for emulated VFIO devices
This prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25 10:19:18 -06:00
Heiko Carstens cada938a01 s390: fix various typos
Fix various typos found with codespell.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03 11:19:42 +02:00
Linus Torvalds 6a46676994 s390 updates for 6.5 merge window
- Fix the style of protected key API driver source: use
   x-mas tree for all local variable declarations.
 
 - Rework protected key API driver to not use the struct
   pkey_protkey and pkey_clrkey anymore. Both structures
   have a fixed size buffer, but with the support of ECC
   protected key these buffers are not big enough. Use
   dynamic buffers internally and transparently for
   userspace.
 
 - Add support for a new 'non CCA clear key token' with
   ECC clear keys supported: ECC P256, ECC P384, ECC P521,
   ECC ED25519 and ECC ED448. This makes it possible to
   derive a protected key from the ECC clear key input via
   PKEY_KBLOB2PROTK3 ioctl, while currently the only way
   to derive is via PCKMO instruction.
 
 - The s390 PMU of PAI crypto and extension 1 NNPA counters
   use atomic_t for reference counting. Replace this with
   the proper data type refcount_t.
 
 - Select ARCH_SUPPORTS_INT128, but limit this to clang for
   now, since gcc generates inefficient code, which may lead
   to stack overflows.
 
 - Replace one-element array with flexible-array member in
   struct vfio_ccw_parent and refactor the rest of the code
   accordingly. Also, prefer struct_size() over sizeof() open-
   coded versions.
 
 - Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
   OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether
   the system memory should be cleared or not once dumped.
 
 - Fix a hang when a user attempts to remove a VFIO-AP mediated
   device attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
   VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver
   callback to request a release of the device.
 
 - Fix calculation for R_390_GOTENT relocations for modules.
 
 - Allow any user space process with CAP_PERFMON capability
   read and display the CPU Measurement facility counter sets.
 
 - Rework large statically-defined per-CPU cpu_cf_events data
   structure and replace it with dynamically allocated structures
   created when a perf_event_open() system call is invoked or
   /dev/hwctr device is accessed.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZJsnCRccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8D+RAQCN5CYdql/X8kOMcs4jJvDHEZf8
 5CHYfmT2SLNs+zWtLgD/f/C9XXv3El/0wMBvuWSZ+T1nw+imt2cz+FJ+Nor+UQg=
 =R7Dm
 -----END PGP SIGNATURE-----

Merge tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Fix the style of protected key API driver source: use x-mas tree for
   all local variable declarations

 - Rework protected key API driver to not use the struct pkey_protkey
   and pkey_clrkey anymore. Both structures have a fixed size buffer,
   but with the support of ECC protected key these buffers are not big
   enough. Use dynamic buffers internally and transparently for
   userspace

 - Add support for a new 'non CCA clear key token' with ECC clear keys
   supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448.
   This makes it possible to derive a protected key from the ECC clear
   key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way
   to derive is via PCKMO instruction

 - The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t
   for reference counting. Replace this with the proper data type
   refcount_t

 - Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since
   gcc generates inefficient code, which may lead to stack overflows

 - Replace one-element array with flexible-array member in struct
   vfio_ccw_parent and refactor the rest of the code accordingly. Also,
   prefer struct_size() over sizeof() open- coded versions

 - Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
   OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the
   system memory should be cleared or not once dumped

 - Fix a hang when a user attempts to remove a VFIO-AP mediated device
   attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
   VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback
   to request a release of the device

 - Fix calculation for R_390_GOTENT relocations for modules

 - Allow any user space process with CAP_PERFMON capability read and
   display the CPU Measurement facility counter sets

 - Rework large statically-defined per-CPU cpu_cf_events data structure
   and replace it with dynamically allocated structures created when a
   perf_event_open() system call is invoked or /dev/hwctr device is
   accessed

* tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events
  s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process
  s390/module: fix rela calculation for R_390_GOTENT
  s390/vfio-ap: wire in the vfio_device_ops request callback
  s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl
  s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl
  s390/pkey: add support for ecc clear key
  s390/pkey: do not use struct pkey_protkey
  s390/pkey: introduce reverse x-mas trees
  s390/zcore: conditionally clear memory on reipl
  s390/ipl: add REIPL_CLEAR flag to os_info
  vfio/ccw: use struct_size() helper
  vfio/ccw: replace one-element array with flexible-array member
  s390: select ARCH_SUPPORTS_INT128
  s390/pai_ext: replace atomic_t with refcount_t
  s390/pai_crypto: replace atomic_t with refcount_t
2023-06-27 15:49:10 -07:00
Gustavo A. R. Silva d933e5f41e vfio/ccw: use struct_size() helper
Prefer struct_size() over open-coded versions.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/f657276073630e806e69726a40ad1cc85101448a.1684805398.git.gustavoars@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01 17:07:56 +02:00
Gustavo A. R. Silva 5dd4241964 vfio/ccw: replace one-element array with flexible-array member
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element array with flexible-array
member in struct vfio_ccw_parent and refactor the rest of the code
accordingly.

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/297
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/3c10549ebe1564eade68a2515bde233527376971.1684805398.git.gustavoars@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01 17:07:55 +02:00
Vineeth Vijayan 89c0c62e94 s390/cio: unregister device when the only path is gone
Currently, if the device is offline and all the channel paths are
either configured or varied offline, the associated subchannel gets
unregistered. Don't unregister the subchannel, instead unregister
offline device.

Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-22 12:45:44 +02:00
Heiko Carstens 2862a2fdfa s390/qdio: fix do_sqbs() inline assembly constraint
Use "a" constraint instead of "d" constraint to pass the state parameter to
the do_sqbs() inline assembly. This prevents that general purpose register
zero is used for the state parameter.

If the compiler would select general purpose register zero this would be
problematic for the used instruction in rsy format: the register used for
the state parameter is a base register. If the base register is general
purpose register zero the contents of the register are unexpectedly ignored
when the instruction is executed.

This only applies to z/VM guests using QIOASSIST with dedicated (pass through)
QDIO-based devices such as FCP [zfcp driver] as well as real OSA or
HiperSockets [qeth driver].

A possible symptom for this case using zfcp is the following repeating kernel
message pattern:

zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: qdio: ZFCP on SC <sc> using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W
zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: A QDIO problem occurred

Each of the qdio problem message can be accompanied by the following entries
for the affected subchannel <sc> in
/sys/kernel/debug/s390dbf/qdio_error/hex_ascii for zfcp or qeth:

<sc> ccq: 69....
<sc> SQBS ERROR.

Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Cc: Steffen Maier <maier@linux.ibm.com>
Fixes: 8129ee1642 ("[PATCH] s390: qdio V=V pass-through")
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17 15:20:17 +02:00
Vineeth Vijayan b1b0d5aec1 s390/cio: include subchannels without devices also for evaluation
Currently when the new channel-path is enabled, we do evaluation only
on the subchannels with a device connected on it. This is because,
in the past, if the device in the subchannel is not working or not
available, we used to unregister the subchannels. But, from the 'commit
2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")'
we allow subchannels with or without an active device connected
on it. So, when we do the io_subchannel_verify, make sure that,
we are evaluating the subchannels without any device too.

Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-15 14:20:14 +02:00
Heiko Carstens e20985a796 s390/cio: replace zero-length array with flexible-array member
There are numerous patches which convert zero-length arrays with a
flexible-array member. Convert the remaining s390 occurrences.

Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://github.com/KSPP/linux/issues/78
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:29 +02:00
Linus Torvalds a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Eric Farman 1c06bb87af vfio/ccw: remove WARN_ON during shutdown
The logic in vfio_ccw_sch_shutdown() always assumed that the input
subchannel would point to a vfio_ccw_private struct, without checking
that one exists. The blamed commit put in a check for this scenario,
to prevent the possibility of a missing private.

The trouble is that check was put alongside a WARN_ON(), presuming
that such a scenario would be a cause for concern. But this can be
triggered by binding a subchannel to vfio-ccw, and rebooting the
system before starting the mdev (via "mdevctl start" or similar)
or after stopping it. In those cases, shutdown doesn't need to
worry because either the private was never allocated, or it was
cleaned up by vfio_ccw_mdev_remove().

Remove the WARN_ON() piece of this check, since there are plausible
scenarios where private would be NULL in this path.

Fixes: 9e6f07cd1e ("vfio/ccw: create a parent struct")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230210174227.2256424-1-farman@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-14 11:45:40 +01:00
Vineeth Vijayan 0c6924c262 s390/cio: introduce locking for register/unregister functions
Unbinding an I/O subchannel with a child-CCW device in disconnected
state sometimes causes a kernel-panic. The race condition was seen
mostly during testing, when setting all the CHPIDs of a device to
offline and at the same time, the unbinding the I/O subchannel driver.

The kernel-panic occurs because of double delete, the I/O subchannel
driver calls device_del on the CCW device while another device_del
invocation for the same device is in-flight.  For instance, disabling
all the CHPIDs will trigger the ccw_device_remove function, which will
call a ccw_device_unregister(), which ends up calling the device_del()
which is asynchronous via cdev's todo workqueue. And unbinding the I/O
subchannel driver calls io_subchannel_remove() function which calls the
ccw_device_unregister() and device_del().

This double delete can be prevented by serializing all CCW device
registration/unregistration calls into the driver core. This patch
introduces a mutex which will be used for this purpose.

Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-31 18:56:36 +01:00
Greg Kroah-Hartman 2a81ada32f driver core: make struct bus_type.uevent() take a const *
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-27 13:45:52 +01:00
Vineeth Vijayan ca34cda73f s390/cio: evaluate devices with non-operational paths
css_schedule_reprobe() function calls the evaluation for CSS_EVAL_UNREG
which is specific to the idset of unregistered subchannels. This
evaluation was introduced because, previously, if the underlying device
become not-accessible, the subchannel was unregistered. But, in the recent
changes in cio,with the commit '2297791c92d0 s390/cio: dont unregister
subchannel from child-drivers', we no  longer unregister the subchannels
just because of a non-operational device. This allows to have subchannels
without any operational device connected on it. So, a css_schedule_reprobe
function on unregistered subchannel does not have any effect.

Change this functionality to evaluate the subchannels which does not
have a working path to the device. This could be due the erroneous
device or due to the erraneous path. Evaluate based on the values of OPM
and PAM&POM.
Here we introduced a new idset function,to keep I/O subchannels in the
idset when the last seen status indicates that the device has no working
path. A device has no working path if all available paths have been tried
without success.A failed I/O attempt on a path is indicated as a 0 bit
value in the POM mask. By looking at the POM mask bit values of available
paths (1 in PAM) that Linux is supposed to use (1 in vary mask OPM), we
can identify a non-working device as a device where the bit-wise and of
the PAM, POM and OPM mask return 0.

css_schedule_reprobe() is being used by dasd-driver and chsc-cio
component. dasd driver, when it detects a change in the pathgroup, invokes
the re-evaluation of the subchannel. And chsc-cio component upon a CRW
event, (resource accessibility event). In both the cases, it makes much
better sense to re-evalute the subchannel with no-valid path.

Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reported-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22 18:42:34 +01:00
Eric Farman beb060ed20 vfio/ccw: remove old IDA format restrictions
By this point, all the pieces are in place to properly support
a 2K Format-2 IDAL, and to convert a guest Format-1 IDAL to
the 2K Format-2 variety. Let's remove the fence that prohibits
them, and allow a guest to submit them if desired.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:09 +01:00
Eric Farman b5a73e8eb2 vfio/ccw: don't group contiguous pages on 2K IDAWs
The vfio_pin_pages() interface allows contiguous pages to be
pinned as a single request, which is great for the 4K pages
that are normally processed. Old IDA formats operate on 2K
chunks, which makes this logic more difficult.

Since these formats are rare, let's just invoke the page
pinning one-at-a-time, instead of trying to group them.
We can rework this code at a later date if needed.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:09 +01:00
Eric Farman 1b676fe3d9 vfio/ccw: handle a guest Format-1 IDAL
There are two scenarios that need to be addressed here.

First, an ORB that does NOT have the Format-2 IDAL bit set could
have both a direct-addressed CCW and an indirect-data-address CCW
chained together. This means that the IDA CCW will contain a
Format-1 IDAL, and can be easily converted to a 2K Format-2 IDAL.
But it also means that the direct-addressed CCW needs to be
converted to the same 2K Format-2 IDAL for consistency with the
ORB settings.

Secondly, a Format-1 IDAL is comprised of 31-bit addresses.
Thus, we need to cast this IDAL to a pointer of ints while
populating the list of addresses that are sent to vfio.

Since the result of both of these is the use of the 2K IDAL
variants, and the output of vfio-ccw is always a Format-2 IDAL
(in order to use 64-bit addresses), make sure that the correct
control bit gets set in the ORB when these scenarios occur.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:09 +01:00
Eric Farman 61f3a16b9d vfio/ccw: allocate/populate the guest idal
Today, we allocate memory for a list of IDAWs, and if the CCW
being processed contains an IDAL we read that data from the guest
into that space. We then copy each IDAW into the pa_iova array,
or fabricate that pa_iova array with a list of addresses based
on a direct-addressed CCW.

Combine the reading of the guest IDAL with the creation of a
pseudo-IDAL for direct-addressed CCWs, so that both CCW types
have a "guest" IDAL that can be populated straight into the
pa_iova array.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:09 +01:00
Eric Farman 6a6dc14ac8 vfio/ccw: calculate number of IDAWs regardless of format
The idal_nr_words() routine works well for 4K IDAWs, but lost its
ability to handle the old 2K formats with the removal of 31-bit
builds in commit 5a79859ae0 ("s390: remove 31 bit support").

Since there's nothing preventing a guest from generating this IDAW
format, let's re-introduce the math for them and use both when
calculating the number of IDAWs based on the bits specified in
the ORB.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:08 +01:00
Eric Farman 667e5dbabf vfio/ccw: read only one Format-1 IDAW
The intention is to read the first IDAW to determine the starting
location of an I/O operation, knowing that the second and any/all
subsequent IDAWs will be aligned per architecture. But, this read
receives 64-bits of data, which is the size of a Format-2 IDAW.

In the event that Format-1 IDAWs are presented, adjust the size
of the read to 32-bits. The data will end up occupying the upper
word of the target iova variable, so shift it down to the lower
word for use as an address. (By definition, this IDAW format
uses a 31-bit address, so the "sign" bit will always be off and
there is no concern about sign extension.)

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:08 +01:00
Eric Farman b21f9cb112 vfio/ccw: refactor the idaw counter
The rules of an IDAW are fairly simple: Each one can move no
more than a defined amount of data, must not cross the
boundary defined by that length, and must be aligned to that
length as well. The first IDAW in a list is special, in that
it does not need to adhere to that alignment, but the other
rules still apply. Thus, by reading the first IDAW in a list,
the number of IDAWs that will comprise a data transfer of a
particular size can be calculated.

Let's factor out the reading of that first IDAW with the
logic that calculates the length of the list, to simplify
the rest of the routine that handles the individual IDAWs.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:08 +01:00
Eric Farman 61783394f4 vfio/ccw: populate page_array struct inline
There are two possible ways the list of addresses that get passed
to vfio are calculated. One is from a guest IDAL, which would be
an array of (probably) non-contiguous addresses. The other is
built from contiguous pages that follow the starting address
provided by ccw->cda.

page_array_alloc() attempts to simplify things by pre-populating
this array from the starting address, but that's not needed for
a CCW with an IDAL anyway so doesn't need to be in the allocator.
Move it to the caller in the non-IDAL case, since it will be
overwritten when reading the guest IDAL.

Remove the initialization of the pa_page output pointers,
since it won't be explicitly needed for either case.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:08 +01:00
Eric Farman 62a97a56a6 vfio/ccw: pass page count to page_array struct
The allocation of our page_array struct calculates the number
of 4K pages that would be needed to hold a certain number of
bytes. But, since the number of pages that will be pinned is
also calculated by the length of the IDAL, this logic is
unnecessary. Let's pass that information in directly, and
avoid the math within the allocator.

Also, let's make this two allocations instead of one,
to make it apparent what's happening within here.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:08 +01:00
Eric Farman 4b946d65b8 vfio/ccw: remove unnecessary malloc alignment
Everything about this allocation is harder than necessary,
since the memory allocation is already aligned to our needs.
Break them apart for readability, instead of doing the
funky arithmetic.

Of the structures that are involved, only ch_ccw needs the
GFP_DMA flag, so the others can be allocated without it.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman a4c6040472 vfio/ccw: simplify CCW chain fetch routines
The act of processing a fetched CCW has two components:

 1) Process a Transfer-in-channel (TIC) CCW
 2) Process any other CCW

The former needs to look at whether the TIC jumps backwards into
the current channel program or forwards into a new segment,
while the latter just processes the CCW data address itself.

Rather than passing the chain segment and index within it to the
handlers for the above, and requiring each to calculate the
elements it needs, simply pass the needed pointers directly.

For the TIC, that means the CCW being processed and the location
of the entire channel program which holds all segments. For the
other CCWs, the page_array pointer is also needed to perform the
page pinning, etc.

While at it, rename ccwchain_fetch_direct to _ccw, to indicate
what it is. The name "_direct" is historical, when it used to
process a direct-addressed CCW, but IDAs are processed here too.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman c5e8083f95 vfio/ccw: replace copy_from_iova with vfio_dma_rw
It was suggested [1] that we replace the old copy_from_iova() routine
(which pins a page, does a memcpy, and unpins the page) with the
newer vfio_dma_rw() interface.

This has a modest improvement in the overall time spent through the
fsm_io_request() path, and simplifies some of the code to boot.

[1] https://lore.kernel.org/r/20220706170553.GK693670@nvidia.com/

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman 254cb663c2 vfio/ccw: move where IDA flag is set in ORB
The output of vfio_ccw is always a Format-2 IDAL, but the code that
explicitly sets this is buried in cp_init().

In fact the input is often already a Format-2 IDAL, and would be
rejected (via the check in ccwchain_calc_length()) if it weren't,
so explicitly setting it doesn't do much. Setting it way down here
only makes it impossible to make decisions in support of other
IDAL formats.

Let's move that to where the rest of the ORB is set up, so that the
CCW processing in cp_prefetch() is performed according to the
contents of the unmodified guest ORB.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman 155a4321c1 vfio/ccw: allow non-zero storage keys
Currently, vfio-ccw copies the ORB from the io_region to the
channel_program struct being built. It then adjusts various
pieces of that ORB to the values needed to be used by the
SSCH issued by vfio-ccw in the host.

This includes setting the subchannel key to the default,
presumably because Linux doesn't do anything with non-zero
storage keys itself. But it seems wrong to convert every I/O
to the default key if the guest itself requested a non-zero
subchannel (access) key.

Any channel program that sets a non-zero key would expect the
same key returned in the SCSW of the IRB, not zero, so best to
allow that to occur unimpeded.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman 9fbed59fcd vfio/ccw: simplify the cp_get_orb interface
There's no need to send in both the address of the subchannel
struct, and an element within it, to populate the ORB.

Pass the whole pointer and let cp_get_orb() take the pieces
that are needed.

Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:07 +01:00
Eric Farman 8a54e238ef vfio/ccw: cleanup some of the mdev commentary
There is no longer an mdev struct accessible via a channel
program struct, but there are some artifacts remaining that
mention it. Clean them up.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-09 14:34:06 +01:00
Linus Torvalds 785d21ba2f VFIO updates for v6.2-rc1
- Replace deprecated git://github.com link in MAINTAINERS. (Palmer Dabbelt)
 
  - Simplify vfio/mlx5 with module_pci_driver() helper. (Shang XiaoJing)
 
  - Drop unnecessary buffer from ACPI call. (Rafael Mendonca)
 
  - Correct latent missing include issue in iova-bitmap and fix support
    for unaligned bitmaps.  Follow-up with better fix through refactor.
    (Joao Martins)
 
  - Rework ccw mdev driver to split private data from parent structure,
    better aligning with the mdev lifecycle and allowing us to remove
    a temporary workaround. (Eric Farman)
 
  - Add an interface to get an estimated migration data size for a device,
    allowing userspace to make informed decisions, ex. more accurately
    predicting VM downtime. (Yishai Hadas)
 
  - Fix minor typo in vfio/mlx5 array declaration. (Yishai Hadas)
 
  - Simplify module and Kconfig through consolidating SPAPR/EEH code and
    config options and folding virqfd module into main vfio module.
    (Jason Gunthorpe)
 
  - Fix error path from device_register() across all vfio mdev and sample
    drivers. (Alex Williamson)
 
  - Define migration pre-copy interface and implement for vfio/mlx5
    devices, allowing portions of the device state to be saved while the
    device continues operation, towards reducing the stop-copy state
    size. (Jason Gunthorpe, Yishai Hadas, Shay Drory)
 
  - Implement pre-copy for hisi_acc devices. (Shameer Kolothum)
 
  - Fixes to mdpy mdev driver remove path and error path on probe.
    (Shang XiaoJing)
 
  - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
    incorrect buffer freeing. (Dan Carpenter)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmObfPgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiDogP/i9GuBKposvZpnfxXWwo
 oNpKBZSOVMW8wgavNEuryMb+9WoouIghce8XU49MmONoP26kIh5TA14Zpi3XWkLK
 K+NlpwicESvLeZVHU7f3R8meVqmPtlxIi59jE+CfEHB8BW2HIAsEdwdhkxMwus9C
 nuiiK/2YYyQWOXYc4LAIkspMzjtGPy6Im5P6AED+dI+TFCEqJAM5qgOLJZFlk4a/
 WwZY2xjVKOl6xf5VZXGw+v7fDgz2Ju+j4Bm3X5lx1HgiDrEH83MjXY5h67neAIVb
 bXrfNLN++MiuO5niGTFMbUjGVUIFxsfmJzBnL9QrLsuj0JrGEKsu/1JEO78g0Km0
 ZCChoJ6UyUOgxt6evEymUAZAAkbcKaaht2gdbAXW71tv9p1TripAbBKwVeah1bQp
 SiHPqy9InKJlhaf+GbXL9eux1WVMfQ6FZccU16bNt7VaV2I8js85z/2gqVD0a5Mw
 +gnwp5XMUFWNKlJrnc7uVCD0bDExwQhr75OP4rWjMNvvLi9hPXJ2cI2Sg+9OLzQw
 vm/I+Df+FfXCuGAgX4Lxq76pqWlYGJH0Qxc14Ds6YoXqygBPz9yvTtuBv8mTHJzE
 KdAl/6DmZZxZ/JFD9lPF80KRiAsJ6iNf6tPTWES7hfDBfIdgQ/DZbXridLWJPNoi
 xLfaW19yrLTXWKSmR7G2Lsz4
 =q9xs
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Replace deprecated git://github.com link in MAINTAINERS (Palmer
   Dabbelt)

 - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing)

 - Drop unnecessary buffer from ACPI call (Rafael Mendonca)

 - Correct latent missing include issue in iova-bitmap and fix support
   for unaligned bitmaps. Follow-up with better fix through refactor
   (Joao Martins)

 - Rework ccw mdev driver to split private data from parent structure,
   better aligning with the mdev lifecycle and allowing us to remove a
   temporary workaround (Eric Farman)

 - Add an interface to get an estimated migration data size for a
   device, allowing userspace to make informed decisions, ex. more
   accurately predicting VM downtime (Yishai Hadas)

 - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas)

 - Simplify module and Kconfig through consolidating SPAPR/EEH code and
   config options and folding virqfd module into main vfio module (Jason
   Gunthorpe)

 - Fix error path from device_register() across all vfio mdev and sample
   drivers (Alex Williamson)

 - Define migration pre-copy interface and implement for vfio/mlx5
   devices, allowing portions of the device state to be saved while the
   device continues operation, towards reducing the stop-copy state size
   (Jason Gunthorpe, Yishai Hadas, Shay Drory)

 - Implement pre-copy for hisi_acc devices (Shameer Kolothum)

 - Fixes to mdpy mdev driver remove path and error path on probe (Shang
   XiaoJing)

 - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
   incorrect buffer freeing (Dan Carpenter)

* tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits)
  vfio/mlx5: error pointer dereference in error handling
  vfio/mlx5: fix error code in mlx5vf_precopy_ioctl()
  samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe()
  hisi_acc_vfio_pci: Enable PRE_COPY flag
  hisi_acc_vfio_pci: Move the dev compatibility tests for early check
  hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions
  hisi_acc_vfio_pci: Add support for precopy IOCTL
  vfio/mlx5: Enable MIGRATION_PRE_COPY flag
  vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
  vfio/mlx5: Introduce multiple loads
  vfio/mlx5: Consider temporary end of stream as part of PRE_COPY
  vfio/mlx5: Introduce vfio precopy ioctl implementation
  vfio/mlx5: Introduce SW headers for migration states
  vfio/mlx5: Introduce device transitions of PRE_COPY
  vfio/mlx5: Refactor to use queue based data chunks
  vfio/mlx5: Refactor migration file state
  vfio/mlx5: Refactor MKEY usage
  vfio/mlx5: Refactor PD usage
  vfio/mlx5: Enforce a single SAVE command at a time
  vfio: Extend the device migration protocol with PRE_COPY
  ...
2022-12-15 13:12:15 -08:00
Linus Torvalds 08cdc21579 iommufd for 6.2
iommufd is the user API to control the IOMMU subsystem as it relates to
 managing IO page tables that point at user space memory.
 
 It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
 container) which is the VFIO specific interface for a similar idea.
 
 We see a broad need for extended features, some being highly IOMMU device
 specific:
  - Binding iommu_domain's to PASID/SSID
  - Userspace IO page tables, for ARM, x86 and S390
  - Kernel bypassed invalidation of user page tables
  - Re-use of the KVM page table in the IOMMU
  - Dirty page tracking in the IOMMU
  - Runtime Increase/Decrease of IOPTE size
  - PRI support with faults resolved in userspace
 
 Many of these HW features exist to support VM use cases - for instance the
 combination of PASID, PRI and Userspace IO Page Tables allows an
 implementation of DMA Shared Virtual Addressing (vSVA) within a
 guest. Dirty tracking enables VM live migration with SRIOV devices and
 PASID support allow creating "scalable IOV" devices, among other things.
 
 As these features are fundamental to a VM platform they need to be
 uniformly exposed to all the driver families that do DMA into VMs, which
 is currently VFIO and VDPA.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
 YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
 s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
 =jJkd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd implementation from Jason Gunthorpe:
 "iommufd is the user API to control the IOMMU subsystem as it relates
  to managing IO page tables that point at user space memory.

  It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
  container) which is the VFIO specific interface for a similar idea.

  We see a broad need for extended features, some being highly IOMMU
  device specific:
   - Binding iommu_domain's to PASID/SSID
   - Userspace IO page tables, for ARM, x86 and S390
   - Kernel bypassed invalidation of user page tables
   - Re-use of the KVM page table in the IOMMU
   - Dirty page tracking in the IOMMU
   - Runtime Increase/Decrease of IOPTE size
   - PRI support with faults resolved in userspace

  Many of these HW features exist to support VM use cases - for instance
  the combination of PASID, PRI and Userspace IO Page Tables allows an
  implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
  Dirty tracking enables VM live migration with SRIOV devices and PASID
  support allow creating "scalable IOV" devices, among other things.

  As these features are fundamental to a VM platform they need to be
  uniformly exposed to all the driver families that do DMA into VMs,
  which is currently VFIO and VDPA"

For more background, see the extended explanations in Jason's pull request:

  https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
  iommufd: Change the order of MSI setup
  iommufd: Improve a few unclear bits of code
  iommufd: Fix comment typos
  vfio: Move vfio group specific code into group.c
  vfio: Refactor dma APIs for emulated devices
  vfio: Wrap vfio group module init/clean code into helpers
  vfio: Refactor vfio_device open and close
  vfio: Make vfio_device_open() truly device specific
  vfio: Swap order of vfio_device_container_register() and open_device()
  vfio: Set device->group in helper function
  vfio: Create wrappers for group register/unregister
  vfio: Move the sanity check of the group to vfio_create_group()
  vfio: Simplify vfio_create_group()
  iommufd: Allow iommufd to supply /dev/vfio/vfio
  vfio: Make vfio_container optionally compiled
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  ...
2022-12-14 09:15:43 -08:00
Linus Torvalds 47477c84b8 s390 updates for 6.2 merge window
- Factor out handle_write() function and simplify 3215 console
   write operation.
 
 - When 3170 terminal emulator is connected to the 3215 console
   driver the boot time could be very long due to limited buffer
   space or missing operator input. Add con3215_drop command line
   parameter and con3215_drop sysfs attribute file to instruct
   the kernel drop console data when such conditions are met.
 
 - Fix white space errors in 3215 console driver.
 
 - Move enum paiext_mode definition to a header file and rename
   it to paievt_mode to indicate this is now used for several
   events. Rename PAI_MODE_COUNTER to PAI_MODE_COUNTING to make
   consistent with PAI_MODE_SAMPLING.
 
 - Simplify the logic of PMU pai_crypto mapped buffer reference
   counter and make it consistent with PMU pai_ext.
 
 - Rename PMU pai_crypto mapped buffer structure member users
   to active_events to make it consistent with PMU pai_ext.
 
 - Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option.
   This results in saving of 12K per 1M hugetlb page (~1.2%)
   and 32764K per 2G hugetlb page (~1.6%).
 
 - Use generic serial.h, bugs.h, shmparam.h and vga.h header
   files and scrap s390-specific versions.
 
 - The generic percpu setup code does not expect the s390-like
   implementation and emits a warning. To get rid of that warning
   and provide sane CPU-to-node and CPU-to-CPU distance mappings
   implementat a minimal version of setup_per_cpu_areas().
 
 - Use kstrtobool() instead of strtobool() for re-IPL sysfs device
   attributes.
 
 - Avoid unnecessary lookup of a pointer to MSI descriptor when
   setting IRQ affinity for a PCI device.
 
 - Get rid of "an incompatible function type cast" warning by
   changing debug_sprintf_format_fn() function prototype so it
   matches the debug_format_proc_t function type.
 
 - Remove unused info_blk_hdr__pcpus() and get_page_state()
   functions.
 
 - Get rid of clang "unused unused insn cache ops function"
   warning by moving s390_insn definition to a private header.
 
 - Get rid of clang "unused function" warning by making function
   raw3270_state_final() only available if CONFIG_TN3270_CONSOLE
   is enabled.
 
 - Use kstrobool() to parse sclp_con_drop parameter to make it
   identical to the con3215_drop parameter and allow passing
   values like "yes" and "true".
 
 - Use sysfs_emit() for all SCLP sysfs show functions, which is
   the current standard way to generate output strings.
 
 - Make SCLP con_drop sysfs attribute also writable and allow to
   change its value during runtime. This makes SCLP console drop
   handling consistent with the 3215 device driver.
 
 - Virtual and physical addresses are indentical on s390. However,
   there is still a confusion when pointers are directly casted to
   physical addresses or vice versa. Use correct address converters
   virt_to_phys() and phys_to_virt() for s390 channel IO drivers.
 
 - Support for power managemant has been removed from s390 since
   quite some time. Remove unused power managemant code from the
   appldata device driver.
 
 - Allow memory tools like KASAN see memory accesses from the
   checksum code. Switch to GENERIC_CSUM if KASAN is enabled,
   just like x86 does.
 
 - Add support of ECKD DASDs disks so it could be used as boot
   and dump devices.
 
 - Follow checkpatch recommendations and use octal values instead
   of S_IRUGO and S_IWUSR for dump device attributes in sysfs.
 
 - Changes to vx-insn.h do not cause a recompile of C files that
   use asm(".include \"asm/vx-insn.h\"\n") magic to access vector
   instruction macros from inline assemblies. Add wrapper include
   header file to avoid this problem.
 
 - Use vector instruction macros instead of byte patterns to
   increase register validation routine readability.
 
 - The current machine check register validation handling does not
   take into account various scenarios and might lead to killing a
   wrong user process or potentially ignore corrupted FPU registers.
   Simplify logic of the machine check handler and stop the whole
   machine if the previous context was kerenel mode. If the previous
   context was user mode, kill the current task.
 
 - Introduce sclp_emergency_printk() function which can be used to
   emit a message in emergency cases. It is supposed to be used in
   cases where regular console device drivers may not work anymore,
   e.g. unrecoverable machine checks.
 
   Keep the early Service-Call Control Block so it can also be used
   after initdata has been freed to allow sclp_emergency_printk()
   implementation.
 
 - In case a system will be stopped because of an unrecoverable
   machine check error print the machine check interruption code
   to give a hint of what went wrong.
 
 - Move storage error checking from the assembly entry code to C
   in order to simplify machine check handling. Enter the handler
   with DAT turned on, which simplifies the entry code even more.
 
 - The machine check extended save areas are allocated using
   a private "nmi_save_areas" slab cache which guarantees a
   required power-of-two alignment. Get rid of that cache in
   favour of kmalloc().
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCY5ckrhccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8NlrAQD8NCLeEAkhGCRnzdTyngExCrzV
 Mw//cEnksUkIPqalJgEArbyFjGh05ecNaiDQduH8Gh94/qOhGE4obMdTgMWq7QY=
 =3aou
 -----END PGP SIGNATURE-----

Merge tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Factor out handle_write() function and simplify 3215 console write
   operation

 - When 3170 terminal emulator is connected to the 3215 console driver
   the boot time could be very long due to limited buffer space or
   missing operator input. Add con3215_drop command line parameter and
   con3215_drop sysfs attribute file to instruct the kernel drop console
   data when such conditions are met

 - Fix white space errors in 3215 console driver

 - Move enum paiext_mode definition to a header file and rename it to
   paievt_mode to indicate this is now used for several events. Rename
   PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with
   PAI_MODE_SAMPLING

 - Simplify the logic of PMU pai_crypto mapped buffer reference counter
   and make it consistent with PMU pai_ext

 - Rename PMU pai_crypto mapped buffer structure member users to
   active_events to make it consistent with PMU pai_ext

 - Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This
   results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per
   2G hugetlb page (~1.6%)

 - Use generic serial.h, bugs.h, shmparam.h and vga.h header files and
   scrap s390-specific versions

 - The generic percpu setup code does not expect the s390-like
   implementation and emits a warning. To get rid of that warning and
   provide sane CPU-to-node and CPU-to-CPU distance mappings implementat
   a minimal version of setup_per_cpu_areas()

 - Use kstrtobool() instead of strtobool() for re-IPL sysfs device
   attributes

 - Avoid unnecessary lookup of a pointer to MSI descriptor when setting
   IRQ affinity for a PCI device

 - Get rid of "an incompatible function type cast" warning by changing
   debug_sprintf_format_fn() function prototype so it matches the
   debug_format_proc_t function type

 - Remove unused info_blk_hdr__pcpus() and get_page_state() functions

 - Get rid of clang "unused unused insn cache ops function" warning by
   moving s390_insn definition to a private header

 - Get rid of clang "unused function" warning by making function
   raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is
   enabled

 - Use kstrobool() to parse sclp_con_drop parameter to make it identical
   to the con3215_drop parameter and allow passing values like "yes" and
   "true"

 - Use sysfs_emit() for all SCLP sysfs show functions, which is the
   current standard way to generate output strings

 - Make SCLP con_drop sysfs attribute also writable and allow to change
   its value during runtime. This makes SCLP console drop handling
   consistent with the 3215 device driver

 - Virtual and physical addresses are indentical on s390. However, there
   is still a confusion when pointers are directly casted to physical
   addresses or vice versa. Use correct address converters
   virt_to_phys() and phys_to_virt() for s390 channel IO drivers

 - Support for power managemant has been removed from s390 since quite
   some time. Remove unused power managemant code from the appldata
   device driver

 - Allow memory tools like KASAN see memory accesses from the checksum
   code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does

 - Add support of ECKD DASDs disks so it could be used as boot and dump
   devices

 - Follow checkpatch recommendations and use octal values instead of
   S_IRUGO and S_IWUSR for dump device attributes in sysfs

 - Changes to vx-insn.h do not cause a recompile of C files that use
   asm(".include \"asm/vx-insn.h\"\n") magic to access vector
   instruction macros from inline assemblies. Add wrapper include header
   file to avoid this problem

 - Use vector instruction macros instead of byte patterns to increase
   register validation routine readability

 - The current machine check register validation handling does not take
   into account various scenarios and might lead to killing a wrong user
   process or potentially ignore corrupted FPU registers. Simplify logic
   of the machine check handler and stop the whole machine if the
   previous context was kerenel mode. If the previous context was user
   mode, kill the current task

 - Introduce sclp_emergency_printk() function which can be used to emit
   a message in emergency cases. It is supposed to be used in cases
   where regular console device drivers may not work anymore, e.g.
   unrecoverable machine checks

   Keep the early Service-Call Control Block so it can also be used
   after initdata has been freed to allow sclp_emergency_printk()
   implementation

 - In case a system will be stopped because of an unrecoverable machine
   check error print the machine check interruption code to give a hint
   of what went wrong

 - Move storage error checking from the assembly entry code to C in
   order to simplify machine check handling. Enter the handler with DAT
   turned on, which simplifies the entry code even more

 - The machine check extended save areas are allocated using a private
   "nmi_save_areas" slab cache which guarantees a required power-of-two
   alignment. Get rid of that cache in favour of kmalloc()

* tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (38 commits)
  s390/nmi: get rid of private slab cache
  s390/nmi: move storage error checking back to C, enter with DAT on
  s390/nmi: print machine check interruption code before stopping system
  s390/sclp: introduce sclp_emergency_printk()
  s390/sclp: keep sclp_early_sccb
  s390/nmi: rework register validation handling
  s390/nmi: use vector instruction macros instead of byte patterns
  s390/vx: add vx-insn.h wrapper include file
  s390/ipl: use octal values instead of S_* macros
  s390/ipl: add eckd dump support
  s390/ipl: add eckd support
  vfio/ccw: identify CCW data addresses as physical
  vfio/ccw: sort out physical vs virtual pointers usage
  s390/checksum: support GENERIC_CSUM, enable it for KASAN
  s390/appldata: remove power management callbacks
  s390/cio: sort out physical vs virtual pointers usage
  s390/sclp: allow to change sclp_console_drop during runtime
  s390/sclp: convert to use sysfs_emit()
  s390/sclp: use kstrobool() to parse sclp_con_drop parameter
  s390/3270: make raw3270_state_final() depend on CONFIG_TN3270_CONSOLE
  ...
2022-12-12 11:04:08 -08:00
Alex Williamson ce3895735c vfio/ap/ccw/samples: Fix device_register() unwind path
We always need to call put_device() if device_register() fails.
All vfio drivers calling device_register() include a similar unwind
stack via gotos, therefore split device_unregister() into its
device_del() and put_device() components in the unwind path, and
add a goto target to handle only the put_device() requirement.

Reported-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/all/20221118032827.3725190-1-ruanjinjie@huawei.com
Fixes: d61fc96f47 ("sample: vfio mdev display - host device")
Fixes: 9d1a546c53 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
Fixes: a5e6e6505f ("sample: vfio bochs vbe display (host device for bochs-drm)")
Fixes: 9e6f07cd1e ("vfio/ccw: create a parent struct")
Fixes: 36360658eb ("s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem")
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Jason Herne <jjherne@linux.ibm.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Link: https://lore.kernel.org/r/166999942139.645727.12439756512449846442.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-05 12:37:54 -07:00
Eric Farman 5de2322d7b vfio/ccw: identify CCW data addresses as physical
The CCW data address created by vfio-ccw is that of an IDAL
built by this code. Since this address is used by real hardware,
it should be a physical address rather than a virtual one.
Let's clarify it as such in the ORB.

Similarly, once the I/O has completed the memory for that IDAL
needs to be released, so convert the CCW data address back to
a virtual address so that kfree() can process it.

Note: this currently doesn't fix a real bug, since virtual
addresses are identical to physical ones.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20221121165836.283781-3-farman@linux.ibm.com
2022-12-05 15:01:36 +01:00
Alexander Gordeev 21c7996917 vfio/ccw: sort out physical vs virtual pointers usage
The ORB's interrupt parameter field is stored unmodified into the
interruption code when an I/O interrupt occurs. As this reflects
a real device, let's store the physical address of the subchannel
struct so it can be used when processing an interrupt.

Note: this currently doesn't fix a real bug, since virtual
addresses are identical to physical ones.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
[EF: Updated commit message]
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Link: https://lore.kernel.org/r/20221121165836.283781-2-farman@linux.ibm.com
2022-12-05 15:01:31 +01:00
Jason Gunthorpe 4741f2e941 vfio-iommufd: Support iommufd for emulated VFIO devices
Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
consist of all the mdev drivers.

Like the physical drivers, support for iommufd is provided by the driver
supplying the correct standard ops. Provide ops from the core that
duplicate what vfio_register_emulated_iommu_dev() does.

Emulated drivers are where it is more likely to see variation in the
iommfd support ops. For instance IDXD will probably need to setup both a
iommfd_device context linked to a PASID and an iommufd_access context to
support all their mdev operations.

Link: https://lore.kernel.org/r/7-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 11:52:03 -04:00
Alexander Gordeev 5c2e5a0cf5 s390/cio: sort out physical vs virtual pointers usage
This does not fix a real bug, since virtual addresses
are currently indentical to physical ones.

Use virt_to_phys() for intparm interrupt parameter to
convert a 64-bit virtual address to the 32-bit physical
address, which is expected to be below 2GB.

Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-12-01 10:58:04 +01:00
Eric Farman 913447d06f vfio: Remove vfio_free_device
With the "mess" sorted out, we should be able to inline the
vfio_free_device call introduced by commit cb9ff3f3b8
("vfio: Add helpers for unifying vfio_device life cycle")
and remove them from driver release callbacks.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>	# vfio-ap part
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-8-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:30:23 -07:00
Eric Farman d1104f9327 vfio/ccw: replace vfio_init_device with _alloc_
Now that we have a reasonable separation of structs that follow
the subchannel and mdev lifecycles, there's no reason we can't
call the official vfio_alloc_device routine for our private data,
and behave like everyone else.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-7-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:30:23 -07:00
Eric Farman f4da83f7e3 vfio/ccw: remove release completion
There's enough separation between the parent and private structs now,
that it is fine to remove the release completion hack.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-6-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:11:26 -07:00
Eric Farman 3d62fe18b6 vfio/ccw: move private to mdev lifecycle
Now that the mdev parent data is split out into its own struct,
it is safe to move the remaining private data to follow the
mdev probe/remove lifecycle. The mdev parent data will remain
where it is, and follow the subchannel and the css driver
interfaces.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-5-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:11:25 -07:00
Eric Farman 06caaa27df vfio/ccw: move private initialization to callback
There's already a device initialization callback that is used to
initialize the release completion workaround that was introduced
by commit ebb72b765f ("vfio/ccw: Use the new device life cycle
helpers").

Move the other elements of the vfio_ccw_private struct that
require distinct initialization over to that routine.

With that done, the vfio_ccw_alloc_private routine only does a
kzalloc, so fold it inline.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-4-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:11:25 -07:00
Eric Farman 008a011d68 vfio/ccw: remove private->sch
These places all rely on the ability to jump from a private
struct back to the subchannel struct. Rather than keeping a
copy in our back pocket, let's use the relationship provided
by the vfio_device embedded within the private.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-3-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:11:25 -07:00
Eric Farman 9e6f07cd1e vfio/ccw: create a parent struct
Move the stuff associated with the mdev parent (and thus the
subchannel struct) into its own struct, and leave the rest in
the existing private structure.

The subchannel will point to the parent, and the parent will point
to the private, for the areas where one or both are needed. Further
separation of these structs will follow.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-2-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:11:25 -07:00
Peter Oberparleiter 1b60741127 s390/cio: fix out-of-bounds access on cio_ignore free
The channel-subsystem-driver scans for newly available devices whenever
device-IDs are removed from the cio_ignore list using a command such as:

  echo free >/proc/cio_ignore

Since an I/O device scan might interfer with running I/Os, commit
172da89ed0 ("s390/cio: avoid excessive path-verification requests")
introduced an optimization to exclude online devices from the scan.

The newly added check for online devices incorrectly assumes that
an I/O-subchannel's drvdata points to a struct io_subchannel_private.
For devices that are bound to a non-default I/O subchannel driver, such
as the vfio_ccw driver, this results in an out-of-bounds read access
during each scan.

Fix this by changing the scan logic to rely on a driver-independent
online indication. For this we can use struct subchannel->config.ena,
which is the driver's requested subchannel-enabled state. Since I/Os
can only be started on enabled subchannels, this matches the intent
of the original optimization of not scanning devices where I/O might
be running.

Fixes: 172da89ed0 ("s390/cio: avoid excessive path-verification requests")
Fixes: 0c3812c347 ("s390/cio: derive cdev information only for IO-subchannels")
Cc: <stable@vger.kernel.org> # v5.15
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-10-26 14:47:31 +02:00