Commit graph

5343 commits

Author SHA1 Message Date
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds
59331c215d A varied set of changes:
- a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
   (myself).  Also fixed a deadlock caused by a bogus allocation on the
   writeback path and authorize reply verification.
 
 - a fix for long stalls during fsync (Jeff Layton).  The client now
   has a way to force the MDS log flush, leading to ~100x speedups in
   some synthetic tests.
 
 - a new [no]require_active_mds mount option (Zheng Yan).  On mount, we
   will now check whether any of the MDSes are available and bail rather
   than block if none are.  This check can be avoided by specifying the
   "no" option.
 
 - a couple of MDS cap handling fixes and a few assorted patches
   throughout.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJYVByGAAoJEEp/3jgCEfOLBqkH/A7nVf7ObSDYmLuYgg1gJ8zq
 4zDDE42S4yZwayAVpn3UjbfPuez5J44lsdXitExdfiHOdIQZDa/WqAbSqQ48HCSg
 7sG6ecRWg3G5zG0psPZnB+S5wGMvsLXmj2hvzV1lt2t0lI5bDLSlNRSnElbhilD/
 8Z7+Ni2go8DMC9o49SJU32lBW7IByKl4p4flveItgwUvGkIFNd8OT3CyPBUqonQs
 lRCeImRYU8Jghb+ifnRxWSbuDf7pZAPc9kL0vibpUUT/1bH6iHsedKp37WQKqc/w
 KDSNnKiZcz0gY/hJeLqE3ymCIKO6SU+JkMQSaYNTouLO5fQsRr8/uWQXSe6S5oc=
 =ypWx
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A varied set of changes:

   - a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
     (myself). Also fixed a deadlock caused by a bogus allocation on the
     writeback path and authorize reply verification.

   - a fix for long stalls during fsync (Jeff Layton). The client now
     has a way to force the MDS log flush, leading to ~100x speedups in
     some synthetic tests.

   - a new [no]require_active_mds mount option (Zheng Yan).

     On mount, we will now check whether any of the MDSes are available
     and bail rather than block if none are. This check can be avoided
     by specifying the "no" option.

   - a couple of MDS cap handling fixes and a few assorted patches
     throughout"

* tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client: (32 commits)
  libceph: remove now unused finish_request() wrapper
  libceph: always signal completion when done
  ceph: avoid creating orphan object when checking pool permission
  ceph: properly set issue_seq for cap release
  ceph: add flags parameter to send_cap_msg
  ceph: update cap message struct version to 10
  ceph: define new argument structure for send_cap_msg
  ceph: move xattr initialzation before the encoding past the ceph_mds_caps
  ceph: fix minor typo in unsafe_request_wait
  ceph: record truncate size/seq for snap data writeback
  ceph: check availability of mds cluster on mount
  ceph: fix splice read for no Fc capability case
  ceph: try getting buffer capability for readahead/fadvise
  ceph: fix scheduler warning due to nested blocking
  ceph: fix printing wrong return variable in ceph_direct_read_write()
  crush: include mapper.h in mapper.c
  rbd: silence bogus -Wmaybe-uninitialized warning
  libceph: no need to drop con->mutex for ->get_authorizer()
  libceph: drop len argument of *verify_authorizer_reply()
  libceph: verify authorize reply on connect
  ...
2016-12-16 11:23:34 -08:00
Linus Torvalds
a829a8445f SCSI misc on 20161213
This update includes the usual round of major driver updates (ncr5380,
 lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).  There's also
 an assortment of minor fixes, mostly in error legs or other not very
 user visible stuff.  The major change is the pci_alloc_irq_vectors
 replacement for the old pci_msix_.. calls; this effectively makes IRQ
 mapping generic for the drivers and allows blk_mq to use the
 information.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYUJOvAAoJEAVr7HOZEZN42B0P/1lj1W2N7y0LOAbR2MasyQvT
 fMD/SSip/v+R/zJiTv+5M/IDQT6ez62JnQGWyX3HZTob9VEfoqagbUuHH6y+fmib
 tHyqiYc9FC/NZSRX/0ib+rpnSVhC/YRSVV7RrAqilbpOKAzeU25FlN/vbz+Nv/XL
 ReVBl+2nGjJtHyWqUN45Zuf74c0zXOWPPUD0hRaNclK5CfZv5wDYupzHzTNSQTkj
 nWvwPYT0OgSMNe7mR+IDFyOe3UQ/OYyeJB0yBNqO63IiaUabT8/hgrWR9qJAvWh8
 LiH+iSQ69+sDUnvWvFjuls/GzcFuuTljwJbm+FyTsmNHONPVY8JRCLNq7CNDJ6Vx
 HwpNuJdTSJpne4lAVBGPwgjs+GhlMvUP/xYVLWAXdaBxU9XGePrwqQDcFu1Rbx3P
 yfMiVaY1+e45OEjLRCbDAwFnMPevC3kyymIvSsTySJxhTbYrOsyrrWt5kwWsvE3r
 SKANsub+xUnpCkyg57nXRQStJSCiSfGIDsydKmMX+pf1SR4k6gCUQZlcchUX0uOa
 dcY6re0c7EJIQQiT7qeGP5TRBblxARocCA/Igx6b5U5HmuQ48tDFlMCps7/TE84V
 JBsBnmkXcEi/ALShL/Tui+3YKA1DfOtEnwHtXx/9Ecx/nxP2Sjr9LJwCKiONv8NY
 RgLpGfccrix34lQumOk5
 =sPXh
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).

  There's also an assortment of minor fixes, mostly in error legs or
  other not very user visible stuff. The major change is the
  pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this
  effectively makes IRQ mapping generic for the drivers and allows
  blk_mq to use the information"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits)
  scsi: qla4xxx: switch to pci_alloc_irq_vectors
  scsi: hisi_sas: support deferred probe for v2 hw
  scsi: megaraid_sas: switch to pci_alloc_irq_vectors
  scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices
  scsi: be2iscsi: set errno on error path
  scsi: be2iscsi: set errno on error path
  scsi: hpsa: fallback to use legacy REPORT PHYS command
  scsi: scsi_dh_alua: Fix RCU annotations
  scsi: hpsa: use %phN for short hex dumps
  scsi: hisi_sas: fix free'ing in probe and remove
  scsi: isci: switch to pci_alloc_irq_vectors
  scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI
  scsi: dpt_i2o: double free on error path
  scsi: cxlflash: Migrate scsi command pointer to AFU command
  scsi: cxlflash: Migrate IOARRIN specific routines to function pointers
  scsi: cxlflash: Cleanup queuecommand()
  scsi: cxlflash: Cleanup send_tmf()
  scsi: cxlflash: Remove AFU command lock
  scsi: cxlflash: Wait for active AFU commands to timeout upon tear down
  scsi: cxlflash: Remove private command pool
  ...
2016-12-14 10:49:33 -08:00
Linus Torvalds
aa3ecf388a xen: features and fixes for 4.10 rc0
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJYT5HMAAoJELDendYovxMvhNQH/1g/3ahM4JKN8Z0SbjKBEdQm
 yj2xOj6cE3l6wMSUblKjZD2DLLhpmcHT/E97Xro/lZQEfQJoMXXWWDFowMU/P1LA
 mJxb7Fzq5Wr+6eGSAlIQB270MrpNi/luf+CWHMwVA3V7R3KRXwonOdGQSkISIzCd
 tgIydEA3a9r2+HgeIBpZFZ4GcSrJQU75krMyl2tjD1C+jeYVd+zdoj2OnDsZQDZQ
 hDWApMpNbpSBAn7JtSSdXWSTBsGH0lUECebeYPhPQ2sX2P6Y8+UCGwA7i6FFdbTa
 agXfVSdRz8dCe3k19VcKDAw6nK9BTTMnEeEHmkmygIh6wuHPP44CzigTXIbJoXI=
 =zjfm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Xen features and fixes for 4.10

  These are some fixes, a move of some arm related headers to share them
  between arm and arm64 and a series introducing a helper to make code
  more readable.

  The most notable change is David stepping down as maintainer of the
  Xen hypervisor interface. This results in me sending you the pull
  requests for Xen related code from now on"

* tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits)
  xen/balloon: Only mark a page as managed when it is released
  xenbus: fix deadlock on writes to /proc/xen/xenbus
  xen/scsifront: don't request a slot on the ring until request is ready
  xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
  x86: Make E820_X_MAX unconditionally larger than E820MAX
  xen/pci: Bubble up error and fix description.
  xen: xenbus: set error code on failure
  xen: set error code on failures
  arm/xen: Use alloc_percpu rather than __alloc_percpu
  arm/arm64: xen: Move shared architecture headers to include/xen/arm
  xen/events: use xen_vcpu_id mapping for EVTCHNOP_status
  xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
  xen-scsifront: Add a missing call to kfree
  MAINTAINERS: update XEN HYPERVISOR INTERFACE
  xenfs: Use proc_create_mount_point() to create /proc/xen
  xen-platform: use builtin_pci_driver
  xen-netback: fix error handling output
  xen: make use of xenbus_read_unsigned() in xenbus
  xen: make use of xenbus_read_unsigned() in xen-pciback
  xen: make use of xenbus_read_unsigned() in xen-fbfront
  ...
2016-12-13 16:07:55 -08:00
Linus Torvalds
36869cb93d Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main block pull request this series. Contrary to previous
  release, I've kept the core and driver changes in the same branch. We
  always ended up having dependencies between the two for obvious
  reasons, so makes more sense to keep them together. That said, I'll
  probably try and keep more topical branches going forward, especially
  for cycles that end up being as busy as this one.

  The major parts of this pull request is:

   - Improved support for O_DIRECT on block devices, with a small
     private implementation instead of using the pig that is
     fs/direct-io.c. From Christoph.

   - Request completion tracking in a scalable fashion. This is utilized
     by two components in this pull, the new hybrid polling and the
     writeback queue throttling code.

   - Improved support for polling with O_DIRECT, adding a hybrid mode
     that combines pure polling with an initial sleep. From me.

   - Support for automatic throttling of writeback queues on the block
     side. This uses feedback from the device completion latencies to
     scale the queue on the block side up or down. From me.

   - Support from SMR drives in the block layer and for SD. From Hannes
     and Shaun.

   - Multi-connection support for nbd. From Josef.

   - Cleanup of request and bio flags, so we have a clear split between
     which are bio (or rq) private, and which ones are shared. From
     Christoph.

   - A set of patches from Bart, that improve how we handle queue
     stopping and starting in blk-mq.

   - Support for WRITE_ZEROES from Chaitanya.

   - Lightnvm updates from Javier/Matias.

   - Supoort for FC for the nvme-over-fabrics code. From James Smart.

   - A bunch of fixes from a whole slew of people, too many to name
     here"

* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
  blk-stat: fix a few cases of missing batch flushing
  blk-flush: run the queue when inserting blk-mq flush
  elevator: make the rqhash helpers exported
  blk-mq: abstract out blk_mq_dispatch_rq_list() helper
  blk-mq: add blk_mq_start_stopped_hw_queue()
  block: improve handling of the magic discard payload
  blk-wbt: don't throttle discard or write zeroes
  nbd: use dev_err_ratelimited in io path
  nbd: reset the setup task for NBD_CLEAR_SOCK
  nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
  nvme-fabrics: Add target support for FC transport
  nvme-fabrics: Add host support for FC transport
  nvme-fabrics: Add FC transport LLDD api definitions
  nvme-fabrics: Add FC transport FC-NVME definitions
  nvme-fabrics: Add FC transport error codes to nvme.h
  Add type 0x28 NVME type code to scsi fc headers
  nvme-fabrics: patch target code in prep for FC transport support
  nvme-fabrics: set sqe.command_id in core not transports
  parser: add u64 number parser
  nvme-rdma: align to generic ib_event logging helper
  ...
2016-12-13 10:19:16 -08:00
Linus Torvalds
e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Ilya Dryomov
d4c2269b3d rbd: silence bogus -Wmaybe-uninitialized warning
drivers/block/rbd.c: In function ‘rbd_watch_cb’:
drivers/block/rbd.c:3690:5: error: ‘struct_v’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/block/rbd.c:3759:5: note: ‘struct_v’ was declared here

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12 23:54:26 +01:00
Josef Bacik
a897b6664e nbd: use dev_err_ratelimited in io path
While doing stress tests we noticed that we'd get a lot of dmesg spam if
we suddenly disconnected the nbd device out of band.  Rate limit the
messages in the io path in order to deal with this.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-08 15:28:09 -07:00
Josef Bacik
20032ec38d nbd: reset the setup task for NBD_CLEAR_SOCK
If an app exits before running NBD_DO_IT but after adding sockets we can
end up not being allowed to do a new nbd device.  Fix this by making
NBD_CLEAR_SOCK reset the setup_task.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-08 15:26:57 -07:00
Sergey Senozhatsky
5c7e9ccd91 zram: restrict add/remove attributes to root only
zram hot_add sysfs attribute is a very 'special' attribute - reading
from it creates a new uninitialized zram device.  This file, by a
mistake, can be read by a 'normal' user at the moment, while only root
must be able to create a new zram device, therefore hot_add attribute
must have S_IRUSR mode, not S_IRUGO.

[akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
Fixes: 6566d1a32b ("zram: add dynamic device add/remove functionality")
Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Allen <steven@stebalien.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>    [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07 17:10:00 -08:00
Jens Axboe
e88f72cb9f nbd: fix 64-bit division
We have this:

ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
nbd.c:(.text+0x247c72): undefined reference to `__divdi3'

due to a recent commit, that did 64-bit division. Use the proper
divider function so that 32-bit compiles don't break.

Fixes: ef77b51524 ("nbd: use loff_t for blocksize and nbd_set_size args")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-03 12:08:03 -07:00
Josef Bacik
ef77b51524 nbd: use loff_t for blocksize and nbd_set_size args
If we have large devices (say like the 40t drive I was trying to test with) we
will end up overflowing the int arguments to nbd_set_size and not get the right
size for our device.  Fix this by using loff_t everywhere so I don't have to
think about this again.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-02 21:06:29 -07:00
Anna-Maria Gleixner
1dd6c834fa zram: Convert to hotplug state machine
Install the callbacks via the state machine with multi instance support and let
the core invoke the callbacks on the already online CPUs.

[bigeasy: wire up the multi instance stuff]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: rt@linutronix.de
Cc: Nitin Gupta <ngupta@vflare.org>
Link: http://lkml.kernel.org/r/20161126231350.10321-19-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:39 +01:00
Pan Bian
5b0e34e194 block: mtip32xx: set error code on failure
Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function
mtip_block_initialize(), variable rv takes the return value, and its
value should be negative on errors. rv is initialized as 0 and is not
reset when the call to ida_pre_get() fails. So 0 may be returned.
The return value 0 indicates that there is no error, which may be
inconsistent with the execution status. This patch fixes the bug by
explicitly assigning -ENOMEM to rv on the branch that ida_pre_get()
fails.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-01 08:01:14 -07:00
Takashi Iwai
529e71e164 zram: fix unbalanced idr management at hot removal
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd985 ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reported-by: David Disseldorp <ddiss@suse.de>
Tested-by: David Disseldorp <ddiss@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>    [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30 16:32:52 -08:00
Jens Axboe
feffa5cc7b nbd: fix setting of 'error' in NBD_DO_IT ioctl
Multiple paths don't set it properly, ensure that we do.

Fixes: 9561a7ade0 ("nbd: add multi-connection support")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22 19:09:45 -07:00
Josef Bacik
9561a7ade0 nbd: add multi-connection support
NBD can become contended on its single connection.  We have to serialize all
writes and we can only process one read response at a time.  Fix this by
allowing userspace to provide multiple connections to a single nbd device.  This
coupled with block-mq drastically increases performance in multi-process cases.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22 12:16:32 -07:00
Ming Lei
2c73a603cd block: floppy: use bio_add_page()
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22 08:57:55 -07:00
Ming Lei
06efffda51 block: drbd: remove impossible failure handling
For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22 08:57:55 -07:00
Ming Lei
3a83f46775 block: bio: pass bvec table to bio_init()
Some drivers often use external bvec table, so introduce
this helper for this case. It is always safe to access the
bio->bi_io_vec in this way for this case.

After converting to this usage, it will becomes a bit easier
to evaluate the remaining direct access to bio->bi_io_vec,
so it can help to prepare for the following multipage bvec
support.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Fixed up the new O_DIRECT cases.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22 08:57:21 -07:00
Jens Axboe
5a8b187c61 pktcdvd: mark as unmaintained and deprecated
This driver is both orphaned, and not really useful anymore. Mark
it as such, and remove it in a future kernel after a release or
two.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-21 09:33:17 -07:00
Geliang Tang
55f958cc6c skd_main: drop duplicate header scatterlist.h
Drop duplicate header scatterlist.h from skd_main.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-18 07:44:21 -07:00
Jens Axboe
429a787be6 nbd: fix use-after-free of rq/bio in the xmit path
For writes, we can get a completion in while we're still iterating
the request and bio chain. If that happens, we're reading freed
memory and we can crash.

Break out after the last segment and avoid having the iterator
read freed memory.

Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17 12:30:37 -07:00
Yasuaki Ishimatsu
92153d30c7 null_blk: add usage hints for NVM
If CONFIG_NVM is disabled, loading null_block module with use_lightnvm=1
fails. But there are no messages and documents related to the failure.

Add the appropriate error message.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Massaged the text a bit.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-16 08:26:11 -07:00
Omar Sandoval
b4a567e811 loop: return proper error from loop_queue_rq()
->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not
an errno.

f4aa4c7bba ("block: loop: convert to per-device workqueue")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-14 15:58:44 -07:00
Jens Axboe
0cbc72a178 aoe: fix crash in page count manipulation
aoeblk contains some mysterious code, that wants to elevate the bio
vec page counts while it's under IO. That is not needed, it's
fragile, and it's causing kernel oopses for some.

Reported-by: Tested-by: Don Koch <kochd@us.ibm.com>
Tested-by: Tested-by: Don Koch <kochd@us.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-12 08:27:07 -07:00
Sachin Shukla
b425b0201e Block: mtip32xx: Improvement in code readability when memdup_user() fails.
There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.

Signed-off-by: Sachin Shukla <sachin.s5@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-11 13:37:04 -07:00
Arnd Bergmann
41c9499b22 skd: fix function prototype
Building with W=1 shows a harmless warning for the skd driver:

drivers/block/skd_main.c:2959:1: error: ‘static’ is not at beginning of declaration [-Werror=old-style-declaration]

This changes the prototype to the expected formatting.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-09 22:53:47 -07:00
Arnd Bergmann
3bc8492f00 skd: fix msix error handling
As reported by gcc -Wmaybe-uninitialized, the cleanup path for
skd_acquire_msix tries to free the already allocated msi-x vectors
in reverse order, but the index variable may not have been
used yet:

drivers/block/skd_main.c: In function ‘skd_acquire_irq’:
drivers/block/skd_main.c:3890:8: error: ‘i’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the failure path to skip releasing the interrupts
if we have not started requesting them yet.

Fixes: 180b0ae77d ("skd: use pci_alloc_irq_vectors")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-09 22:53:45 -07:00
Richard Weinberger
d8e9e5e80e drbd: Fix kernel_sendmsg() usage - potential NULL deref
Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.

DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
 rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
 rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
 rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);

tcp_sendmsg() used to totally ignore the size parameter.
 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.

Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time".  Apparently that was unlikely enough for most, so this went
unnoticed for years.

Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/

This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.

It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).

It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().

Fixes: b411b3637f ("The DRBD driver")
Cc: <stable@vger.kernel.org> # 2.6.33.x-
Cc: viro@zeniv.linux.org.uk
Cc: christoph.lechleitner@iteg.at
Cc: wolfgang.glas@iteg.at
Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
[changed oneliner to be "obvious" without context; more verbose message]
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-09 17:08:32 -07:00
Andy Shevchenko
8d9c1f86a3 scsi: cciss: replace custom function to hexdump
For small buffers we may use %*ph[N] specifier, for the bigger blocks
print_hex_dump() call.

Cc: Don Brace <don.brace@microsemi.com>
Cc: esc.storagedev@microsemi.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-08 17:29:57 -05:00
Andy Shevchenko
8e1de26cd5 skd_main: use %*ph to dump small buffers
Replace custom approach by %*ph specifier to dump small buffers in hex format.

Unfortunately we can't use print_hex_dump_bytes() here since tha gap is
present, though one familiar with the code may change this.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-07 21:40:42 -07:00
Christoph Hellwig
180b0ae77d skd: use pci_alloc_irq_vectors
Switch the skd driver to use pci_alloc_irq_vectors.  We need to two calls to
pci_alloc_irq_vectors as skd only supports multiple MSI-X vectors, but not
multiple MSI vectors.

Otherwise this cleans up a lot of cruft and allows to a lot more common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-07 21:37:20 -07:00
Christoph Hellwig
feebd56872 pktcdvd: don't scribble over the bvec array
Hi Peter, hi Jens,

I've been looking over the multi page bio vec work again recently, and
one of the stumbling blocks is raw biovec access in the pktcdvd.

The first issue is that it directly sets up the page and offset pointers
in the biovec just before calling bio_add_page.  As bio_add_page already
does the setup it's trivial to just switch it to stack variables for the
arguments.

The second issue is the copy code in pkt_make_local_copy, which
effectively is an opencoded version of bio_copy_data except that it
skips pages that already are the same in the ѕource and destination.
But we look at the only calleer we just set up the bio using bio_add_page
to point exactly to the page array that pkt_make_local_copy compares,
so the pages will always be the same and we can just remove this function.

Note that all this is done based on code inspection, I don't have any
packet writing hardware myself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-07 08:50:54 -07:00
Juergen Gross
f27dc1ac56 xen: make use of xenbus_read_unsigned() in xen-blkfront
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
2016-11-07 13:55:09 +01:00
Juergen Gross
8235777b20 xen: make use of xenbus_read_unsigned() in xen-blkback
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
2016-11-07 13:55:07 +01:00
Christophe JAILLET
772918524d nbd: Fix error handling
'blk_mq_alloc_request()' returns an error pointer in case of error, not
NULL. So test it with IS_ERR.

Fixes: 	fd8383fd88 ("nbd: convert to blkmq")

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-06 14:14:59 -07:00
Bart Van Assche
2b053aca76 blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02 12:50:19 -06:00
Bart Van Assche
52d7f1b5c2 blk-mq: Avoid that requeueing starts stopped queues
Since blk_mq_requeue_work() starts stopped queues and since
execution of this function can be scheduled after a queue has
been stopped it is not possible to stop queues without using
an additional state variable to track whether or not the queue
has been stopped. Hence modify blk_mq_requeue_work() such that it
does not start stopped queues. My conclusion after a review of
the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
callers is as follows:
* In the dm driver starting and stopping queues should only happen
  if __dm_suspend() or __dm_resume() is called and not if the
  requeue list is processed.
* In the SCSI core queue stopping and starting should only be
  performed by the scsi_internal_device_block() and
  scsi_internal_device_unblock() functions but not by any other
  function. Although the blk_mq_stop_hw_queue() call in
  scsi_queue_rq() may help to reduce CPU load if a LLD queue is
  full, figuring out whether or not a queue should be restarted
  when requeueing a command would require to introduce additional
  locking in scsi_mq_requeue_cmd() to avoid a race with
  scsi_internal_device_block(). Avoid this complexity by removing
  the blk_mq_stop_hw_queue() call from scsi_queue_rq().
* In the NVMe core only the functions that call
  blk_mq_start_stopped_hw_queues() explicitly should start stopped
  queues.
* A blk_mq_start_stopped_hwqueues() call must be added in the
  xen-blkfront driver in its blkif_recover() function.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02 12:50:19 -06:00
Christoph Hellwig
70fd76140a block,fs: use REQ_* flags directly
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Christoph Hellwig
a2b809672e block: replace REQ_NOIDLE with REQ_IDLE
Noidle should be the default for writes as seen by all the compounds
definitions in fs.h using it.  In fact only direct I/O really should
be using NODILE, so turn the whole flag around to get the defaults
right, which will make our life much easier especially onces the
WRITE_* defines go away.

This assumes all the existing "raw" users of REQ_SYNC for writes
want noidle behavior, which seems to be spot on from a quick audit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Christoph Hellwig
03ea4afa67 umem: use op_is_sync to check for synchronous requests
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Markus Elfring
2ff98449ee virtio_blk: Delete an unnecessary initialisation in init_vq()
The local variable "err" will be set to an appropriate value
by a following statement.
Thus omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:47 +02:00
Markus Elfring
668866b6e8 virtio_blk: Use kmalloc_array() in init_vq()
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:46 +02:00
Christoph Hellwig
e806402130 block: split out request-only flags into a new namespace
A lot of the REQ_* flags are only used on struct requests, and only of
use to the block layer and a few drivers that dig into struct request
internals.

This patch adds a new req_flags_t rq_flags field to struct request for
them, and thus dramatically shrinks the number of common requests.  It
also removes the unfortunate situation where we have to fit the fields
from the same enum into 32 bits for struct bio and 64 bits for
struct request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:45:17 -06:00
Uwe Kleine-König
ee52c44dee block: DAC960: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Mike Snitzer
ff26956875 brd: remove support for BLKFLSBUF
Discontinue having the brd driver destructively free all pages in the
ramdisk in response to the BLKFLSBUF ioctl.  Doing so allows a BLKFLSBUF
ioctl issued to a logical partition to destroy pages of the parent brd
device (and all other partitions of that brd device).

This change breaks compatibility - but in this case the compatibility
breaks more than it helps.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-25 11:55:25 -06:00
Jan Kara
366f4aea64 brd: Switch rd_size to unsigned long
Currently rd_size was int which lead to overflow and bogus device size
once the requested ramdisk size was 1 TB or more. Although these days
ramdisks with 1 TB size are mostly a mistake, the days when they are
useful are not far.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-25 11:55:23 -06:00
John W. Linville
423221d174 nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
Commit 0eadf37afc ("nbd: allow block mq to deal with timeouts")
changed normal usage of nbd->sock_lock to use spin_lock/spin_unlock
rather than the *_irq variants, but it missed this unlock in an
error path.

Found by Coverity, CID 1373871.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Fixes: 0eadf37afc ("nbd: allow block mq to deal with timeouts")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-24 13:18:14 -06:00