Commit graph

2038 commits

Author SHA1 Message Date
Namhyung Kim
8892cbaf68 brd: export module parameters
Export 'rd_nr', 'rd_size' and 'max_part' parameters to sysfs so user can
know that how many devices are allowed, how big each device is and how
many partitions are supported. If 'max_part' is 0, it means simply the
device doesn't support partitioning.

Also note that 'max_part' can be adjusted to power of 2 minus 1 form if
needed. User should check this value after the module loading if he/she
want to use that number correctly (i.e. fdisk, mknod, etc.).

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26 21:06:50 +02:00
Namhyung Kim
13868b76ab brd: fix comment on initial device creation
If 'rd_nr' param was not specified, 16 (can be adjusted via
CONFIG_BLK_DEV_RAM_COUNT) devices would be created by default
but comment said 1. Fix it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26 21:06:50 +02:00
Namhyung Kim
af46566885 brd: handle on-demand devices correctly
When finding or allocating a ram disk device, brd_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4
for simplicity) :

$ sudo modprobe brd max_part=15
$ ls -l /dev/ram*
brw-rw---- 1 root disk 1,  0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
$ sudo mknod /dev/ram4 b 1 64
$ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256
256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s
namhyung@leonhard:linux$ ls -l /dev/ram*
brw-rw---- 1 root disk 1,    0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1,   16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1,   32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1,   48 2011-05-25 15:41 /dev/ram3
brw-r--r-- 1 root root 1,   64 2011-05-25 15:45 /dev/ram4
brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64

After this patch, /dev/ram4 - instead of /dev/ram64 - was
accessed correctly.

In addition, 'range' passed to blk_register_region() should
include all range of dev_t that RAMDISK_MAJOR can address.
It does not need to be limited by partition numbers unless
'rd_nr' param was specified.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26 21:06:50 +02:00
Namhyung Kim
315980c868 brd: limit 'max_part' module param to DISK_MAX_PARTS
The 'max_part' parameter controls the number of maximum partition
a brd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel panic (or, at least, produce invalid device
nodes in some cases).

On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:

$ sudo modprobe brd max_part=100000
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
 IP: [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
 PGD 7af1067 PUD 7b19067 PMD 0
 Oops: 0000 [#1] SMP
 last sysfs file:
 CPU 0
 Modules linked in: brd(+)

 Pid: 44, comm: insmod Tainted: G        W   2.6.39-qemu+ #158 Bochs Bochs
 RIP: 0010:[<ffffffff81110a9a>]  [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
 RSP: 0018:ffff880007b15d78  EFLAGS: 00000286
 RAX: ffff880007b05478 RBX: ffff880007a52760 RCX: ffff880007b15dc8
 RDX: ffff880007a4f900 RSI: ffff880007b15e48 RDI: ffff880007a52760
 RBP: ffff880007b15da8 R08: 0000000000000002 R09: 0000000000000000
 R10: ffff880007b15e48 R11: ffff880007b05478 R12: 0000000000000000
 R13: ffff880007b05478 R14: 0000000000400920 R15: 0000000000000063
 FS:  0000000002160880(0063) GS:ffff880007c00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 0000000007b1c000 CR4: 00000000000006b0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
 Process insmod (pid: 44, threadinfo ffff880007b14000, task ffff880007acb980)
 Stack:
  ffff880007b15dc8 ffff880007b05478 ffff880007b15da8 00000000fffffffe
  ffff880007a52760 ffff880007b05478 ffff880007b15de8 ffffffff81143c0a
  0000000000400920 ffff880007a52760 ffff880007b05478 0000000000000000
 Call Trace:
  [<ffffffff81143c0a>] kobject_add_internal+0xdf/0x1a0
  [<ffffffff81143da1>] kobject_add_varg+0x41/0x50
  [<ffffffff81143e6b>] kobject_add+0x64/0x66
  [<ffffffff8113bbe7>] blk_register_queue+0x5f/0xb8
  [<ffffffff81140f72>] add_disk+0xdf/0x289
  [<ffffffffa00040df>] brd_init+0xdf/0x1aa [brd]
  [<ffffffffa0004000>] ? 0xffffffffa0003fff
  [<ffffffffa0004000>] ? 0xffffffffa0003fff
  [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
  [<ffffffff8108516c>] sys_init_module+0x9c/0x1dc
  [<ffffffff812ff4bb>] system_call_fastpath+0x16/0x1b
 Code: 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 18 48 85 ff 75 04 0f 0b eb fe 48 8b 47 18 49 c7 c4 70 1e 4d 81 48 85 c0 74 04 4c 8b 60 30
  8b 44 24 58 45 31 ed 0f b6 c4 85 c0 74 0d 48 8b 43 28 48 89
 RIP  [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
  RSP <ffff880007b15d78>
 CR2: 0000000000000058
 ---[ end trace aebb1175ce1f6739 ]---

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26 21:06:50 +02:00
Namhyung Kim
a2cba2913c brd: get rid of unused members from struct brd_device
brd_refcnt, brd_offset, brd_sizelimit and brd_blocksize in struct
brd_device seem to be copied from struct loop_device but they're
not used anywhere. Let get rid of them.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26 21:06:50 +02:00
Linus Torvalds
57bb559574 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
  ceph: fix cap flush race reentrancy
  libceph: subscribe to osdmap when cluster is full
  libceph: handle new osdmap down/state change encoding
  rbd: handle online resize of underlying rbd image
  ceph: avoid inode lookup on nfs fh reconnect
  ceph: use LOOKUPINO to make unconnected nfs fh more reliable
  rbd: use snprintf for disk->disk_name
  rbd: cleanup: make kfree match kmalloc
  rbd: warn on update_snaps failure on notify
  ceph: check return value for start_request in writepages
  ceph: remove useless check
  libceph: add missing breaks in addr_set_port
  libceph: fix TAG_WAIT case
  ceph: fix broken comparison in readdir loop
  libceph: fix osdmap timestamp assignment
  ceph: fix rare potential cap leak
  libceph: use snprintf for unknown addrs
  libceph: use snprintf for formatting object name
  ceph: use snprintf for dirstat content
  libceph: fix uninitialized value when no get_authorizer method is set
  ...
2011-05-25 11:46:31 -07:00
Linus Torvalds
929cfdd5d3 Merge branch 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block: (110 commits)
  loop: handle on-demand devices correctly
  loop: limit 'max_part' module param to DISK_MAX_PARTS
  drbd: fix warning
  drbd: fix warning
  drbd: Fix spelling
  drbd: fix schedule in atomic
  drbd: Take a more conservative approach when deciding max_bio_size
  drbd: Fixed state transitions after async outdate-peer-handler returned
  drbd: Disallow the peer_disk_state to be D_OUTDATED while connected
  drbd: Fix for the connection problems on high latency links
  drbd: fix potential activity log refcount imbalance in error path
  drbd: Only downgrade the disk state in case of disk failures
  drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int
  drbd: fix potential distributed deadlock
  lru_cache.h: fix comments referring to ts_ instead of lc_
  drbd: Fix for application IO with the on-io-error=pass-on policy
  xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
  xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
  xen/blkback: don't fail empty barrier requests
  xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end()
  ...
2011-05-25 09:15:35 -07:00
Linus Torvalds
798ce8f1cc Merge branch 'for-2.6.40/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.40/core' of git://git.kernel.dk/linux-2.6-block: (40 commits)
  cfq-iosched: free cic_index if cfqd allocation fails
  cfq-iosched: remove unused 'group_changed' in cfq_service_tree_add()
  cfq-iosched: reduce bit operations in cfq_choose_req()
  cfq-iosched: algebraic simplification in cfq_prio_to_maxrq()
  blk-cgroup: Initialize ioc->cgroup_changed at ioc creation time
  block: move bd_set_size() above rescan_partitions() in __blkdev_get()
  block: call elv_bio_merged() when merged
  cfq-iosched: Make IO merge related stats per cpu
  cfq-iosched: Fix a memory leak of per cpu stats for root group
  backing-dev: Kill set but not used var in  bdi_debug_stats_show()
  block: get rid of on-stack plugging debug checks
  blk-throttle: Make no throttling rule group processing lockless
  blk-cgroup: Make cgroup stat reset path blkg->lock free for dispatch stats
  blk-cgroup: Make 64bit per cpu stats safe on 32bit arch
  blk-throttle: Make dispatch stats per cpu
  blk-throttle: Free up a group only after one rcu grace period
  blk-throttle: Use helper function to add root throtl group to lists
  blk-throttle: Introduce a helper function to fill in device details
  blk-throttle: Dynamically allocate root group
  blk-cgroup: Allow sleeping while dynamically allocating a group
  ...
2011-05-25 09:14:07 -07:00
Sage Weil
9db4b3e327 rbd: handle online resize of underlying rbd image
If we get a notification that the image header has changed, check for
a change in the image size.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:08 -07:00
Sage Weil
aedfec59ee rbd: use snprintf for disk->disk_name
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:03 -07:00
Sage Weil
916d4d6727 rbd: cleanup: make kfree match kmalloc
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:01 -07:00
Namhyung Kim
a1c15c59fe loop: handle on-demand devices correctly
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:

$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,   0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,  16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7,  32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,  48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,  64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,  80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,  96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,    0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,   16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7,   32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,   48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,   64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,   80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,   96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7,  112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7,  128 2011-05-24 22:17 /dev/loop8

After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.

In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-24 16:48:55 +02:00
Namhyung Kim
78f4bb367f loop: limit 'max_part' module param to DISK_MAX_PARTS
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).

On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:

$ sudo modprobe loop max_part0000
 ------------[ cut here ]------------
 kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
 invalid opcode: 0000 [#1] SMP
 last sysfs file:
 CPU 0
 Modules linked in: loop(+)

 Pid: 43, comm: insmod Tainted: G        W   2.6.39-qemu+ #155 Bochs Bochs
 RIP: 0010:[<ffffffff8113ce61>]  [<ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
 RSP: 0018:ffff880007b3fde8  EFLAGS: 00000246
 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
 FS:  0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
 Stack:
  ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
  0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
  0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
 Call Trace:
  [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
  [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
  [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
  [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
  [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
  [<ffffffff8116f527>] add_disk+0xdf/0x290
  [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
  [<ffffffffa0006000>] ? 0xffffffffa0005fff
  [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
  [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
  [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
 Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
 RIP  [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
  RSP <ffff880007b3fde8>
 ---[ end trace a123eb592043acad ]---

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-24 16:48:54 +02:00
Andrew Morton
0ddf72be4e drbd: fix warning
In file included from drivers/block/drbd/drbd_main.c:54:                        drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type

Forward declarations of enums do not work.

Fix it unpleasantly by moving the prototype.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2011-05-24 10:38:33 +02:00
Philipp Reisner
9b2f61aec7 drbd: fix warning
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2011-05-24 10:38:32 +02:00
Bart Van Assche
24c4830c8e drbd: Fix spelling
Found these with the help of ispell -l.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2011-05-24 10:21:29 +02:00
Lars Ellenberg
9a0d9d0389 drbd: fix schedule in atomic
An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.

This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().

This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.

Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.

manually picked commit 564040f from the drbd 8.4 branch.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:14:32 +02:00
Philipp Reisner
99432fcc52 drbd: Take a more conservative approach when deciding max_bio_size
The old (optimistic) implementation could shrink the bio size
on an primary device.

Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.

The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.

 We cache the last seen max_bio_size of the peer in the meta
 data, and rely on that, to make the operation of single
 nodes more efficient.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:08:58 +02:00
Philipp Reisner
21423fa791 drbd: Fixed state transitions after async outdate-peer-handler returned
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:08:11 +02:00
Philipp Reisner
fa7d939663 drbd: Disallow the peer_disk_state to be D_OUTDATED while connected
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:07:50 +02:00
Philipp Reisner
a8e407925d drbd: Fix for the connection problems on high latency links
It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:07:22 +02:00
Lars Ellenberg
76727f684a drbd: fix potential activity log refcount imbalance in error path
It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:06:44 +02:00
Philipp Reisner
d2e17807e3 drbd: Only downgrade the disk state in case of disk failures
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:05:48 +02:00
Lars Ellenberg
f36af18c7b drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int
If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.

If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.

To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.

By default, ping-int is 10 seconds, ping-timeout is 500 ms.

Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.

Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:03:30 +02:00
Lars Ellenberg
53ea433145 drbd: fix potential distributed deadlock
We limit ourselves to a configurable maximum number of pages used as
temporary bio pages.

If the configured "max_buffers" is not big enough to match the bandwidth
of the respective deployment, a distributed deadlock could be triggered
by e.g. fast online verify and heavy application IO.

TCP connections would block on congestion, because both receivers
would wait on pages to become available.

Fortunately the respective senders in this case would be able to give
back some pages already. So do that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:02:41 +02:00
Philipp Reisner
738a84b25c drbd: Fix for application IO with the on-io-error=pass-on policy
In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.

Read retry remote was already in place.

Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 09:59:49 +02:00
Paul Gortmaker
70c7160619 Add appropriate <linux/prefetch.h> include for prefetch users
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout.  Give
them an explicit include via the following $0.02 script.

 =========================================
 #!/bin/bash
 MANUAL=""
 for i in `git grep -l 'prefetch(.*)' .` ; do
 	grep -q '<linux/prefetch.h>' $i
 	if [ $? = 0 ] ; then
 		continue
 	fi

 	(	echo '?^#include <linux/?a'
 		echo '#include <linux/prefetch.h>'
 		echo .
 		echo w
 		echo q
 	) | ed -s $i > /dev/null 2>&1
 	if [ $? != 0 ]; then
 		echo $i needs manual fixup
 		MANUAL="$i $MANUAL"
 	fi
 done
 echo ------------------- 8\<----------------------
 echo vi $MANUAL
 =========================================

Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
  non-network drivers and the fib_trie.c case    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 21:41:57 -07:00
Jens Axboe
698567f3fa Merge commit 'v2.6.39' into for-2.6.40/core
Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
had churn in the core area that makes it difficult to handle
patches for eg cfq or blk-throttle. Instead of requiring that they
be based in older versions with bugs that have been fixed later
in the rc cycle, merge in 2.6.39 final.

Also fixes up conflicts in the below files.

Conflicts:
	drivers/block/paride/pcd.c
	drivers/cdrom/viocd.c
	drivers/ide/ide-cd.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-20 20:33:15 +02:00
Sage Weil
13143d2d1c rbd: warn on update_snaps failure on notify
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 11:25:05 -07:00
Jens Axboe
779d530632 Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers 2011-05-19 09:46:00 +02:00
Jan Beulich
8ab521506c xen/blkback: don't fail empty barrier requests
The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.

Inspired by Konrad's "When writting barriers set the sector number to
zero...".

While at it also add overflow checking to the math in vbd_translate().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-18 11:28:16 -04:00
Linus Torvalds
a2b9c1f620 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: don't delay blk_run_queue_async
  scsi: remove performance regression due to async queue run
  blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
  block: rescan partitions on invalidated devices on -ENOMEDIA too
  cdrom: always check_disk_change() on open
  block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
2011-05-18 06:49:02 -07:00
Yehuda Sadeh
1fec70932d rbd: fix split bio handling
The rbd driver currently splits bios when they span an object boundary.
However, the blk_end_request expects the completions to roll up the results
in block device order, and the split rbd/ceph ops can complete in any
order.  This patch adds a struct rbd_req_coll to track completion of split
requests and ensures that the results are passed back up to the block layer
in order.

This fixes errors where the file system gets completion of a read operation
that spans an object boundary before the data has actually arrived.  The
bug is easily reproduced with iozone with a working set larger than
available RAM.

Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-13 13:52:57 -07:00
Laszlo Ersek
496b318eb6 xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end()
vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317

Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-13 09:45:40 -04:00
Sage Weil
11f770027b rbd: fix leak of ops struct
The ops vector must be freed by the rbd_do_request caller.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-12 20:59:14 -07:00
Konrad Rzeszutek Wilk
5185432277 xen/blkback: Align the tabs on the structure.
The recent changes caused this field of the structure to be offset a bit.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 18:02:28 -04:00
Konrad Rzeszutek Wilk
cca537af7d xen/blkback: if log_stats is enabled print out the data.
And not depend on the driver being built with -DDEBUG flag.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:54 -04:00
Konrad Rzeszutek Wilk
5a577e3872 xen/blkback: Add the prefix XEN in the common.h.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:53 -04:00
Konrad Rzeszutek Wilk
3d814731ba xen/blkback: Prefix 'vbd' with 'xen' in structs and functions.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:52 -04:00
Konrad Rzeszutek Wilk
30fd150202 xen/blkback: Change structure name blkif_st to xen_blkif.
No need for that '_st' and xen_blkif is more apt.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:51 -04:00
Konrad Rzeszutek Wilk
325a648604 xen/blkback: Remove the unused typedefs.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:50 -04:00
Konrad Rzeszutek Wilk
452a6b2bb6 xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h
Not point of the blkif.h file. It is not used by the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:49 -04:00
Konrad Rzeszutek Wilk
b0f801273f xen/blkback: Fixing some more of the cleanpatch.pl warnings.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:48 -04:00
Konrad Rzeszutek Wilk
03e0edf946 xen/blkback: Checkpatch.pl recommend against multiple assigments.
CHECK: multiple assignments should be avoided

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:47 -04:00
Konrad Rzeszutek Wilk
a4c348580e xen/blkback: Flesh out the description in the Kconfig.
with more details.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:55:40 -04:00
Konrad Rzeszutek Wilk
b9fc02968c xen/blkback: Fix spelling mistakes.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:43:21 -04:00
Konrad Rzeszutek Wilk
68c88dd7d3 xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir.
From the blkif.h header, which was exposed to the frontend.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:43:20 -04:00
Konrad Rzeszutek Wilk
72468bfcb8 xen/blkback: Removing the debug_lvl option.
It is not really used for anything.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:43:20 -04:00
Konrad Rzeszutek Wilk
22b20f2dff xen/blkback: Use the DRV_PFX in the pr_.. macros.
To make it easier to read.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:43:12 -04:00
Konrad Rzeszutek Wilk
1afbd730a3 xen/blkback: Make the DPRINTK uniform.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:42:51 -04:00
Konrad Rzeszutek Wilk
ebe8190659 xen/blkback: Change printk/DPRINTK to pr_.. type variant.
And also make them uniform and prefix the message with 'xen-blkback'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 16:42:31 -04:00
Konrad Rzeszutek Wilk
edf6ef59ec xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support.
If the backend supports the 'feature-flush-cache' mode, use that
instead of the 'feature-barrier' support.

Currently there are three backends that support the 'feature-flush-cache'
mode: NetBSD, Solaris and Linux kernel. The 'flush' option is much
light-weight version than the 'barrier' support so lets try to use as
there are no filesystems in the kernel that use full barriers anymore.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 08:56:03 -04:00
Marek Marczykowski
4352b47ab7 xen-blkfront: fix data size for xenbus_gather in blkfront_connect
barrier variable is int, not long. This overflow caused another variable
override: "err" (in PV code) and "binfo" (in xenlinux code -
drivers/xen/blkfront/blkfront.c). The later caused incorrect device
flags (RO/removable etc).

Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
[v1: Changed title]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 08:55:51 -04:00
Konrad Rzeszutek Wilk
01f37f2d53 xen/blkback: Fixed up comments and converted spaces to tabs.
Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-11 15:57:09 -04:00
Jens Axboe
edc83d47a9 cciss: fix compile issue
drivers/block/cciss.c: In function ‘cciss_send_reset’:
drivers/block/cciss.c:2515:2: error: implicit declaration of function ‘fill_cmd’
drivers/block/cciss.c: At top level:
drivers/block/cciss.c:2531:12: error: conflicting types for ‘fill_cmd’
drivers/block/cciss.c:2534:1: note: an argument type that has a default promotion can’t match an empty parameter name list declaration
drivers/block/cciss.c:2515:18: note: previous implicit declaration of ‘fill_cmd’ was here
make[1]: *** [drivers/block/cciss.o] Error 1
make: *** [drivers/block/cciss.o] Error 2

Move fill_cmd() to above where it is first used.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:27:00 -06:00
Stephen M. Cameron
8a4ec67bd5 cciss: add cciss_tape_cmds module paramter
This is to allow number of commands reserved for use by SCSI tape drives
and medium changers to be adjusted at driver load time via the kernel
parameter cciss_tape_cmds, with a default value of 6, and a range
of 2 - 16 inclusive.  Previously, the driver limited the number of
commands which could be queued to the SCSI half of the the driver
to only 2.  This is to fix the problem that if you had more than
two tape drives, you couldn't, for example, erase or rewind them all
at the same time.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:59 -06:00
Stephen M. Cameron
063d2cf72a cciss: do not use bit 2 doorbell reset
It causes NMIs which are undesirable at best, unsurvivable at worst.
Prefer the soft reset instead.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:58 -06:00
Stephen M. Cameron
ec52d5f1cb cciss: do not attempt PCI power management reset method if we know it won't work.
Just go straight to the soft-reset method instead.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:57 -06:00
Stephen M. Cameron
93c46c2fa7 cciss: remove superfluous sleeps around reset code
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:56 -06:00
Stephen M. Cameron
5afe278114 cciss: do soft reset if hard reset is broken
on driver load, if reset_devices is set, and the hard reset
attempts fail, try to bring up the controller to the point that
a command can be sent, and send it a soft reset command, then
after the reset undo whatever driver initialization was done to get
it to the point to take a command, and re-do it after the reset.

This is to get kdump to work on all the "non-resettable" controllers
(except 64xx controllers which can't be reset due to the potentially
shared cache module.)

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:56 -06:00
Stephen M. Cameron
bf2e2e6b87 cciss: use new doorbell-bit-5 reset method
The bit-2-doorbell reset method seemed to cause (survivable) NMIs
on some systems and (unsurvivable) IOCK NMIs on some G7 servers.
Firmware guys implemented a new doorbell method to alleviate these
problems triggered by bit 5 of the doorbell register.  We want to
use it if it's available.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:55 -06:00
Stephen M. Cameron
3e28601fdf cciss: increase timeouts for post-reset no-ops
Just to reduce the messages about timeouts that appear.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:54 -06:00
Stephen M. Cameron
59ec86bb98 cciss: clarify messages around reset behavior
When waiting for the board to become "not ready"
don't print a message saying "waiting for board to
become ready" (possibly followed by a message saying
"failed waiting for board to become not ready".  Instead,
it should be "waiting for board to reset" and "failed
waiting for board to reset."

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
"
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:53 -06:00
Stephen M. Cameron
19adbb9254 cciss: increase time to wait for board reset to start
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:51 -06:00
Stephen M. Cameron
8f71bb829a cciss: get rid of message related magic numbers
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:50 -06:00
Stephen M. Cameron
e363e01436 cciss: fix reply pool and block fetch table memory leaks
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:50 -06:00
Stephen M. Cameron
2b48085f97 cciss: factor out irq request code
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:49 -06:00
Stephen M. Cameron
abf7966e61 cciss: factor out scatterlist allocation functions
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:48 -06:00
Stephen M. Cameron
54dae34320 cciss: factor out command pool allocation functions
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:47 -06:00
Stephen M. Cameron
62710ae1ce cciss: do a better job of detecting controller reset failure
Detect failure of controller reset by noticing if the 32 bytes of
"driver version" we store on the hardware in the config table
fail to get zeroed out.  Previously we noticed if the controller
did not transition to "simple mode", but this did not detect reset
failure if the controller was already in simple mode prior to
the reset attempt (e.g. due to module parameter hpsa_simple_mode=1).

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:46 -06:00
Stephen M. Cameron
9bd3c20487 cciss: add readl after writel in interrupt mask setting code
This is to ensure the board interrupts are really off when
these functions return.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 08:23:45 -06:00
Konrad Rzeszutek Wilk
3d68b39926 xen/blkback: Fix up some of the comments.
They had the wrong data or were in the wrong spot.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05 13:43:26 -04:00
Konrad Rzeszutek Wilk
fc53bf757e xen/blkback: Squash the checking for operation into dispatch_rw_block_io
We do a check for the operations right before calling dispatch_rw_block_io.
And then we do the same check in dispatch_rw_block_io. This patch
squashes those checks into the 'dispatch_rw_block_io' function.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05 13:43:25 -04:00
Konrad Rzeszutek Wilk
24f567f952 xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER.
We drop the support for 'feature-barrier' and add in the support
for the 'feature-flush-cache' if the real backend storage supports
flushing.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05 13:43:24 -04:00
Sage Weil
4ad12621e4 libceph: fix ceph_osdc_alloc_request error checks
ceph_osdc_alloc_request returns NULL on failure.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03 09:28:13 -07:00
Konrad Rzeszutek Wilk
a19be5f0f0 Revert "xen/blkback: Move the plugging/unplugging to a higher level."
This reverts commit 97961ef46b b/c
we lose about 15% performance if we do the unplugging and the
end of the reading the ring buffer.
2011-04-27 12:40:11 -04:00
Konrad Rzeszutek Wilk
013c3ca184 xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.
If one runs a simple fio request with random read/write with a
20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler.

IOmeter       |       |      |          |
64K, randrw   |  NOOP | CFQ  | deadline |
randrwmix=80  |       |      |          |
--------------+-------+------+----------+
blkback       |103/27 |32/10 | 102/27   |
--------------+-------+------+----------+
QEMU qdisk    |103/27 |102/27| 102/27   |

The problem as explained by Vivek Goyal was:

".. that difference is that sync vs async requests. In the case of
a kernel thread submitting IO, [..] all the WRITES might be being
considered as async and will go in a different queue. If you mix those
with some READS, they are always sync and will go in differnet queue.
In presence of sync queue, CFQ will idle and choke up WRITES in
an attempt to improve latencies of READs.

In case of AIO [note: this is what QEMU qdisk is doing] , [..]
it is direct IO and both READS and WRITES will be considered SYNC
and will go in a single queue and no choking of WRITES will take place."

The solution is quite simple, tack on REQ_SYNC (which is
what the WRITE_ODIRECT macro points to) and the numbers go
back up.

Suggested-by: Vivek Goyal <vgoyal@redhat.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-26 16:24:18 -04:00
Konrad Rzeszutek Wilk
97961ef46b xen/blkback: Move the plugging/unplugging to a higher level.
We used to the plug/unplug on the submit_bio. But that means
if within a stream of WRITE, WRITE, WRITE,...,WRITE we have
one READ, it could stall the pipeline (as the 'submio_bio'
could trigger the unplug_fnc to be called and stall/sync
when doing the READ). Instead we want to move the unplugging
when the whole (or as a much as possible) ring buffer has been
processed. This also eliminates us doing plug/unplug for
each request.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-26 13:01:32 -04:00
Tejun Heo
9fd097b149 block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
In-kernel disk event polling doesn't matter for legacy/fringe drivers
and may lead to infinite event loop if ->check_events() implementation
generates events on level condition instead of edge.

Now that block layer supports suppressing exporting unlisted events,
simply leaving disk->events cleared allows these drivers to keep the
internal revalidation behavior intact while avoiding weird
interactions with userland event handler.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21 21:33:05 +02:00
Tejun Heo
d4dc210f69 block: don't block events on excl write for non-optical devices
Disk event code automatically blocks events on excl write.  This is
primarily to avoid issuing polling commands while burning is in
progress.  This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.

This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21 20:54:46 +02:00
Konrad Rzeszutek Wilk
8b6bf747d7 xen/blkback: Prefix exposed functions with xen_
And also shorten the name if it has blkback to blkbk.

This results in the symbol table (if compiled in the kernel)
to be much shorter, prettier,  and also easier to search for.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 11:58:03 -04:00
Konrad Rzeszutek Wilk
42c7841d17 xen-blkback: Inline some of the functions that were moved from vbd/interface.c
Shuffling code around.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 11:58:02 -04:00
Konrad Rzeszutek Wilk
6cd0388cd6 xen-blkback: Remove from the copyright notice the address.
There is no need for it, as the address is updated constatly
in the root of the Linux kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 11:58:01 -04:00
Konrad Rzeszutek Wilk
ee9ff8537e xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.
Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the
critical bits where they belong, respectively.

Leaving only blkback.c for the data- and xenbus.c for the control path.

Suggested-by:  Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 11:57:59 -04:00
Konrad Rzeszutek Wilk
dfc07b13dc xen/blkback: Move it from drivers/xen to drivers/block
.. and modify the Makefile and Kconfig files appropriately.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-18 14:30:26 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
7e599e6e62 drbd: fix up merge error
In commit 95a0f10cdd ("drbd: store in-core bitmap little endian,
regardless of architecture") drbd had made the sane choice to use
little-endian bitmap functions everywhere.  However, it used the
horrible old functions names from <asm-generic/bitops/le.h>, that were
never really meant to be exported.

In the meantime, things got cleaned up, and in commit c4945b9ed4
("asm-generic: rename generic little-endian bitops functions") we
renamed the LE bitops to something sane, exactly so that they could be
used in random code without people gouging their eyes out when seeing
the crazy jumble of letters that were the old internal names.

As a result the drbd thing merged cleanly (commit 8d49a77568: "Merge
branch 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block"),
since there was no data conflict - but the end result obviously doesn't
actually compile.

Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-28 07:42:58 -07:00
Linus Torvalds
8d49a77568 Merge branch 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block: (122 commits)
  cciss: fix lost command issue
  drbd: need include for bitops functions declarations
  Revert "cciss: Add missing allocation in scsi_cmd_stack_setup and  corresponding deallocation"
  cciss: fix missed command status value CMD_UNABORTABLE
  cciss: remove unnecessary casts
  cciss: Mask off error bits of c->busaddr in cmd_special_free when calling pci_free_consistent
  cciss: Inform controller we are using 32-bit tags.
  cciss: hoist tag masking out of loop
  cciss: Add missing allocation in scsi_cmd_stack_setup and  corresponding deallocation
  cciss: export resettable host attribute
  drbd: drop code present under #ifdef which is relevant to 2.6.28 and below
  drbd: Fixed handling of read errors on a 'VerifyS' node
  drbd: Fixed handling of read errors on a 'VerifyT' node
  drbd: Implemented real timeout checking for request processing time
  drbd: Remove unused function atodb_endio()
  drbd: improve log message if received sector offset exceeds local capacity
  drbd: kill dead code
  drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio fails
  drbd: Removed left over, now wrong comments
  drbd: serialize admin requests for new verify run with pending bitmap io
  ...
2011-03-27 20:02:07 -07:00
Linus Torvalds
6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Bud Brown
1ddd504954 cciss: fix lost command issue
Under certain workloads a command may seem to get lost. IOW, the Smart Array
thinks all commands have been completed but we still have commands in our
completion queue. This may lead to system instability, filesystems going
read-only, or even panics depending on the affected filesystem. We add an
extra read to force the write to complete.

Testing shows this extra read avoids the problem.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-23 20:47:11 +01:00
Linus Torvalds
0adfc56ce8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: use watch/notify for changes in rbd header
  libceph: add lingering request and watch/notify event framework
  rbd: update email address in Documentation
  ceph: rename dentry_release -> d_release, fix comment
  ceph: add request to the tail of unsafe write list
  ceph: remove request from unsafe list if it is canceled/timed out
  ceph: move readahead default to fs/ceph from libceph
  ceph: add ino32 mount option
  ceph: update common header files
  ceph: remove debugfs debug cruft
  libceph: fix osd request queuing on osdmap updates
  ceph: preserve I_COMPLETE across rename
  libceph: Fix base64-decoding when input ends in newline.
2011-03-22 16:25:25 -07:00
Yehuda Sadeh
59c2be1e4d rbd: use watch/notify for changes in rbd header
Send notifications when we change the rbd header (e.g. create a snapshot)
and wait for such notifications.  This allows synchronizing the snapshot
creation between different rbd clients/rools.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22 11:33:56 -07:00
Linus Torvalds
e16b396ce3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
  doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
  Update cpuset info & webiste for cgroups
  dcdbas: force SMI to happen when expected
  arch/arm/Kconfig: remove one to many l's in the word.
  asm-generic/user.h: Fix spelling in comment
  drm: fix printk typo 'sracth'
  Remove one to many n's in a word
  Documentation/filesystems/romfs.txt: fixing link to genromfs
  drivers:scsi Change printk typo initate -> initiate
  serial, pch uart: Remove duplicate inclusion of linux/pci.h header
  fs/eventpoll.c: fix spelling
  mm: Fix out-of-date comments which refers non-existent functions
  drm: Fix printk typo 'failled'
  coh901318.c: Change initate to initiate.
  mbox-db5500.c Change initate to initiate.
  edac: correct i82975x error-info reported
  edac: correct i82975x mci initialisation
  edac: correct commented info
  fs: update comments to point correct document
  target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
  ...

Trivial conflict in fs/eventpoll.c (spelling vs addition)
2011-03-18 10:37:40 -07:00
Stephen Rothwell
f0ff1357ce drbd: need include for bitops functions declarations
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-17 15:02:51 +01:00
Linus Torvalds
dc113c1f1d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/block: amiflop - Remove superfluous amiga_chip_alloc() cast
  m68k/atari: ARAnyM - Add support for network access
  m68k/atari: ARAnyM - Add support for console access
  m68k/atari: ARAnyM - Add support for block access
  m68k/atari: Initial ARAnyM support
  m68k: Kconfig - Remove unneeded "default n"
  m68k: Makefiles - Change to new flags variables
  m68k/amiga: Reclaim Chip RAM for PPC exception handlers
  m68k: Allow all kernel traps to be handled via exception fixups
  m68k: Use base_trap_init() to initialize vectors
  m68k: Add helper function handle_kernel_fault()
2011-03-16 19:08:03 -07:00
Linus Torvalds
4c5811bf46 Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6: (21 commits)
  tty: serial: altera_jtaguart: Add device tree support
  tty: serial: altera_uart: Add devicetree support
  dt: eliminate of_platform_driver shim code
  dt: Eliminate of_platform_{,un}register_driver
  dt/serial: Eliminate users of of_platform_{,un}register_driver
  dt/usb: Eliminate users of of_platform_{,un}register_driver
  dt/video: Eliminate users of of_platform_{,un}register_driver
  dt/net: Eliminate users of of_platform_{,un}register_driver
  dt/sound: Eliminate users of of_platform_{,un}register_driver
  dt/spi: Eliminate users of of_platform_{,un}register_driver
  dt: uartlite: merge platform and of_platform driver bindings
  dt: xilinx_hwicap: merge platform and of_platform driver bindings
  ipmi: convert OF driver to platform driver
  leds/leds-gpio: merge platform_driver with of_platform_driver
  dt/sparc: Eliminate users of of_platform_{,un}register_driver
  dt/powerpc: Eliminate users of of_platform_{,un}register_driver
  dt/powerpc: move of_bus_type infrastructure to ibmebus
  drivercore/dt: add a match table pointer to struct device
  dt: Typo fix.
  altera_ps2: Add devicetree support
  ...
2011-03-16 17:28:10 -07:00
Linus Torvalds
7a6362800c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
  bonding: enable netpoll without checking link status
  xfrm: Refcount destination entry on xfrm_lookup
  net: introduce rx_handler results and logic around that
  bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
  bonding: wrap slave state work
  net: get rid of multiple bond-related netdevice->priv_flags
  bonding: register slave pointer for rx_handler
  be2net: Bump up the version number
  be2net: Copyright notice change. Update to Emulex instead of ServerEngines
  e1000e: fix kconfig for crc32 dependency
  netfilter ebtables: fix xt_AUDIT to work with ebtables
  xen network backend driver
  bonding: Improve syslog message at device creation time
  bonding: Call netif_carrier_off after register_netdevice
  bonding: Incorrect TX queue offset
  net_sched: fix ip_tos2prio
  xfrm: fix __xfrm_route_forward()
  be2net: Fix UDP packet detected status in RX compl
  Phonet: fix aligned-mode pipe socket buffer header reserve
  netxen: support for GbE port settings
  ...

Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
2011-03-16 16:29:25 -07:00
Geert Uytterhoeven
059718d572 m68k/block: amiflop - Remove superfluous amiga_chip_alloc() cast
amiga_chip_alloc() returns a void *, so we don't need a cast.
Also clean up coding style while we're at it.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2011-03-16 19:11:25 +01:00
Linus Torvalds
76ca078328 Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: suspend: remove xen_hvm_suspend
  xen: suspend: pull pre/post suspend hooks out into suspend_info
  xen: suspend: move arch specific pre/post suspend hooks into generic hooks
  xen: suspend: refactor non-arch specific pre/post suspend hooks
  xen: suspend: add "arch" to pre/post suspend hooks
  xen: suspend: pass extra hypercall argument via suspend_info struct
  xen: suspend: refactor cancellation flag into a structure
  xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
  xen: switch to new schedop hypercall by default.
  xen: use new schedop interface for suspend
  xen: do not respond to unknown xenstore control requests
  xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled
  xen: PV on HVM: support PV spinlocks and IPIs
  xen: make the ballon driver work for hvm domains
  xen-blkfront: handle Xen major numbers other than XENVBD
  xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
  xen: no need to delay xen_setup_shutdown_event for hvm guests anymore
2011-03-15 10:59:09 -07:00
Linus Torvalds
27d2a8b97e Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: ia64 build broken due to "xen: switch to new schedop hypercall by default."

* 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Union the blkif_request request specific fields

* 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: annotate functions which only call into __init at start of day
  xen p2m: annotate variable which appears unused
  xen: events: mark cpu_evtchn_mask_p as __refdata
2011-03-15 10:49:16 -07:00