Commit graph

375131 commits

Author SHA1 Message Date
Eric Sandeen
f63e0cca91 btrfs: ignore device open failures in __btrfs_open_devices
This:

   # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test

would lead to a blkdev open/close mismatch when the mount fails, and
a permanently busy (opened O_EXCL) sdb2:

   # wipefs -a /dev/sdb2
   wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy

It's because btrfs_open_devices() may open some devices, fail on
the last one, and return that failure stored in "ret."   The mount
then fails, but the caller then does not clean up the open devices.

Chris assures me that:

"btrfs_open_devices just means: go off and open every bdev you can from
this uuid.  It should return success if we opened any of them at all."

So change the logic to ignore any open failures; just skip processing
of that device.  Later on it's decided whether we have enough devices
to continue.

Reported-by: Jan Safranek <jsafrane@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:36 -04:00
Miao Xie
e4100d987b Btrfs: improve the performance of the csums lookup
It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us).

 # dd if=<mnt>/file of=/dev/null bs=1M count=1024

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:35 -04:00
Josef Bacik
09a2a8f96e Btrfs: fix bad extent logging
A user sent me a btrfs-image of a file system that was panicing on mount during
the log recovery.  I had originally thought these problems were from a bug in
the free space cache code, but that was just a symptom of the problem.  The
problem is if your application does something like this

[prealloc][prealloc][prealloc]

the internal extent maps will merge those all together into one extent map, even
though on disk they are 3 separate extents.  So if you go to write into one of
these ranges the extent map will be right since we use the physical extent when
doing the write, but when we log the extents they will use the wrong sizes for
the remainder prealloc space.  If this doesn't happen to trip up the free space
cache (which it won't in a lot of cases) then you will get bogus entries in your
extent tree which will screw stuff up later.  The data and such will still work,
but everything else is broken.  This patch fixes this by not allowing extents
that are on the modified list to be merged.  This has the side effect that we
are no longer adding everything to the modified list all the time, which means
we now have to call btrfs_drop_extents every time we log an extent into the
tree.  So this allows me to drop all this speciality code I was using to get
around calling btrfs_drop_extents.  With this patch the testcase I've created no
longer creates a bogus file system after replaying the log.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:34 -04:00
Josef Bacik
cc95bef635 Btrfs: log ram bytes properly
When logging changed extents I was logging ram_bytes as the current length,
which isn't correct, it's supposed to be the ram bytes of the original extent.
This is for compression where even if we split the extent we need to know the
ram bytes so when we uncompress the extent we know how big it will be.  This was
still working out right with compression for some reason but I think we were
getting lucky.  It was definitely off for prealloc which is why I noticed it,
btrfsck was complaining about it.  With this patch btrfsck no longer complains
after a log replay.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:33 -04:00
Josef Bacik
98ad69cfd2 Btrfs: don't wait on ordered extents if we have a trans open
Dave was hitting a lockdep warning because we're now properly taking the ordered
operations mutex in the ordered wait stuff.  This is because some cases we will
have a trans handle when we are flushing delalloc space, but we can't wait on
ordered extents because we could potentially deadlock, so fix this by not doing
the wait if we have a trans handle.  Thanks

Reported-and-tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:32 -04:00
Josef Bacik
8c579fe745 Btrfs: fix error handling in make/read block group
I noticed that we will add a block group to the space info before we add it to
the block group cache rb tree, so we could potentially allocate from the block
group before it's able to be searched for.  I don't think this is too much of
a problem, the race window is microscopic, but just in case move the tree
insertion to above the space info linking.  This makes it easier to adjust the
error handling as well, so we can remove a couple of BUG_ON(ret)'s and have real
error handling setup for these scenarios.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:32 -04:00
Wang Shilong
5c2d867fdc Btrfs: fix double free in the iterate_extent_inodes()
If btrfs_find_all_roots() fails, 'roots' has been freed or 'roots'
fails to allocate. We don't need to free it outside btrfs_find_all_roots()
again.Fix it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:31 -04:00
Wang Shilong
f172393952 Btrfs: kill some BUG_ONs() in the find_parent_nodes()
The reason that BUG_ON() happens in these places is just
because of ENOMEM.

We try ro return ENOMEM rather than trigger BUG_ON(), the
caller will abort the transaction thus avoiding the kernel panic.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:30 -04:00
Josef Bacik
41b0fc4280 Btrfs: compare relevant parts of delayed tree refs
A user reported a panic while running a balance.  What was happening was he was
relocating a block, which added the reference to the relocation tree.  Then
relocation would walk through the relocation tree and drop that reference and
free that block, and then it would walk down a snapshot which referenced the
same block and add another ref to the block.  The problem is this was all
happening in the same transaction, so the parent block was free'ed up when we
drop our reference which was immediately available for allocation, and then it
was used _again_ to add a reference for the same block from a different
snapshot.  This resulted in something like this in the delayed ref tree

add ref to 90234880, parent=2067398656, ref_root 1766, level 1
del ref to 90234880, parent=2067398656, ref_root 18446744073709551608, level 1
add ref to 90234880, parent=2067398656, ref_root 1767, level 1

as you can see the ref_root's don't match, because when we inc the ref we use
the header owner, which is the original tree the block belonged to, instead of
the data reloc tree.  Then when we remove the extent we use the reloc tree
objectid.  But none of this matters, since it is a shared reference which means
only the parent matters.  When the delayed ref stuff runs it adds all the
increments first, and then does all the drops, to make sure that we don't delete
the ref if we net a positive ref count.  But tree blocks aren't allowed to have
multiple refs from the same block, so this panics when it tries to add the
second ref.  We need the add and the drop to cancel each other out in memory so
we only do the final add.

So to fix this we need to adjust how the delayed refs are added to the tree.
Only the ref_root matters when it is a normal backref, and only the parent
matters when it is a shared backref.  So make our decision based on what ref
type we have.  This allows us to keep the ref_root in memory in case anybody
wants to use it for something else, and it allows the delayed refs to be merged
properly so we don't end up with this panic.

With this patch the users image no longer panics on mount, and it has a clean
fsck after a normal mount/umount cycle.  Thanks,

Cc: stable@vger.kernel.org
Reported-by: Roman Mamedov <rm@romanrm.ru>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:29 -04:00
Josef Bacik
cf79ffb5b7 Btrfs: fix infinite loop when we abort on mount
Testing my enospc log code I managed to abort a transaction during mount, which
put me into an infinite loop.  This is because of two things, first we don't
reset trans_no_join if we abort during transaction commit, which will force
anybody trying to start a transaction to just loop endlessly waiting for it to
be set to 0.  But this is still just a symptom, the second issue is we don't set
the fs state to error during errors on mount.  This is because we don't want to
do the flip read only thing during mount, but we still really want to set the fs
state to an error to keep us from even getting to the trans_no_join check.  So
fix both of these things, make sure to reset trans_no_join if we abort during a
commit, and make sure we set the fs state to error no matter if we're mounting
or not.  This should keep us from getting into this infinite loop again.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:29 -04:00
Wang Shilong
c9a9dbf2cb Btrfs: fix a warning when disabling quota
Steps to reproduce:
	mkfs.btrfs <disk>
	mount <disk> <mnt>
	btrfs quota enable <mnt>
	btrfs sub create <mnt>/subv

	i=1
	while [ $i -le 10000 ]
	do
		dd if=/dev/zero of=<mnt>/subv/data_$i bs=1K count=1
		i=$(($i+1))
		if [ $i -eq 500 ]
		then
			btrfs quota disable $mnt
		fi
	done
	dmesg
Obviously, this warn_on() is unnecessary, and it will be easily triggered.
Just remove it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:28 -04:00
Liu Bo
6b67a32000 Btrfs: pass NULL instead of 0
set_extent_bit()'s (u64 *failed_start) expects NULL not 0.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:27 -04:00
Eric Sandeen
c854a9909a btrfs: document mount options in Documentation/fs/btrfs.txt
Document all current btrfs mount options.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:26 -04:00
David Sterba
5c50c9b89f btrfs: make subvol creation/deletion killable in the early stages
The subvolume ioctls block on the parent directory mutex that can be
held by other concurrent snapshot activity for a long time. Give the
user at least some chance to get out of this situation by allowing
to send a kill signal.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:26 -04:00
David Sterba
94ef7280e8 btrfs: cover more error codes in btrfs_decode_error
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:25 -04:00
David Sterba
4884b476d7 btrfs: make orphan cleanup less verbose
The messages

  btrfs: unlinked 123 orphans
  btrfs: truncated 456 orphans

are not useful to regular users and raise questions whether there are
problems with the filesystem.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:24 -04:00
David Sterba
5e2a4b25da btrfs: deprecate subvolrootid mount option
This mount option was a workaround when subvol= assumed path relative
to the default subvolume, not the toplevel one. This was fixed long time
ago and subvolrootid has no effect.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:23 -04:00
Simon Kirby
c2cf52eb71 Btrfs: Include the device in most error printk()s
With more than one btrfs volume mounted, it can be very difficult to find
out which volume is hitting an error. btrfs_error() will print this, but
it is currently rigged as more of a fatal error handler, while many of
the printk()s are currently for debugging and yet-unhandled cases.

This patch just changes the functions where the device information is
already available. Some cases remain where the root or fs_info is not
passed to the function emitting the error.

This may introduce some confusion with volumes backed by multiple devices
emitting errors referring to the primary device in the set instead of the
one on which the error occurred.

Use btrfs_printk(fs_info, format, ...) rather than writing the device
string every time, and introduce macro wrappers ala XFS for brevity.
Since the function already cannot be used for continuations, print a
newline as part of the btrfs_printk() message rather than at each caller.

Signed-off-by: Simon Kirby <sim@hostway.ca>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:23 -04:00
David Sterba
aa8259145e btrfs: update kconfig title
The Kconfig title does not make much sense after the cleanup of
CONFIG_EXPERIMENTAL option, align the wording with other filesystems.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:22 -04:00
David Sterba
9d1a2a3ad5 btrfs: clean snapshots one by one
Each time pick one dead root from the list and let the caller know if
it's needed to continue. This should improve responsiveness during
umount and balance which at some point waits for cleaning all currently
queued dead roots.

A new dead root is added to the end of the list, so the snapshots
disappear in the order of deletion.

The snapshot cleaning work is now done only from the cleaner thread and the
others wake it if needed.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:21 -04:00
Zhi Yong Wu
6841ebee6b btrfs: Cleanup some redundant codes in btrfs_log_inode()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:20 -04:00
Zhi Yong Wu
628c8282be btrfs: Cleanup some redundant codes in btrfs_lookup_csums_range()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:20 -04:00
Liu Bo
7abadb6431 Btrfs: share stop worker code
Share the exactly same code of stopping workers.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:19 -04:00
Josef Bacik
3173a18f70 Btrfs: add a incompatible format change for smaller metadata extent refs
We currently store the first key of the tree block inside the reference for the
tree block in the extent tree.  This takes up quite a bit of space.  Make a new
key type for metadata which holds the level as the offset and completely removes
storing the btrfs_tree_block_info inside the extent ref.  This reduces the size
from 51 bytes to 33 bytes per extent reference for each tree block.  In practice
this results in a 30-35% decrease in the size of our extent tree, which means we
COW less and can keep more of the extent tree in memory which makes our heavy
metadata operations go much faster.  This is not an automatic format change, you
must enable it at mkfs time or with btrfstune.  This patch deals with having
metadata stored as either the old format or the new format so it is easy to
convert.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:18 -04:00
Liu Bo
be283b2e67 Btrfs: use helper to cleanup tree roots
free_root_pointers() has been introduced to cleanup all of tree roots,
so just use it instead.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:17 -04:00
Liu Bo
b0496686ba Btrfs: cleanup unused arguments of btrfs_csum_data
Argument 'root' is no more used in btrfs_csum_data().

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:14 -04:00
David Sterba
087488109a btrfs: clean up transaction abort messages
The transaction abort stacktrace is printed only once per module
lifetime, but we'd like to see it each time it happens per mounted
filesystem.  Introduce a fs_state flag that records it.

Tweak the messages around abort:
* add error number to the first abort
* print the exact negative errno from btrfs_decode_error
* clean up btrfs_decode_error and callers
* no dots at the end of the messages

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:52:56 -04:00
David Sterba
bbece8a3f0 btrfs: merge save_error_info helpers into one
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:52:55 -04:00
Josef Bacik
74255aa07d Btrfs: add some free space cache tests
We keep hitting bugs in the tree log replay because btrfs_remove_free_space
doesn't account for some corner case.  So add a bunch of tests to try and fully
test btrfs_remove_free_space since the only time it is called is during tree log
replay.  These tests all finish successfully, so as we find more of these bugs
we need to add to these tests to make sure we don't regress in fixing things.
I've hidden the tests behind a Kconfig option, but they take no time to run so
all btrfs developers should have this turned on all the time.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:52:54 -04:00
Linus Torvalds
2bf1bef0d6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
 "This is the second batch of s390 patches for the 3.10 merge window.

  Heiko improved the memory detection, this fixes kdump for large memory
  sizes.  Some kvm related memory management work, new ipldev/condev
  keywords in cio and bug fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mem_detect: remove artificial kdump memory types
  s390/mm: add pte invalidation notifier for kvm
  s390/zcrypt: ap bus rescan problem when toggle crypto adapters on/off
  s390/memory hotplug,sclp: get rid of per memory increment usecount
  s390/memory hotplug: provide memory_block_size_bytes() function
  s390/mem_detect: limit memory detection loop to "mem=" parameter
  s390/kdump,bootmem: fix bootmem allocator bitmap size
  s390: get rid of odd global real_memory_size
  s390/kvm: Change the virtual memory mapping location for Virtio devices
  s390/zcore: calculate real memory size using own get_mem_size function
  s390/mem_detect: add DAT sanity check
  s390/mem_detect: fix lockdep irq tracing
  s390/mem_detect: move memory detection code to mm folder
  s390/zfcpdump: exploit new cio_ignore keywords
  s390/cio: add condev keyword to cio_ignore
  s390/cio: add ipldev keyword to cio_ignore
  s390/uaccess: add "fallthrough" comments
2013-05-06 12:34:53 -07:00
Sergei Shtylyov
c81400be71 3c59x: fix freeing nonexistent resource on driver unload
When unloading the driver that drives an EISA board, a message similar to the
following one is displayed:

Trying to free nonexistent resource <0000000000013000-000000000001301f>

Then an user is unable to reload the driver because the resource it requested in
the previous load hasn't been freed. This happens most probably due to a typo in
vortex_eisa_remove() which calls release_region() with 'dev->base_addr'  instead
of 'edev->base_addr'...

Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 12:22:11 -04:00
Dan Carpenter
a3dbbc2bab netpoll: inverted down_trylock() test
The return value is reversed from mutex_trylock().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:52 -04:00
Al Viro
243198d09f rps_dev_flow_table_release(): no need to delay vfree()
The same story as with fib_trie patch - vfree() from RCU callbacks
is legitimate now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
Al Viro
0020356355 fib_trie: no need to delay vfree()
Now that vfree() can be called from interrupt contexts, there's no
need to play games with schedule_work() to escape calling vfree()
from RCU callbacks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
Konstantin Khlebnikov
b56141ab34 net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675]  RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
Michael S. Tsirkin
7542a6b0d2 vhost: drop virtio_net.h dependency
There's no net specific code in vhost.c anymore,
don't include the virtio_net.h header.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 14:04:06 +03:00
Asias He
fe729a57c8 vhost-net: Cleanup vhost_ubuf and vhost_zcopy
- Rename vhost_ubuf to vhost_net_ubuf
- Rename vhost_zcopy_mask to vhost_net_zcopy_mask
- Make funcs static

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:25:47 +03:00
Asias He
e40ab7484f vhost: Remove vhost_enable_zcopy in vhost.h
It is net.c specific.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:21:15 +03:00
Asias He
ab00c42a56 vhost: Remove comments for hdr in vhost.h
It is supposed to be removed when hdr is moved into vhost_net_virtqueue.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:21:07 +03:00
Asias He
8570a6e72c vhost: Move VHOST_NET_FEATURES to net.c
vhost.h should not depend on device specific marcos like
VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 13:21:00 +03:00
Asias He
b1ad8496c9 vhost-net: Free ubuf when vhost_dev_set_owner fails
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 12:57:54 +03:00
Asias He
54db63c2ca vhost: Export vhost_dev_set_owner
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-05-06 12:57:54 +03:00
Christoph Lameter
6286ae97d1 slab: Return NULL for oversized allocations
The inline path seems to have changed the SLAB behavior for very large
kmalloc allocations with  commit e3366016 ("slab: Use common
kmalloc_index/kmalloc_size functions"). This patch restores the old
behavior but also adds diagnostics so that we can figure where in the
code these large allocations occur.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/201305040348.CIF81716.OStQOHFJMFLOVF@I-love.SAKURA.ne.jp
[ penberg@kernel.org: use WARN_ON_ONCE ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-05-06 09:24:16 +03:00
Benjamin Herrenschmidt
d5dae72130 powerpc/topology: Fix spurr attribute permission
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 15:02:40 +10:00
Benjamin Herrenschmidt
3fd47f063b powerpc/pci: Support per-aperture memory offset
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.

This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.

This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 13:40:40 +10:00
Benjamin Herrenschmidt
342d6666f7 powerpc/cell/iommu: Improve error message for missing node
Some devices don't have a correct node ID and thus can't be
attached to an iommu.

The message displayed by the iommu code isn't very useful if
you don't have a device-tree node as it tries to print the
device-tree path but not the struct device name.

Improve this by printing the device name as well.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 12:03:49 +10:00
Benjamin Herrenschmidt
96cf3f66de powerpc/cell/spufs: Fix status attribute permission
We are registering the attribute with permission 0644 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 12:02:05 +10:00
Benjamin Herrenschmidt
5fe0c1f2f0 irqdomain: Allow quiet failure mode
Some interrupt controllers refuse to map interrupts marked as
"protected" by firwmare. Since we try to map everyting in the
device-tree on some platforms, we end up with a lot of nasty
WARN's in the boot log for what is a normal situation on those
machines.

This defines a specific return code (-EPERM) from the host map()
callback which cause irqdomain to fail silently.

MPIC is updated to return this when hitting a protected source
printing only a single line message for diagnostic purposes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 11:37:43 +10:00
Al Viro
a8ca889ed9 mtd_blktrans_ops->release() should return void
Both existing instances always return 0 and even if they didn't,
the value would be lost on the way out.  Just don't bother...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-05 21:31:22 -04:00
Linus Torvalds
d7ab7302f9 For 3.10 we have a few new MFD drivers for:
- The ChromeOS embedded controller which provides keyboard, battery and power
   management services. This controller is accessible through i2c or SPI.
 
 - Silicon Laboratories 476x controller, providing access to their FM chipset
   and their audio codec.
 
 - Realtek's RTS5249, a memory stick, MMC and SD/SDIO PCI based reader.
 
 - Nokia's Tahvo power button and watchdog device. This device is very similar
   to Retu and is thus supported by the same code base.
 
 - STMicroelectronics STMPE1801, a keyboard and GPIO controller supported by
   the stmpe driver.
 
 - ST-Ericsson AB8540 and AB8505 power management and voltage converter
   controllers through the existing ab8500 code.
 
 Some other drivers got cleaned up or improved. In particular:
 
 - The Linaro/STE guys got the ab8500 driver in sync with their internal code
   through a series of optimizations, fixes and improvements.
 
 - The AS3711 and OMAP USB drivers now have DT support.
 
 - The arizona clock and interrupt handling code got improved.
 
 - The wm5102 register patch and boot mechanism also got improved.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRhttxAAoJEIqAPN1PVmxKl6QP/ilyz2OnuZSJKAT+N3tt0EpR
 6hFk0H6uSiHJ5aNyA22WGJq97R3jW9eGK9uD4AKCQ05l9UF/c5+YeXtmGHtxDLCb
 jBrErfB6GmEn1H2TzVK+Rp1WPAB/yoYHJosgGNCwohvuffhMiogSVHlI09EY4mQh
 2Eo0RTN1UXKXSOZN+E7hb+GbIFzU8eOlEFdc2jh4qtfsvMDANbEByrZM6s0QFB31
 LPn03uBL0+iwE8KW2144LKsfzeOos4JWbumyG9Lh6BugUSy1e/Zvv7aWNVeMvY8C
 0+ZUk0bzRm9g7e3X4iYLPSboZt7J6DLaBlWXnUaOsJb+YRkUGh094ySdKojP3EiK
 8SWSfH4EDwIANKC4zyXMcyny8OewySyrTTd0BTlbgHFyDmvmHk213crsCcilHzRb
 3wrX0ETrk96Dkla4/e7IAyME+AbrglStHVGGf2hexlPm2nZdLsE8lfyo9yqjPqzy
 w49y7mpTA5PVE63szB1tI/58W2snZtXAEdQGjZmDQp29vDZaeR1t3W/IhKNG30JN
 SZGiX3H/6YS4MDZ48N709H83hM4V93XrHKsN59NjQe8NZ7AnSIfns9IgMciGBv7r
 aBE+Uwm9htK270Hvl5q8qDDnKaVGYOFlCq9qaeZ2k8NPyyRlQCRpJYjtSplYAnGr
 iLI0JdM32u3qdf5IT+Cw
 =Wq20
 -----END PGP SIGNATURE-----

Merge tag 'mfd-3.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next

Pull MFD update from Samuel Ortiz:
 "For 3.10 we have a few new MFD drivers for:

   - The ChromeOS embedded controller which provides keyboard, battery
     and power management services.  This controller is accessible
     through i2c or SPI.

   - Silicon Laboratories 476x controller, providing access to their FM
     chipset and their audio codec.

   - Realtek's RTS5249, a memory stick, MMC and SD/SDIO PCI based
     reader.

   - Nokia's Tahvo power button and watchdog device.  This device is
     very similar to Retu and is thus supported by the same code base.

   - STMicroelectronics STMPE1801, a keyboard and GPIO controller
     supported by the stmpe driver.

   - ST-Ericsson AB8540 and AB8505 power management and voltage
     converter controllers through the existing ab8500 code.

  Some other drivers got cleaned up or improved.  In particular:

   - The Linaro/STE guys got the ab8500 driver in sync with their
     internal code through a series of optimizations, fixes and
     improvements.

   - The AS3711 and OMAP USB drivers now have DT support.

   - The arizona clock and interrupt handling code got improved.

   - The wm5102 register patch and boot mechanism also got improved."

* tag 'mfd-3.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next: (104 commits)
  mfd: si476x: Don't use 0bNNN
  mfd: vexpress: Handle pending config transactions
  mfd: ab8500: Export ab8500_gpadc_sw_hw_convert properly
  mfd: si476x: Fix i2c warning
  mfd: si476x: Add header files and Kbuild plumbing
  mfd: si476x: Add chip properties handling code
  mfd: si476x: Add the bulk of the core driver
  mfd: si476x: Add commands abstraction layer
  mfd: rtsx: Support RTS5249
  mfd: retu: Add Tahvo support
  mfd: ucb1400: Pass ucb1400-gpio data through ac97 bus
  mfd: wm8994: Add some OF properties
  mfd: wm8994: Add device ID data to WM8994 OF device IDs
  input: Export matrix_keypad_parse_of_params()
  mfd: tps65090: Add compatible string for charger subnode
  mfd: db8500-prcmu: Support platform dependant device selection
  mfd: syscon: Fix warnings when printing resource_size_t
  of: Add stub of_get_parent for non-OF builds
  mfd: omap-usb-tll: Convert to devm_ioremap_resource()
  mfd: omap-usb-host: Convert to devm_ioremap_resource()
  ...
2013-05-05 17:36:20 -07:00