* git://git.infradead.org/iommu-2.6:
intel-iommu: fix superpage support in pfn_to_dma_pte()
intel-iommu: set iommu_superpage on VM domains to lowest common denominator
intel-iommu: fix return value of iommu_unmap() API
MAINTAINERS: Update VT-d entry for drivers/pci -> drivers/iommu move
intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.
intel-iommu: Workaround IOTLB hang on Ironlake GPU
intel-iommu: Fix AB-BA lockdep report
Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
regression since 2.6.39, namely the machine reboots occasionally at S4
resume. It doesn't happen always, overall rate is about 1/20. But,
like other bugs, once when this happens, it continues to happen.
This patch fixes the problem by essentially reverting the memory
assignment in the older way.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai.lu@oracle.com>
[ We'll hopefully find the real fix, but that's too late for 3.1 now ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix memory leak introduced by commit a6e50b409d
(dm snapshot: skip reading origin when overwriting complete chunk).
When allocating a set of jobs from kc->job_pool, job->master_job must be
set (to point to itself) so that the mempool item gets freed when the
master_job completes.
master_job was introduced by commit c6ea41fbbe
(dm kcopyd: preallocate sub jobs to avoid deadlock)
Reported-by: Michael Leun <ml@newton.leun.net>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
v2:
- register_syscore_ops(&s3c24xx_irq_syscore_ops) does not need to be
conditionally compiled out, it is already optimized out on !CONFIG_PM
- fix also s3c2412 and s3c2416 affected by the same build issue
v1:
s3c2440.c fails to build if !CONFIG_PM because in such case
s3c2410_pm_syscore_ops is not defined. Same error should happen also
in s3c2410.c and s3c2442.c
Signed-off-by: Domenico Andreoli <cavokz@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Offsets of the irq controller registers were calculated
correctly only for first GPIO bank. This patch fixes
calculation of the register offsets for all GPIO banks.
Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fib_rules: fix unresolved_rules counting
r8169: fix wrong eee setting for rlt8111evl
r8169: fix driver shutdown WoL regression.
ehea: Change maintainer to me
pptp: pptp_rcv_core() misses pskb_may_pull() call
tproxy: copy transparent flag when creating a time wait
pptp: fix skb leak in pptp_xmit()
bonding: use local function pointer of bond->recv_probe in bond_handle_frame
smsc911x: Add support for SMSC LAN89218
tg3: negate USE_PHYLIB flag check
netconsole: enable netconsole can make net_device refcnt incorrent
bluetooth: Properly clone LSM attributes to newly created child connections
l2tp: fix a potential skb leak in l2tp_xmit_skb()
bridge: fix hang on removal of bridge via netlink
x25: Prevent skb overreads when checking call user data
x25: Handle undersized/fragmented skbs
x25: Validate incoming call user data lengths
udplite: fast-path computation of checksum coverage
IPVS netns shutdown/startup dead-lock
netfilter: nf_conntrack: fix event flooding in GRE protocol tracker
Fix a bug introduced by 20b45077. We have to return EINVAL on mount
failure, but doing that too early in the sequence leaves all of the
devices opened exclusively. This also fixes an issue where under some
scenarios only a second mount -o degraded <devices> command would
succeed.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Initialize fs_info->bdev_holder a bit earlier to be able to pass a
correct holder id to blkdev_get() when opening seed devices with O_EXCL.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If lookup_extent_backref fails, path->nodes[0] reasonably could be
null along with other callers of btrfs_print_leaf, so ensure we have a
valid extent buffer before dereferencing.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
The task may fail to get free space though it is enough when multi-task
space allocation and caching space happen at the same time.
Task1 Caching Thread Task2
------------------------------------------------------------------------
find_free_extent
The space has not
be cached, and start
caching thread. And
wait for it.
cache space, if
the space is > 2MB
wake up Task1
find_free_extent
get all the space that
is cached.
try to allocate space,
but there is no space
now.
trigger BUG_ON()
The message is following:
btrfs allocation failed flags 1, wanted 4096
space_info has 1040187392 free, is not full
space_info total=1082130432, used=4096, pinned=41938944, reserved=0, may_use=40828928, readonly=0
block group 12582912 has 8388608 bytes, 0 used 8388608 pinned 0 reserved
block group has cluster?: no
0 blocks of free space at or bigger than bytes is
block group 1103101952 has 1073741824 bytes, 4096 used 33550336 pinned 0 reserved
block group has cluster?: no
0 blocks of free space at or bigger than bytes is
------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:835!
[<ffffffffa031261b>] __extent_writepage+0x1bf/0x5ce [btrfs]
[<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
[<ffffffffa02f8ada>] ? wait_current_trans+0x23/0xec [btrfs]
[<ffffffff810c3fbf>] ? find_get_pages_tag+0x73/0xe2
[<ffffffffa0312d12>] extent_write_cache_pages.clone.0+0x176/0x29a [btrfs]
[<ffffffffa0312e74>] extent_writepages+0x3e/0x53 [btrfs]
[<ffffffff8110ad2c>] ? do_sync_write+0xc6/0x103
[<ffffffffa0302d6e>] ? btrfs_submit_direct+0x414/0x414 [btrfs]
[<ffffffff811380fa>] ? fsnotify+0x236/0x266
[<ffffffffa02fc930>] btrfs_writepages+0x22/0x24 [btrfs]
[<ffffffff810cc215>] do_writepages+0x1c/0x25
[<ffffffff810c4958>] __filemap_fdatawrite_range+0x4e/0x50
[<ffffffff810c4982>] filemap_write_and_wait_range+0x28/0x51
[<ffffffffa0306b2e>] btrfs_sync_file+0x7d/0x198 [btrfs]
[<ffffffff8110aa26>] ? fsnotify_modify+0x5d/0x65
[<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
[<ffffffff8112d170>] vfs_fsync+0x17/0x19
[<ffffffff8112d316>] do_fsync+0x29/0x3e
[<ffffffff8112d348>] sys_fsync+0xb/0xf
[<ffffffff81468352>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP [<ffffffffa02fe08c>] cow_file_range+0x1c4/0x32b [btrfs]
We fix this bug by trying to allocate the space again if there are block groups
in caching.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
In btrfs_get_acl(), when the second __btrfs_getxattr() call fails,
acl is not correctly set.
Therefore, a wrong value might return to the caller.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Free space items are located in tree of tree roots, not in the extent
tree. It didn't pop up because lookup_free_space_inode() grabs the
inode all the time instead of actually searching the tree.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
To reproduce the bug:
# mount -o nodatacow /dev/sda7 /mnt/
# dd if=/dev/zero of=/mnt/tmp bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000136115 s, 30.1 MB/s
# dd if=/dev/zero of=/mnt/tmp bs=4K count=1 conv=notrunc oflag=direct
dd: writing `/mnt/tmp': Input/output error
1+0 records in
0+0 records out
btrfs_ordered_update_i_size() may return 1, but btrfs_endio_direct_write()
mistakenly takes it as an error.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
It's not a big deal if we fail to allocate the array, and instead of
panic we can just give up compressing.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
We should retirn EINVAL if the start is beyond the end of the file
system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate
check for it.
Also in the btrfs_trim_fs() it is possible that len+start might overflow
if big values are passed. Fix it by decrementing the len so that start+len
is equal to the file system size in the worst case.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
We won't defrag an extent, if it's bigger than the threshold we
specified and there's no small extent before it, but actually
the code doesn't work this way.
There are three bugs:
- When should_defrag_range() decides we should keep on defragmenting
an extent, last_len is not incremented. (old bug)
- The length that passes to should_defrag_range() is not the length
we're going to defrag. (new bug)
- We always defrag 256K bytes data, and a big extent can be part of
this range. (new bug)
For a file with 4 extents:
| 4K | 4K | 256K | 256K |
The result of defrag with (the default) 256K extent thresh should be:
| 264K | 256K |
but with those bugs, we'll get:
| 520K |
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
It's off-by-one, and thus we may skip the last page while defragmenting.
An example case:
# create /mnt/file with 2 4K file extents
# btrfs fi defrag /mnt/file
# sync
# filefrag /mnt/file
/mnt/file: 2 extents found
So it's not defragmented.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Don't use inode->i_size directly, since we're not holding i_mutex.
This also fixes another bug, that i_size can change after it's checked
against 0 and then (i_size - 1) can be negative.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Offset field in data extent backref can underflow if clone range ioctl
is used. We can reliably detect the underflow because max file size is
limited to 2^63 and max data extent size is limited by block group size.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Since 8-bit temperature values are now handled in 16-bit struct
members, values have to be cast to s8 for negative temperatures to be
properly handled. This is broken since kernel version 2.6.39
(commit bce26c58df86599c9570cee83eac58bdaae760e4.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
I don't usually pay much attention to the stale "? " addresses in
stack backtraces, but this lucky report from Pawel Sikora hints that
mremap's move_ptes() has inadequate locking against page migration.
3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
kernel BUG at include/linux/swapops.h:105!
RIP: 0010:[<ffffffff81127b76>] [<ffffffff81127b76>]
migration_entry_wait+0x156/0x160
[<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
[<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
[<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
[<ffffffff81102a31>] handle_mm_fault+0x181/0x310
[<ffffffff81106097>] ? vma_adjust+0x537/0x570
[<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
[<ffffffff81421d5f>] page_fault+0x1f/0x30
mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
and pagetable locks, were good enough before page migration (with its
requirement that every migration entry be found) came in, and enough
while migration always held mmap_sem; but not enough nowadays, when
there's memory hotremove and compaction.
The danger is that move_ptes() lets a migration entry dodge around
behind remove_migration_pte()'s back, so it's in the old location when
looking at the new, then in the new location when looking at the old.
Either mremap's move_ptes() must additionally take anon_vma lock(), or
migration's remove_migration_pte() must stop peeking for is_swap_entry()
before it takes pagetable lock.
Consensus chooses the latter: we prefer to add overhead to migration
than to mremapping, which gets used by JVMs and by exec stack setup.
Reported-and-tested-by: Paweł Sikora <pluto@agmk.net>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently no type of alignment is specified for PCI expansion roms while
parsing the openfirmware tree. This causes calls to pci_map_rom() to fail.
IORESOURCE_SIZEALIGN is the default alignment used for rom resouces in
pci/probe.c, and has been verified to work with various cards on a ultra 10.
Signed-off-By: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
we should decrease ops->unresolved_rules when deleting a unresolved rule.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the wrong parameter for setting EEE for RTL8111E-VL.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to commit 92fc43b415 ("r8169: modify the
flow of the hw reset."), rtl8169_hw_reset stomps during driver shutdown on
RxConfig bits which are needed for WOL on some versions of the hardware.
As these bits were formerly set from the r81{0x, 68}_pll_power_down methods,
factor them out for use in the driver shutdown (rtl_shutdown) handler.
I favored __rtl8169_get_wol() -hardware state indication- over
RTL_FEATURE_WOL as the latter has become a good candidate for removal.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
Tested-by: Marc Ballarin <ballarin.marc@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Breno Leitao has passed the maintainership to me.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed we had a little bit of latency when writing out the space cache
inodes. It's because we flush it before we write anything in case we have dirty
pages already there. This doesn't matter though since we're just going to
overwrite the space, and there really shouldn't be any dirty pages anyway. This
makes some of my tests run a little bit faster. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Mitch kept hitting a panic because he was getting ENOSPC. One of my previous
patches makes it so we are much better at not allocating new metadata chunks.
Unfortunately coupled with the overcommit patch this works us into a bit of a
problem if we are removing a bunch of space and end up chewing up all of our
space with pinned extents. We can allocate chunks fine and overflow is ok, but
the only way to reclaim this space is to commit the transaction. So if we go to
overcommit, first check and see how much pinned space we have. If we have more
than 80% of the free space chewed up with pinned extents, just commit the
transaction, this will free up enough space for our reservation and we won't
have this problem anymore. With this patch Mitch's test doesn't blow up
anymore. Thanks,
Reported-and-tested-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Josef Bacik <josef@redhat.com>
Currently btrfs_block_rsv_check does 2 things, it will either refill a block
reserve like in the truncate or refill case, or it will check to see if there is
enough space in the global reserve and possibly refill it. However because of
overcommit we could be well overcommitting ourselves just to try and refill the
global reserve, when really we should just be committing the transaction. So
breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill
will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
it will only tell you if the factor of the total space is still reserved.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
In __unlink_start_trans() if we don't have enough room for a reservation we will
check to see if the unlink will free up space. If it does that's great, but we
will still could add an orphan item, so we need to reserve enough space to add
the orphan item. Do this and migrate the space the global reserve so it all
works out right. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We started setting trans->block_rsv = NULL to allow the delayed refs flushing
stuff to use the right block_rsv and then just made
btrfs_trans_release_metadata() unconditionally use the trans block rsv. The
problem with this is we need to reserve some space in the transaction and then
migrate it to the global block rsv, so we need to be able to free that out
properly. So instead just move btrfs_trans_release_metadata() before the
delayed ref flushing and use trans->block_rsv for the freeing. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Currently we only allow a maximum of 2 megabytes of pages to be flushed at a
time. This was ok before, but now we have overcommit which will screw us in a
heartbeat if we are quickly filling the disk. So instead pick either 2
megabytes or the number of pages we need to reclaim to be safe again, which ever
is larger. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
The only way we actually reclaim delalloc space is waiting for the IO to
completely finish. Usually we kick off a bunch of IO and wait for a little bit
and hope we can make our reservation, and usually this works out pretty well.
With overcommit however we can get seriously underwater if we're filling up the
disk quickly, so we need to be able to force the delalloc shrinker to wait for
the ordered IO to finish to give us a better chance of actually reclaiming
enough space to get our reservation. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Before the only reason to commit the transaction to recover space in
reserve_metadata_bytes() was if there were enough pinned_bytes to satisfy our
reservation. But now we have the delayed inode stuff which will hold it's
reservations until we commit the transaction. So say we max out our reservation
by creating a bunch of files but don't have any pinned bytes we will ENOSPC out
early even though we could commit the transaction and get that space back. So
now just unconditionally commit the transaction since currently there is no way
to know how much metadata space is being reserved by delayed inode stuff.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Recently I changed the xattr stuff to unconditionally set the xattr first in
case the xattr didn't exist yet. This has introduced a regression when setting
an xattr that already exists with a large value. If we find the key we are
looking for split_leaf will assume that we're extending that item. The problem
is the size we pass down to btrfs_search_slot includes the size of the item
already, so if we have the largest xattr we can possibly have plus the size of
the xattr item plus the xattr item that btrfs_search_slot we'd overflow the
leaf. Thankfully this is not what we're doing, but split_leaf doesn't know this
so it just returns EOVERFLOW. So in the xattr code we need to check and see if
we got back EOVERFLOW and treat it like EEXIST since that's really what
happened. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Our unlink reservations were a bit much, we were reserving 10 and I only count 8
possible items we're touching, so comment what we're reserving for and fix the
count value. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
I noticed recently that my overcommit patch was causing one of my enospc tests
to fail 25% of the time with early ENOSPC. This is because my overcommit patch
was letting us go way over board, but it wasn't waiting long enough to let the
delalloc shrinker do it's job. The problem is we just start writeback and wait
a little bit hoping we flush enough, but we only free up delalloc space by
having the writes complete all the way. We do this by waiting for ordered
extents, which we do but only if we already free'd enough for the reservation,
which isn't right, we should flush ordered extents if we didn't reclaim enough
in case that will push us over the edge. With this patch I've not seen a
failure in this enospc test after running it in a loop for an hour. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>