Commit graph

246 commits

Author SHA1 Message Date
Tao Ma
0edeb71dc9 ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We have found some bugs that the flag is set while leaving
i_aiodio_unwritten unchanged(commit 32c80b32c0). So this patch just tries
to create a helper function to wrap them to avoid any future bug.
The idea is inspired by Eric.

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31 17:30:44 -04:00
Yongqiang Yang
e7b319e397 ext4: trace punch_hole correctly in ext4_ext_map_blocks
When ext4_ext_map_blocks() is called by punch_hole, trace should
trace blocks punched out.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:39:51 -04:00
Yongqiang Yang
02dc62fba8 ext4: clean up AGGRESSIVE_TEST code
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:29:11 -04:00
Yongqiang Yang
81fdbb4a8d ext4: move variables to their scope
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:23:38 -04:00
Eric Gouriou
80e675f906 ext4: optimize memmmove lengths in extent/index insertions
ext4_ext_insert_extent() (respectively ext4_ext_insert_index())
was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine
how many entries needed to be moved beyond the insertion point.
In practice this means that (320 - I) * 24 bytes were memmove()'d
when I is the insertion point, rather than (#entries - I) * 24 bytes.

This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead
to only move existing entries. The code flow is also simplified
slightly to highlight similarities and reduce code duplication in
the insertion logic.

This patch reduces system CPU consumption by over 25% on a 4kB
synchronous append DIO write workload when used with the
pre-2.6.39 x86_64 memmove() implementation. With the much faster
2.6.39 memmove() implementation we still see a decrease in
system CPU usage between 2% and 7%.

Note that the ext_debug() output changes with this patch, splitting
some log information between entries. Users of the ext_debug() output
should note that the "move %d" units changed from reporting the number
of bytes moved to reporting the number of entries moved.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:52:18 -04:00
Eric Gouriou
6f91bc5fda ext4: optimize ext4_ext_convert_to_initialized()
This patch introduces a fast path in ext4_ext_convert_to_initialized()
for the case when the conversion can be performed by transferring
the newly initialized blocks from the uninitialized extent into
an adjacent initialized extent. Doing so removes the expensive
invocations of memmove() which occur during extent insertion and
the subsequent merge.

In practice this should be the common case for clients performing
append writes into files pre-allocated via
fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
direct IO and when using a suboptimal implementation of memmove()
(x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
consumption by 32%.

Two new trace points are added to ext4_ext_convert_to_initialized()
to offer visibility into its operations. No exit trace point has
been added due to the multiplicity of return points. This can be
revisited once the upstream cleanup is backported.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:43:23 -04:00
Tao Ma
b3ff056908 ext4: don't check io->flag when setting EXT4_STATE_DIO_UNWRITTEN inode state
When we want to convert the unitialized extent in direct write, we can
either do it in ext4_end_io_nolock(AIO case) or in
ext4_ext_direct_IO(non AIO case) and EXT4_I(inode)->cur_aio_dio is a
guard for ext4_ext_map_blocks to find the right case.  In e9e3bcecf,
we mistakenly change it by:

-			if (io)
+			if (io && !(io->flag & EXT4_IO_END_UNWRITTEN)) {
 				io->flag = EXT4_IO_END_UNWRITTEN;
-			else
+				atomic_inc(&EXT4_I(inode)->i_aiodio_unwritten);
+			} else
 				ext4_set_inode_state(inode,
 						     EXT4_STATE_DIO_UNWRITTEN);

So now if we map 2 blocks, and the first one set the
EXT_IO_END_UNWRITTEN, the 2nd mapping will set inode state because of
the check for the flag. This is wrong.

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 11:08:39 -04:00
Curt Wohlgemuth
6f8ff53726 ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch().  In
such a case, dereferencing it will crash the machine.

This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.

Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 04:38:59 -04:00
Dan Carpenter
f85b287a01 ext4: error handling fix in ext4_ext_convert_to_initialized()
When allocated is unsigned it breaks the error handling at the end
of the function when we call:
	allocated = ext4_split_extent(...);
	if (allocated < 0)
		err = allocated;

I've made it a signed int instead of unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:42:36 -04:00
Dmitry Monakhov
a4e5d88b1b ext4: update EOFBLOCKS flag on fallocate properly
EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
Currently it happens only if new extent was allocated.

TESTCASE:
fallocate test_file -n -l4096
fallocate test_file -l4096
Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
fsck will complain about that.

Also remove ping pong in ext4_fallocate() in case of new extents,
where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
ext4_falloc_update_inode() restore it again.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25 08:15:12 -04:00
Dmitry Monakhov
750c9c47a5 ext4: remove messy logic from ext4_ext_rm_leaf
- Both callers(truncate and punch_hole) already aligned left end point
  so we no longer need split logic here.
- Remove dead duplicated code.
- Call ext4_ext_dirty only after we have updated eh_entries, otherwise
  we'll loose entries update. Regression caused by d583fb87a3
  266'th testcase in xfstests (http://patchwork.ozlabs.org/patch/120872)

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25 05:35:05 -04:00
Dmitry Monakhov
1939dd84b3 ext4: cleanup ext4_ext_grow_indepth code
Currently code make an impression what grow procedure is very complicated
and some mythical paths, blocks are involved. But in fact grow in depth
it relatively simple procedure:
 1) Just create new meta block and copy root data to that block.
 2) Convert root from extent to index if old depth == 0
 3) Update root block pointer

This patch does:
 - Reorganize code to make it more self explanatory
 - Do not pass path parameter to new_meta_block() in order to
   provoke allocation from inode's group because top-level block
   should site closer to it's inode, but not to leaf data block.

   [ This happens anyway, due to logic in mballoc; we should drop
     the path parameter from new_meta_block() entirely.  -- tytso ]

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-22 01:26:05 -04:00
H Hartley Sweeten
ee90d57e20 ext4: quiet sparse noise about plain integer as NULL pointer
The third parameter to ext4_free_blocks is a struct buffer_head *.  This
parameter should be NULL not 0.

This quiets the sparse noise:

warning: Using plain integer as NULL pointer

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 11:01:51 -04:00
Tao Ma
f472e02669 ext4: avoid stamping on other memories in ext4_ext_insert_index()
Add a sanity check to make sure ix hasn't gone beyond the valid bounds
of the extent block.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-17 10:13:46 -04:00
Tao Ma
6ee3b21224 ext4: use le32_to_cpu for ext4_extent_idx.ei_block in ext4_ext_search_left()
ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in
ext4_ext_search_left().

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 16:08:34 -04:00
Tao Ma
df3ab17072 ext4: fix the comment describing ext4_ext_search_right()
The comment describing what ext4_ext_search_right() does is incorrect.
We return 0 in *phys when *logical is the 'largest' allocated block,
not smallest.  

Fix a few other typos while we're at it.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-10-08 15:53:49 -04:00
Aditya Kali
5356f2615c ext4: attempt to fix race in bigalloc code path
Currently, there exists a race between delayed allocated writes and
the writeback when bigalloc feature is in use. The race was because we
wanted to determine what blocks in a cluster are under delayed
allocation and we were using buffer_delayed(bh) check for it. But, the
writeback codepath clears this bit without any synchronization which
resulted in a race and an ext4 warning similar to:

EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
		reserved data blocks

The race existed in two places.
(1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
    writeback code path.
(2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
    buffer_delayed(bh) is set.

To fix (1), this patch introduces a new buffer_head state bit -
BH_Da_Mapped.  This bit is set under the protection of
EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
allocated blocks during the writeout time. We can now reliably check
for this bit inside ext4_find_delalloc_range() to determine whether
the reservation for the blocks have already been claimed or not.

To fix (2), it was necessary to set buffer_delay(bh) under the
protection of i_data_sem.  So, I extracted the very beginning of
ext4_map_blocks into a new function - ext4_da_map_blocks() - and
performed the required setting of bh_delay bit and the quota
reservation under the protection of i_data_sem.  These two fixes makes
the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
thus removing the race.

Tested: I was able to reproduce the problem by running 'dd' and
'fsync' in parallel. Also, xfstests sometimes used to reproduce this
race. After the fix both my test and xfstests were successful and no
race (warning message) was observed.

Google-Bug-Id: 4997027

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09 19:20:51 -04:00
Aditya Kali
d8990240d8 ext4: add some tracepoints in ext4/extents.c
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
ext4/inode.c.

Tested: Built and ran the kernel and verified that these tracepoints work.
Also ran xfstests.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09 19:18:51 -04:00
Aditya Kali
7b415bf60f ext4: Fix bigalloc quota accounting and i_blocks value
With bigalloc changes, the i_blocks value was not correctly set (it was still
set to number of blocks being used, but in case of bigalloc, we want i_blocks
to represent the number of clusters being used). Since the quota subsystem sets
the i_blocks value, this patch fixes the quota accounting and makes sure that
the i_blocks value is set correctly.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09 19:04:51 -04:00
Theodore Ts'o
0aa060000e ext4: teach ext4_ext_truncate() about the bigalloc feature
When we are truncating (as opposed unlinking) a file, we need to worry
about partial truncates of a file, especially in the light of sparse
files.  The changes here make sure that arbitrary truncates of sparse
files works correctly.  Yeah, it's messy.

Note that these functions will need to be revisted when the punch
ioctl is integrated --- in fact this commit will probably have merge
conflicts with the punch changes which Allison Henders and the IBM LTC
have been working on.  I will need to fix this up when either patch
hits mainline.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09 18:54:51 -04:00
Theodore Ts'o
4d33b1ef10 ext4: teach ext4_ext_map_blocks() about the bigalloc feature
If we need to allocate a new block in ext4_ext_map_blocks(), the
function needs to see if the cluster has already been allocated.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09 18:52:51 -04:00
Allison Henderson
189e868fa8 ext4: fix fsx truncate failure
While running extended fsx tests to verify the first
two patches, a similar bug was also found in the
truncate operation.

This bug happens because the truncate routine only zeros
the unblock aligned portion of the last page.  This means
that the block aligned portions of the page appearing after
i_size are left unzeroed, and the buffer heads still mapped.

This bug is corrected by using ext4_discard_partial_page_buffers
in the truncate routine to zero the partial page and unmap
the buffer headers.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-06 21:49:44 -04:00
Theodore Ts'o
9ea7a0df63 jbd2: add debugging information to jbd2_journal_dirty_metadata()
Add debugging information in case jbd2_journal_dirty_metadata() is
called with a buffer_head which didn't have
jbd2_journal_get_write_access() called on it, or if the journal_head
has the wrong transaction in it.  In addition, return an error code.
This won't change anything for ocfs2, which will BUG_ON() the non-zero
exit code.

For ext4, the caller of this function is ext4_handle_dirty_metadata(),
and on seeing a non-zero return code, will call __ext4_journal_stop(),
which will print the function and line number of the (buggy) calling
function and abort the journal.  This will allow us to recover instead
of bug halting, which is better from a robustness and reliability
point of view.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-04 10:18:14 -04:00
Allison Henderson
2be4751b21 ext4: fix 2nd xfstests 127 punch hole failure
This patch fixes a second punch hole bug found by xfstests 127.

This bug happens because punch hole needs to flush the pages
of the hole to avoid race conditions.  But if the end of the
hole is in the same page as i_size, the buffer heads beyond
i_size need to be unmapped and the page needs to be zeroed
after it is flushed.

To correct this, the new ext4_discard_partial_page_buffers
routine is used to zero and unmap the partial page
beyond i_size if the end of the hole appears in the same
page as i_size.

The code has also been optimized to set the end of the hole
to the page after i_size if the specified hole exceeds i_size,
and the code that flushes the pages has been simplified.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-09-03 11:56:52 -04:00
Allison Henderson
ba06208a13 ext4: fix xfstests 75, 112, 127 punch hole failure
This patch addresses a bug found by xfstests 75, 112, 127
when blocksize = 1k

This bug happens because the punch hole code only zeros
out non block aligned regions of the page.  This means that if the
blocks are smaller than a page, then the block aligned regions of
the page inside the hole are left un-zeroed, and their buffer heads
are still mapped.  This bug is corrected by using
ext4_discard_partial_page_buffers to properly zero the partial page
at the head and tail of the hole, and unmap the corresponding buffer
heads

This patch also addresses a bug reported by Lukas while working on a
new patch to add discard support for loop devices using punch hole.
The bug happened because of the first and last block number
needed to be cast to a larger data type before calculating the
byte offset, but since now we only need the byte offsets of the
pages, we no longer even need to be calculating the byte offsets
of the blocks.  The code to do the block offset calculations is
removed in this patch.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-09-03 11:55:59 -04:00
Utako Kusaka
29ae07b702 ext4: Fix overflow caused by missing cast in ext4_fallocate()
The logical block number in map.l_blk is a __u32, and so before we
shift it left, by the block size, we neeed cast it to a 64-bit size.

Otherwise i_size can be corrupted on an ENOSPC.

# df -T /mnt/mp1
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda6     ext4     9843276    153056   9190200   2% /mnt/mp1
# fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile
fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device
# stat /mnt/mp1/testfile
  File: `/mnt/mp1/testfile'
  Size: 4293656576	Blocks: 19380440   IO Block: 4096   regular file
Device: 806h/2054d	Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-07-25 13:01:31.414490496 +0900
Modify: 2011-07-25 13:01:31.414490496 +0900
Change: 2011-07-25 13:01:31.454490495 +0900

Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--
 fs/ext4/extents.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
2011-07-27 22:11:20 -04:00
Robin Dong
0e1147b001 ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
The old function ext4_ext_rm_idx is used only for truncate case
because it just remove last index in extent-index-block. When punching
hole, it usually needed to remove "middle" index, therefore we must
move indexes which after it forward.

(I create a file with 1 depth extent tree and punch hole in the middle
of it, the last index in index-block strangly gone, so I find out this
bug)

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-27 21:29:33 -04:00
Robin Dong
b7ca1e8ec5 ext4: correct comment for ext4_ext_check_cache
The comment for ext4_ext_check_cache has a litte mistake.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23 21:53:25 -04:00
Robin Dong
0737964bc9 ext4: correct the debug message in ext4_ext_insert_extent
The debug message in ext4_ext_insert_extent before moving extent
is incorrect (the "from xx to xx").

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23 21:51:07 -04:00
Robin Dong
5718789da5 ext4: remove unused argument in ext4_ext_next_leaf_block
The argument "inode" in function ext4_ext_next_allocated_block looks useless,
so clean it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-23 21:49:07 -04:00
Robin Dong
d46203159e ext4: avoid eh_entries overflow before insert extent_idx
If eh_entries is equal to (or greater than) eh_max, the operation of
inserting new extent_idx will make number of entries overflow.
So check eh_entries before inserting the new extent_idx.

Although there is no bug case according the code (function
ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split
is called only if the index block has free space), the right logic
should be "lookup the capacity before insertion".

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-17 23:43:42 -04:00
Robin Dong
015861badd ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT
This patch avoids an extraneous lookup of the extent cache
in ext4_ext_map_blocks() when the flag
EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent.

The existing logic was performing the lookup but not making
use of the result. The patch simply reverses the order of evaluation
in the condition.

Since ext4_ext_in_cache() does not initialize newex on misses, bypassing
its invocation does not introduce any new issue in this regard.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Eric Gouriou <egouriou@google.com>
2011-07-17 23:27:43 -04:00
Allison Henderson
c6a0371cbe ext4: remove unneeded parameter to ext4_ext_remove_space()
This patch removes the extra parameter in ext4_ext_remove_space()
which is no longer needed.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-17 23:21:03 -04:00
Allison Henderson
f7d0d3797f ext4: punch hole optimizations: skip un-needed extent lookup
This patch optimizes the punch hole operation by skipping the
tree walking code that is used by truncate.  Since punch hole
is done through map blocks, the path to the extent is already
known in this function, so we do not need to look it up again.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-17 23:17:02 -04:00
Robin Dong
598dbdf243 ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents
Optimize ext4_ext_insert_extent() by avoiding
ext4_ext_next_leaf_block() when the result is not used/needed.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-11 18:24:01 -04:00
Robin Dong
ffb505ff0f ext4: remove redundant goto in ext4_ext_insert_extent()
If eh->eh_entries is smaller than eh->eh_max, the routine will
go to the "repeat" and then go to "has_space" directlly ,
since argument "depth" and "eh" are not even changed.

Therefore, goto "has_space" directly and remove redundant "repeat" tag.

Signed-off-by: Robin Dong <sanbai@taobao.com>
2011-07-11 11:43:59 -04:00
Jiaying Zhang
575a1d4bdf ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails
Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-07-10 20:07:25 -04:00
Maxim Patlasov
7132de744b ext4: fix i_blocks/quota accounting when extent insertion fails
The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.

However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:

1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
   ext4_ext_insert_extent() and calls ext4_free_blocks().

In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:

> Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
because i_blocks was erroneously decremented as explained above.

The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.

Signed-off-by: Maxim Patlasov <maxim.patlasov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-07-10 19:37:48 -04:00
Yongqiang Yang
9331b62610 ext4: quiet 'unused variables' compile warnings
Unused variables was deleted.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-28 10:19:05 -04:00
Eric Sandeen
f86186b44b ext4: refactor duplicated block placement code
I found that ext4_ext_find_goal() and ext4_find_near()
share the same code for returning a coloured start block
based on i_block_group.

We can refactor this into a common function so that they
don't diverge in the future.

Thanks to adilger for suggesting the new function name.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-28 10:01:31 -04:00
Robin Dong
ed7a7e1672 ext4: fix incorrect error msg in ext4_ext_insert_index
In function ext4_ext_insert_index when eh_entries of curp is
bigger than eh_max, error messages will be printed out, but the content
is about logical and ei_block, that's incorret.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-27 15:35:53 -04:00
Lukas Czerner
c03f8aa9ab ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
Currently we are not marking the extent as the last one
(FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is
because we just do not check for it right now and continue searching for
next extent. But at the point we hit the hole at the end of the file, it
is too late.

This commit adds check for the allocated block in subsequent extent and
if there is no more extents (block = EXT_MAX_BLOCKS) just flag the
current one as the last one.

This behaviour has been spotted unintentionally by 252 xfstest, when the
test hangs out, because of wrong loop condition. However on other
filesystems (like xfs) it will exit anyway, because we notice the last
extent flag and exit.

With this patch xfstest 252 does not hang anymore, ext4 fiemap
implementation still reports bad extent type in some cases, however
this seems to be different issue.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-06 00:06:52 -04:00
Lukas Czerner
f17722f917 ext4: Fix max file size and logical block counting of extent format file
Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

The root cause of the problem lies in the fact that we specifically set
s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
which is 32 bit long. However, we are not storing start and end block
number, but rather start block number and length in blocks. It means
that in order to cover extent from 0 to EXT_MAX_BLOCK we need
EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
and it does not.

The only way to fix it without changing the meaning of the struct
ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
by one fs block so we can cover the whole extent we can get by the
on-disk extent format.

Also in many places EXT_MAX_BLOCK is used as length instead of maximum
logical block number as the name suggests, it is all a bit messy. So
this commit renames it to EXT_MAX_BLOCKS and change its usage in some
places to actually be maximum number of blocks in the extent.

The bug which this commit fixes can be reproduced as follows:

 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
 sync
 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-06 00:05:17 -04:00
Yongqiang Yang
1b16da77f9 ext4: teach ext4_ext_split to calculate extents efficiently
Make ext4_ext_split() get extents to be moved by calculating in a statement
instead of counting in a loop.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-25 17:41:48 -04:00
Vivek Haldar
556b27abf7 ext4: do not normalize block requests from fallocate()
Currently, an fallocate request of size slightly larger than a power of
2 is turned into two block requests, each a power of 2, with the extra
blocks pre-allocated for future use. When an application calls
fallocate, it already has an idea about how large the file may grow so
there is usually little benefit to reserve extra blocks on the
preallocation list. This reduces disk fragmentation.

Tested: fsstress. Also verified manually that fallocat'ed files are
contiguously laid out with this change (whereas without it they begin at
power-of-2 boundaries, leaving blocks in between). CPU usage of
fallocate is not appreciably higher.  In a tight fallocate loop, CPU
usage hovers between 5%-8% with this change, and 5%-7% without it.

Using a simulated file system aging program which the file system to
70%, the percentage of free extents larger than 8MB (as measured by
e2freefrag) increased from 38.8% without this change, to 69.4% with
this change.

Signed-off-by: Vivek Haldar <haldar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-25 07:41:54 -04:00
Allison Henderson
a4bb6b64e3 ext4: enable "punch hole" functionality
This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
and "ext4_ext_check_cache"

fallocate has been modified to call ext4_punch_hole when the punch hole
flag is passed.  At the moment, we only support punching holes in
extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
routine.

The ext4_ext_punch_hole routine first completes all outstanding writes
with the associated pages, and then releases them.  The unblock
aligned data is zeroed, and all blocks in between are punched out.

The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
except it accepts a ext4_ext_cache parameter instead of a ext4_extent
parameter.  This routine is used by ext4_ext_punch_hole to check and
see if a block in a hole that has been cached.  The ext4_ext_cache
parameter is necessary because the members ext4_extent structure are
not large enough to hold a 32 bit value.  The existing
ext4_ext_in_cache routine has become a wrapper to this new function.

[ext4 punch hole patch series 5/5 v7] 

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-25 07:41:50 -04:00
Allison Henderson
e861304b8e ext4: add "punch hole" flag to ext4_map_blocks()
This patch adds a new flag to ext4_map_blocks() that specifies the
given range of blocks should be punched out.  Extents are first
converted to uninitialized extents before they are punched
out. Because punching a hole may require that the extent be split, it
is possible that the splitting may need more blocks than are
available.  To deal with this, use of reserved blocks are enabled to
allow the split to proceed.

The routine then returns the number of blocks successfully
punched out.

[ext4 punch hole patch series 4/5 v7]

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-25 07:41:46 -04:00
Allison Henderson
d583fb87a3 ext4: punch out extents
This patch modifies the truncate routines to support hole punching
Below is a brief summary of the patches changes:

- Added end param to ext_ext4_rm_leaf
        This function has been modified to accept an end parameter
        which enables it to punch holes in leafs instead of just
        truncating them.

- Implemented the "remove head" case in the ext_remove_blocks routine
        This routine is used by ext_ext4_rm_leaf to remove the tail
        of an extent during a truncate.  The new ext_ext4_rm_leaf
        routine will now also use it to remove the head of an extent in the
        case that the hole covers a region of blocks at the beginning
        of an extent.

- Added "end" param to ext4_ext_remove_space routine
        This function has been modified to accept a stop parameter, which
        is passed through to ext4_ext_rm_leaf.

[ext4 punch hole patch series 3/5 v6] 

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-25 07:41:43 -04:00
Allison Henderson
55f020db66 ext4: add flag to ext4_has_free_blocks
This patch adds an allocation request flag to the ext4_has_free_blocks
function which enables the use of reserved blocks.  This will allow a
punch hole to proceed even if the disk is full.  Punching a hole may
require additional blocks to first split the extents.

Because ext4_has_free_blocks is a low level function, the flag needs
to be passed down through several functions listed below:

ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
ext4_ext_split
ext4_ext_new_meta_block
ext4_mb_new_blocks
ext4_claim_free_blocks
ext4_has_free_blocks

[ext4 punch hole patch series 1/5 v7]

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-25 07:41:26 -04:00
Yongqiang Yang
b221349fa8 ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly
To get delayed-extent information, ext4_ext_fiemap_cb() looks up
pagecache, it thus collects information starting from a page's
head block.

If blocksize < pagesize, the beginning blocks of a page may lies
before the request range. So ext4_ext_fiemap_cb() should proceed
ignoring them, because they has been handled before. If no mapped
buffer in the range is found in the 1st page, we need to look up
the 2nd page, otherwise delayed-extents after a hole will be ignored.

Without this patch, xfstests 225 will hung on ext4 with 1K block.

Reported-by: Amir Goldstein <amir73il@users.sourceforge.net>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24 11:36:58 -04:00