To improve recovery speed, f2fs try to readahead many contiguous blocks in warm
node segment, but for most time, abnormal power-off do not occur frequently, so
when mount a normal power-off f2fs image, by contrary ra so many blocks and then
invalid them will hurt the performance of mount.
It's better to just ra the first next-block for normal condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch does cleanup work, it introduces is_valid_blkaddr() to include
verification code for blkaddr with upper and down boundary value which were in
ra_meta_pages previous.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1.We use zero as upper boundary value for ra SSA/CP blocks, we will skip
readahead as verification failure with max number, it causes low performance.
2.Low boundary value is not accurate for SSA/CP/POR region verification, so
these values need to be redefined.
This patch fixes above issues.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As inline_{dir,inode} stat is increased/decreased concurrently by multi threads,
so the value is not so accurate, let's use atomic type for counting accurately.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added some commentaries for code readability and cleaned up if-statement
clearly.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If inode state is dirty, go straight to write.
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The inmemory pages should be handled by invalidate_page since it needs to be
released int the truncation path.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In do_read_inode, if we failed __recover_inline_status, the inode has inline
flag without increasing its count.
Later, f2fs_evict_inode will decrease the count, which causes -1.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch revists retrial paths in f2fs.
The basic idea is to use cond_resched instead of retrying from the very early
stage.
Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch tries to fix:
BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384
(radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200)
(radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c)
(gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400)
(f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c)
(gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac)
(kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c)
The reason is that f2fs calls radix_tree_insert under enabled preemption.
So, before calling it, we need to call radix_tree_preload.
Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or
semaphore to cover the radix tree operations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previoulsy, we used rwlock for nat_entry lock.
But, now we have a lot of complex operations in set_node_addr.
(e.g., allocating kernel memories, handling radix_trees, and so on)
So, this patches tries to change spinlock to rw_semaphore to give CPUs to other
threads.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If there are many inodes that have data blocks in victim segment,
it takes long time to find a inode in gc_inode list.
Let's use radix_tree to reduce lookup time.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We've already made fi and sbi for inode. Let's avoid duplicated work.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fix the wrong error number in error path of f2fs_write_begin.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Little cleanup to distinguish each phase easily
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: modify indentation for code readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If an inode has converted inline_data which was written to the disk, we should
set its inode flag for further fsync so that this inline_data can be recovered
from sudden power off.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After flushing dirty nat entries, it has to be no more dirty nat
entries.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in
journal. And although there are rooms for dirty nat entires if dirty_nat_cnt
is zero, it's also meaningless to check __has_cursum_space.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A deadlock can be occurred:
Thread 1] Thread 2]
- f2fs_write_data_pages - f2fs_write_begin
- lock_page(page #0)
- grab_cache_page(page #X)
- get_node_page(inode_page)
- grab_cache_page(page #0)
: to convert inline_data
- f2fs_write_data_page
- f2fs_write_inline_data
- get_node_page(inode_page)
In this case, trying to lock inode_page and page #0 causes deadlock.
In order to avoid this, this patch adds a rule for this locking policy,
which is that page #0 should be locked followed by inode_page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Two jump labels were adjusted in the implementation of the
create_node_manager_caches() function because these identifiers
contained typos.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_evict_inode,
commit_inmemory_pages
f2fs_gc
f2fs_iget
iget_locked
-> wait for inode free
Here, if the inode is same as the one to be evicted, f2fs should wait forever.
Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid
this.
But, the commit_inmem_pages calls f2fs_balance_fs by default, even if
f2fs_evict_inode wants to free inmemory pages only.
Hence, this patch adds to trigger f2fs_balance_fs only when there is something
to write.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It used nat_entry_set when create slab for sit_entry_set.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If a node page is request to be written during the reclaiming path, we should
submit the bio to avoid pending to recliam it.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO,
and we manage fields related to inode cache separately in struct f2fs_sb_info
for each inode cache type.
This makes codes a bit messy, so that this patch intorduce a new struct
inode_management to wrap inner fields as following which make codes more neat.
/* for inner inode cache management */
struct inode_management {
struct radix_tree_root ino_root; /* ino entry array */
spinlock_t ino_lock; /* for ino entry lock */
struct list_head ino_list; /* inode list head */
unsigned long ino_num; /* number of entries */
};
struct f2fs_sb_info {
...
struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */
...
}
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Because we have checked the contrary condition in case of "if" judgment, we do
not need to check the condition again in case of "else" judgment. Let's remove
it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new
option is set without BG_GC, then if any error occurred in the following
procedure, we can restore to start the gc thread.
But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is
not set in new option, so we'd better move this condition judgment out of
start_gc_thread to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The locked page should be released before returning the function.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode.
Otherwise, we can make some dirty pages during the truncation, and those pages
will be written through f2fs_write_data_page.
At that moment, the inode has still inline_data, so that it tries to write non-
zero pages into inline_data area.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The scenario is like this.
One trhead triggers:
f2fs_write_data_pages
lock_page
f2fs_write_data_page
f2fs_lock_op <- wait
The other thread triggers:
f2fs_truncate
truncate_blocks
f2fs_lock_op
truncate_partial_data_page
lock_page <- wait for locking the page
This patch resolves this bug by relocating truncate_partial_data_page.
This function is just to truncate user data page and not related to FS
consistency as well.
And, we don't need to call truncate_inline_data. Rather than that,
f2fs_write_data_page will finally update inline_data later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The # of inline_data inode is decreased only when it has inline_data.
After clearing the flag, we can't decreased the number.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to write node pages if checkpoint is not doing in order to avoid
memory pressure.
Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to control the memory footprint used by ino entries.
This will conduct best effort, not strictly.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
__submit_merged_bio f2fs_write_end_io f2fs_write_end_io
wait_io = X wait_io = x
complete(X) complete(X)
wait_io = NULL
wait_for_completion()
free(X)
spin_lock(X)
kernel panic
In order to avoid this, this patch removes the wait_io facility.
Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If there is a chance to make a huge sized discard command, we don't need
to split it out, since each blkdev_issue_discard should wait one at a
time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There's no point in using test_and_clear_bit_le() when we don't use the
return value of the function. Just use clear_bit_le() instead.
Coverity-id: 1016434
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>