Commit graph

1760 commits

Author SHA1 Message Date
Jaegeuk Kim
d50aaeec90 f2fs: show actual device info in tracepoints
This patch shows actual device information in the tracepoints.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:24 -08:00
Jaegeuk Kim
5b6c6be2d8 f2fs: use SSR for warm node as well
We have had node chains, but haven't used it so far due to stale node blocks.
Now, we have crc|cp_ver in node footer and give random cp_ver at format time,
we can start to use it again.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:22 -08:00
Chao Yu
39133a5015 f2fs: enable inline_xattr by default
In android, since SElinux is enable, security policy will be appliedd for
each file, it stores in inode as an xattr entry, so it will take one 4k
size node block additionally for each file.

Let's enable inline_xattr by default in order to save storage space.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:49 -08:00
Chao Yu
23cf7212a1 f2fs: introduce noinline_xattr mount option
This patch introduces new mount option 'noinline_xattr', so we can disable
inline xattr functionality which is already set as a default mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:48 -08:00
Jaegeuk Kim
25cc5d3b9d f2fs: avoid reading NAT page by get_node_info
We've not seen this buggy case for a long time, so it's time to avoid this
unnecessary get_node_info() call which reading NAT page to cache nat entry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:47 -08:00
Jaegeuk Kim
9b064f7d0c f2fs: remove build_free_nids() during checkpoint
Let's avoid build_free_nids() in checkpoint path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:53 -08:00
Chao Yu
d260081ccf f2fs: change recovery policy of xattr node block
Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.

This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.

So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:52 -08:00
Bhumika Goyal
2ad0ef846b f2fs: super: constify fscrypt_operations structure
Declare fscrypt_operations structure as const as it is only stored in
the s_cop field of a super_block structure. This field is of type const,
so fscrypt_operations structure having this property can be made const
too.

File size before: fs/f2fs/super.o
   text	   data	    bss	    dec	    hex	filename
  54131	  31355	    184	  85670	  14ea6	fs/f2fs/super.o

File size after: fs/f2fs/super.o
   text	   data	    bss	    dec	    hex	filename
  54227	  31259	    184	  85670	  14ea6	fs/f2fs/super.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:51 -08:00
Jaegeuk Kim
1200abb26f f2fs: show checkpoint version at mount time
If we mounted f2fs successfully, let's show current checkpoint version.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:50 -08:00
Jaegeuk Kim
7f54f51f46 f2fs: remove preflush for nobarrier case
This patch removes REQ_PREFLUSH in the nobarrier case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:48 -08:00
Jaegeuk Kim
942fd3192f f2fs: check last page index in cached bio to decide submission
If the cached bio has the last page's index, then we need to submit it.
Otherwise, we don't need to submit it and can wait for further IO merges.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:48 -08:00
Jaegeuk Kim
d68f735b3b f2fs: check io submission more precisely
This patch check IO submission more precisely than previous rough check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:47 -08:00
Jaegeuk Kim
f566bae846 f2fs: call internal __write_data_page directly
This patch introduces __write_data_page to call it by f2fs_write_cache_pages
directly..

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:46 -08:00
Jaegeuk Kim
e7c75ab099 f2fs: avoid out-of-order execution of atomic writes
We need to flush data writes before flushing last node block writes by using
FUA with PREFLUSH. We don't need to guarantee precedent node writes since if
those are not written, we can't reach to the last node block when scanning
node block chain during roll-forward recovery.
Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to
disk, which builds a valid node block chain.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:35 -08:00
Jaegeuk Kim
faa24895ac f2fs: move write_node_page above fsync_node_pages
This patch just moves write_node_page and introduces an inner function.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:09:43 -08:00
Jaegeuk Kim
c1b221078b f2fs: move flush tracepoint
This patch moves the tracepoint location for flush command.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:08:43 -08:00
Jaegeuk Kim
a00861dbca f2fs: show # of APPEND and UPDATE inodes
This patch shows cached # of APPEND and UPDATE inode entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:54:53 -08:00
DongOh Shin
cac5a3d8f5 f2fs: fix 446 coding style warnings in f2fs.h
1) Nine coding style warnings below have been resolved:
"Missing a blank line after declarations"

2) 435 coding style warnings below have been resolved:
"function definition argument 'x' should also have an identifier name"

3) Two coding style warnings below have been resolved:
"macros should not use a trailing semicolon"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:55 -08:00
DongOh Shin
c64ab12e36 f2fs: fix 3 coding style errors in f2fs.h
Two coding style errors below have been resolved:
"Macros with complex values should be enclosed in parentheses"

And a coding style error below has been resolved:
"space prohibited before that ',' (ctx:WxW)"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:55 -08:00
Jaegeuk Kim
8ed5974552 f2fs: declare missing static function
We missed two functions declared as static functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:54 -08:00
Kaixu Xia
0cc0dec2b6 f2fs: show the fault injection mount option
This patch shows the fault injection mount option in
f2fs_show_options().

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:53 -08:00
Chao Yu
73545817c9 f2fs: fix null pointer dereference when issuing flush in ->fsync
We only allocate flush merge control structure sbi::sm_info::fcc_info when
flush_merge option is on, but in f2fs_issue_flush we still try to access
member of the control structure without that option, it incurs panic as
show below, fix it.

Call Trace:
 __remove_ino_entry+0xa9/0xc0 [f2fs]
 f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs]
 f2fs_sync_file+0x18/0x20 [f2fs]
 vfs_fsync_range+0x3d/0xb0
 __do_page_fault+0x261/0x4d0
 do_fsync+0x3d/0x70
 SyS_fsync+0x10/0x20
 do_syscall_64+0x6e/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f18ce260de0
RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0
RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003
RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f
R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100
R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100
Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4
RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78
CR2: 0000000000000020
---[ end trace a09314c24f037648 ]---

Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:52 -08:00
Chao Yu
dba79f38bc f2fs: fix to avoid overflow when left shifting page offset
We use following method to calculate size with current page index:
size = index << PAGE_SHIFT
If type of index has only 32-bits size, left shifting will incur overflow,
which makes result incorrect.

So let's cast index with 64-bits type to avoid such issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:51 -08:00
Chao Yu
ba38c27eb9 f2fs: enhance lookup xattr
Previously, in getxattr we will load all entries both in inline xattr and
xattr node block, and then do the lookup in all entries, but our lookup
flow shows low efficiency, since if we can lookup and hit in inline xattr
of inode page cache first, we don't need to load and lookup xattr node
block, which can obviously save cpu time and IO latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize NULL to avoid warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:51 -08:00
Wei Fang
b86e33075e f2fs: fix a dead loop in f2fs_fiemap()
A dead loop can be triggered in f2fs_fiemap() using the test case
as below:

	...
	fd = open();
	fallocate(fd, 0, 0, 4294967296);
	ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
	...

It's caused by an overflow in __get_data_block():
	...
	bh->b_size = map.m_len << inode->i_blkbits;
	...
map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits
on 64 bits archtecture, type conversion from an unsigned int to a size_t
will result in an overflow.

In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap()
will call get_data_block() at block 0 again an again.

Fix this by adding a force conversion before left shift.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:49 -08:00
Jaegeuk Kim
dc91de78e5 f2fs: do not preallocate blocks which has wrong buffer
Sheng Yong reports needless preallocation if write(small_buffer, large_size)
is called.

In that case, f2fs preallocates large_size, but vfs returns early due to
small_buffer size. Let's detect it before preallocation phase in f2fs.

Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:48 -08:00
Jaegeuk Kim
dcc9165dbf f2fs: show # of on-going flush and discard bios
This patch adds stat information for flush and discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:47 -08:00
Jaegeuk Kim
1546996348 f2fs: add a kernel thread to issue discard commands asynchronously
This patch adds a kernel thread to issue discard commands.
It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
bio status.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:45 -08:00
Jaegeuk Kim
0b54fb8458 f2fs: factor out discard command info into discard_cmd_control
This patch adds discard_cmd_control with the existing discarding controls.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:53 -08:00
Jaegeuk Kim
d4adb30f25 f2fs: reorganize stat information
This patch modifies stat information more clearly.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:52 -08:00
Jaegeuk Kim
b01a92019c f2fs: clean up flush/discard command namings
This patch simply cleans up the names for flush/discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:51 -08:00
Chao Yu
ae27d62e6b f2fs: check in-memory sit version bitmap
This patch adds a mirror for sit version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:50 -08:00
Chao Yu
599a09b2c1 f2fs: check in-memory nat version bitmap
This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:49 -08:00
Chao Yu
355e78913c f2fs: check in-memory block bitmap
This patch adds a mirror for valid block bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:49 -08:00
Chao Yu
5fe457430e f2fs: introduce FI_ATOMIC_COMMIT
This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:48 -08:00
Chao Yu
939afa943c f2fs: clean up with list_{first, last}_entry
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:47 -08:00
Jaegeuk Kim
25290fa559 f2fs: return fs_trim if there is no candidate
If there is no candidate to submit discard command during f2fs_trim_fs, let's
return without checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:40 -08:00
Jaegeuk Kim
0333ad4e4f f2fs: avoid needless checkpoint in f2fs_trim_fs
The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated
data blocks only which didn't change the critical checkpoint data such as nat
and sit entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 13:16:36 -08:00
Linus Torvalds
6c24337f22 Various cleanups for the file system encryption feature.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirP6wACgkQ8vlZVpUN
 gaMwpQgApR67CxzlstxYjZpWPAqC8McJ2FBDX+mCOle5Vkc1WQDklwr0oCfQThTj
 eDSFRhNfIvyPh0DJ589PxBCsWOqN5h6Si7hD5ZinomVNI+IL89OytaU5EV2OpWaW
 iKdJgO9Tm8U7LuY6FOIoVdX57kUXVdkWoj61rC056B1SNhnNiVeofi7lYDM8Ix4q
 IGSQ9W24iQKmCk4hCwgObhJBRK9RnlOH0GLUmpMaS+jnfnj/uePwdxWEFsPuCOob
 8acAJ49lr55kjIw79E0BAyWxhEZ2aiArHk8PaWynT/DyNq3ftcapPlpftoeba8vo
 glBJRX70QxPvt0iHEp0ykfExkhWhFA==
 =Joki
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Various cleanups for the file system encryption feature"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: constify struct fscrypt_operations
  fscrypt: properly declare on-stack completion
  fscrypt: split supp and notsupp declarations into their own headers
  fscrypt: remove redundant assignment of res
  fscrypt: make fscrypt_operations.key_prefix a string
  fscrypt: remove unused 'mode' member of fscrypt_ctx
  ext4: don't allow encrypted operations without keys
  fscrypt: make test_dummy_encryption require a keyring key
  fscrypt: factor out bio specific functions
  fscrypt: pass up error codes from ->get_context()
  fscrypt: remove user-triggerable warning messages
  fscrypt: use EEXIST when file already uses different policy
  fscrypt: use ENOTDIR when setting encryption policy on nondirectory
  fscrypt: use ENOKEY when file cannot be created w/o key
2017-02-20 18:22:31 -08:00
Eric Biggers
6f69f0ed61 fscrypt: constify struct fscrypt_operations
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-08 10:59:57 -05:00
Eric Biggers
46f47e4800 fscrypt: split supp and notsupp declarations into their own headers
Previously, each filesystem configured without encryption support would
define all the public fscrypt functions to their notsupp_* stubs.  This
list of #defines had to be updated in every filesystem whenever a change
was made to the public fscrypt functions.  To make things more
maintainable now that we have three filesystems using fscrypt, split the
old header fscrypto.h into several new headers.  fscrypt_supp.h contains
the real declarations and is included by filesystems when configured
with encryption support, whereas fscrypt_notsupp.h contains the inline
stubs and is included by filesystems when configured without encryption
support.  fscrypt_common.h contains common declarations needed by both.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:26:43 -05:00
Jaegeuk Kim
4e6a8d9b22 f2fs: relax async discard commands more
This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.

Test on initial partition of nvme drive.

 # time fstrim /mnt/test

Before : 6.158s
After : 4.822s

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
bb95d9ab2a f2fs: drop exist_data for inline_data when truncated to 0
A test program gets the SEEK_DATA with two values between
a new created file and the exist file on f2fs filesystem.

F2FS filesystem,  (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 8192)
SEEK_DATA size != 0 (offset = 4096)

PNFS filesystem, (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 4096)
SEEK_DATA size != 0 (offset = 4096)

int main(int argc, char **argv)
{
        char *filename = argv[1];
        int offset = 1, i = 0, fd = -1;

        if (argc < 2) {
                printf("Usage: %s f2fsfilename\n", argv[0]);
                return -1;
        }

        /*
        if (!access(filename, F_OK) || errno != ENOENT) {
                printf("Needs a new file for test, %m\n");
                return -1;
        }*/

        fd = open(filename, O_RDWR | O_CREAT, 0777);
        if (fd < 0) {
                printf("Create test file %s failed, %m\n", filename);
                return -1;
        }

        for (i = 0; i < 20; i++) {
                offset = 1 << i;
                ftruncate(fd, 0);
                lseek(fd, offset, SEEK_SET);
                write(fd, "test", 5);
                /* Get the alloc size by seek data equal zero*/
                if (lseek(fd, 0, SEEK_DATA)) {
                        printf("SEEK_DATA size != 0 (offset = %d)\n", offset);
                        break;
                }
        }

        close(fd);
        return 0;
}

Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
363fa4e078 f2fs: don't allow encrypted operations without keys
This patch fixes the renaming bug on encrypted filenames, which was pointed by

 (ext4: don't allow encrypted operations without keys)

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
26a28a0c1e f2fs: show the max number of atomic operations
This patch adds to show the max number of atomic operations which are
conducting concurrently.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
ec91538dcc f2fs: get io size bit from mount option
This patch adds to set io_size_bits from mount option.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
0a595ebaaa f2fs: support IO alignment for DATA and NODE writes
This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.

Note that,
 - it requires "-o mode=lfs".
 - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
 - read IO is still 4KB.
 - do checkpoint at fsync, if dummy NODE page was written.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
554b5125f5 f2fs: add submit_bio tracepoint
This patch adds final submit_bio() tracepoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
9d52a504db f2fs: reassign new segment for mode=lfs
Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
650d3c4e56 f2fs: fix a missing discard prefree segments
If userspace issue a fstrim with a range not involve prefree segments,
it will reuse these segments without discard. This patch fix it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Geliang Tang
ed0b56209f f2fs: use rb_entry_safe
Use rb_entry_safe() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
746e240392 f2fs: add a case of no need to read a page in write begin
If the range we write cover the whole valid data in the last page,
we do not need to read it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
7855eba4d6 f2fs: fix a problem of using memory after free
This patch fix a problem of using memory after free
in function __try_merge_extent_node.

Fixes: 0f825ee6e8 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Dan Carpenter
07fe8d4440 f2fs: remove unneeded condition
We checked that "inode" is not an error pointer earlier so there is
no need to check again here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Chao Yu
5c9e418436 f2fs: don't cache nat entry if out of memory
If we run out of memory, in cache_nat_entry, it's better to avoid loop
for allocating memory to cache nat entry, so in low memory scenario, for
read path of node block, I expect this can avoid unneeded latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Yunlei He
fed2466848 f2fs: remove unused values in recover_fsync_data
This patch remove unused values in function recover_fsync_data

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Damien Le Moal
f99e86485c block: Rename blk_queue_zone_size and bdev_zone_size
All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Collapsed the two patches, they were nonsensically split and broke
bisection.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-12 07:58:32 -07:00
Eric Biggers
a5d431eff2 fscrypt: make fscrypt_operations.key_prefix a string
There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix.  It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all.  So simplify the code by making key_prefix a const char *.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-01-08 01:03:41 -05:00
Eric Biggers
54475f531b fscrypt: use ENOKEY when file cannot be created w/o key
As part of an effort to clean up fscrypt-related error codes, make
attempting to create a file in an encrypted directory that hasn't been
"unlocked" fail with ENOKEY.  Previously, several error codes were used
for this case, including ENOENT, EACCES, and EPERM, and they were not
consistent between and within filesystems.  ENOKEY is a better choice
because it expresses that the failure is due to lacking the encryption
key.  It also matches the error code returned when trying to open an
encrypted regular file without the key.

I am not aware of any users who might be relying on the previous
inconsistent error codes, which were never documented anywhere.

This failure case will be exercised by an xfstest.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-31 16:26:20 -05:00
Linus Torvalds
231753ef78 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull partial readlink cleanups from Miklos Szeredi.

This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.

Miklos and Al are still discussing the rest of the series.

* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  vfs: make generic_readlink() static
  vfs: remove ".readlink = generic_readlink" assignments
  vfs: default to generic_readlink()
  vfs: replace calling i_op->readlink with vfs_readlink()
  proc/self: use generic_readlink
  ecryptfs: use vfs_get_link()
  bad_inode: add missing i_op initializers
2016-12-17 19:16:12 -08:00
Linus Torvalds
5084fdf081 This merge request includes the dax-4.0-iomap-pmd branch which is
needed for both ext4 and xfs dax changes to use iomap for DAX.  It
 also includes the fscrypt branch which is needed for ubifs encryption
 work as well as ext4 encryption and fscrypt cleanups.
 
 Lots of cleanups and bug fixes, especially making sure ext4 is robust
 against maliciously corrupted file systems --- especially maliciously
 corrupted xattr blocks and a maliciously corrupted superblock.  Also
 fix ext4 support for 64k block sizes so it works well on ppcle.  Fixed
 mbcache so we don't miss some common xattr blocks that can be merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhQQVEACgkQ8vlZVpUN
 gaN9TQgAoCD+V4kJjMCFhiV8u6QR3hqD6bOZbggo5wJf4CHglWkmrbAmc3jANOgH
 CKsXDRRjxuDjPXf1ukB1i4M7ArLYjkbbzKdsu7lismoJLS+w8uwUKSNdep+LYMjD
 alxUcf5DCzLlUmdOdW4yE22L+CwRfqfs8IpBvKmJb7DrAKiwJVA340ys6daBGuu1
 63xYx0QIyPzq0xjqLb6TVf88HUI4NiGVXmlm2wcrnYd5966hEZd/SztOZTVCVWOf
 Z0Z0fGQ1WJzmaBB9+YV3aBi+BObOx4m2PUprIa531+iEW02E+ot5Xd4vVQFoV/r4
 NX3XtoBrT1XlKagy2sJLMBoCavqrKw==
 =j4KP
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "This merge request includes the dax-4.0-iomap-pmd branch which is
  needed for both ext4 and xfs dax changes to use iomap for DAX. It also
  includes the fscrypt branch which is needed for ubifs encryption work
  as well as ext4 encryption and fscrypt cleanups.

  Lots of cleanups and bug fixes, especially making sure ext4 is robust
  against maliciously corrupted file systems --- especially maliciously
  corrupted xattr blocks and a maliciously corrupted superblock. Also
  fix ext4 support for 64k block sizes so it works well on ppcle. Fixed
  mbcache so we don't miss some common xattr blocks that can be merged"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  dax: Fix sleep in atomic contex in grab_mapping_entry()
  fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL
  fscrypt: Delay bounce page pool allocation until needed
  fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()
  fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page()
  fscrypt: Never allocate fscrypt_ctx on in-place encryption
  fscrypt: Use correct index in decrypt path.
  fscrypt: move the policy flags and encryption mode definitions to uapi header
  fscrypt: move non-public structures and constants to fscrypt_private.h
  fscrypt: unexport fscrypt_initialize()
  fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info()
  fscrypto: move ioctl processing more fully into common code
  fscrypto: remove unneeded Kconfig dependencies
  MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches
  ext4: do not perform data journaling when data is encrypted
  ext4: return -ENOMEM instead of success
  ext4: reject inodes with negative size
  ext4: remove another test in ext4_alloc_file_blocks()
  Documentation: fix description of ext4's block_validity mount option
  ext4: fix checks for data=ordered and journal_async_commit options
  ...
2016-12-14 09:17:42 -08:00
Linus Torvalds
09cb6464fe for-f2fs-4.10
This patch series contains several performance tuning patches regarding to the
 IO submission flow, in addition to supporting new features such as a ZBC-base
 drive and multiple devices.
 
 It also includes some major bug fixes such as:
  - checkpoint version control
  - fdatasync-related roll-forward recovery routine
  - memory boundary or null-pointer access in corner cases
  - missing error cases
 
 It has various minor clean-up patches as well.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg
 qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl
 PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms
 GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d
 iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX
 3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0
 wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E
 dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI
 twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH
 WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs
 Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX
 VysJUcNMRKxMUTWOKsxp
 =Boum
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch series contains several performance tuning patches
  regarding to the IO submission flow, in addition to supporting new
  features such as a ZBC-base drive and multiple devices.

  It also includes some major bug fixes such as:
   - checkpoint version control
   - fdatasync-related roll-forward recovery routine
   - memory boundary or null-pointer access in corner cases
   - missing error cases

  It has various minor clean-up patches as well"

* tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
  f2fs: fix a missing size change in f2fs_setattr
  f2fs: fix to access nullified flush_cmd_control pointer
  f2fs: free meta pages if sanity check for ckpt is failed
  f2fs: detect wrong layout
  f2fs: call sync_fs when f2fs is idle
  Revert "f2fs: use percpu_counter for # of dirty pages in inode"
  f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
  f2fs: do not activate auto_recovery for fallocated i_size
  f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
  f2fs: fix 32-bit build
  f2fs: set ->owner for debugfs status file's file_operations
  f2fs: fix incorrect free inode count in ->statfs
  f2fs: drop duplicate header timer.h
  f2fs: fix wrong AUTO_RECOVER condition
  f2fs: do not recover i_size if it's valid
  f2fs: fix fdatasync
  f2fs: fix to account total free nid correctly
  f2fs: fix an infinite loop when flush nodes in cp
  f2fs: don't wait writeback for datas during checkpoint
  f2fs: fix wrong written_valid_blocks counting
  ...
2016-12-14 09:07:36 -08:00
Linus Torvalds
36869cb93d Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main block pull request this series. Contrary to previous
  release, I've kept the core and driver changes in the same branch. We
  always ended up having dependencies between the two for obvious
  reasons, so makes more sense to keep them together. That said, I'll
  probably try and keep more topical branches going forward, especially
  for cycles that end up being as busy as this one.

  The major parts of this pull request is:

   - Improved support for O_DIRECT on block devices, with a small
     private implementation instead of using the pig that is
     fs/direct-io.c. From Christoph.

   - Request completion tracking in a scalable fashion. This is utilized
     by two components in this pull, the new hybrid polling and the
     writeback queue throttling code.

   - Improved support for polling with O_DIRECT, adding a hybrid mode
     that combines pure polling with an initial sleep. From me.

   - Support for automatic throttling of writeback queues on the block
     side. This uses feedback from the device completion latencies to
     scale the queue on the block side up or down. From me.

   - Support from SMR drives in the block layer and for SD. From Hannes
     and Shaun.

   - Multi-connection support for nbd. From Josef.

   - Cleanup of request and bio flags, so we have a clear split between
     which are bio (or rq) private, and which ones are shared. From
     Christoph.

   - A set of patches from Bart, that improve how we handle queue
     stopping and starting in blk-mq.

   - Support for WRITE_ZEROES from Chaitanya.

   - Lightnvm updates from Javier/Matias.

   - Supoort for FC for the nvme-over-fabrics code. From James Smart.

   - A bunch of fixes from a whole slew of people, too many to name
     here"

* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
  blk-stat: fix a few cases of missing batch flushing
  blk-flush: run the queue when inserting blk-mq flush
  elevator: make the rqhash helpers exported
  blk-mq: abstract out blk_mq_dispatch_rq_list() helper
  blk-mq: add blk_mq_start_stopped_hw_queue()
  block: improve handling of the magic discard payload
  blk-wbt: don't throttle discard or write zeroes
  nbd: use dev_err_ratelimited in io path
  nbd: reset the setup task for NBD_CLEAR_SOCK
  nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
  nvme-fabrics: Add target support for FC transport
  nvme-fabrics: Add host support for FC transport
  nvme-fabrics: Add FC transport LLDD api definitions
  nvme-fabrics: Add FC transport FC-NVME definitions
  nvme-fabrics: Add FC transport error codes to nvme.h
  Add type 0x28 NVME type code to scsi fc headers
  nvme-fabrics: patch target code in prep for FC transport support
  nvme-fabrics: set sqe.command_id in core not transports
  parser: add u64 number parser
  nvme-rdma: align to generic ib_event logging helper
  ...
2016-12-13 10:19:16 -08:00
Yunlei He
c0ed4405a9 f2fs: fix a missing size change in f2fs_setattr
This patch fix a missing size change in f2fs_setattr

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-12 11:09:05 -08:00
David Gstir
bd7b829038 fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()
Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which,
when set, indicates that the fs uses pages under its own control as
opposed to writeback pages which require locking and a bounce buffer for
encryption.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-11 16:26:12 -05:00
Eric Biggers
db717d8e26 fscrypto: move ioctl processing more fully into common code
Multiple bugs were recently fixed in the "set encryption policy" ioctl.
To make it clear that fscrypt_process_policy() and fscrypt_get_policy()
implement ioctls and therefore their implementations must take standard
security and correctness precautions, rename them to
fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy().  Make the
latter take in a struct file * to make it consistent with the former.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-11 16:26:07 -05:00
Miklos Szeredi
dfeef68862 vfs: remove ".readlink = generic_readlink" assignments
If .readlink == NULL implies generic_readlink().

Generated by:

to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09 16:45:04 +01:00
Jaegeuk Kim
5eba8c5d1f f2fs: fix to access nullified flush_cmd_control pointer
f2fs_sync_file()             remount_ro
 - f2fs_readonly
                               - destroy_flush_cmd_control
 - f2fs_issue_flush
   - no fcc pointer!

So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 18:56:50 -08:00
Jaegeuk Kim
a2125ff7dd f2fs: free meta pages if sanity check for ckpt is failed
This fixes missing freeing meta pages in the error case.

Tested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 14:38:16 -08:00
Jaegeuk Kim
2040fce83f f2fs: detect wrong layout
Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-and-Tested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 14:37:33 -08:00
Jaegeuk Kim
f455c8a5f0 f2fs: call sync_fs when f2fs is idle
The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:44:07 -08:00
Jaegeuk Kim
204706c7ac Revert "f2fs: use percpu_counter for # of dirty pages in inode"
This reverts commit 1beba1b3a9.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:43:59 -08:00
Chao Yu
0002b61bda f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29 15:43:00 -08:00
Jaegeuk Kim
26787236b3 f2fs: do not activate auto_recovery for fallocated i_size
If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29 15:42:58 -08:00
Jaegeuk Kim
8508e44ae9 f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
We don't guarantee cp_addr is fixed by cp_version.
This is to sync with f2fs-tools.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-28 13:39:58 -08:00
Arnd Bergmann
19c526515f f2fs: fix 32-bit build
The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
on 32-bit machines because of a 64-bit division:

fs/f2fs/f2fs.o: In function `__issue_discard_async':
extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'

Fortunately, bdev_zone_size() is guaranteed to return a power-of-two
number, so we can replace the % operator with a cheaper bit mask.

Fixes: 792b84b74b54 ("f2fs: support multiple devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:09 -08:00
Nicolai Stange
05e6ea2685 f2fs: set ->owner for debugfs status file's file_operations
The struct file_operations instance serving the f2fs/status debugfs file
lacks an initialization of its ->owner.

This means that although that file might have been opened, the f2fs module
can still get removed. Any further operation on that opened file, releasing
included,  will cause accesses to unmapped memory.

Indeed, Mike Marshall reported the following:

  BUG: unable to handle kernel paging request at ffffffffa0307430
  IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
  <...>
  Call Trace:
   [] __fput+0xdf/0x1d0
   [] ____fput+0xe/0x10
   [] task_work_run+0x8e/0xc0
   [] do_exit+0x2ae/0xae0
   [] ? __audit_syscall_entry+0xae/0x100
   [] ? syscall_trace_enter+0x1ca/0x310
   [] do_group_exit+0x44/0xc0
   [] SyS_exit_group+0x14/0x20
   [] do_syscall_64+0x61/0x150
   [] entry_SYSCALL64_slow_path+0x25/0x25
  <...>
  ---[ end trace f22ae883fa3ea6b8 ]---
  Fixing recursive fault but reboot is needed!

Fix this by initializing the f2fs/status file_operations' ->owner with
THIS_MODULE.

This will allow debugfs to grab a reference to the f2fs module upon any
open on that file, thus preventing it from getting removed.

Fixes: 902829aa0b ("f2fs: move proc files to debugfs")
Reported-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:08 -08:00
Chao Yu
b08b12d2dd f2fs: fix incorrect free inode count in ->statfs
While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:07 -08:00
Geliang Tang
b4ceec2921 f2fs: drop duplicate header timer.h
Drop duplicate header timer.h from segment.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:06 -08:00
Jaegeuk Kim
97dd26ad83 f2fs: fix wrong AUTO_RECOVER condition
If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:05 -08:00
Jaegeuk Kim
3a3a5ead7b f2fs: do not recover i_size if it's valid
If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:04 -08:00
Chao Yu
281518c694 f2fs: fix fdatasync
For below two cases, we can't guarantee data consistence:

a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered

b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered

The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.

Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:03 -08:00
Chao Yu
04d47e6738 f2fs: fix to account total free nid correctly
Thread A		Thread B		Thread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    alloc last nid
  - f2fs_unlock_op
			- f2fs_create
			 - f2fs_new_inode
			  - f2fs_lock_op
			   - alloc_nid
			    as node count still not
			    be increased, we will
			    loop in alloc_nid
						- f2fs_write_node_pages
						 - f2fs_balance_fs_bg
						  - f2fs_sync_fs
						   - write_checkpoint
						    - block_operations
						     - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:01 -08:00
Yunlei He
d40a43af0a f2fs: fix an infinite loop when flush nodes in cp
Thread A			Thread B

- write_checkpoint
 - block_operations
   -blk_start_plug
    -sync_node_pages		- f2fs_do_sync_file
				 - fsync_node_pages
				  - f2fs_wait_on_page_writeback

Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
it start a plug list, some requests have been added to this list.
Thread B lock one dirty node page, and wait this page write back.
But this page has been in plug list of thread A with PG_writeback flag.
Thread A keep on running and its plug list has no chance to finish,
so it seems a deadlock between cp and fsync path.

This patch add a wait on page write back before set node page dirty
to avoid this problem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:00 -08:00
Chao Yu
36951b38d1 f2fs: don't wait writeback for datas during checkpoint
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:59 -08:00
Jaegeuk Kim
c79b7ff1d3 f2fs: fix wrong written_valid_blocks counting
Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
Fix it to get the number from actual written block count from sit entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:58 -08:00
Jaegeuk Kim
7702bdbe50 f2fs: avoid BG_GC in f2fs_balance_fs
If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:57 -08:00
Jaegeuk Kim
c040ff9d69 f2fs: fix redundant block allocation
In direct_IO path of f2fs_file_write_iter(),
1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
   -> allocate LBA X
2. f2fs_direct_IO()
   -> return 0;

Then,
f2fs_write_data_page() will allocate another LBA X+1.

This makes EIO triggered by HM-SMR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:50 -08:00
Jaegeuk Kim
a7de608691 f2fs: use err for f2fs_preallocate_blocks
This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:14 -08:00
Jaegeuk Kim
3c62be17d4 f2fs: support multiple devices
This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:13 -08:00
Jaegeuk Kim
e57e9ae5b1 f2fs: allow dio read for LFS mode
We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:12 -08:00
Jaegeuk Kim
6ae1be13e8 f2fs: revert segment allocation for direct IO
Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b2 (f2fs: align direct_io'ed data to section)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:02 -08:00
Yunlei He
2061471128 f2fs: return directly if block has been removed from the victim
If one block has been to written to a new place, just return
in move data process. This patch check it again with holding
page lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:31 -08:00
Chao Yu
d47b871595 Revert "f2fs: do not recover from previous remained wrong dnodes"
i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.

Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.

This reverts commit 807b1e1c8e.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:30 -08:00
Jaegeuk Kim
b4b9d34c85 f2fs: remove checkpoint in f2fs_freeze
The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
it triggers circular locking problem below due to gc_mutex for checkpoint.

======================================================
[ INFO: possible circular locking dependency detected ]
4.9.0-rc1+ #132 Tainted: G           OE
-------------------------------------------------------

1. wait for __sb_start_write() by

 [<ffffffff9845f353>] dump_stack+0x85/0xc2
 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff98289991>] iput+0x171/0x2c0
 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
 [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
 [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
 [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
 [<ffffffff982a50b5>] sys_sync+0x55/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

2. wait for sbi->gc_mutex by

 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
 [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:29 -08:00
Jaegeuk Kim
bdb7d964c4 f2fs: assign segments correctly for direct_io
Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or
four logs, we do not use that type at all.
Let's fix it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:28 -08:00
Chao Yu
9f0552e078 f2fs: fix wrong i_atime recovery
Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
recovering inode.

Shuoran found this bug which is hidden for a long time, honour is belong
to him.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:26 -08:00
Chao Yu
60dcedc997 f2fs: record inode updating status correctly
We should record updating status of inode only for living inode, for those
unlinked inode it needs to clear its ino cache, otherwise after the ino
was been reused, it will cause unneeded node page writing during ->fsync.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:25 -08:00
Damien Le Moal
126606c7a9 f2fs: Trace reset zone events
Similarly to the regular discard, trace zone reset events.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:24 -08:00
Damien Le Moal
f46e8809e8 f2fs: Reset sequential zones on zoned block devices
When a zoned block device is mounted, discarding sections
contained in sequential zones must reset the zone write pointer.
For sections contained in conventional zones, the regular discard
is used if the drive supports it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:23 -08:00