Pull UBI and UBIFS updates from Richard Weinberger:
- cleanups and bug fixes all over UBI and UBIFS
- block-mq support for UBI Block
- UBI volumes can now be renamed while they are in use
- security.* XATTR support for UBIFS
- a maintainer update
* 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs:
UBI: block: Fix checking for NULL instead of IS_ERR()
UBI: block: Continue creating ubiblocks after an initialization error
UBIFS: return -EINVAL if log head is empty
UBI: Block: Explain usage of blk_rq_map_sg()
UBI: fix soft lockup in ubi_check_volume()
UBI: Fastmap: Care about the protection queue
UBIFS: add a couple of extra asserts
UBI: do propagate positive error codes up
UBI: clean-up printing helpers
UBI: extend UBI layer debug/messaging capabilities - cosmetics
UBIFS: add ubifs_err() to print error reason
UBIFS: Add security.* XATTR support for the UBIFS
UBIFS: Add xattr support for symlinks
UBI: Block: Add blk-mq support
UBI: Add initial support for scatter gather
UBI: rename_volumes: Use UBI_METAONLY
UBI: Implement UBI_METAONLY
Add myself as UBI co-maintainer
Artem: rename static functions so that they do not use the "ubifs_" prefix - we
only use this prefix for non-static functions.
Artem: remove few junk white-space changes in file.c
Signed-off-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Acked-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Terry Wilcox <terry.wilcox@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only
called on the mount path and commit path. The mount path is sequential and
there is no parallelism, and the commit path is also serialized - there is only
one commit going on at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Fix checkpatch warnings:
"WARNING: Prefer seq_puts to seq_printf"
Andrew Morton wrote:
"
- puts is presumably faster
- puts doesn't go rogue if you accidentally pass it a "%".
- this patch actually made fs/ubifs/super.o 12 bytes smaller.
Perhaps because seq_printf() is a varargs function, forcing the
caller to pass args on the stack instead of in registers.
"
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
When attempting to mount a non-ubifs formatted volume, lots of error
messages (including a stack dump) are thrown to the kernel log even if
the MS_SILENT mount flag is set.
Fix this by introducing adding an additional state-variable in
struct ubifs_info and suppress error messages in ubifs_read_node if
MS_SILENT is set.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Dan's "smatch" checker found out that there was a bug in the error path of the
'ubifs_remount_rw()' function. Instead of jumping to the "out" label which
cleans-things up, we just returned.
This patch fixes the problem.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
+165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
DrPKlXwIgva7zq5/S0Vr
=4s1Z
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
ext4: fix premature freeing of partial clusters split across leaf blocks
ext4: remove unneeded test of ret variable
ext4: fix comment typo
ext4: make ext4_block_zero_page_range static
ext4: atomically set inode->i_flags in ext4_set_inode_flags()
ext4: optimize Hurd tests when reading/writing inodes
ext4: kill i_version support for Hurd-castrated file systems
ext4: each filesystem creates and uses its own mb_cache
fs/mbcache.c: doucple the locking of local from global data
fs/mbcache.c: change block and index hash chain to hlist_bl_node
ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
ext4: refactor ext4_fallocate code
ext4: Update inode i_size after the preallocation
ext4: fix partial cluster handling for bigalloc file systems
ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
ext4: only call sync_filesystm() when remounting read-only
fs: push sync_filesystem() down to the file system's remount_fs()
jbd2: improve error messages for inconsistent journal heads
jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
jbd2: minimize region locked by j_list_lock in journal_get_create_access()
...
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
change which fixes a return error code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSgh3wAAoJECmIfjd9wqK0dxwQALF6rMiFldVa9uZnNgOawzA9
yB8OR2EmQilZzCcSCLKaPIwzc7izJO7H6hbaLZrNY9UHzjoCwLhTGkbdkuGwkKuF
Q+M99MKGnYghNWsOWu50J7wewR9zJPD1UGy3ZU/CJXfQzK4uwHIEOiXss+b0kZCo
sW5ogFOiFzEkz8/CtGwffPXNWuqwPXbDBVHXBjnmv9MCCzsSRdGEpNClCOI/qUHQ
pFZhKVHSiqYDPvvwQ/pTGv6kw2tu2pVh6+O232g35JB9G3NHq7MDxg0RLr61C7qK
3ZK6K34r6D09NKhjrOOAsf/EgzyNXtcIL9ySYamaZzEsMGBb2LNMneu3gkrk7O49
5CXi5Em4cg8eBzhEEmHDxCZanwPhggnuLh+YPlaBBNU/z3NiYRNYoHRjzQWCFwa/
yee6AzU+w1i0NRtjpPIAmpaqkryJnAZ+c2Hthlm+DPtN3DzWX13QxHdWTxC3Xycc
8aZNqC128meDqrKRxKkhh1H5TLRdsVTkC9yf0Iv8U8nQg4aT/dPJedNOI4n20Oqy
C8LnABH6wtc7MImzA5xRbk7CHI0Feb8h6darSKWSLydKNJmvyJ6nRdsi5tnfBEyn
kbLdYed833tCll86G8iYMwt3jO0DURn+yNlQrTsf3kvNTADvJdzAoP3tudgDxdIA
T7Uxs/2VFIFAlDK+U5Aw
=ap2l
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs
Pull ubifs changes from Artem Bityutskiy:
"Mostly fixes for the power cut emulation UBIFS mode, and only one
functional change which fixes a return error code"
* tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs:
UBIFS: correct data corruption range
UBIFS: fix return code
UBIFS: remove unnecessary code in ubifs_garbage_collect
Fix to return -ENOMEM in the kmalloc() and d_make_root() error handling
case instead of 0, as done elsewhere in those functions.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Convert the filesystem shrinkers to use the new API, and standardise some
of the behaviours of the shrinkers at the same time. For example,
nr_to_scan means the number of objects to scan, not the number of objects
to free.
I refactored the CIFS idmap shrinker a little - it really needs to be
broken up into a shrinker per tree and keep an item count with the tree
root so that we don't need to walk the tree every time the shrinker needs
to count the number of objects in the tree (i.e. all the time under
memory pressure).
[glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
[assorted fixes folded in]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When mounting an UBIFS R/W volume, we have the message:
UBIFS: mounted UBI device 0, volume 1, name "rootfs"(null)
With this patch, we'll have:
UBIFS: mounted UBI device 0, volume 1, name "rootfs"
Which is, I think, what was intended.
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Cc: stable@vger.kernel.org [v3.7+]
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.
The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
(see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
flag and re-writes the entire media atomically, which may take really a lot
of time.
3. UBIFS clears the fixup flag in the superblock.
This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reported-by: Mark Jackson <mpfj-list@mimc.co.uk>
Cc: stable@vger.kernel.org # 3.0+
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* Error reporting and debug printing improvements
* Power cut emulation fixes
* Minor cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQaumRAAoJECmIfjd9wqK016UQAImRrzhPoegy2Pr7j/CpzUST
wJru3QlgdHgjukWer/s4CGTTjmTLMgLAllUnJLO3yn+Po4o67RJxI4SjhMf5lwDG
TSmB/ttQpRmYxaaS2EOC6Ow+jdMM2MhAsPrD8nAfLvOm/XduuSyquPY/xFa6U4rJ
oE4w23b5t2BZI/UJmsCaYs6Rj1h8w1aZUukii+J9BN8DGygn/vJuI9EDNKdtQmxe
vmoT2uWKPyL859kY/+lONH048NWMkEB3BhNmuAsFqGY/tdQRdmeyhgWEKNjbmc/M
DXZe7ovy5umUyw6iDpfhEhmOBdV9xorDySsmKNP1/q60rUGD6R/tynC4y1q1IueB
c6fbry6xpwKIccMS/w7x9cbZxMnG++iQoj71pr2FV9Bg5Xd7XduM/XjzI5rMHuqv
EnmcwT/LIu5Lw1vMU8pfW2yIdTkotNR/2aca1bs0f4WtS3AhngdqmNbFmddcpY37
qvTYDrSMOW/IaegiDQzucVXNhIs8ufIULZzmj9CtmgeyvNVJboTJRik+HJDWFqeC
04TaM8k+Yjp/isbzTmWEyBG3G06cNpMwLtT5OKcpRVQ1ZYu62AbnR+Dbep6n0qt+
gBORW8TTY32GudeSiHLKCrOsrkx3zNSli6T4Sw9YD5e29Dee62KRTgjKVplezQfB
Djh70vUyX2tHdYlW88gf
=gM4M
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs
Pull ubifs changes from Artem Bityutskiy:
"No big changes for 3.7 in UBIFS:
- Error reporting and debug printing improvements
- Power cut emulation fixes
- Minor cleanups"
Fix trivial conflict in fs/ubifs/debug.c due to the user namespace
changes.
* tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs:
UBIFS: print less
UBIFS: use pr_ helper instead of printk
UBIFS: comply with coding style
UBIFS: use __aligned() attribute
UBIFS: remove __DATE__ and __TIME__
UBIFS: fix power cut emulation for mtdram
UBIFS: improve scanning debug output
UBIFS: always print full error reports
UBIFS: print PID in debug messages
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
UBIFS currently prints a lot of information when it mounts a volume, which
bothers some people. Make it less chatty - print only important information
by default.
Get rid of 'dbg_msg()' macro completely.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This tag is useless and it breaks automatic builds. It causes rebuilds
for packages that depend on kernel for no real reason.
Further, quoting Michal, who removed most of the users already:
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
The pdflush thread is long gone, so this patch removes references to pdflush
from UBIFS comments.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called. They could also be passed to the
compare function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
+AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
/9c4zxxekjHZnn2oooEa
=oLXf
-----END PGP SIGNATURE-----
Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."
* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally
Richard removed the "dtype" hint, but few commentaries were left and this patch
removes them. I've also added a better description about the "dtype" field in
the ubi-user.h for people who may ever wonder what was that dtype thing about.
This patch also adds an important note that it is better to use value "3" for
the "dtype" field.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
We do not need this feature and to our shame it even was not working
and there was a bug found very recently.
-- Artem Bityutskiy
Without the data type hint UBI2 (fastmap) will be easier to implement.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
UBIFS leaks memory on error path in 'mount_ubifs()'. In case of failure in
'ubifs_fixup_free_space()', it does not call 'ubifs_lpt_free()' whereas LPT
data structures can potentially be allocated. The amount of memory leaked can
be quite high -- see 'ubifs_lpt_init()'.
The bug was introduced when moving the LPT initialisation earlier in the
mount process (commit '781c5717a95a74b294beb38b8276943b0f8b5bb4').
Signed-off-by: Sidney Amani <seed95@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This patch removes the 'dbg_err()' macro and we now use 'ubifs_err()' instead.
The idea of 'dbg_err()' was to compile out some error message to make the
binary a bit smaller - but I think it was a bad idea.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This commit re-names all functions which dump something from "dbg_dump_*()" to
"ubifs_dump_*()". This is done for consistency with UBI and because this way it
will be more logical once we remove the debugging sompilation option.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of using long "(inode->i_mode & S_IFMT) != S_IFREG" expression, use
shorted "!S_ISREG(inode->i_mode)".
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
d251ed271d "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* allocate ubifs_info in ->mount(), fill it enough for sb_test() and
set ->s_fs_info to it in set() callback passed to sget().
* do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
done kill_anon_super().
* don't free it in ubifs_fill_super() either - deactivate_locked_super()
done by caller when ubifs_fill_super() returns an error will take care
of that sucker.
* get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
only ubifs_info. Which it will find in ->s_fs_info just fine, no need to
reassign anything...
As the result, sb_test() becomes safe to apply to all superblocks that
can be found by sget() (and a kludge with temporary use of ->s_fs_info
to store a pointer to very different structure goes away).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:
Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.
Artem: tweaked the patch a bit
Signed-off-by: Ben Gardiner <bengardiner@nanometrics>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.
Artem: massaged the patch and commit message.
Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Journal heads are different in a way how UBIFS writes nodes there. All normal
journal heads receive grouped nodes, while the GC journal heads receives
ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
describes this property.
This patch is a preparation to a further recovery fix.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.
This is just the prototype change with no user for it yet. I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.
Also remove incorrect comments that ->dirty_inode can't block. That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If a UBIFS filesystem is being mounted read-write, or is being remounted
from read-only to read-write, check for the "space_fixup" flag and fix
all LEBs containing empty space if necessary.
Artem: tweaked the patch a bit
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When re-mounting from R/O mode to R/W mode and the LEB count in the superblock
is not up-to date, because for the underlying UBI volume became larger, we
re-write the superblock. We allocate RAM for these purposes, but never free it.
So this is a memory leak, although very rare one.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
This patch improves the 'dbg_check_space_info()' function which checks
whether the amount of space before re-mounting and after re-mounting
is the same (remounting from R/O to R/W modes and vice-versa).
The problem is that 'dbg_check_space_info()' does not save the budgeting
information before re-mounting, so when an error is reported, we do not
know why the amount of free space changed.
This patches makes the following changes:
1. Teaches 'dbg_dump_budg()' function to accept a 'struct ubifs_budg_info'
argument and print out the this argument. This way we may ask it to
print any saved budgeting info, no only the current one.
2. Accordingly changes all the callers of 'dbg_dump_budg()' to comply with
the changed interface.
3. Introduce a 'saved_bi' (saved budgeting info) field to
'struct ubifs_debug_info' and save the budgeting info before re-mounting
there.
4. Change 'dbg_check_space_info()' and make it print both old and new
budgeting information.
5. Additionally, save 'c->igx_gc_cnt' and print it if and error happens. This
value contributes to the amount of free space, so we have to print it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch separates out all the budgeting-related information
from 'struct ubifs_info' to 'struct ubifs_budg_info'. This way the
code looks a bit cleaner. However, the main driver for this is
that we want to save budgeting information and print it later,
so a separate data structure for this is helpful.
This patch is a preparation for the further debugging output
improvements.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Fix several minor stylistic issues:
* lines longer than 80 characters
* space before closing parenthesis ')'
* spaces in the indentations
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently UBIFS has a small optimization - it frees write-buffers when it is
re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
does not allocate write-buffers as well.
This optimization is nice but it leads to subtle problems and complications
in recovery, which I can reproduce using the integck test. The symptoms are
that after a power cut the file-system cannot be mounted if we first mount
it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:
UBIFS error (pid 34456): could not find an empty LEB
Analysis of the problem.
When mounting R/W, the reply process sets journal heads to buds [1], but
when mounting R/O - it does not do this, because the write-buffers are not
allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
same file-system but for the following 2 cases:
1. mounting R/W after a power cut and recover
2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery
In the former case, we have journal heads seeked to the a bud, in the latter
case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
try to recover the GC LEB by garbage-collecting to the GC head, but we just
try to find an empty LEB, and there may be no empty LEBs, so we just fail.
On the other hand, in the former case (mount R/W), we are able to make a GC LEB
(@c->gc_lnum) by garbage-collecting.
Thus, let's remove this small nice optimization and always allocate
write-buffers. This should not make too big difference - we have only 3
of them, each of max. write unit size, which is usually 2KiB. So this is
about 6KiB of RAM for the typical case, and only when mounted R/O.
[1]: Note, currently the replay process is setting (seeking) the journal heads
to _some_ buds, not necessarily to the buds which had been the journal heads
before the power cut happened. This will be fixed separately.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
When UBIFS switches to R/O mode because it detects I/O failures, then
when we unmount, we still may have allocated budget, and the assertions
which verify that we have not budget will fire. But it is expected to
have the budget in case of I/O failures, so the assertion warnings will
be false. Suppress them for the I/O failure case.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes UBIFS mount failure when the debugging support is enabled,
we are recovering from a power cut, we were first mounter R/O and we are
re-mounting R/W. In this case we should not assume that the amount of free
space before we have re-mounted R/W and after are equivalent, because
when we have mounted R/O the file-system is in a non-committed state so
the amount of free space is slightly smaller, due to the fact that we cannot
predict the amount of free space precisely before we commit.
This patch fixes the issue by skipping the debugging check in case of
recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Caizhiyong <caizhiyong@huawei.com>
Cc: stable@kernel.org [2.6.30+]
This patch fixes UBIFS assertion warnings like:
UBIFS assert failed in ubifs_leb_unmap at 135 (pid 29365)
Pid: 29365, comm: integck Tainted: G I 2.6.37-ubi-2.6+ #34
Call Trace:
[<ffffffffa047c663>] ubifs_lpt_init+0x95e/0x9ee [ubifs]
[<ffffffffa04623a7>] ubifs_remount_fs+0x2c7/0x762 [ubifs]
[<ffffffff810f066e>] do_remount_sb+0xb6/0x101
[<ffffffff81106ff4>] ? do_mount+0x191/0x78e
[<ffffffff811070bb>] do_mount+0x258/0x78e
[<ffffffff810da1e8>] ? alloc_pages_current+0xa2/0xc5
[<ffffffff81107674>] sys_mount+0x83/0xbd
[<ffffffff81009a12>] system_call_fastpath+0x16/0x1b
They happen when we re-mount from R/O mode to R/W mode. While
re-mounting, we write to the media, but we still have the c->ro_mount
flag set. The fix is very simple - just clear the flag before
starting re-mounting R/W.
These warnings are caused by the following commit:
2ef13294d2
For -stable guys: this bug was introduced in 2.6.38, this is materieal
for 2.6.38-stable.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org [2.6.38]
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...
Fix up conflicts in fs/{aio.c,super.c}
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
UBIFS: clean-up commentaries
UBIFS: save 128KiB or more RAM
UBIFS: allocate orphans scan buffer on demand
UBIFS: allocate lpt dump buffer on demand
UBIFS: allocate ltab checking buffer on demand
UBIFS: allocate scanning buffer on demand
UBIFS: allocate dump buffer on demand
UBIFS: do not check data crc by default
UBIFS: simplify UBIFS Kconfig menu
UBIFS: print max. index node size
UBIFS: handle allocation failures in UBIFS write path
UBIFS: use max_write_size during recovery
UBIFS: use max_write_size for write-buffers
UBIFS: introduce write-buffer size field
UBI: incorporate LEB offset information
UBIFS: incorporate maximum write size
UBI: provide LEB offset information
UBI: incorporate maximum write size
UBIFS: fix LEB number in printk
UBIFS: restrict world-writable debugfs files
...
Change the default UBIFS behavior WRT data CRC checking. Currently,
UBIFS checks data CRC when reading, which slows it down quite a bit,
and this is the default option. However, it looks like in average
user does not need this feature and would prefer faster read speed
over extra reliability. And this seems to be de-facto standard that
file-systems do not check data CRC every time they read from the
media.
Thus, make UBIFS default behavior so that it does not check data
CRC. This corresponds to the no_chk_data_crc mount option. Those users
who need extra protection can always enable it using the chk_data_crc
option.
Please, read more information about this feature here:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Running kernel 2.6.37, my PPC-based device occasionally gets an
order-2 allocation failure in UBIFS, which causes the root FS to
become unwritable:
kswapd0: page allocation failure. order:2, mode:0x4050
Call Trace:
[c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable)
[c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c
[c787dd00] [c0061b98] __get_free_pages+0x20/0x50
[c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200
[c787dd50] [c00e82d4] do_writepage+0x94/0x198
[c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c
[c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370
[c787de90] [c0068224] shrink_zone+0x2b4/0x2b8
[c787df00] [c0068854] kswapd+0x408/0x5d4
[c787dfb0] [c0037bcc] kthread+0x80/0x84
[c787dff0] [c000ef44] kernel_thread+0x4c/0x68
Similar problems were encountered last April by Tomasz Stanislawski:
http://patchwork.ozlabs.org/patch/50965/
This patch implements Artem's suggested fix: fall back to a
mutex-protected static buffer, allocated at mount time. I tested it
by forcing execution down the failure path, and didn't see any ill
effects.
Artem: massaged the patch a little, improved it so that we'd not
allocate the write reserve buffer when we are in R/O mode.
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Incorporate the LEB offset information into UBIFS. We'll use this
information in one of the next patches to figure out what are the
max. write size offsets relative to the PEB. So this patch is just
a preparation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Incorporate maximum write size into the UBIFS description data
structure. This patch just introduces new 'c->max_write_size'
and 'c->max_write_shift' fields as a preparation for the following
patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a preparational patch which removes the 'c->always_chk_crc' which was
set during mounting and remounting to R/W mode and introduces 'c->mounting'
flag which is set when mounting. Now the 'c->always_chk_crc' flag is the
same as 'c->remounting_rw && c->mounting'.
This patch is a preparation for the next one which will need to know when we
are mounting and remounting to R/W mode, which is exactly what
'c->always_chk_crc' effectively is, but its name does not suite the
next patch. The other possibility would be to just re-name it, but then
we'd end up with less logical flags coverage.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
This is a bug-fix: when we unmount, and we are currently in R/O
mode because of an error - we do not sync write-buffers, which
means we also do not cancel write-buffer timers we may possibly
have armed. This patch fixes the issue.
The issue can easily be reproduced by enabling UBIFS failure debug
mode (echo 4 > /sys/module/ubifs/parameters/debug_tsts) and
unmounting as soon as a failure happen. At some point the system
oopses because we have an armed hrtimer but UBIFS is unmounted
already.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a clean-up patch which:
1. Removes explicite 'hrtimer_cancel()' after 'ubifs_wbuf_sync()' in
'ubifs_remount_ro()', because the timers will be canceled by
'ubifs_wbuf_sync()', no need to cancel them for the second time.
2. Remove "if (c->jheads)" check from 'ubifs_put_super()', because
at journal heads must always be allocated there, since we checked
earlier that we were mounted R/W, and the olny situation when
journal heads are not allocated is when mounter or re-mounted R/O.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
.get_sb is called on mounts with automatic fs detection too, so this
function should print an error if it cannot read the superblock in
debug mode only (new behaviour conforms the other fs types)
Signed-off-by: Steffen Sledz <sledz@dresearch.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 2fde99cb55 "UBIFS: mark VFS SB RO too"
introduced regression. This commit made UBIFS set the 'MS_RDONLY' flag in the
VFS superblock when it switches to R/O mode due to an error. This was done
to make VFS show the R/O UBIFS flag in /proc/mounts.
However, several places in UBIFS relied on the 'MS_RDONLY' flag and assume this
flag can only change when we re-mount. For example, 'ubifs_put_super()'.
This patch introduces new UBIFS flag - 'c->ro_mount' which changes only when
we re-mount, and preserves the way UBIFS was originally mounted (R/W or R/O).
This allows us to de-initialize UBIFS cleanly in 'ubifs_put_super()'.
This patch also changes all 'ubifs_assert(!c->ro_media)' assertions to
'ubifs_assert(!c->ro_media && !c->ro_mount)', because we never should write
anything if the FS was mounter R/O.
All the places where we test for 'MS_RDONLY' flag in the VFS SB were changed
and now we test the 'c->ro_mount' flag instead, because it preserves the
original UBIFS mount type, unlike the 'MS_RDONLY' flag.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The R/O state may have various reasons:
1. The UBI volume is R/O
2. The FS is mounted R/O
3. The FS switched to R/O mode because of an error
However, in UBIFS we have only one variable which represents cases
1 and 3 - 'c->ro_media'. Indeed, we set this to 1 if we switch to
R/O mode due to an error, and then we test it in many places to
make sure that we stop writing as soon as the error happens.
But this is very unclean. One consequence of this, for example, is
that in 'ubifs_remount_fs()' we use 'c->ro_media' to check whether
we are in R/O mode because on an error, and we print a message
in this case. However, if we are in R/O mode because the media
is R/O, our message is bogus.
This patch introduces new flag - 'c->ro_error' which is set when
we switch to R/O mode because of an error. It also changes all
"if (c->ro_media)" checks to "if (c->ro_error)" checks, because
this is what the checks actually mean. We do not need to check
for 'c->ro_media' because if the UBI volume is in R/O mode, we
do not allow R/W mounting, and now writes can happen. This is
guaranteed by VFS. But it is good to double-check this, so this
patch also adds many "ubifs_assert(!c->ro_media)" checks.
In the 'ubifs_remount_fs()' function this patch makes a bit more
changes - it fixes the error messages as well.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
In 'mount_ubifs()', in case of 'ubifs_leb_unmap()' falure,
free allocated resources.
Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The error code from 'ubifs_rcvry_gc_commit()' was ignored, so UBIFS
failed to recover and continued. Instead, we should refuse mounting
the file-system.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch makes it possible to mount UBI character device
nodes, and use something like:
$ mount -t ubifs /dev/ubi_volume_name /mnt/ubifs
instead of the old restrictive 'nodev' semantics:
$ mount -t ubifs ubi0_0 /mnt/ubifs
[Comments and the patch were amended a bit by Artem]
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().
Note that ->s_bdi assignment is required for proper writeback!
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds two new exported functions:
- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
and also waits for the IO to complete.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add one more check to UBIFS - a check that makes sure that there
are no data nodes beyond inode size. And few commantaries fixes
along the line.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
Nowadays VFS always synchronizes all dirty inodes and pages before
calling '->sync_fs()', so remove unneeded 'generic_sync_sb_inodes()'
from 'ubifs_sync_fs()'. It used to be needed, but not any longer.
Pointed-out-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The BKL was pushed down from VFS to the file-systems. It used
to serialize mount/unmount/remount and prevented more than one
instance of the same file-system from doing
mount/umount/remount at the same time. But it is OK for UBIFS
and it does not need any additional locking for these cases.
Thus, kick the BKL out of UBIFS.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS uses a bdi device per volume, but does not care to hand out unique
names to each of them. This causes an error when trying to mount more
than one volumes. Append the UBI volume and device ID to avoid that.
[Amended a bit by Artem Bityutskiy]
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When debugging is enabled and an unclean file-system is mounter,
the following assertion is triggered:
UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081)
Call Trace:
[cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable)
[cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18
[cfaffc90] [c0112670] do_commit+0x150/0xa44
[cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544
[cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8
[cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320
[cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0
[cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8
[cfaffeb0] [c0095c24] do_mount+0x550/0x758
[cfafff10] [c0095ebc] sys_mount+0x90/0xe0
[cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c
The reason is that we initialize 'c->min_leb_idx' early, and do
not re-calculate it after journal replay.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds the following minor optimization:
1. If write-buffer does not use the timer, indicate it with the
wbuf->no_timer variable, instead of using the wbuf->softlimit
variable. This is better because wbuf->softlimit is of ktime_t
type, and the ktime_to_ns function contains 64-bit multiplication.
2. Do not call the 'hrtimer_cancel()' function for write-buffers
which do not use timers.
3. Do not cancel the timer in 'ubifs_put_super()' because the
synchronization function does this.
This patch also removes a confusing comment.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: start using hrtimers
hrtimer: export ktime_add_safe
UBIFS: do not forget to register BDI device
UBIFS: allow sync option in rootflags
UBIFS: remove dead code
UBIFS: use anonymous device
UBIFS: return proper error code if the compr is not present
UBIFS: return error if link and unlink race
UBIFS: reset no_space flag after inode deletion
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.
[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make sure a superblock really is writeable by checking MS_RDONLY
under s_umount. sync_filesystems needed some re-arragement for
that, but all but one sync_filesystem caller had the correct locking
already so that we could add that check there. cachefiles grew
s_umount locking.
I've also added a WARN_ON to sync_filesystem to assert this for
future callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
UBIFS uses timers for write-buffer write-back. It is not
crucial for us to write-back exactly on time. We are fine
to write-back a little earlier or later. And this means
we may optimize UBIFS timer so that it could be groped
with a close timer event, so that the CPU would not be
waken up just to do the write back. This is optimization
to lessen power consumption, which is important in
embedded devices UBIFS is used for.
hrtimers have a nice feature: they are effectively range
timers, and we may defind the soft and hard limits for
it. Standard timers do not have these feature. They may
only be made deferrable, but this means there is effectively
no hard limit. So, we will better use hrtimers.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When passing UBIFS parameters via kernel command line, the
sync option will be passed to UBIFS as a string, not as an
MS_SYNCHRONOUS flag. Teach UBIFS interpreting this flag.
Reported-by: Aurélien GÉRÔME <ag@debian.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS has erroneuosly set 'sb->s_dev' to the UBI volume
character device major/minor. This may lead to clashes
if there is another FS mounted to a block device with
the same major/minor numbers. User-space programs which
use 'stat->st_dev' may get confused because of this.
This problem was found by Al Viro. He also pointed the
way to fix the problem - use 'set_anon_super()' and
'kill_anon_super()' VFS helpers.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If the compressor is not present, mount_ubifs need
to return an error code. This way ubifs_fill_super
will stop and handle the error.
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When UBIFS runs out of space it spends a lot of time trying to
find more space before returning ENOSPC. As there is no point
repeating that unless something has changed, UBIFS has an
optimization to record that the file system is 100% full and not
try to find space. That flag was not being reset when a pending
deletion was finally done.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Reviewed-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
should:
up_write(sb->s_unmount);
deactivate_super(sb);
if simple_set_mnt() fails.
Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now UBIFS is supported by u-boot. If we ever decide to change the
media format, then people will have to upgrade their u-boots to
mount new format images. However, very often it is possible to
preserve R/O forward-compatibility, even though the write
forward-compatibility is not preserved.
This patch introduces a new super-block field which stores the
R/O compatibility version.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Adrian Hunter <Adrian.Hunter@nokia.com>
fs/ubifs/super.c: In function ‘ubifs_show_options’:
fs/ubifs/super.c:425: warning: format not a string literal and no format arguments
fs/ubifs/super.c: In function ‘mount_ubifs’:
fs/ubifs/super.c:1204: warning: format not a string literal and no format arguments
fs/ubifs/super.c: In function ‘ubifs_remount_rw’:
fs/ubifs/super.c:1557: warning: format not a string literal and no format arguments
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch introduces a helpful @c->idx_leb_size variable.
The patch also fixes some spelling issues and makes comments
use "LEB" instead of "eraseblock", which is more correct.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Empty journal head LEBs are accounted as taken empty as well, so
the GC LEB does not have to be the only taken empty LEB when
nounting/remounting.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This UBIFS feature has never worked properly, and it was a mistake
to add it because we simply have no use-cases. So, lets still accept
the fast_unmount mount option, but ignore it. This does not change
much, because UBIFS commit in sync_fs anyway, and sync_fs is called
while unmounting.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When mounting/re-mounting, UBIFS returns EINVAL even if the ENOSPC
or EROFS codes are are much better, just because we have not found
references to ENOSPC/EROFS in mount (2) man pages. This patch
changes this behaviour and makes UBIFS return real error code,
because:
1. It is just less confusing and more logical
2. mount is not described in SuSv3, so it seems to be not really
well-standartized
3. we do not cover all cases, and any random undocumented in man
pages error code may be returned anyway
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
- preserve the idx_gc list - it will be needed in the same
state, should UBIFS be remounted rw again
- prevent remounting ro if we have switched to read only
mode (due to a fatal error)
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
All writes go through wbufs so they must be sync'd
after syncing inodes and pages.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
I introduce wrong assertions in one of the previous commits, this
patch fixes them.
Also, initialize debugfs after the debugging check. This is a little
nicer because we want the FS data to be accessible to external users
after everything has been initialized.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When mounting read-only the orphan area head is
not initialized. It must be initialized when
remounting read/write, but it was not. This patch
fixes that.
[Artem: sorry, added comment tweaking noise]
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When we mount UBIFS, GC LEB may contain out-of-date information,
and UBIFS should update lprops and set free space for thei LEB.
Currently UBIFS does this only if mounted R/W. But for R/O mount
we have to do the same, because otherwise we will have incorrect
FS free space reported to user-space.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We observe space corrupted accounting when re-mounting. So add some
debbugging checks to catch problems like this.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When freeing the c->idx_lebs list, we have to release the LEBs as well,
because we might be called from mount to read-only mode code. Otherwise
the LEBs stay taken forever, which may cause problems when we re-mount
back ro RW mode.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
VFS calls '->sync_fs()' twice - first time with @wait = 0, second
time with @wait = 1. As a result, we may commit and synchronize
write-buffers twice. Avoid doing this by returning immediatelly if
@wait = 0.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
WB_SYNC_HOLD is going to be zapped so we should not use it. Use
%WB_SYNC_NONE instead. Here is what akpm said:
"I think I'll just switch that to WB_SYNC_NONE. The `wait==0' mode is
just an advisory thing to help the fs shove lots of data into the
queues. If some gets missed then it'll be picked up on the second
->sync_fs call, with wait==1."
Thanks to Randy Dunlap for catching this.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is fine if there is not free space - we should still allow mounting
this FS. This patch relaxes the free space requirements and adds info
dumps.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS commits on unmount to make the next mount faster. Currently,
it commits only if there is more than LEB size bytes in the
journal. This is not very good, because journal size may be
large (512KiB). And there may be few deletions in the journal
which do not take much journal space, but which do introduce
a lot of TNC changes and make mount slow.
Thus, jurt remove this condition and always commit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Always run commit in sync_fs, because even if the journal seems
to be almost empty, there may be a deletion which removes a large
file, which affects the index greatly. And because we want
better free space predictions after 'sync_fs()', we have to
commit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Argh. The ->sync_fs call is called _before_ all inodes are flushed.
This means we first sync write buffers and commit, then all
inodes are synced, and we end up with unflushed write buffers!
Fix this by forcing synching all indoes from 'ubifs_sync_fs()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The c->min_idx_lebs constant depends on c->old_idx_sz, which
is read from the master node. This means that we have to
initialize c->min_idx_lebs only after we have read the master
node.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Take into account that 2 eraseblocks are never available because
they are reserved for the index. This gives more realistic count
of FS blocks.
To avoid future confusions like this, introduce a constant.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We need to have a possibility to see various UBIFS variables
and ask UBIFS to dump various information. Debugfs is what
we need.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Introduce a new data structure which contains all debugging
stuff inside. This is cleaner than having debugging stuff
directly in 'c'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
It is very handy to be able to change default UBIFS compressor
via mount options. Introduce -o compr=<name> mount option support.
Currently only "none", "lzo" and "zlib" compressors are supported.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Save a 4 bytes of RAM per 'struct inode' by stroring inode
compression type in bit-filed, instead of using 'int'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
To avoid memory allocation failure during bulk-read, pre-allocate
a bulk-read buffer, so that if there is only one bulk-reader at
a time, it would just use the pre-allocated buffer and would not
do any memory allocation. However, if there are more than 1 bulk-
reader, then only one reader would use the pre-allocated buffer,
while the other reader would allocate the buffer for itself.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Bulk-read allocates 128KiB or more using kmalloc. The allocation
starts failing often when the memory gets fragmented. UBIFS still
works fine in this case, because it falls-back to standard
(non-optimized) read method, though. This patch teaches bulk-read
to allocate exactly the amount of memory it needs, instead of
allocating 128KiB every time.
This patch is also a preparation to the further fix where we'll
have a pre-allocated bulk-read buffer as well. For example, now
the @bu object is prepared in 'ubifs_bulk_read()', so we could
path either pre-allocated or allocated information to
'ubifs_do_bulk_read()' later. Or teaching 'ubifs_do_bulk_read()'
not to allocate 'bu->buf' if it is already there.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Bulk-read allocates a lot of memory with 'kmalloc()', and when it
is/gets fragmented 'kmalloc()' fails with a scarry warning. But
because bulk-read is just an optimization, UBIFS keeps working fine.
Supress the warning by passing __GFP_NOWARN option to 'kmalloc()'.
This patch also introduces a macro for the magic 128KiB constant.
This is just neater.
Note, this is not really fixes the problem we had, but just hides
the warnings. The further patches fix the problem.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
UBIFS: fix ubifs_compress commentary
UBIFS: amend printk
UBIFS: do not read unnecessary bytes when unpacking bits
UBIFS: check buffer length when scanning for LPT nodes
UBIFS: correct condition to eliminate unecessary assignment
UBIFS: add more debugging messages for LPT
UBIFS: fix bulk-read handling uptodate pages
UBIFS: improve garbage collection
UBIFS: allow for sync_fs when read-only
UBIFS: commit on sync_fs
UBIFS: correct comment for commit_on_unmount
UBIFS: update dbg_dump_inode
UBIFS: fix commentary
UBIFS: fix races in bit-fields
UBIFS: ensure data read beyond i_size is zeroed out correctly
UBIFS: correct key comparison
UBIFS: use bit-fields when possible
UBIFS: check data CRC when in error state
UBIFS: improve znode splitting rules
UBIFS: add no_chk_data_crc mount option
...
It is better to print "Reserved for root" than
"Reserved pool size", because it is more obvious for users
what this means.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sync_fs can be called even if the file system is mounted
read-only. Ensure the commit is not run in that case.
Reported-by: Zoltan Sogor <weth@inf.u-szeged.hu>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Commit the journal when the FS is sync'ed. This will make
statfs provide better free space report. And we anyway
advice our users to sync the FS if they want better statfs
report.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS read performance can be improved by skipping the CRC
check when data nodes are read. This option can be used if
the underlying media is considered to be highly reliable.
Note that CRCs are always checked for metadata.
Read speed on Arm platform with OneNAND goes from 19 MiB/s
to 27 MiB/s with data CRC checking disabled.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>