Libfuse basically creates a new thread for each new request. This is fine for
synchronous requests, which are naturally limited. However background
requests (especially writepage) can cause a thread creation storm.
To avoid this, limit the number of background requests available to userspace.
This is done by introducing another queue for background requests, and a
counter for the number of "active" requests, which are currently available for
userspace.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the fields 'dentry' and 'vfsmount' into the request specific union, since
these are only used for the RELEASE request.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Invalidate attributes on create, since st_ctime is updated. Reported by
Szabolcs Szakacsits.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer pointed out that a mount; umount loop of ecryptfs, with the same
cipher & other mount options, created a new ecryptfs_key_tfm_cache item
each time, and the cache could grow quite large this way.
Looking at this with mhalcrow, we saw that ecryptfs_parse_options()
unconditionally called ecryptfs_add_new_key_tfm(), which is what was adding
these items.
Refactor ecryptfs_get_tfm_and_mutex_for_cipher_name() to create a new
helper function, ecryptfs_tfm_exists(), which checks for the cipher on the
cached key_tfm_list, and sets a pointer to it if it exists. This can then
be called from ecryptfs_parse_options(), and new key_tfm's can be added
only when a cached one is not found.
With list locking changes suggested by akpm.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only the lower byte of cipher_code is ever used, so it makes sense
for its type to be u8.
Signed-off-by: Trevor Highland <trevor.highland@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The printk statements that result when the user does not have the
proper key available could use some refining.
Signed-off-by: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecryptfs_debug really should not be a mount option; it is not per-mount,
but rather sets a global "ecryptfs_verbosity" variable which affects all
mounted filesysytems. It's already settable as a module load option,
I think we can leave it at that.
Also, if set, since secret values come out in debug messages, kick
things off with a stern warning.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mike Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change ecryptfs_show_options to reflect the actual mount options in use.
Note that this does away with the "dir=" output, which is not a valid mount
option and appears to be unused.
Mount options such as "ecryptfs_verbose" and "ecryptfs_xattr_metadata" are
somewhat indeterminate for a given fs, but in any case the reported mount
options can be used in a new mount command to get the same behavior.
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no need to keep re-setting the same key for any given eCryptfs inode.
This patch optimizes the use of the crypto API and helps performance a bit.
Signed-off-by: Trevor Highland <trevor.highland@gmail.com>
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove internal references to header extents; just keep track of header bytes
instead. Headers can easily span multiple pages with the recent persistent
file changes.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- make the following needlessly global code static:
- crypto.c:ecryptfs_lower_offset_for_extent()
- crypto.c:key_tfm_list
- crypto.c:key_tfm_list_mutex
- inode.c:ecryptfs_getxattr()
- main.c:ecryptfs_init_persistent_file()
- remove the no longer used mmap.c:ecryptfs_lower_page_cache
- #if 0 the unused read_write.c:ecryptfs_read()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
schedule_timeout(jiffies) waits for at least jiffies - 1. Add 1 jiffie to
the timeout_jiffies calculated in sys_poll() to wait at least
timeout_msecs, like poll() manpage says.
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at
compile time, not runtime.
[akpm@linux-foundation.org: clean it all up]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use DEFAULT_SGI_PARTITION for SGI_PARTION default
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext2 should not worry about checking sb->s_blocksize for XIP before the
sb's blocksize actually gets set.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We restarted scan of sb->s_inodes list whenever we had to drop inode_lock
in add_dquot_ref(). This leads to overall quadratic running time and thus
add_dquot_ref() can take several minutes when called on a life filesystem.
We fix the problem by using the fact that inode cannot be removed from
s_inodes list while we hold a reference to it and thus we can safely
restart the scan if we don't drop the reference. Here we use the fact that
inodes freshly added to s_inodes list are already guaranteed to have quotas
properly initialized and the ordering of inodes on s_inodes list does not
change so we cannot skip any inode.
Thanks goes to Nick <gentuu@gmail.com> for analyzing the problem and
testing the fix.
[akpm@linux-foundation.org: iput(NULL) is legal]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Nick <gentuu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inotify debugging code is supposed to verify that the
DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in
notifications getting lost nor extra needless locking generated.
Unfortunately there are also some races in the debugging code. And it isn't
very good at finding problems anyway. So remove it for now.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: Yan Zheng <yanzheng@21cn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race between setting an inode's children's "parent watched" flag
when placing the first watch on a parent, and instantiating new children of
that parent: a child could miss having its flags set by
set_dentry_child_flags, but then inotify_d_instantiate might still see
!inotify_inode_watched.
The solution is to set_dentry_child_flags after adding the watch. Locking is
taken care of, because both set_dentry_child_flags and inotify_d_instantiate
hold dcache_lock and child->d_locks.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: Yan Zheng <yanzheng@21cn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.
Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.
Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.
This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.
[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, no notification event has been sent when inode's link count
changed. This is inconvenient for the application in some cases:
Suppose you have the following directory structure
foo/test
bar/
and you watch test. If someone does "mv foo/test bar/", you get event
IN_MOVE_SELF and you know something has happened with the file "test".
However if someone does "ln foo/test bar/test" and "rm foo/test" you get no
inotify event for the file "test" (only directories "foo" and "bar" receive
events).
Furthermore it could be argued that link count belongs to file's metadata and
thus IN_ATTRIB should be sent when it changes.
The following patch implements sending of IN_ATTRIB inotify events when link
count of the inode changes, i.e., when a hardlink to the inode is created or
when it is removed. This event is sent in addition to all the events sent so
far. In particular, when a last link to a file is removed, IN_ATTRIB event is
sent in addition to IN_DELETE_SELF event.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Morten Welinder <mwelinder@gmail.com>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Address Roman's review comments for the previously sent on-disk
corruption hfs robustness patch.
- use 0 as a failure value, rather than making a new macro HFS_BAD_KEYLEN,
and use a switch statement instead of if's.
- Add new fail: target to __hfs_brec_find to skip assignments using bad
values when exiting with a failure.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use list_for_each_entry_reverse for super_blocks list and remove
unused sb_entry macro.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use hlist_unhashed() instead of opencoded equivalent.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The avenrun[] values are supposed to be protected by xtime_lock.
loadavg_read_proc does not use it. Theoretically this may result in an
occasional glitch when the value read from /proc/loadavg would be as much
as 1<<11 times higher than it should be.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the prototypes for its global
functions (in this case sys_eventfd()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the prototypes for its global
functions (in this case sys_signalfd()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the prototypes for its global
functions.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ad a proper prototype for migration_init() in include/linux/fs.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
smb_receive calls kernel_recvmsg with a size that's the minimum of the
amount of buffer space in the kvec passed in or req->rq_rlen (which
represents the length of the response). This does not take into account
any data that was read in a request earlier pass through smb_receive.
If the first pass through smb_receive receives some but not all of the
response, then the next pass can call kernel_recvmsg with a size field
that's too big. kernel_recvmsg can overrun into the next response,
throwing off the alignment and making it unrecognizable.
This causes messages like this to pop up in the ring buffer:
smb_get_length: Invalid NBT packet, code=69
as well as other errors indicating that the response is unrecognizable.
Typically this is seen on a smbfs mount under heavy I/O.
This patch changes the code to use (req->rq_rlen - req->rq_bytes_recvd)
instead instead of just req->rq_rlen, since that should represent the
amount of unread data in the response.
I think this is correct, but an ACK or NACK from someone more familiar
with this code would be appreciated...
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes sure printk format strings contain no more than a single line.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
[the message was tweaked.]
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for show_interrupts() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about
MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount
--make-private".
Today I happened to see mount(8) and Documentation/sharedsubtree.txt and
both document the version obtained by applying the little patch given in
the above (and again below).
So, the present kernel code is not according to specs and must be regarded
as buggy.
Specification in Documentation/sharedsubtree.txt:
See state diagram: unbindable should become private upon make-private.
Specification in mount(8):
... It's
also possible to set up uni-directional propagation (with --make-
slave), to make a mount point unavailable for --bind/--rbind (with
--make-unbindable), and to undo any of these (with --make-private).
Repeat of old fix-shared-subtrees-make-private.patch
(due to Dirk Gerrits, René Gabriëls, Peter Kooijmans):
Acked-by: Ram Pai <linuxram@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add SIGIO-driven I/O for descriptors returned by inotify_init(). The thing
may be enabled by convenient fcntl (fd, F_SETFL, O_ASYNC) call.
Signed-off-by: Dmitry Antipov <antipov@dev.rtsoft.ru>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext2 file system was by default ignoring errors and continuing. This is
not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system readonly. Debian
and ubuntu already does this as the default in their fstab.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes some instances where we were continuing after calling
ext2_error. ext2_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext2_error call.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following comment is at fs/inotify_user.c:287
/* coalescing: drop this event if it is a dupe of the previous */
I think the previous event in the comment should be the last event in the
link list. But inotify_dev_get_event return the first event in the list.
In addition, it doesn't check whether the list is empty
Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Acked-by: Robert Love <rlove@rlove.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prohibit mode changes in non-quiet mode that cannot be stored reliably with
the on-disk format.
Suppose a vfat filesystem is mounted with umask=0 and [not-quiet]. Then
all files will have mode 0777. Trying to change the owner will fail,
because fat does not know about owners or groups. chmod 0770, on the other
hand, will succeed, even though fat does not know about the permission
triplet [user/group/other].
So this patch changes fat's not-quiet behavior so that only UNIX modes are
accepted that can be mapped lossless between the fat disk format and the
local system. There is only one attribute, and that is the readonly
attribute, which is mapped to the UNIX write permission bit(s). chmod 0555
is therefore valid (taking away the +w bits <=> setting the readonly
attribute). Since chmod 0775 and chmod 0755 is an ambiguous case as to
whether to set or clear the readonly bit, these modes are also denied.
In quiet mode, chmod and chown will continue to "succeed" as they did
before, meaning that a subsequent stat() will temporarily return the new
mode as long as the inode is not reread from disk, and chown will silently
do nothing, not even return the new uid/gid in stat().
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
smbfs is a bit buggy and has no maintainer. Change it to shout at the user on
the first five mount attempts - tell them to switch to CIFS.
Come December we'll mark it BROKEN and see what happens.
[olecom@flower.upol.cz: documentation update]
Cc: Urban Widmark <urban@teststation.com>
Acked-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Oleg Verych <olecom@flower.upol.cz>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply.
Signed-off-by: Dominique Quatravaux <dominique@quatravaux.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities. If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.
FWIW libcap-2.00 supports this change (and earlier capability formats)
http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally vfs_getxattr would pull the security xattr variable using
the inode getxattr handle and then proceed to clobber it with a subsequent call
to the LSM.
This patch reorders the two operations such that when the xattr requested is
in the security namespace it first attempts to grab the value from the LSM
directly.
If it fails to obtain the value because there is no module present or the
module does not support the operation it will fall back to using the inode
getxattr operation.
In the event that both are inaccessible it returns EOPNOTSUPP.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch modifies the interface to inode_getsecurity to have the function
return a buffer containing the security blob and its length via parameters
instead of relying on the calling function to give it an appropriately sized
buffer.
Security blobs obtained with this function should be freed using the
release_secctx LSM hook. This alleviates the problem of the caller having to
guess a length and preallocate a buffer for this function allowing it to be
used elsewhere for Labeled NFS.
The patch also removed the unused err parameter. The conversion is similar to
the one performed by Al Viro for the security_getprocattr hook.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays. But sometimes the following
happens instead:
- after 30s: ~4M
- after 5s: ~4M
- after 5s: all remaining 92M
Some analyze shows that the internal io dispatch queues goes like this:
s_io s_more_io
-------------------------
1) 100M,1K 0
2) 1K 96M
3) 0 96M
1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)
nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out. The big dirty file is actually still sitting in
s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty. It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).
We have to return when a full scan of s_io completes. So nr_to_write > 0
does not necessarily mean that "all data are written". This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done. With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.
In sync_sb_inodes() we only set more_io on super_blocks we actually
visited. This avoids the interaction between two pdflush deamons.
Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress. Failing to do so may lead to 100% iowait.
Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since I_SYNC was split out from I_LOCK, the concern in commit
4b89eed93e ("Write back inode data pages
even when the inode itself is locked") is not longer valid.
We should revert to the original behavior: in __writeback_single_inode(),
when we find an I_SYNC-ed inode and we're not doing a data-integrity sync,
skip writing entirely. Otherwise, we are double calling do_writepages()
Signed-off-by: Qi Yong <qiyong@fc-cn.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>