Commit graph

282 commits

Author SHA1 Message Date
Steven Whitehouse
b2c8b3ea87 GFS2: Allocate block for xattr at inode alloc time, if required
This is another step towards improving the allocation of xattr
blocks at inode allocation time. Here we take advantage of
Christoph's recent work on ACLs to allocate a block for the
xattrs early if we know that we will be adding ACLs to the
inode later on. The advantage of that is that it is much
more likely that we'll get a contiguous run of two blocks
where the first is the inode and the second is the xattr block.

We still have to fall back to the original system in case we
don't get the requested two contiguous blocks, or in case the
ACLs are too large to fit into the block.

Future patches will move more of the ACL setting code further
up the gfs2_inode_create() function. Also, I'd like to be
able to do the same thing with the xattrs from LSMs in
due course, too. That way we should be able to slowly reduce
the number of independent transactions, at least in the
most common cases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-04 15:45:11 +00:00
Linus Torvalds
bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Christoph Hellwig
e01580bf9e gfs2: use generic posix ACL infrastructure
This contains some major refactoring for the create path so that
inodes are created with the right mode to start with instead of
fixing it up later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:22 -05:00
J. Bruce Fields
d57b9c9a99 GFS2: revert "GFS2: d_splice_alias() can't return error"
0d0d110720 asserts that "d_splice_alias()
can't return error unless it was given an IS_ERR(inode)".

That was true of the implementation of d_splice_alias, but this is
really a problem with d_splice_alias: at a minimum it should be able to
return -ELOOP in the case where inserting the given dentry would cause a
directory loop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-18 09:50:53 +00:00
Steven Whitehouse
ac3beb6a5d GFS2: Don't use ENOBUFS when ENOMEM is the correct error code
Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.

>        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
2014-01-16 10:31:13 +00:00
Bob Peterson
62e96cf819 GFS2: Increase i_writecount during gfs2_setattr_chown
This patch calls get_write_access in function gfs2_setattr_chown,
which merely increases inode->i_writecount for the duration of the
function. That will ensure that any file closes won't delete the
inode's multi-block reservation while the function is running.
It also ensures that a multi-block reservation exists when needed
for quota change operations during the chown.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-07 09:43:51 +00:00
Steven Whitehouse
2b47dad866 GFS2: Remember directory insert point
When we look to see if there is enough space to add a dir
entry without allocation, we have then been repeating the
same search later when we do the actual insertion. This
patch caches the details of the location in the gfs2_diradd
structure, so that we do not have to repeat the search.

This will provide a performance improvement which will be
greater as the size of the directory increases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-06 12:49:43 +00:00
Steven Whitehouse
534cf9ca55 GFS2: Consolidate transaction blocks calculation for dir add
There are three cases where we need to calculate the number of
blocks to reserve in a transaction involving linking an inode
into a directory. The one in rename is a bit more complicated,
but the basis of it is the same as for link and create. So it
makes sense to move this calculation into a single function
rather than repeating it three times.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-06 12:03:05 +00:00
Steven Whitehouse
3c1c0ae1db GFS2: Add directory addition info structure
The intent is that this structure will hold the information
required when adding entries to a directory (linking). To
start with, it will contain only the number of blocks which
are required to link the new entry into the directory. The
current calculation returns either 0 or the maximim number of
blocks that can ever be requested by such a transaction.

The intent is that in a later patch, we can update the dir
code to calculate this value more accurately. In addition
further patches will also add further fields to the new
structure to increase its utility.

In addition this patch fixes a bug where the link used during
inode creation was adding requesting too many blocks in
some cases. This is harmless unless the fs is close to being
full.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-06 11:28:41 +00:00
Steven Whitehouse
ea0341e071 GFS2: Fix ref count bug relating to atomic_open
In the case that atomic_open calls finish_no_open() with
the dentry that was supplied to gfs2_atomic_open() an
extra reference count is required. This patch fixes that
issue preventing a bug trap triggering at umount time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-11-21 18:47:57 +00:00
Linus Torvalds
9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Linus Torvalds
8b5baa460b The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list
 code. There is still plenty of scope for some further changes in
 due course - faster lookups of quota structures is very much
 on the todo list. Also, a start has been made towards the more tricky
 issue of using the generic lru code with glocks, but that will
 have to be completed in a subsequent merge window.
 
 The other, more minor feature, is that there have been a number of
 performance patches which relate to block allocation. In particular
 they will improve performance when the disk is nearly full.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSeQMSAAoJEMrg3m4a/8jS4/EP/AtkfsT+GATPmK2R3Yoy0Hrb
 4KucaloOtlUmSsVwTpzYGYZaqJo2D5BndJWw9jekPJOS4aB5CbE1ZYCMIKyuhhr4
 Y70kjfGlwK5hRSItPJ5gHnWkiTZzR65wBLj/+EBAFm2gF3UsJ4DJvLNvd8DP9SJC
 3IfYfqV6cPa7aPDhmEbdq5h0X5iSI+Ee/X2Z3a6fe7rErR1cD4iAEFEyPHa0aHgt
 TkrS32DodOn/J26PvUFq5MUb+El+Ul6EpeB3CC8UN0+pvucAKCMVy8+sPROTbViz
 mMRyWxHHHPDEEulFPWJlXW/tOAhHMHTPGbnWu4bH+iDudzOHif7E0tWklPR9bJAY
 4/1Fa4ILIxV02kdGBHxO74Vv/ir4gyLzzXCPbXOpxu5jMw3VdN9dp0/Uck+rmsya
 rC5Q/8vm4AUO7YYHUBEEaT7Nqp8HRRWGwv11Wdyqf3RQyC5jYHNEXYJkdMsODEae
 p+Ju/O6MfLw68IrG38RaGT4/tCBPonggsCVxwqNxqyDnjtNEpO/o3VjJMJ/3j2b5
 CCRx+9JYENT8EsdpIFWasfABy66xbKPTE9RiMUbk1e73julXLfzIMI3/Ol/Bj7rQ
 YLs5XYrKcz3QfYgMvNS3nMbI3w3nJrCnzdV7STps8nyaOa1oQndGxe9b7tDb+Fb8
 /acuYuQclbvsAkzvH4jc
 =qwXe
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw

Pull gfs2 updates from Steven Whitehouse:
 "The main feature of interest this time is quota updates.  There are
  some clean ups and some patches to use the new generic lru list code.

  There is still plenty of scope for some further changes in due course -
  faster lookups of quota structures is very much on the todo list.
  Also, a start has been made towards the more tricky issue of using the
  generic lru code with glocks, but that will have to be completed in a
  subsequent merge window.

  The other, more minor feature, is that there have been a number of
  performance patches which relate to block allocation.  In particular
  they will improve performance when the disk is nearly full"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Use generic list_lru for quota
  GFS2: Rename quota qd_lru_lock qd_lock
  GFS2: Use reflink for quota data cache
  GFS2: Use lockref for glocks
  GFS2: Protect quota sync generation
  GFS2: Inline qd_trylock into gfs2_quota_unlock
  GFS2: Make two similar quota code fragments into a function
  GFS2: Remove obsolete quota tunable
  GFS2: Move gfs2_icbit_munge into quota.c
  GFS2: Speed up starting point selection for block allocation
  GFS2: Add allocation parameters structure
  GFS2: Clean up reservation removal
  GFS2: fix dentry leaks
  GFS2: new function gfs2_rbm_incr
  GFS2: Introduce rbm field bii
  GFS2: Do not reset flags on active reservations
  GFS2: introduce bi_blocks for optimization
  GFS2: optimize rbm_from_block wrt bi_start
  GFS2: d_splice_alias() can't return error
2013-11-11 07:11:00 +09:00
Al Viro
87dc800be2 new helper: kfree_put_link()
duplicated to hell and back...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:49 -04:00
Steven Whitehouse
7b9cff4671 GFS2: Add allocation parameters structure
This patch adds a structure to contain allocation parameters with
the intention of future expansion of this structure. The idea is
that we should be able to add more information about the allocation
in the future in order to allow the allocator to make a better job
of placing the requests on-disk.

There is no functional difference from applying this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-02 11:13:25 +01:00
Steven Whitehouse
af5c269799 GFS2: Clean up reservation removal
The reservation for an inode should be cleared when it is truncated so
that we can start again at a different offset for future allocations.
We could try and do better than that, by resetting the search based on
where the truncation started from, but this is only a first step.

In addition, there are three callers of gfs2_rs_delete() but only one
of those should really be testing the value of i_writecount. While
we get away with that in the other cases currently, I think it would
be better if we made that test specific to the one case which
requires it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-27 12:49:33 +01:00
Miklos Szeredi
5ca1db41ec GFS2: fix dentry leaks
We need to dput() the result of d_splice_alias(), unless it is passed to
finish_no_open().

Edited by Steven Whitehouse in order to make it apply to the current
GFS2 git tree, and taking account of a prerequisite patch which hasn't
been applied.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: stable@vger.kernel.org
2013-09-23 13:30:57 +01:00
Miklos Szeredi
0d0d110720 GFS2: d_splice_alias() can't return error
unless it was given an IS_ERR(inode), which isn't the case here.  So clean
up the unnecessary error handling in gfs2_create_inode().

This paves the way for real fixes (hence the stable Cc).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: stable@vger.kernel.org
2013-09-17 10:04:07 +01:00
Miklos Szeredi
c5bf8fef52 gfs2: set FILE_CREATED
In gfs2_create_inode() set FILE_CREATED in *opened.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-16 19:17:24 -04:00
Steven Whitehouse
7bd9ee58a4 GFS2: Check for glock already held in gfs2_getxattr
Since the introduction of atomic_open, gfs2_getxattr can be
called with the glock already held, so we need to allow for
this.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
2013-08-19 09:33:57 +01:00
Steven Whitehouse
2523d47a79 GFS2: Fix typo in gfs2_create_inode()
PTR_RET should be PTR_ERR

Reported-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-08-19 09:32:29 +01:00
Steven Whitehouse
6d4ade986f GFS2: Add atomic_open support
I've restricted atomic_open to only operate on regular files, although
I still don't understand why atomic_open should not be possible also for
directories on GFS2. That can always be added in later though, if it
makes sense.

The ->atomic_open function can be passed negative dentries, which
in most cases means either ENOENT (->lookup) or a call to d_instantiate
(->create). In the GFS2 case though, we need to actually perform the
look up, since we do not know whether there has been a new inode created
on another node. The look up calls d_splice_alias which then tries to
rehash the dentry - so the solution here is to simply check for that
in d_splice_alias. The same issue is likely to affect any other cluster
filesystem implementing ->atomic_open

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields fieldses org>
Cc: Jeff Layton <jlayton@redhat.com>
2013-06-14 11:17:15 +01:00
Steven Whitehouse
5a00f3cc97 GFS2: Only do one directory search on create
Creation of a new inode requires a directory search in order to ensure
that we are not trying to create an inode with the same name as an
existing one. This was hidden away inside the create_ok() function.

In the case that there was an existing inode, and a lookup can be
substituted for a create (which is the case with regular files
when the O_EXCL flag is not in use) then we were doing a second
lookup in order to return the inode.

This patch merges these two lookups into one. This can be done by
passing a flag to gfs2_dir_search() to tell it to just return -EEXIST
in the cases where we don't actually want to look up the inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-11 13:45:29 +01:00
Thomas Meyer
2b6f8860e1 GFS2: Cocci spatch "ptr_ret.spatch"
Use PTR_RET in place of open coding this function.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-05 09:51:08 +01:00
Bob Peterson
a6a4d98b01 GFS2: Don't cache iopen glocks
This patch makes GFS2 immediately reclaim/delete all iopen glocks
as soon as they're dequeued. This allows deleters to get an
EXclusive lock on iopen so files are deleted properly instead of
being set as unlinked.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:40:22 +01:00
Steven Whitehouse
79ba74808d GFS2: Use gfs2_dinode_out() in the inode create path
Over the previous two patches relating to inode creation, the
content of init_dinode() has been looking more and more like
gfs2_dinode_out(). This is not an accident! This patch replaces
the parts of init_dinode() which are duplicated in gfs2_dinode_out()
with a call to that function.

Mostly that is straightforward, but there is one issue which needed
to be resolved relating to the link count. The link count has to be
set to zero in a certain error handling code path, which lands up
calling iput(). This is now done specifically in that code path
allowing the link count to be set earlier and written into the
on disk inode by gfs2_dinode_put() in the normal way.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08 08:40:37 +01:00
Steven Whitehouse
28fb302755 GFS2: Remove gfs2_refresh_inode from inode creation path
The original method for creating inodes used in GFS2 was to fill
out a buffer, with all the information, and then to read that
buffer into the in-core inode, using gfs2_refresh_inode()

The problem with this approach is that all the inode's fields
need to be calculated ahead of time, and were stored in various
variables making the code rather complicated.

The new approach is simply to allocate the in-core inode earlier
and fill in as many fields as possible ahead of time. These can
then be used to initilise the on disk representation. The
code has been working towards the point where it is possible
to remove gfs2_refresh_inode() because all the fields are
correctly initialised ahead of time. We've now reached that
milestone, and have reversed the order of setting up the in
core and on disk inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08 08:40:17 +01:00
Steven Whitehouse
fd4b4e042c GFS2: Clean up inode creation path
This patch cleans up the inode creation code path in GFS2. After the
Orlov allocator was merged, a number of potential improvements are
now possible, and this is a first set of these.

The quota handling is now updated so that it matches the point in
the code where the allocation takes place. This means that the one
exception in gfs2_alloc_blocks relating to quota is now no longer
required, and we can use the generic code everywhere.

In addition the call to figure out whether we need to allocate any
extra blocks in order to add a directory entry is moved higher up
gfs2_create_inode. This means that if it returns an error, we
can deal with that at a stage where it is easier to handle that case.
The returned status cannot change during the function since we hold
an exclusive lock on the directory.

Two calls to gfs2_rindex_update have been changed to one, again at
the top of gfs2_create_inode to simplify error handling.

The time stamps are also now initialised earlier in the creation
process, this is gradually moving towards being able to remove the
call to gfs2_refresh_inode in gfs2_inode_create once we have all the
fields covered.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08 08:39:56 +01:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Eric W. Biederman
d054642642 gfs2: Convert uids and gids between dinodes and vfs inodes.
When reading dinodes from the disk convert uids and gids
into kuids and kgids to store in vfs data structures.

When writing to dinodes to the disk convert kuids and kgids
in the in memory structures into plain uids and gids.

For now all on disk data structures are assumed to be
stored in the initial user namespace.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:11 -08:00
Eric W. Biederman
6b24c0d279 gfs2: Use uid_eq and gid_eq where appropriate
Where kuid_t values are compared use uid_eq and where kgid_t values
are compared use gid_eq.  This is unfortunately necessary because
of the type safety that keeps someone from accidentally mixing
kuids and kgids with other types.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:10 -08:00
Eric W. Biederman
7c06b5d672 gfs2: Use kuid_t and kgid_t types where appropriate.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:09 -08:00
Eric W. Biederman
f4108a607f gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
so the constants may be well typed.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:01 -08:00
Steven Whitehouse
350a9b0a72 GFS2: Split gfs2_trans_add_bh() into two
There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:04 +00:00
Bob Peterson
1e2d9d44f3 GFS2: Set gl_object during inode create
This patch fixes a cluster coherency problem that occurs when one
node creates a file, does several writes, then a different node
tries to write to the same file. When the inode's glock is demoted,
the inode wasn't synced to the media properly because the gl_object
wasn't set. Later, the flush daemon noticed the uncommitted data
and tried to flush it, only to discover the glock was no longer locked
properly in exclusive mode. That caused an assert withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-21 14:49:21 +00:00
Bob Peterson
be4f245dbb GFS2: add error check while allocating new inodes
This patch adds a return code check after attempting to allocate
a new inode during dinode creation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-16 14:26:57 +00:00
Bob Peterson
4327a9bf71 GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode
Since we now have a dirty_inode that takes care of manipulating the
inode buffer and writing from the inode to the buffer, we can
eliminate some unnecessary buffer manipulations in gfs2_unlink_inode
that are now redundant.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 09:55:26 +00:00
Steven Whitehouse
9dbe9610b9 GFS2: Add Orlov allocator
Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:33:17 +00:00
Steven Whitehouse
c9aecf7371 GFS2: Use proper allocation context for new inodes
Rather than using the parent directory's allocation context, this
patch allocated the new inode earlier in the process and then uses
it to contain all the information required. As a result, we can now
use the new inode's own allocation context to allocate it rather
than having to use the parent directory's context. This give us a
lot more flexibility in where the inode is placed on disk.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:32:42 +00:00
Steven Whitehouse
ff7f4cb461 GFS2: Consolidate free block searching functions
With the recently added block reservation code, an additional function
was added to search for free blocks. This had a restriction of only being
able to search for aligned extents of free blocks. As a result the
allocation patterns when reserving blocks were suboptimal when the
existing allocation of blocks for an inode was not aligned to the same
boundary.

This patch resolves that problem by adding the ability for gfs2_rbm_find
to search for extents of a particular minimum size. We can then use
gfs2_rbm_find for both looking for reservations, and also looking for
free blocks on an individual basis when we actually come to do the
allocation later on. As a result we only need a single set of code
to deal with both situations.

The function gfs2_rbm_from_block() is moved up rgrp.c so that it
occurs before all of its callers.

Many thanks are due to Bob for helping track down the final issue in
this patch. That fix to the rb_tree traversal and to not share
block reservations from a dirctory to its children is included here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:26 +01:00
Steven Whitehouse
71f890f7f7 GFS2: Remove rs_requested field from reservations
The rs_requested field is left over from the original allocation
code, however this should have been a parameter passed to the
various functions from gfs2_inplace_reserve() and not a member of the
reservation structure as the value is not required after the
initial allocation.

This also helps simplify the code since we no longer need to set
the rs_requested to zero. Also the gfs2_inplace_release()
function can also be simplified since the reservation structure
will always be defined when it is called, and the only remaining
task is to unlock the rgrp if required. It can also now be
called unconditionally too, resulting in a further simplification.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:54 +01:00
Steven Whitehouse
645b2ccc75 GFS2: Fix missing allocation data for set/remove xattr
These entry points were missed in the original patch to allocate
this data structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-13 10:30:34 +01:00
Linus Torvalds
801b03653f Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Eliminate 64-bit divides
  GFS2: Reduce file fragmentation
  GFS2: kernel panic with small gfs2 filesystems - 1 RG
  GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
  GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
  GFS2: Add kobject release method
  GFS2: Size seq_file buffer more carefully
  GFS2: Use seq_vprintf for glocks debugfs file
  seq_file: Add seq_vprintf function and export it
  GFS2: Use lvbs for storing rgrp information with mount option
  GFS2: Cache last hash bucket for glock seq_files
  GFS2: Increase buffer size for glocks and glstats debugfs files
  GFS2: Fix error handling when reading an invalid block from the journal
  GFS2: Add "top dir" flag support
  GFS2: Fold quota data into the reservations struct
  GFS2: Extend the life of the reservations
2012-07-24 17:57:05 -07:00
Bob Peterson
8e2e004735 GFS2: Reduce file fragmentation
This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.

The implementation uses an rbtree of reservations attached to each
resource group (and each inode).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-19 14:51:08 +01:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Bob Peterson
5407e24229 GFS2: Fold quota data into the reservations struct
This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:20:22 +01:00
Bob Peterson
0a305e4960 GFS2: Extend the life of the reservations
This patch lengthens the lifespan of the reservations structure for
inodes. Before, they were allocated and deallocated for every write
operation. With this patch, they are allocated when the first write
occurs, and deallocated when the last process closes the file.
It's more efficient to do it this way because it saves GFS2 a lot of
unnecessary allocates and frees. It also gives us more flexibility
for the future: (1) we can now fold the qadata structure back into
the structure and save those alloc/frees, (2) we can use this for
multi-block reservations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:17:59 +01:00
Bob Peterson
5e2f7d617b GFS2: Make sure rindex is uptodate before starting transactions
This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-05 10:20:10 +01:00
Steven Whitehouse
66fc061bda GFS2: FITRIM ioctl support
The FITRIM ioctl provides an alternative way to send discard requests to
the underlying device. Using the discard mount option results in every
freed block generating a discard request to the block device. This can
be slow, since many block devices can only process discard requests of
larger sizes, and also such operations can be time consuming.

Rather than using the discard mount option, FITRIM allows a sweep of the
filesystem on an occasional basis, and also to optionally avoid sending
down discard requests for smaller regions.

In GFS2 FITRIM will work at resource group granularity. There is a flag
for each resource group which keeps track of which resource groups have
been trimmed. This flag is reset whenever a deallocation occurs in the
resource group, and set whenever a successful FITRIM of that resource
group has taken place. This helps to reduce repeated discard requests
for the same block ranges, again improving performance.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 17:10:21 +00:00
Steven Whitehouse
a365fbf354 GFS2: Read resource groups on mount
This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:52:39 +00:00
Bob Peterson
718b97bd6b GFS2: Read in rindex if necessary during unlink
This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:48:02 +00:00
Steven Whitehouse
66ad863b41 GFS2: Fix nlink setting on inode creation
Since the nlink count will be 0, we need to use set_nlink rather
than inc_nlink in order to avoid triggering the inc_nlink warning
which was added recently.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11 12:35:05 +00:00
Linus Torvalds
1619ed8f60 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: local functions should be static
  GFS2: We only need one ACL getting function
  GFS2: Fix multi-block allocation
  GFS2: decouple quota allocations from block allocations
  GFS2: split function rgblk_search
  GFS2: Fix up "off by one" in the previous patch
  GFS2: move toward a generic multi-block allocator
  GFS2: O_(D)SYNC support for fallocate
  GFS2: remove vestigial al_alloced
  GFS2: combine gfs2_alloc_block and gfs2_alloc_di
  GFS2: Add non-try locks back to get_local_rgrp
  GFS2: f_ra is always valid in dir readahead function
  GFS2: Fix very unlikley memory leak in ACL xattr code
  GFS2: More automated code analysis fixes
  GFS2: Add readahead to sequential directory traversal
  GFS2: Fix up REQ flags
2012-01-08 13:07:54 -08:00
Al Viro
175a4eb7ea fs: propagate umode_t, misc bits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro
1a67aafb5f switch ->mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro
4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
H Hartley Sweeten
46cc1e5fce GFS2: local functions should be static
Quiets the sparse noise:

warning: symbol 'gfs2_initxattrs' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-12-06 09:46:41 +00:00
Steven Whitehouse
6a8099ed56 GFS2: Fix multi-block allocation
Clean up gfs2_alloc_blocks so that it takes the full extent length
rather than just the number of non-inode blocks as an argument. That
will only make a difference in the inode allocation case for now.

Also, this fixes the extent length handling around gfs2_alloc_extent() so
that multi block allocations will work again.

The rd_last_alloc block is set to the final block in the allocated
extent (as per the update to i_goal, but referenced to a different
start point).

This also removes the dinode argument to rgblk_search() which is no
longer used.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 12:18:51 +00:00
Bob Peterson
564e12b115 GFS2: decouple quota allocations from block allocations
This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 10:25:21 +00:00
Bob Peterson
6e87ed0fc9 GFS2: move toward a generic multi-block allocator
This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21 10:04:09 +00:00
Bob Peterson
3c5d785acf GFS2: combine gfs2_alloc_block and gfs2_alloc_di
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15 15:25:03 +00:00
Steven Whitehouse
87654896ca GFS2: More automated code analysis fixes
A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08 14:04:20 +00:00
Linus Torvalds
f793f29611 Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw
* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits)
  GFS2: Move readahead of metadata during deallocation into its own function
  GFS2: Remove two unused variables
  GFS2: Misc fixes
  GFS2: rewrite fallocate code to write blocks directly
  GFS2: speed up delete/unlink performance for large files
  GFS2: Fix off-by-one in gfs2_blk2rgrpd
  GFS2: Clean up ->page_mkwrite
  GFS2: Correctly set goal block after allocation
  GFS2: Fix AIL flush issue during fsync
  GFS2: Use cached rgrp in gfs2_rlist_add()
  GFS2: Call do_strip() directly from recursive_scan()
  GFS2: Remove obsolete assert
  GFS2: Cache the most recently used resource group in the inode
  GFS2: Make resource groups "append only" during life of fs
  GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme
  GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added
  GFS2: Clean up gfs2_create
  GFS2: Use ->dirty_inode()
  GFS2: Fix bug trap and journaled data fsync
  GFS2: Fix inode allocation error path
  ...
2011-10-28 10:44:50 -07:00
Steven Whitehouse
54335b1fca GFS2: Cache the most recently used resource group in the inode
This means that after the initial allocation for any inode, the
last used resource group is cached in the inode for future use.
This drastically reduces the number of lookups of resource
groups in the common case, and this the contention on that
data structure.

The allocation algorithm is the same as previously, except that we
always check to see if the goal block is within the cached rgrp
first before going to the rbtree to look one up.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:34 +01:00
Steven Whitehouse
8339ee543e GFS2: Make resource groups "append only" during life of fs
Since we have ruled out supporting online filesystem shrink,
it is possible to make the resource group list append only
during the life of a super block. This gives several benefits:

Firstly, we only need to read new rindex elements as they are added
rather than needing to reread the whole rindex file each time one
element is added.

Secondly, the rindex glock can be held for much shorter periods of
time, and is completely removed from the fast path for allocations.
The lock is taken in shared mode only when updating the resource
groups when the first allocation occurs, and after a grow has
taken place.

Thirdly, this results in a reduction in code size, and everything
gets a lot simpler to understand in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:33 +01:00
Steven Whitehouse
9a63edd12b GFS2: Clean up gfs2_create
If we pass through knowledge of whether the creation is intended to be
exclusive or not, then we can deal with that in gfs2_create_inode
and remove one set of locking. Also this removes the loop in
gfs2_create and simplifies the code a bit.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:28 +01:00
Steven Whitehouse
ab9bbda020 GFS2: Use ->dirty_inode()
The aim of this patch is to use the newly enhanced ->dirty_inode()
super block operation to deal with atime updates, rather than
piggy backing that code into ->write_inode() as is currently
done.

The net result is a simplification of the code in various places
and a reduction of the number of gfs2_dinode_out() calls since
this is now implied by ->dirty_inode().

Some of the mark_inode_dirty() calls have been moved under glocks
in order to take advantage of then being able to avoid locking in
->dirty_inode() when we already have suitable locks.

One consequence is that generic_write_end() now correctly deals
with file size updates, so that we do not need a separate check
for that afterwards. This also, indirectly, means that fdatasync
should work correctly on GFS2 - the current code always syncs the
metadata whether it needs to or not.

Has survived testing with postmark (with and without atime) and
also fsx.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:26 +01:00
Steven Whitehouse
40ac218f52 GFS2: Fix inode allocation error path
If we have got far enough through the inode allocation code
path that an inode has already been allocated, then we must
call iput to dispose of it, if an error occurs during a
later part of the process. This will always be the final iput
since there will be no other references to the inode.

Unlike when the inode has been unlinked, its block state will
be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
to skip the test in ->evict_inode() for this one case in order
to ensure that it will be deallocated correctly. This patch adds
a new flag in order to ensure that this will happen correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:23 +01:00
James Morris
5a2f3a02ae Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next
Conflicts:
	fs/attr.c

Resolve conflict manually.

Signed-off-by: James Morris <jmorris@namei.org>
2011-08-09 10:31:03 +10:00
Christoph Hellwig
4e34e719e4 fs: take the ACL checks to common code
Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss.  This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:30:23 -04:00
Al Viro
6c673ab393 simplify gfs2_lookup()
d_splice_alias() will DTRT when given NULL or ERR_PTR

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:02 -04:00
Al Viro
10556cb21a ->permission() sanitizing: don't pass flags to ->permission()
not used by the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:24 -04:00
Al Viro
2830ba7f34 ->permission() sanitizing: don't pass flags to generic_permission()
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:22 -04:00
Al Viro
178ea73521 kill check_acl callback of generic_permission()
its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:16 -04:00
Mimi Zohar
9d8f13ba3f security: new security_inode_init_security API adds function callback
This patch changes the security_inode_init_security API by adding a
filesystem specific callback to write security extended attributes.
This change is in preparation for supporting the initialization of
multiple LSM xattrs and the EVM xattr.  Initially the callback function
walks an array of xattrs, writing each xattr separately, but could be
optimized to write multiple xattrs at once.

For existing security_inode_init_security() calls, which have not yet
been converted to use the new callback function, such as those in
reiserfs and ocfs2, this patch defines security_old_inode_init_security().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2011-07-18 12:29:38 -04:00
Steven Whitehouse
f2741d9898 GFS2: Move all locking inside the inode creation function
Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 12:11:17 +01:00
Steven Whitehouse
160b4026dc GFS2: Clean up symlink creation
This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 10:34:59 +01:00
Steven Whitehouse
e2d0a13bba GFS2: Clean up mkdir
This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 09:55:55 +01:00
Steven Whitehouse
2ab9cd1c63 GFS2: Rename ops_inode.c to inode.c
This is the final part of the ops_inode.c/inode.c reordering. We
are left with a single file called inode.c which now contains
all the inode operations, as expected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:12:49 +01:00
Steven Whitehouse
64ea540258 GFS2: Inode.c is empty now, remove it
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:09:53 +01:00
Steven Whitehouse
9eed04cd99 GFS2: Move final part of inode.c into super.c
Now inode.c is empty.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:38 +01:00
Steven Whitehouse
194c011fc4 GFS2: Move most of the remaining inode.c into ops_inode.c
This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:14 +01:00
Steven Whitehouse
d4b2cf1b05 GFS2: Move gfs2_refresh_inode() and friends into glops.c
Eventually there will only be a single caller of this code, so lets
move it where it can be made static at some future date.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:49 +01:00
Steven Whitehouse
94fb763b1a GFS2: Remove gfs2_dinode_print() function
This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:29 +01:00
Steven Whitehouse
3d6ecb7d16 GFS2: When adding a new dir entry, inc link count if it is a subdir
This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:43:53 +01:00
Steven Whitehouse
d192a8e5c6 GFS2: Double check link count under glock
To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:35:40 +01:00
Steven Whitehouse
4667a0ec32 GFS2: Make writeback more responsive to system conditions
This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:37 +01:00
Steven Whitehouse
f42ab08529 GFS2: Optimise glock lru and end of life inodes
The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:17 +01:00
Bob Peterson
44ad37d69b GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:23:50 +01:00
James Morris
fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
Eric Paris
2a7dba391e fs/vfs/security: pass last path component to LSM on inode creation
SELinux would like to implement a new labeling behavior of newly created
inodes.  We currently label new inodes based on the parent and the creating
process.  This new behavior would also take into account the name of the
new object when deciding the new label.  This is not the (supposed) full path,
just the last component of the path.

This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations.  We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly.  This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook.  If no such name
exists it is fine to pass NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2011-02-01 11:12:29 -05:00
Steven Whitehouse
24d9765fc1 GFS2: Fix error path in gfs2_lookup_by_inum()
In the (impossible, except if there is fs corruption) error path
in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
fails, it was leaving the function by calling iput() rather
than iget_failed(). This would cause future lookups of the same
inode to block forever.

This patch fixes the problem by moving the call to gfs2_inode_refresh()
into gfs2_inode_lookup() where iget_failed() is part of the error path
already. Also this cleans up some unreachable code and makes
gfs2_set_iop() static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:49:08 +00:00
Linus Torvalds
b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin
b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Steven Whitehouse
9e55cd5372 GFS2: Remove unreachable calls to vmtruncate
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:22:48 +00:00
Steven Whitehouse
044b9414c7 GFS2: Fix inode deallocation race
This area of the code has always been a bit delicate due to the
subtleties of lock ordering. The problem is that for "normal"
alloc/dealloc, we always grab the inode locks first and the rgrp lock
later.

In order to ensure no races in looking up the unlinked, but still
allocated inodes, we need to hold the rgrp lock when we do the lookup,
which means that we can't take the inode glock.

The solution is to borrow the technique already used by NFS to solve
what is essentially the same problem (given an inode number, look up
the inode carefully, checking that it really is in the expected
state).

We cannot do that directly from the allocation code (lock ordering
again) so we give the job to the pre-existing delete workqueue and
carry on with the allocation as normal.

If we find there is no space, we do a journal flush (required anyway
if space from a deallocation is to be released) which should block
against the pending deallocations, so we should always get the space
back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-15 12:44:42 +00:00
Steven Whitehouse
a2e0f79939 GFS2: Remove i_disksize
With the update of the truncate code, ip->i_disksize and
inode->i_size are merely copies of each other. This means
we can remove ip->i_disksize and use inode->i_size exclusively
reducing the size of a GFS2 inode by 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-20 11:18:29 +01:00
Al Viro
a4ffdde6e5 simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:44 -04:00
Christoph Hellwig
1025774ce4 remove inode_setattr
Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:37 -04:00