linux-stable

Commit Graph

Author	SHA1	Message	Date
Steven Whitehouse	5a00f3cc97	GFS2: Only do one directory search on create Creation of a new inode requires a directory search in order to ensure that we are not trying to create an inode with the same name as an existing one. This was hidden away inside the create_ok() function. In the case that there was an existing inode, and a lookup can be substituted for a create (which is the case with regular files when the O_EXCL flag is not in use) then we were doing a second lookup in order to return the inode. This patch merges these two lookups into one. This can be done by passing a flag to gfs2_dir_search() to tell it to just return -EEXIST in the cases where we don't actually want to look up the inode. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-11 13:45:29 +01:00
Thomas Meyer	2b6f8860e1	GFS2: Cocci spatch "ptr_ret.spatch" Use PTR_RET in place of open coding this function. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-05 09:51:08 +01:00
Bob Peterson	a6a4d98b01	GFS2: Don't cache iopen glocks This patch makes GFS2 immediately reclaim/delete all iopen glocks as soon as they're dequeued. This allows deleters to get an EXclusive lock on iopen so files are deleted properly instead of being set as unlinked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-03 16:40:22 +01:00
Steven Whitehouse	79ba74808d	GFS2: Use gfs2_dinode_out() in the inode create path Over the previous two patches relating to inode creation, the content of init_dinode() has been looking more and more like gfs2_dinode_out(). This is not an accident! This patch replaces the parts of init_dinode() which are duplicated in gfs2_dinode_out() with a call to that function. Mostly that is straightforward, but there is one issue which needed to be resolved relating to the link count. The link count has to be set to zero in a certain error handling code path, which lands up calling iput(). This is now done specifically in that code path allowing the link count to be set earlier and written into the on disk inode by gfs2_dinode_put() in the normal way. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:40:37 +01:00
Steven Whitehouse	28fb302755	GFS2: Remove gfs2_refresh_inode from inode creation path The original method for creating inodes used in GFS2 was to fill out a buffer, with all the information, and then to read that buffer into the in-core inode, using gfs2_refresh_inode() The problem with this approach is that all the inode's fields need to be calculated ahead of time, and were stored in various variables making the code rather complicated. The new approach is simply to allocate the in-core inode earlier and fill in as many fields as possible ahead of time. These can then be used to initilise the on disk representation. The code has been working towards the point where it is possible to remove gfs2_refresh_inode() because all the fields are correctly initialised ahead of time. We've now reached that milestone, and have reversed the order of setting up the in core and on disk inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:40:17 +01:00
Steven Whitehouse	fd4b4e042c	GFS2: Clean up inode creation path This patch cleans up the inode creation code path in GFS2. After the Orlov allocator was merged, a number of potential improvements are now possible, and this is a first set of these. The quota handling is now updated so that it matches the point in the code where the allocation takes place. This means that the one exception in gfs2_alloc_blocks relating to quota is now no longer required, and we can use the generic code everywhere. In addition the call to figure out whether we need to allocate any extra blocks in order to add a directory entry is moved higher up gfs2_create_inode. This means that if it returns an error, we can deal with that at a stage where it is easier to handle that case. The returned status cannot change during the function since we hold an exclusive lock on the directory. Two calls to gfs2_rindex_update have been changed to one, again at the top of gfs2_create_inode to simplify error handling. The time stamps are also now initialised earlier in the creation process, this is gradually moving towards being able to remove the call to gfs2_refresh_inode in gfs2_inode_create once we have all the fields covered. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:39:56 +01:00
Linus Torvalds	94f2f14234	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...	2013-02-25 16:00:49 -08:00
Eric W. Biederman	d054642642	gfs2: Convert uids and gids between dinodes and vfs inodes. When reading dinodes from the disk convert uids and gids into kuids and kgids to store in vfs data structures. When writing to dinodes to the disk convert kuids and kgids in the in memory structures into plain uids and gids. For now all on disk data structures are assumed to be stored in the initial user namespace. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:11 -08:00
Eric W. Biederman	6b24c0d279	gfs2: Use uid_eq and gid_eq where appropriate Where kuid_t values are compared use uid_eq and where kgid_t values are compared use gid_eq. This is unfortunately necessary because of the type safety that keeps someone from accidentally mixing kuids and kgids with other types. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:10 -08:00
Eric W. Biederman	7c06b5d672	gfs2: Use kuid_t and kgid_t types where appropriate. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:09 -08:00
Eric W. Biederman	f4108a607f	gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE so the constants may be well typed. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:01 -08:00
Steven Whitehouse	350a9b0a72	GFS2: Split gfs2_trans_add_bh() into two There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-29 10:28:04 +00:00
Bob Peterson	1e2d9d44f3	GFS2: Set gl_object during inode create This patch fixes a cluster coherency problem that occurs when one node creates a file, does several writes, then a different node tries to write to the same file. When the inode's glock is demoted, the inode wasn't synced to the media properly because the gl_object wasn't set. Later, the flush daemon noticed the uncommitted data and tried to flush it, only to discover the glock was no longer locked properly in exclusive mode. That caused an assert withdraw. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-21 14:49:21 +00:00
Bob Peterson	be4f245dbb	GFS2: add error check while allocating new inodes This patch adds a return code check after attempting to allocate a new inode during dinode creation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-16 14:26:57 +00:00
Bob Peterson	4327a9bf71	GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode Since we now have a dirty_inode that takes care of manipulating the inode buffer and writing from the inode to the buffer, we can eliminate some unnecessary buffer manipulations in gfs2_unlink_inode that are now redundant. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 09:55:26 +00:00
Steven Whitehouse	9dbe9610b9	GFS2: Add Orlov allocator Just like ext3, this works on the root directory and any directory with the +T flag set. Also, just like ext3, any subdirectory created in one of the just mentioned cases will be allocated to a random resource group (GFS2 equivalent of a block group). If you are creating a set of directories, each of which will contain a job running on a different node, then by setting +T on the parent directory before creating the subdirectories, each will land up in a different resource group, and thus resource group contention between nodes will be kept to a minimum. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:33:17 +00:00
Steven Whitehouse	c9aecf7371	GFS2: Use proper allocation context for new inodes Rather than using the parent directory's allocation context, this patch allocated the new inode earlier in the process and then uses it to contain all the information required. As a result, we can now use the new inode's own allocation context to allocate it rather than having to use the parent directory's context. This give us a lot more flexibility in where the inode is placed on disk. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:32:42 +00:00
Steven Whitehouse	ff7f4cb461	GFS2: Consolidate free block searching functions With the recently added block reservation code, an additional function was added to search for free blocks. This had a restriction of only being able to search for aligned extents of free blocks. As a result the allocation patterns when reserving blocks were suboptimal when the existing allocation of blocks for an inode was not aligned to the same boundary. This patch resolves that problem by adding the ability for gfs2_rbm_find to search for extents of a particular minimum size. We can then use gfs2_rbm_find for both looking for reservations, and also looking for free blocks on an individual basis when we actually come to do the allocation later on. As a result we only need a single set of code to deal with both situations. The function gfs2_rbm_from_block() is moved up rgrp.c so that it occurs before all of its callers. Many thanks are due to Bob for helping track down the final issue in this patch. That fix to the rb_tree traversal and to not share block reservations from a dirctory to its children is included here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2012-09-24 10:47:26 +01:00
Steven Whitehouse	71f890f7f7	GFS2: Remove rs_requested field from reservations The rs_requested field is left over from the original allocation code, however this should have been a parameter passed to the various functions from gfs2_inplace_reserve() and not a member of the reservation structure as the value is not required after the initial allocation. This also helps simplify the code since we no longer need to set the rs_requested to zero. Also the gfs2_inplace_release() function can also be simplified since the reservation structure will always be defined when it is called, and the only remaining task is to unlock the rgrp if required. It can also now be called unconditionally too, resulting in a further simplification. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:46:54 +01:00
Steven Whitehouse	645b2ccc75	GFS2: Fix missing allocation data for set/remove xattr These entry points were missed in the original patch to allocate this data structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-13 10:30:34 +01:00
Linus Torvalds	801b03653f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse. * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Eliminate 64-bit divides GFS2: Reduce file fragmentation GFS2: kernel panic with small gfs2 filesystems - 1 RG GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve GFS2: Add kobject release method GFS2: Size seq_file buffer more carefully GFS2: Use seq_vprintf for glocks debugfs file seq_file: Add seq_vprintf function and export it GFS2: Use lvbs for storing rgrp information with mount option GFS2: Cache last hash bucket for glock seq_files GFS2: Increase buffer size for glocks and glstats debugfs files GFS2: Fix error handling when reading an invalid block from the journal GFS2: Add "top dir" flag support GFS2: Fold quota data into the reservations struct GFS2: Extend the life of the reservations	2012-07-24 17:57:05 -07:00
Bob Peterson	8e2e004735	GFS2: Reduce file fragmentation This patch reduces GFS2 file fragmentation by pre-reserving blocks. The resulting improved on disk layout greatly speeds up operations in cases which would have resulted in interlaced allocation of blocks previously. A typical example of this is 10 parallel dd processes, each writing to a file in a common dirctory. The implementation uses an rbtree of reservations attached to each resource group (and each inode). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-19 14:51:08 +01:00
Al Viro	ebfc3b49a7	don't pass nameidata to ->create() boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:47 +04:00
Al Viro	00cd8dd3bf	stop passing nameidata to ->lookup() Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:32 +04:00
Bob Peterson	5407e24229	GFS2: Fold quota data into the reservations struct This patch moves the ancillary quota data structures into the block reservations structure. This saves GFS2 some time and effort in allocating and deallocating the qadata structure. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-06 11:20:22 +01:00
Bob Peterson	0a305e4960	GFS2: Extend the life of the reservations This patch lengthens the lifespan of the reservations structure for inodes. Before, they were allocated and deallocated for every write operation. With this patch, they are allocated when the first write occurs, and deallocated when the last process closes the file. It's more efficient to do it this way because it saves GFS2 a lot of unnecessary allocates and frees. It also gives us more flexibility for the future: (1) we can now fold the qadata structure back into the structure and save those alloc/frees, (2) we can use this for multi-block reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-06 11:17:59 +01:00
Bob Peterson	5e2f7d617b	GFS2: Make sure rindex is uptodate before starting transactions This patch removes the call from gfs2_blk2rgrd to function gfs2_rindex_update and replaces it with individual calls. The former way turned out to be too problematic. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-05 10:20:10 +01:00
Steven Whitehouse	66fc061bda	GFS2: FITRIM ioctl support The FITRIM ioctl provides an alternative way to send discard requests to the underlying device. Using the discard mount option results in every freed block generating a discard request to the block device. This can be slow, since many block devices can only process discard requests of larger sizes, and also such operations can be time consuming. Rather than using the discard mount option, FITRIM allows a sweep of the filesystem on an occasional basis, and also to optionally avoid sending down discard requests for smaller regions. In GFS2 FITRIM will work at resource group granularity. There is a flag for each resource group which keeps track of which resource groups have been trimmed. This flag is reset whenever a deallocation occurs in the resource group, and set whenever a successful FITRIM of that resource group has taken place. This helps to reduce repeated discard requests for the same block ranges, again improving performance. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:10:21 +00:00
Steven Whitehouse	a365fbf354	GFS2: Read resource groups on mount This makes mount take slightly longer, but at the same time, the first write to the filesystem will be faster too. It also means that if there is a problem in the resource index, then we can refuse to mount rather than having to try and report that when the first write occurs. In addition, to avoid recursive locking, we hvae to take account of instances when the rindex glock may already be held when we are trying to update the rbtree of resource groups. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 09:52:39 +00:00
Bob Peterson	718b97bd6b	GFS2: Read in rindex if necessary during unlink This patch fixes a problem whereby you were unable to delete files until other file system operations were done (such as statfs, touch, writes, etc.) that caused the rindex to be read in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 09:48:02 +00:00
Steven Whitehouse	66ad863b41	GFS2: Fix nlink setting on inode creation Since the nlink count will be 0, we need to use set_nlink rather than inc_nlink in order to avoid triggering the inc_nlink warning which was added recently. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-01-11 12:35:05 +00:00
Linus Torvalds	1619ed8f60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: local functions should be static GFS2: We only need one ACL getting function GFS2: Fix multi-block allocation GFS2: decouple quota allocations from block allocations GFS2: split function rgblk_search GFS2: Fix up "off by one" in the previous patch GFS2: move toward a generic multi-block allocator GFS2: O_(D)SYNC support for fallocate GFS2: remove vestigial al_alloced GFS2: combine gfs2_alloc_block and gfs2_alloc_di GFS2: Add non-try locks back to get_local_rgrp GFS2: f_ra is always valid in dir readahead function GFS2: Fix very unlikley memory leak in ACL xattr code GFS2: More automated code analysis fixes GFS2: Add readahead to sequential directory traversal GFS2: Fix up REQ flags	2012-01-08 13:07:54 -08:00
Al Viro	175a4eb7ea	fs: propagate umode_t, misc bits Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:55:10 -05:00
Al Viro	1a67aafb5f	switch ->mknod() to umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:54 -05:00
Al Viro	4acdaf27eb	switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:53 -05:00
Al Viro	18bb1db3e7	switch vfs_mkdir() and ->mkdir() to umode_t vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:53 -05:00
H Hartley Sweeten	46cc1e5fce	GFS2: local functions should be static Quiets the sparse noise: warning: symbol 'gfs2_initxattrs' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-12-06 09:46:41 +00:00
Steven Whitehouse	6a8099ed56	GFS2: Fix multi-block allocation Clean up gfs2_alloc_blocks so that it takes the full extent length rather than just the number of non-inode blocks as an argument. That will only make a difference in the inode allocation case for now. Also, this fixes the extent length handling around gfs2_alloc_extent() so that multi block allocations will work again. The rd_last_alloc block is set to the final block in the allocated extent (as per the update to i_goal, but referenced to a different start point). This also removes the dinode argument to rgblk_search() which is no longer used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-22 12:18:51 +00:00
Bob Peterson	564e12b115	GFS2: decouple quota allocations from block allocations This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-22 10:25:21 +00:00
Bob Peterson	6e87ed0fc9	GFS2: move toward a generic multi-block allocator This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-21 10:04:09 +00:00
Bob Peterson	3c5d785acf	GFS2: combine gfs2_alloc_block and gfs2_alloc_di GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-15 15:25:03 +00:00
Steven Whitehouse	87654896ca	GFS2: More automated code analysis fixes A potentially uninitialised variable, some unreachable code, and the main part of this, fixing the error path in the unlink function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-08 14:04:20 +00:00
Linus Torvalds	f793f29611	Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw * http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits) GFS2: Move readahead of metadata during deallocation into its own function GFS2: Remove two unused variables GFS2: Misc fixes GFS2: rewrite fallocate code to write blocks directly GFS2: speed up delete/unlink performance for large files GFS2: Fix off-by-one in gfs2_blk2rgrpd GFS2: Clean up ->page_mkwrite GFS2: Correctly set goal block after allocation GFS2: Fix AIL flush issue during fsync GFS2: Use cached rgrp in gfs2_rlist_add() GFS2: Call do_strip() directly from recursive_scan() GFS2: Remove obsolete assert GFS2: Cache the most recently used resource group in the inode GFS2: Make resource groups "append only" during life of fs GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added GFS2: Clean up gfs2_create GFS2: Use ->dirty_inode() GFS2: Fix bug trap and journaled data fsync GFS2: Fix inode allocation error path ...	2011-10-28 10:44:50 -07:00
Steven Whitehouse	54335b1fca	GFS2: Cache the most recently used resource group in the inode This means that after the initial allocation for any inode, the last used resource group is cached in the inode for future use. This drastically reduces the number of lookups of resource groups in the common case, and this the contention on that data structure. The allocation algorithm is the same as previously, except that we always check to see if the goal block is within the cached rgrp first before going to the rbtree to look one up. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:34 +01:00
Steven Whitehouse	8339ee543e	GFS2: Make resource groups "append only" during life of fs Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:33 +01:00
Steven Whitehouse	9a63edd12b	GFS2: Clean up gfs2_create If we pass through knowledge of whether the creation is intended to be exclusive or not, then we can deal with that in gfs2_create_inode and remove one set of locking. Also this removes the loop in gfs2_create and simplifies the code a bit. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:28 +01:00
Steven Whitehouse	ab9bbda020	GFS2: Use ->dirty_inode() The aim of this patch is to use the newly enhanced ->dirty_inode() super block operation to deal with atime updates, rather than piggy backing that code into ->write_inode() as is currently done. The net result is a simplification of the code in various places and a reduction of the number of gfs2_dinode_out() calls since this is now implied by ->dirty_inode(). Some of the mark_inode_dirty() calls have been moved under glocks in order to take advantage of then being able to avoid locking in ->dirty_inode() when we already have suitable locks. One consequence is that generic_write_end() now correctly deals with file size updates, so that we do not need a separate check for that afterwards. This also, indirectly, means that fdatasync should work correctly on GFS2 - the current code always syncs the metadata whether it needs to or not. Has survived testing with postmark (with and without atime) and also fsx. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:26 +01:00
Steven Whitehouse	40ac218f52	GFS2: Fix inode allocation error path If we have got far enough through the inode allocation code path that an inode has already been allocated, then we must call iput to dispose of it, if an error occurs during a later part of the process. This will always be the final iput since there will be no other references to the inode. Unlike when the inode has been unlinked, its block state will be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need to skip the test in ->evict_inode() for this one case in order to ensure that it will be deallocated correctly. This patch adds a new flag in order to ensure that this will happen correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:23 +01:00
James Morris	5a2f3a02ae	Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next Conflicts: fs/attr.c Resolve conflict manually. Signed-off-by: James Morris <jmorris@namei.org>	2011-08-09 10:31:03 +10:00
Christoph Hellwig	4e34e719e4	fs: take the ACL checks to common code Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:30:23 -04:00
Al Viro	6c673ab393	simplify gfs2_lookup() d_splice_alias() will DTRT when given NULL or ERR_PTR Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Al Viro	10556cb21a	->permission() sanitizing: don't pass flags to ->permission() not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:24 -04:00
Al Viro	2830ba7f34	->permission() sanitizing: don't pass flags to generic_permission() redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:22 -04:00
Al Viro	178ea73521	kill check_acl callback of generic_permission() its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:16 -04:00
Mimi Zohar	9d8f13ba3f	security: new security_inode_init_security API adds function callback This patch changes the security_inode_init_security API by adding a filesystem specific callback to write security extended attributes. This change is in preparation for supporting the initialization of multiple LSM xattrs and the EVM xattr. Initially the callback function walks an array of xattrs, writing each xattr separately, but could be optimized to write multiple xattrs at once. For existing security_inode_init_security() calls, which have not yet been converted to use the new callback function, such as those in reiserfs and ocfs2, this patch defines security_old_inode_init_security(). Signed-off-by: Mimi Zohar <zohar@us.ibm.com>	2011-07-18 12:29:38 -04:00
Steven Whitehouse	f2741d9898	GFS2: Move all locking inside the inode creation function Now that there are no longer any exceptions to the normal inode creation code path, we can move the parts of the locking code which were duplicated in mkdir/mknod/create/symlink into the inode create function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 12:11:17 +01:00
Steven Whitehouse	160b4026dc	GFS2: Clean up symlink creation This moves the symlink specific parts of inode creation into the function where we initialise the rest of the dinode. As a result we have one less place where we need to look up the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 10:34:59 +01:00
Steven Whitehouse	e2d0a13bba	GFS2: Clean up mkdir This moves the initialisation of the directory into the inode creation functions to avoid having to duplicate the lookup of the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 09:55:55 +01:00
Steven Whitehouse	2ab9cd1c63	GFS2: Rename ops_inode.c to inode.c This is the final part of the ops_inode.c/inode.c reordering. We are left with a single file called inode.c which now contains all the inode operations, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-10 13:12:49 +01:00
Steven Whitehouse	64ea540258	GFS2: Inode.c is empty now, remove it Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-10 13:09:53 +01:00
Steven Whitehouse	9eed04cd99	GFS2: Move final part of inode.c into super.c Now inode.c is empty. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-09 16:45:38 +01:00
Steven Whitehouse	194c011fc4	GFS2: Move most of the remaining inode.c into ops_inode.c This is in preparation to remove inode.c and rename ops_inode.c to inode.c. Also most of the functions which were left in inode.c relate to the creation and lookup of inodes. I'm intending to work on consolidating some of that code, and its easier when its all in one place. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-09 16:45:14 +01:00
Steven Whitehouse	d4b2cf1b05	GFS2: Move gfs2_refresh_inode() and friends into glops.c Eventually there will only be a single caller of this code, so lets move it where it can be made static at some future date. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-09 16:44:49 +01:00
Steven Whitehouse	94fb763b1a	GFS2: Remove gfs2_dinode_print() function This function was intended for debugging purposes, but it is not very useful. If we want to know what is on disk then all we need is a block number and gfs2_edit can give us much better information about what is there. Otherwise, if we are interested in what is stored in the in-core inode, it doesn't help us out there either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-09 16:44:29 +01:00
Steven Whitehouse	3d6ecb7d16	GFS2: When adding a new dir entry, inc link count if it is a subdir This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-09 16:43:53 +01:00
Steven Whitehouse	d192a8e5c6	GFS2: Double check link count under glock To avoid any possible races relating to the link count, we need to recheck it under the inode's glock in all cases where it matters. Also to ensure we never get any nasty surprises, this patch also ensures that once the link count has hit zero it can never be elevated by rereading in data from disk. The only place we cannot provide a proper solution is in rename in the case where we are removing a target inode and we discover that the target inode has been already unlinked on another node. The race window is very small, and we return EAGAIN in this case to indicate what has happened. The proper solution would be to move the lookup parts of rename from the vfs into library calls which the fs could call directly, but that is potentially a very big job and this fix should cover most cases for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-05 12:35:40 +01:00
Steven Whitehouse	4667a0ec32	GFS2: Make writeback more responsive to system conditions This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:37 +01:00
Steven Whitehouse	f42ab08529	GFS2: Optimise glock lru and end of life inodes The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:17 +01:00
Bob Peterson	44ad37d69b	GFS2: filesystem hang caused by incorrect lock order This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-18 15:23:50 +01:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
Eric Paris	2a7dba391e	fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:12:29 -05:00
Steven Whitehouse	24d9765fc1	GFS2: Fix error path in gfs2_lookup_by_inum() In the (impossible, except if there is fs corruption) error path in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh() fails, it was leaving the function by calling iput() rather than iget_failed(). This would cause future lookups of the same inode to block forever. This patch fixes the problem by moving the call to gfs2_inode_refresh() into gfs2_inode_lookup() where iget_failed() is part of the error path already. Also this cleans up some unreachable code and makes gfs2_set_iop() static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:49:08 +00:00
Linus Torvalds	b4a45f5fe8	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...	2011-01-07 08:56:33 -08:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Steven Whitehouse	9e55cd5372	GFS2: Remove unreachable calls to vmtruncate Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:22:48 +00:00
Steven Whitehouse	044b9414c7	GFS2: Fix inode deallocation race This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-15 12:44:42 +00:00
Steven Whitehouse	a2e0f79939	GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:29 +01:00
Al Viro	a4ffdde6e5	simplify checks for I_CLEAR/I_FREEING add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING \| I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:44 -04:00
Christoph Hellwig	1025774ce4	remove inode_setattr Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:37 -04:00
Bob Peterson	b1becbdee7	GFS2: Fix kernel NULL pointer dereference by dlm_astd This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:06:25 +01:00
Bob Peterson	ed4878e8a4	GFS2: Rework reclaiming unlinked dinodes The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-21 16:11:36 +01:00
Bob Peterson	1a0eae8848	GFS2: glock livelock This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-04-14 16:48:05 +01:00
Steven Whitehouse	009d851837	GFS2: Metadata address space clean up Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-03-01 14:07:37 +00:00
Christoph Hellwig	eaff8079d4	kill I_LOCK After I_SYNC was split from I_LOCK the leftover is always used together with I_NEW and thus superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-17 11:03:25 -05:00
Christoph Hellwig	431547b3c4	sanitize xattr handler prototypes Add a flags argument to struct xattr_handler and pass it to all xattr handler methods. This allows using the same methods for multiple handlers, e.g. for the ACL methods which perform exactly the same action for the access and default ACLs, just using a different underlying attribute. With a little more groundwork it'll also allow sharing the methods for the regular user/trusted/secure handlers in extN, ocfs2 and jffs2 like it's already done for xfs in this patch. Also change the inode argument to the handlers to a dentry to allow using the handlers mechnism for filesystems that require it later, e.g. cifs. [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:49 -05:00
Steven Whitehouse	0ab7d13fcb	GFS2: Tag all metadata with jid There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 11:58:47 +00:00
Steven Whitehouse	479c427dd6	GFS2: Clean up ACLs To prepare for support for caching of ACLs, this cleans up the GFS2 ACL support by pushing the xattr code back into xattr.c and changing the acl_get function into one which only returns ACLs so that we can drop the caching function into it shortly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 11:47:35 +00:00
Steven Whitehouse	8d8291ae93	GFS2: Remove no_formal_ino generating code The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-27 15:51:07 +01:00
Steven Whitehouse	307cf6e63c	GFS2: Rename eattr.[ch] as xattr.[ch] Use the more conventional name for the extended attribute support code. Update all the places which care. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-26 18:51:04 +01:00
Steven Whitehouse	40b78a3223	GFS2: Clean up of extended attribute support This has been on my list for some time. We need to change the way in which we handle extended attributes to allow faster file creation times (by reducing the number of transactions required) and the extended attribute code is the main obstacle to this. In addition to that, the VFS provides a way to demultiplex the xattr calls which we ought to be using, rather than rolling our own. This patch changes the GFS2 code to use that VFS feature and as a result the code shrinks by a couple of hundred lines or so, and becomes easier to read. I'm planning on doing further clean up work in this area, but this patch is a good start. The cleaned up code also uses the more usual "xattr" shorthand, I plan to eliminate the use of "eattr" eventually and in the mean time it serves as a flag as to which bits of the code have been updated. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-26 18:41:32 +01:00
Steven Whitehouse	6050b9c74f	GFS2: Improve error handling in inode allocation A little while back, block allocation was given some improved error handling which meant that -EIO was returned in the case of there being a problem in the resource group data. In addition a message is printed explaning what went wrong and how to fix it. This extends that error handling so that it also covers inode allocation too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:05:31 +01:00
Steven Whitehouse	87ec217411	GFS2: Move gfs2_unlink_ok into ops_inode.c Another function which is only called from one ops_inode.c so we move it and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-22 10:54:50 +01:00
Steven Whitehouse	536baf02f6	GFS2: Move gfs2_readlinki into ops_inode.c Move gfs2_readlinki into ops_inode.c and make it static Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-22 10:48:59 +01:00
Steven Whitehouse	2286dbfad1	GFS2: Move gfs2_rmdiri into ops_inode.c Move gfs2_rmdiri() into ops_inode.c and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-22 10:45:09 +01:00
Steven Whitehouse	b1e71b0622	GFS2: Clean up some file names This patch renames the ops_*.c files which have no counterpart without the ops_ prefix in order to shorten the name and make it more readable. In addition, ops_address.h (which was very small) is moved into inode.h and inode.h is cleaned up by adding extern where required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-22 10:01:55 +01:00
Christoph Hellwig	10d2198805	GFS2: cleanup file_operations mess Remove the weird pointer to file_operations mess and replace it with straight-forward defining of the lockinginstance names to the _nolock variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-04-15 10:17:18 +01:00
Steven Whitehouse	f057f6cdf6	GFS2: Merge lock_dlm module into GFS2 This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:14 +00:00
Steven Whitehouse	97cc1025b1	GFS2: Kill two daemons with one patch This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:09 +00:00
Steven Whitehouse	383f01fbf4	GFS2: Banish struct gfs2_dinode_host The final field in gfs2_dinode_host was the i_flags field. Thats renamed to i_diskflags in order to avoid confusion with the existing inode flags, and moved into the inode proper at a suitable location to avoid creating a "hole". At that point struct gfs2_dinode_host is no longer needed and as promised (quite some time ago!) it can now be removed completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:38:59 +00:00
Steven Whitehouse	c9e9888677	GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize This patch moved the i_size field from the gfs2_dinode_host and following the ext3 convention renames it i_disksize. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:38:58 +00:00

1 2 3 4 5 ...

261 Commits