linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-05 08:26:59 +00:00

Author	SHA1	Message	Date
Bob Peterson	8381e60227	GFS2: Remove allocation parms from gfs2_rbm_find Struct gfs2_alloc_parms ap is never referenced in function gfs2_rbm_find, so this patch removes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2016-05-02 09:49:22 -05:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Bob Peterson	5ea31bc0a6	GFS2: Always use iopen glock for gl_deletes Before this patch, when function try_rgrp_unlink queued a glock for delete_work to reclaim the space, it used the inode glock to do so. That's different from the iopen callback which uses the iopen glock for the same purpose. We should be consistent and always use the iopen glock. This may also save us reference counting problems with the inode glock, since clear_glock does an extra glock_put() for the inode glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-12-18 11:02:52 -06:00
Bob Peterson	a097dc7e24	GFS2: Make rgrp reservations part of the gfs2_inode structure Before this patch, multi-block reservation structures were allocated from a special slab. This patch folds the structure into the gfs2_inode structure. The disadvantage is that the gfs2_inode needs more memory, even when a file is opened read-only. The advantages are: (a) we don't need the special slab and the extra time it takes to allocate and deallocate from it. (b) we no longer need to worry that the structure exists for things like quota management. (c) This also allows us to remove the calls to get_write_access and put_write_access since we know the structure will exist. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-12-14 12:16:38 -06:00
Bob Peterson	b54e9a0b92	GFS2: Extract quota data from reservations structure (revert `5407e24`) This patch basically reverts the majority of patch `5407e24`. That patch eliminated the gfs2_qadata structure in favor of just using the reservations structure. The problem with doing that is that it increases the size of the reservations structure. That is not an issue until it comes time to fold the reservations structure into the inode in memory so we know it's always there. By separating out the quota structure again, we aren't punishing the non-quota users by making all the inodes bigger, requiring more slab space. This patch creates a new slab area to allocate the quota stuff so it's managed a little more sanely. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-11-24 08:38:44 -06:00
Andreas Gruenbacher	c8d5770384	gfs2: Extended attribute readahead When gfs2 allocates an inode and its extended attribute block next to each other at inode create time, the inode's directory entry indicates that in de_rahead. In that case, we can readahead the extended attribute block when we read in the inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-11-16 12:00:29 -06:00
Bob Peterson	31dddd9eb9	GFS2: Fix rgrp end rounding problem for bsize < page size This patch fixes a bug introduced by commit `7005c3e`. That patch tries to map a vm range for resource groups, but the calculation breaks down when the block size is less than the page size. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-11-09 09:38:02 -06:00
Andreas Gruenbacher	f3dd164912	gfs2: Remove gl_spin define Commit `e66cf161` replaced the gl_spin spinlock in struct gfs2_glock with a gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in gl_lockref). Remove that define to make the references to gl_lockref.lock more obvious. Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-10-29 12:57:48 -05:00
Ben Hutchings	4d207133e9	gfs2: Make statistics unsigned, suitable for use with do_div() None of these statistics can meaningfully be negative, and the numerator for do_div() must have the type u64. The generic implementation of do_div() used on some 32-bit architectures asserts that, resulting in a compiler error in gfs2_rgrp_congested(). Fixes: `0166b197c2` ("GFS2: Average in only non-zero round-trip times ...") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Andreas Gruenbacher <agruenba@redhat.com>	2015-09-03 13:33:32 -05:00
Bob Peterson	15562c439d	GFS2: Move glock superblock pointer to field gl_name What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2015-09-03 13:33:09 -05:00
Bob Peterson	39b0f1e929	GFS2: Don't brelse rgrp buffer_heads every allocation This patch allows the block allocation code to retain the buffers for the resource groups so they don't need to be re-read from buffer cache with every request. This is a performance improvement that's especially noticeable when resource groups are very large. For example, with 2GB resource groups and 4K blocks, there can be 33 blocks for every resource group. This patch allows those 33 buffers to be kept around and not read in and thrown away with every operation. The buffers are released when the resource group is either synced or invalidated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>	2015-06-19 07:40:22 -05:00
Fabian Frederick	a3e3213676	gfs2: fix shadow warning in gfs2_rbm_find() bi was already declared and initialized globally in gfs2_rbm_find() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-05-18 15:23:03 -05:00
Abhi Das	959b671717	gfs2: handle NULL rgd in set_rgrp_preferences The function set_rgrp_preferences() does not handle the (rarely returned) NULL value from gfs2_rgrpd_get_next() and this patch fixes that. The fs image in question is only 150MB in size which allows for only 1 rgrp to be created. The in-memory rb tree has only 1 node and when gfs2_rgrpd_get_next() is called on this sole rgrp, it returns NULL. (Default behavior is to wrap around the rb tree and return the first node to give the illusion of a circular linked list. In the case of only 1 rgrp, we can't have gfs2_rgrpd_get_next() return the same rgrp (first, last, next all point to the same rgrp)... that would cause unintended consequences and infinite loops.) Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-05-05 11:26:04 -05:00
Bob Peterson	0166b197c2	GFS2: Average in only non-zero round-trip times for congestion stats This patch changes function gfs2_rgrp_congested so that it only factors in non-zero values into its average round trip time. If the round-trip time is zero for a particular cpu, that cpu has obviously never dealt with bouncing the resource group in question, so factoring in a zero value will only skew the numbers. It also fixes a compile error on some arches related to division. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2015-04-24 07:57:37 -05:00
Bob Peterson	f4a3ae9308	GFS2: Use average srttb value in congestion calculations This patch changes function gfs2_rgrp_congested so that it uses an average srttb (smoothed round trip time for blocking rgrp glocks) rather than the CPU-specific value. If we use the CPU-specific value it can incorrectly report no contention when there really is contention due to the glock processing occurring on a different CPU. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2015-04-24 07:57:08 -05:00
Abhi Das	25435e5ed6	gfs2: allow quota_check and inplace_reserve to return available blocks struct gfs2_alloc_parms is passed to gfs2_quota_check() and gfs2_inplace_reserve() with ap->target containing the number of blocks being requested for allocation in the current operation. We add a new field to struct gfs2_alloc_parms called 'allowed'. gfs2_quota_check() and gfs2_inplace_reserve() return the max blocks allowed by quota and the max blocks allowed by the chosen rgrp respectively in 'allowed'. A new field 'min_target', when non-zero, tells gfs2_quota_check() and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when there are atleast 'min_target' blocks allowable/available. The assumption is that the caller is ok with just 'min_target' blocks and will likely proceed with allocating them. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2015-03-18 12:47:10 -05:00
Bob Peterson	1a8550332a	GFS2: If we use up our block reservation, request more next time If we run out of blocks for a given multi-block allocation, we obviously did not reserve enough. We should reserve more blocks for the next reservation to reduce fragmentation. This patch increases the size hint for reservations when they run out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-11-03 19:26:54 +00:00
Bob Peterson	0e27c18c30	GFS2: Set of distributed preferences for rgrps This patch tries to use the journal numbers to evenly distribute which node prefers which resource group for block allocations. This is to help performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-11-03 19:24:49 +00:00
Bob Peterson	d24e0569e0	GFS2: Use gfs2_rbm_incr in rgblk_free This patch speeds up GFS2 unlink operations by using function gfs2_rbm_incr rather than continuously calculating the rbm. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-10-03 14:40:10 +01:00
Abhi Das	00a158be83	GFS2: fix bad inode i_goal values during block allocation This patch checks if i_goal is either zero or if doesn't exist within any rgrp (i.e gfs2_blk2rgrpd() returns NULL). If so, it assigns the ip->i_no_addr block as the i_goal. There are two scenarios where a bad i_goal can result in a -EBADSLT error. 1. Attempting to allocate to an existing inode: Control reaches gfs2_inplace_reserve() and ip->i_goal is bad. We need to fix i_goal here. 2. A new inode is created in a directory whose i_goal is hosed: In this case, the parent dir's i_goal is copied onto the new inode. Since the new inode is not yet created, the ip->i_no_addr field is invalid and so, the fix in gfs2_inplace_reserve() as per 1) won't work in this scenario. We need to catch and fix it sooner in the parent dir itself (gfs2_create_inode()), before it is copied to the new inode. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-09-19 10:45:18 +01:00
Fabian Frederick	27ff6a0f7f	GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:15:14 +01:00
Benjamin Marzinski	24972557b1	GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-05-14 10:04:34 +01:00
Joe Perches	d77d1b58aa	GFS2: Use pr_<level> more consistently Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-07 09:30:51 +00:00
Fabian Frederick	fc554ed3d8	GFS2: global conversion to pr_foo() -All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-06 17:34:06 +00:00
Rashika Kheria	c2b0b30edd	GFS2: Mark functions as static in gfs2/rgrp.c Mark functions as static in gfs2/rgrp.c because they are not used outside this file. This eliminates the following warning in gfs2/rgrp.c: fs/gfs2/rgrp.c:1092:5: warning: no previous prototype for ‘gfs2_rgrp_bh_get’ [-Wmissing-prototypes] fs/gfs2/rgrp.c:1157:5: warning: no previous prototype for ‘update_rgrp_lvb’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-10 12:29:16 +00:00
Steven Whitehouse	b2c8b3ea87	GFS2: Allocate block for xattr at inode alloc time, if required This is another step towards improving the allocation of xattr blocks at inode allocation time. Here we take advantage of Christoph's recent work on ACLs to allocate a block for the xattrs early if we know that we will be adding ACLs to the inode later on. The advantage of that is that it is much more likely that we'll get a contiguous run of two blocks where the first is the inode and the second is the xattr block. We still have to fall back to the original system in case we don't get the requested two contiguous blocks, or in case the ACLs are too large to fit into the block. Future patches will move more of the ACL setting code further up the gfs2_inode_create() function. Also, I'd like to be able to do the same thing with the xattrs from LSMs in due course, too. That way we should be able to slowly reduce the number of independent transactions, at least in the most common cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-04 15:45:11 +00:00
Bob Peterson	8b127d0494	GFS2: Small cleanup This is a small cleanup to function gfs2_rgrp_go_lock so that it uses rgd instead of its more complicated twin. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-16 14:22:12 +00:00
Steven Whitehouse	ac3beb6a5d	GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>	2014-01-16 10:31:13 +00:00
Steven Whitehouse	7005c3e4ae	GFS2: Use range based functions for rgrp sync/invalidation Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 10:00:31 +00:00
Steven Whitehouse	7de41d36ff	GFS2: Remove test which is always true Since gfs2_inplace_reserve() is always called with a valid alloc parms structure, there is no need to test for this within the function itself - and in any case, after we've all ready dereferenced it anyway. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:59:30 +00:00
Bob Peterson	5ea5050cec	GFS2: Implement a "rgrp has no extents longer than X" scheme With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:58:08 +00:00
Bob Peterson	1330edbeaf	GFS2: Drop inadequate rgrps from the reservation tree This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:57:31 +00:00
Bob Peterson	5ce13431dd	GFS2: If requested is too large, use the largest extent in the rgrp Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:57:02 +00:00
Al Viro	951b4bd553	gfs2: endianness misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:16 -05:00
Steven Whitehouse	9e07f2cb3d	GFS2: Speed up starting point selection for block allocation When setting the starting point for block allocation, there were calls to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case of there being an active reservation. The gfs2_rbm_from_block() function can be quite slow, and since the two conversions were effectively a no-op, it makes sense to avoid them entirely in this case. There is no functional change here, but the code should be a bit more efficient after this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 14:42:45 +01:00
Steven Whitehouse	7b9cff4671	GFS2: Add allocation parameters structure This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 11:13:25 +01:00
Steven Whitehouse	af5c269799	GFS2: Clean up reservation removal The reservation for an inode should be cleared when it is truncated so that we can start again at a different offset for future allocations. We could try and do better than that, by resetting the search based on where the truncation started from, but this is only a first step. In addition, there are three callers of gfs2_rs_delete() but only one of those should really be testing the value of i_writecount. While we get away with that in the other cases currently, I think it would be better if we made that test specific to the one case which requires it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-27 12:49:33 +01:00
Bob Peterson	149ed7f51e	GFS2: new function gfs2_rbm_incr Since the previous patch eliminated bi in favor of bii, this follow-on patch needed to be adjusted accordingly. Here is the revised version. This patch adds a new function, gfs2_rbm_incr, which increments an rbm structure. This is more efficient than calling gfs2_rbm_to_block, incrementing, then calling gfs2_rbm_from_block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-18 10:40:38 +01:00
Bob Peterson	e579ed4f44	GFS2: Introduce rbm field bii This is a respin of the original patch. As Steve pointed out, the introduction of field bii makes it easy to eliminate bi itself. This revised patch does just that, replacing bi with bii. This patch adds a new field to the rbm structure, called bii, which is an index into the array of bitmaps for an rgrp. This replaces *bi which was a pointer to the bitmap. This is being done for further optimizations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-18 10:39:53 +01:00
Bob Peterson	b870890519	GFS2: Do not reset flags on active reservations When we used try locks for rgrps on block allocations, it was important to clear the flags field so that we used a blocking hold on the glock. Now that we're not doing try locks, clearing flags is unnecessary, and a waste of time. In fact, it's probably doing the wrong thing because it clears the GL_SKIP bit that was set for the lvb tracking purposes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:19:29 +01:00
Bob Peterson	7e230f5774	GFS2: introduce bi_blocks for optimization This patch introduces a new field in the bitmap structure called bi_blocks. Its purpose is to save us from constantly multiplying bi_len by the constant GFS2_NBBY. It also paves the way for more optimization in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:15:13 +01:00
Bob Peterson	6aa7640f30	GFS2: optimize rbm_from_block wrt bi_start In function gfs2_rbm_from_block, it starts by checking if the block falls within the first bitmap. It does so by checking if the rbm's offset is less than (rbm->bi->bi_start + rbm->bi->bi_len) * GFS2_NBBY. However, the first bitmap will always have bi_start==0. Therefore this is an unnecessary calculation in a function that gets called billions of times. This patch removes the reference to bi_start. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:14:39 +01:00
Abhijith Das	6a98c333ed	GFS2: Fix fstrim boundary conditions This patch correctly distinguishes two boundary conditions: 1. When the given range is entire within the unaccounted space between two rgrps, and 2. The range begins beyond the end of the filesystem Also fix the unit of the returned value r.len (total trimming) to be in bytes instead of the (incorrect) 512 byte blocks With this patch, GFS2 passes multiple iterations of all the relevant xfstests (251, 260, 288) with different fs block sizes. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-19 21:41:26 +01:00
Bob Peterson	2b3dcf3581	GFS2: Increase i_writecount during gfs2_setattr_size This patch calls get_write_access in a few functions. This merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-03 16:38:58 +01:00
Bob Peterson	af21ca8ed5	GFS2: Use single-block reservations for directories This patch changes the multi-block allocation code, such that directory inodes only get a single block reserved in the bitmap. That way, the bitmaps are more tightly packed together, and there are fewer spans of free blocks for in-use block reservations. This means it takes less time to find a free span of blocks in the bitmap, which speeds things up. This increases the performance of some workloads by almost 2X. In Nate's mockup.py script (which does (1) create dir, (2) create dir in dir, (3) create file in that dir) the test executes in 23 steps rather than 43 steps, a 47% performance improvement. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-05-24 13:47:32 +01:00
Bob Peterson	20095218fb	GFS2: Remove vestigial parameter ip from function rs_deltree The functions that delete block reservations from the rgrp block reservations rbtree no longer use the ip parameter. This patch eliminates the parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:41:04 +01:00
Steven Whitehouse	fd4b4e042c	GFS2: Clean up inode creation path This patch cleans up the inode creation code path in GFS2. After the Orlov allocator was merged, a number of potential improvements are now possible, and this is a first set of these. The quota handling is now updated so that it matches the point in the code where the allocation takes place. This means that the one exception in gfs2_alloc_blocks relating to quota is now no longer required, and we can use the generic code everywhere. In addition the call to figure out whether we need to allocate any extra blocks in order to add a directory entry is moved higher up gfs2_create_inode. This means that if it returns an error, we can deal with that at a stage where it is easier to handle that case. The returned status cannot change during the function since we hold an exclusive lock on the directory. Two calls to gfs2_rindex_update have been changed to one, again at the top of gfs2_create_inode to simplify error handling. The time stamps are also now initialised earlier in the creation process, this is gradually moving towards being able to remove the call to gfs2_refresh_inode in gfs2_inode_create once we have all the fields covered. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:39:56 +01:00
Bob Peterson	b2c87cae0e	GFS2: Issue discards in 512b sectors This patch changes GFS2's discard issuing code so that it calls function sb_issue_discard rather than blkdev_issue_discard. The code was calling blkdev_issue_discard and specifying the correct sector offset and sector size, but blkdev_issue_discard expects these values to be in terms of 512 byte sectors, even if the native sector size for the device is different. Calling sb_issue_discard with the BLOCK size instead ensures the correct block-to-512b-sector translation. I verified that "minlen" is specified in blocks, so comparing it to a number of blocks is correct. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-05 17:55:13 +01:00
Wei Yongjun	441362d06b	GFS2: return error if malloc failed in gfs2_rs_alloc() The error code in gfs2_rs_alloc() is set to ENOMEM when error but never be used, instead, gfs2_rs_alloc() always return 0. Fix to return 'error'. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-04 09:53:10 +01:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Steven Whitehouse	350a9b0a72	GFS2: Split gfs2_trans_add_bh() into two There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-29 10:28:04 +00:00
Bob Peterson	13d2eb0129	GFS2: Reset rd_last_alloc when it reaches the end of the rgrp In function rg_mblk_search, it's searching for multiple blocks in a given state (e.g. "free"). If there's an active block reservation its goal is the next free block of that. If the resource group contains the dinode's goal block, that's used for the search. But if neither is the case, it uses the rgrp's last allocated block. That way, consecutive allocations appear after one another on media. The problem comes in when you hit the end of the rgrp; it would never start over and search from the beginning. This became a problem, since if you deleted all the files and data from the rgrp, it would never start over and find free blocks. So it had to keep searching further out on the media to allocate blocks. This patch resets the rd_last_alloc after it does an unsuccessful search at the end of the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:05:27 +00:00
Bob Peterson	15bd50ad82	GFS2: Stop looking for free blocks at end of rgrp This patch adds a return code check after calling function gfs2_rbm_from_block while determining the free extent size. That way, when the end of an rgrp is reached, it won't try to process unaligned blocks after the end. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:05:10 +00:00
Abhijith Das	f1213cacc7	GFS2: Fix race in gfs2_rs_alloc QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible to come out of the function with a valid ip->i_res allocation but it gets freed before use resulting in a NULL ptr dereference. This patch envelopes the initial short-circuit check for non-NULL ip->i_res into the mutex lock. With this patch, I was able to successfully run the reproducer test multiple times. Resolves: rhbz#878476 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:04:53 +00:00
David Teigland	4e2f8849de	GFS2: remove redundant lvb pointer The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-15 10:17:22 +00:00
Steven Whitehouse	aa8920c968	GFS2: Fix one RG corner case For filesystems with only a single resource group, we need to be careful that the allocation loop will not land up with a NULL resource group. This fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function was being used instead of gfs2_rgrpd_get_first() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 14:50:35 +00:00
Steven Whitehouse	9dbe9610b9	GFS2: Add Orlov allocator Just like ext3, this works on the root directory and any directory with the +T flag set. Also, just like ext3, any subdirectory created in one of the just mentioned cases will be allocated to a random resource group (GFS2 equivalent of a block group). If you are creating a set of directories, each of which will contain a job running on a different node, then by setting +T on the parent directory before creating the subdirectories, each will land up in a different resource group, and thus resource group contention between nodes will be kept to a minimum. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:33:17 +00:00
Steven Whitehouse	bcd97c0630	GFS2: Add test for resource group congestion status This patch uses information gathered by the recent glock statistics patch in order to derrive a boolean verdict on the congestion status of a resource group. This is then used when making decisions on which resource group to choose during block allocation. The aim is to avoid resource groups which are heavily contended by other nodes, while still ensuring locality of access wherever possible. Once a reservation has been made in a particular resource group we continue to use that resource group until a new reservation is required. This should help to ensure that we do not change resource groups too often. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:32:21 +00:00
Bob Peterson	a68a0a352a	GFS2: Speed up gfs2_rbm_from_block This patch is a rewrite of function gfs2_rbm_from_block. Rather than looping to find the right bitmap, the code now does a few simple math calculations. I compared the performance of both algorithms side by side and the new algorithm is noticeably faster. Sample instrumentation output from a "fast" machine: 5 million calls: millisec spent: Orig: 166 New: 113 5 million calls: millisec spent: Orig: 189 New: 114 In addition, I ran postmark (on a somewhat slowr CPU) before the after the new algorithm was put in place and postmark showed a decent improvement: Before the new algorithm: ------------------------- Time: 645 seconds total 584 seconds of transactions (171 per second) Files: 150087 created (232 per second) Creation alone: 100000 files (2083 per second) Mixed with transactions: 50087 files (85 per second) 49995 read (85 per second) 49991 appended (85 per second) 150087 deleted (232 per second) Deletion alone: 100174 files (7705 per second) Mixed with transactions: 49913 files (85 per second) Data: 273.42 megabytes read (434.08 kilobytes per second) 852.13 megabytes written (1.32 megabytes per second) With the new algorithm: ----------------------- Time: 599 seconds total 530 seconds of transactions (188 per second) Files: 150087 created (250 per second) Creation alone: 100000 files (1886 per second) Mixed with transactions: 50087 files (94 per second) 49995 read (94 per second) 49991 appended (94 per second) 150087 deleted (250 per second) Deletion alone: 100174 files (6260 per second) Mixed with transactions: 49913 files (94 per second) Data: 273.42 megabytes read (467.42 kilobytes per second) 852.13 megabytes written (1.42 megabytes per second) Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:31:36 +00:00
Lukas Czerner	076f0faa76	GFS2: Fix FITRIM argument handling Currently implementation in gfs2 uses FITRIM arguments as it were in file system blocks units which is wrong. The FITRIM arguments (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are actually in bytes. Moreover, check for start argument beyond the end of file system, len argument being smaller than file system block and minlen argument being bigger than biggest resource group were missing. This commit converts the code to convert FITRIM argument to file system blocks and also adds appropriate checks mentioned above. All the problems were recognised by xfstests 251 and 260. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:41:58 +00:00
Lukas Czerner	3a238adefb	GFS2: Require user to provide argument for FITRIM When the fstrim_range argument is not provided by user in FITRIM ioctl we should just return EFAULT and not promoting bad behaviour by filling the structure in kernel. Let the user deal with it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:41:37 +00:00
Andrew Price	cd0ed19fb6	GFS2: Fix possible null pointer deref in gfs2_rs_alloc Despite the return value from kmem_cache_zalloc() being checked, the error wasn't being returned until after a possible null pointer dereference. This patch returns the error immediately, allowing the removal of the error variable. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:40:39 +00:00
Bob Peterson	3701530aed	GFS2: Fix infinite loop in rbm_find This patch fixes an infinite loop in gfs2_rbm_find that was introduced by the previous patch. The problem occurred when the length was less than 3 but the rbm block was byte-aligned, causing it to improperly return a extent length of zero, which caused it to spin. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Barry Marson <bmarson@redhat.com>	2012-09-24 10:47:27 +01:00
Steven Whitehouse	ff7f4cb461	GFS2: Consolidate free block searching functions With the recently added block reservation code, an additional function was added to search for free blocks. This had a restriction of only being able to search for aligned extents of free blocks. As a result the allocation patterns when reserving blocks were suboptimal when the existing allocation of blocks for an inode was not aligned to the same boundary. This patch resolves that problem by adding the ability for gfs2_rbm_find to search for extents of a particular minimum size. We can then use gfs2_rbm_find for both looking for reservations, and also looking for free blocks on an individual basis when we actually come to do the allocation later on. As a result we only need a single set of code to deal with both situations. The function gfs2_rbm_from_block() is moved up rgrp.c so that it occurs before all of its callers. Many thanks are due to Bob for helping track down the final issue in this patch. That fix to the rb_tree traversal and to not share block reservations from a dirctory to its children is included here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2012-09-24 10:47:26 +01:00
Bob Peterson	0688a5ecea	GFS2: Stop block extents at the end of bitmaps This patch stops multiple block allocations if a nonzero return code is received from gfs2_rbm_from_block. Without this patch, if enough pressure is put on the file system, you get a kernel warning quickly followed by: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2] With this patch, things run normally. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:23 +01:00
Steven Whitehouse	c743ffd09f	GFS2: Fix unclaimed_blocks() wrapping bug and clean up When rgd->rd_free_clone is less than rgd->rd_reserved, the unclaimed_blocks() calculation would wrap and produce incorrect results. This patch checks for this condition when this function is called from gfs2_mblk_search() In addition, the use of this particular function in other places in the code has been dropped by means of a general clean up of gfs2_inplace_reserve(). This function is now much easier to follow. Also the setting of the rgd->rd_last_alloc field is corrected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:21 +01:00
Steven Whitehouse	9e733d3923	GFS2: Improve block reservation tracing This patch improves the tracing of block reservations by removing some corner cases and also providing more useful detail in the traces. A new field is added to the reservation structure to contain the inode number. This is used since in certain contexts it is not possible to access the inode itself to obtain this information. As a result we can then display the inode number for all tracepoints and also in case we dump the resource group. The "del" tracepoint operation has been removed. This could be called with the reservation rgrp set to NULL. That resulted in not printing the device number, and thus making the information largely useless anyway. Also, the conditional on the rgrp being NULL can then be removed from the tracepoint. After this change, all the block reservation tracepoint calls will be called with the rgrp information. The existing ins,clm and tdel calls to the block reservation tracepoint are sufficient to track the entire life of the block reservation. In gfs2_block_alloc() the error detection is updated to print out the inode number of the problematic inode. This can then be compared against the information in the glock dump,tracepoints, etc. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:20 +01:00
Steven Whitehouse	137834a696	GFS2: Fall back to ignoring reservations, if there are no other blocks left When we get to the stage of allocating blocks, we know that the resource group in question must contain enough free blocks, otherwise gfs2_inplace_reserve() would have failed. So if we are left with only free blocks which are reserved, then we must use those. This can happen if another node has sneeked in and use some blocks reserved on this node, for example. Generally this will happen very rarely and only when the resouce group is nearly full. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:19 +01:00
Steven Whitehouse	3e6339dd28	GFS2: Use rbm for gfs2_setbit() Use the rbm structure for gfs2_setbit() in order to simplify the arguments to the function. We have to add a bool to control whether the clone bitmap should be updated (if it exists) but otherwise it is a more or less direct substitution. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:16 +01:00
Steven Whitehouse	c04a2ef3a8	GFS2: Use rbm for gfs2_testbit() Change the arguments to gfs2_testbit() so that it now just takes an rbm specifying the position of the two bit entry to return. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:14 +01:00
Bob Peterson	29c05b205d	GFS2: Eliminate unnecessary check for state > 3 in bitfit Function gfs2_bitfit was checking for state > 3, but that's impossible since it is only called from rgblk_search, which receives only GFS2_BLKST_ constants. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:13 +01:00
Bob Peterson	8d8b752a0f	GFS2: rbm code cleanup This patch fixes a few small rbm related things. First, it fixes a corner case where the rbm needs to switch bitmaps and wasn't adjusting its buffer pointer. Second, there's a white space issue fixed. Third, the logic in function gfs2_rbm_from_block was optimized a bit. Lastly, a check for goal block overflows was added to function gfs2_alloc_blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:04 +01:00
Steven Whitehouse	5d50d53246	GFS2: Fix case where reservation finished at end of rgrp One corner case which the original patch failed to take into account was when there is a reservation which ended such that the following block was one beyond the end of the rgrp in question. This extra test fixes that case. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com>	2012-09-24 10:47:03 +01:00
Michel Lespinasse	24d634e8f3	GFS2: Use RB_CLEAR_NODE() rather than rb_init_node() gfs2 calls RB_EMPTY_NODE() to check if nodes are not on an rbtree. The corresponding initialization function is RB_CLEAR_NODE(). rb_init_node() was never clearly defined and is going away. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:02 +01:00
Steven Whitehouse	3b1d0b9d0b	GFS2: Update rgblk_free() to use rbm Replace open coded version with a call to gfs2_rbm_from_block() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:00 +01:00
Steven Whitehouse	3983903a71	GFS2: Update gfs2_get_block_type() to use rbm Use the new gfs2_rbm_from_block() function to replace an open coded version of the same code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:46:59 +01:00
Steven Whitehouse	5b924ae2dc	GFS2: Replace rgblk_search with gfs2_rbm_find This is part of a series of patches which are introducing the gfs2_rbm structure throughout the block allocation code. The main aim of this part is to create a search function which can deal directly with struct gfs2_rbm. In this case it specifies the initial position at which to start the search and also the point at which the search terminates. The net result of this is to clean up the search code and make it rather more readable, and the various possible exceptions which may occur during the search are partitioned into their own functions. There are some bug fixes too. We should not be checking the reservations while allocating extents - the time for that is when we are searching for where to put the extent, not when we've already made that decision. Also, rgblk_search had two uses, and in only one of those cases did it make sense to check for reservations. This is fixed in the new gfs2_rbm_find function, which has a cleaner interface. The reservation checking has been improved by always checking for contiguous reservations, and returning the first free block after all contiguous reservations. This is done under the spin lock to ensure consistancy of the tree. The allocation of extents is now in all cases done by the existing allocation code, and if there is an active reservation, that is updated after the fact. Again this is done under the spin lock, since it entails changing the lookup key for the reservation in question. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:46:57 +01:00
Steven Whitehouse	4a993fb150	GFS2: Add structure to contain rgrp, bitmap, offset tuple This patch introduces a new structure, gfs2_rbm, which is a tuple of a resource group, a bitmap within the resource group and an offset within that bitmap. This is designed to make manipulating these sets of variables easier. There is also a new helper function which converts this representation back to a disk block address. In addition, the rbtree nodes which are used for the reservations were not being correctly initialised, which is now fixed. Also, the tracing was not passing through the inode where it should have been. That is mostly fixed aside from one corner case. This needs to be revisited since there can also be a NULL rgrp in some cases which results in the device being incorrect in the trace. This is intended to be the first step towards cleaning up some of the allocation code, and some further bug fixes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:46:56 +01:00
Steven Whitehouse	71f890f7f7	GFS2: Remove rs_requested field from reservations The rs_requested field is left over from the original allocation code, however this should have been a parameter passed to the various functions from gfs2_inplace_reserve() and not a member of the reservation structure as the value is not required after the initial allocation. This also helps simplify the code since we no longer need to set the rs_requested to zero. Also the gfs2_inplace_release() function can also be simplified since the reservation structure will always be defined when it is called, and the only remaining task is to unlock the rgrp if required. It can also now be called unconditionally too, resulting in a further simplification. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:46:54 +01:00
Steven Whitehouse	62e252eeef	GFS2: Take account of blockages when using reserved blocks The claim_reserved_blks() function was not taking account of the possibility of "blockages" while performing allocation. This can be caused by another node allocating something in the same extent which has been reserved locally. This patch tests for this condition and then skips the remainder of the reservation in this case. This is a relatively rare event, so that it should not affect the general performance improvement which the block reservations provide. The claim_reserved_blks() function also appears not to be able to deal with reservations which cross bitmap boundaries, but that can be dealt with in a future patch since we don't generate boundary crossing reservations currently. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Teigland <teigland@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2012-09-13 10:30:58 +01:00
Bob Peterson	8e2e004735	GFS2: Reduce file fragmentation This patch reduces GFS2 file fragmentation by pre-reserving blocks. The resulting improved on disk layout greatly speeds up operations in cases which would have resulted in interlaced allocation of blocks previously. A typical example of this is 10 parallel dd processes, each writing to a file in a common dirctory. The implementation uses an rbtree of reservations attached to each resource group (and each inode). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-19 14:51:08 +01:00
Abhijith Das	294f2ad5a5	GFS2: kernel panic with small gfs2 filesystems - 1 RG In the unlikely setup where there's only one resource group in the gfs2 filesystem, gfs2_rgrpd_get_next() returns a NULL rgd that is not dealt with properly, causing a kernel NULL ptr dereference. This patch fixes this issue. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-18 16:45:13 +01:00
Bob Peterson	666d1d8ad2	GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve This function combines rgrp functions get_local_rgrp and gfs2_inplace_reserve so that the double retry loop is gone. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-14 09:58:40 +01:00
Benjamin Marzinski	90306c41dc	GFS2: Use lvbs for storing rgrp information with mount option Instead of reading in the resource groups when gfs2 is checking for free space to allocate from, gfs2 can store the necessary infromation in the resource group's lvb. Also, instead of searching for unlinked inodes in every resource group that's checked for free space, gfs2 can store the number of unlinked but inodes in the lvb, and only check for unlinked inodes if it will find some. The first time a resource group is locked, the lvb must initialized. Since this involves counting the unlinked inodes in the resource group, this takes a little extra time. But after that, if the resource group is locked with GL_SKIP, the buffer head won't be read in unless it's actually needed. Enabling the resource groups lvbs is done via the rgrplvb mount option. If this option isn't set, the lvbs will still be set and updated, but they won't be verfied or used by the filesystem. To safely turn on this option, all of the nodes mounting the filesystem must be running code with this patch, and the filesystem must have been completely unmounted since they were updated. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-08 11:50:01 +01:00
Bob Peterson	5407e24229	GFS2: Fold quota data into the reservations struct This patch moves the ancillary quota data structures into the block reservations structure. This saves GFS2 some time and effort in allocating and deallocating the qadata structure. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-06 11:20:22 +01:00
Bob Peterson	0a305e4960	GFS2: Extend the life of the reservations This patch lengthens the lifespan of the reservations structure for inodes. Before, they were allocated and deallocated for every write operation. With this patch, they are allocated when the first write occurs, and deallocated when the last process closes the file. It's more efficient to do it this way because it saves GFS2 a lot of unnecessary allocates and frees. It also gives us more flexibility for the future: (1) we can now fold the qadata structure back into the structure and save those alloc/frees, (2) we can use this for multi-block reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-06 11:17:59 +01:00
Bob Peterson	41db1ab9be	GFS2: Add rgrp information to block_alloc trace point This is a second attempt at a patch that adds rgrp information to the block allocation trace point for GFS2. As suggested, the patch was modified to list the rgrp information _after_ the fields that exist today. Again, the reason for this patch is to allow us to trace and debug problems with the block reservations patch, which is still in the works. We can debug problems with reservations if we can see what block allocations result from the block reservations. It may also be handy in figuring out if there are problems in rgrp free space accounting. In other words, we can use it to track the rgrp and its free space along side the allocations that are taking place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-05-11 10:31:34 +01:00
Bob Peterson	06344b9186	GFS2: Eliminate needless parameter from function gfs2_setbit This patch eliminates parameter "buf1" from function gfs2_setbit. This is possible because it was always passed in as bi->bi_bh->b_data. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-27 10:46:07 +01:00
Andrew Price	4306629e1c	GFS2: Remove unused argument from gfs2_internal_read gfs2_internal_read accepts an unused ra_state argument, left over from when we did readahead on the rindex. Since there are currently no plans to add back this readahead, this patch removes the ra_state parameter and updates the functions which call gfs2_internal_read accordingly. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:37 +01:00
Bob Peterson	9598d25ed9	GFS2: Change variable blk to biblk In the resource group code, we have no less than three different kinds of block references: block relative to the file system (u64), block relative to the rgrp (u32), and block relative to the bitmap. This is a small step to making the code more readable; it renames variable blk to biblk to solidify in my mind that it's relative to the bitmap and nothing else. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:32 +01:00
Bob Peterson	886b141675	GFS2: Fix function parameter comments in rgrp.c This patch just fixes a bunch of function parameter comments. Slowly, over the years, the comments have gotten out of date (mostly my fault, as I haven't been good at keeping them up to date). This patch rectifies some of that. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:31 +01:00
Bob Peterson	29c578f567	GFS2: Eliminate offset parameter to gfs2_setbit This patch eliminates a redundant parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:30 +01:00
Bob Peterson	36f5580be1	GFS2: Use slab for block reservation memory This patch changes block reservations so it uses slab storage. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:29 +01:00
Bob Peterson	5e2f7d617b	GFS2: Make sure rindex is uptodate before starting transactions This patch removes the call from gfs2_blk2rgrd to function gfs2_rindex_update and replaces it with individual calls. The former way turned out to be too problematic. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-05 10:20:10 +01:00
Bob Peterson	c1ac539ed4	GFS2: put glock reference in error patch of read_rindex_entry This patch fixes the error path of function read_rindex_entry so that it correctly gives up its glock reference in cases where there is a race to re-read the rindex after gfs2_grow. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-03-26 09:16:56 +01:00
Bob Peterson	58884c4df0	GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd This patch adds a call to gfs2_rindex_update from function gfs2_blk2rgrpd and removes calls to it that are made redundant by it. The problem is that a gfs2_grow can add rgrps to the rindex, then put those rgrps into use, thus rendering the rindex we read in at mount time incomplete. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-03-05 15:10:34 +00:00
Bob Peterson	6aad1c3d3e	GFS2: Eliminate sd_rindex_mutex Over time, we've slowly eliminated the use of sd_rindex_mutex. Up to this point, it was only used in two places: function gfs2_ri_total (which totals the file system size by reading and parsing the rindex file) and function gfs2_rindex_update which updates the rgrps in memory. Both of these functions have the rindex glock to protect them, so the rindex is unnecessary. Since gfs2_grow writes to the rindex via the meta_fs, the mutex is in the wrong order according to the normal rules. This patch eliminates the mutex entirely to avoid the problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-03-05 15:06:56 +00:00
Bob Peterson	a08fd280b5	GFS2: Unlock rindex mutex on glock error This patch fixes an error path in function gfs2_rindex_update that leaves the rindex mutex held. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-03-01 09:25:21 +00:00
Steven Whitehouse	66fc061bda	GFS2: FITRIM ioctl support The FITRIM ioctl provides an alternative way to send discard requests to the underlying device. Using the discard mount option results in every freed block generating a discard request to the block device. This can be slow, since many block devices can only process discard requests of larger sizes, and also such operations can be time consuming. Rather than using the discard mount option, FITRIM allows a sweep of the filesystem on an occasional basis, and also to optionally avoid sending down discard requests for smaller regions. In GFS2 FITRIM will work at resource group granularity. There is a flag for each resource group which keeps track of which resource groups have been trimmed. This flag is reset whenever a deallocation occurs in the resource group, and set whenever a successful FITRIM of that resource group has taken place. This helps to reduce repeated discard requests for the same block ranges, again improving performance. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:10:21 +00:00
Steven Whitehouse	a365fbf354	GFS2: Read resource groups on mount This makes mount take slightly longer, but at the same time, the first write to the filesystem will be faster too. It also means that if there is a problem in the resource index, then we can refuse to mount rather than having to try and report that when the first write occurs. In addition, to avoid recursive locking, we hvae to take account of instances when the rindex glock may already be held when we are trying to update the rbtree of resource groups. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 09:52:39 +00:00
Bob Peterson	49528b4e47	GFS2: Fix a use-after-free that coverity spotted In function gfs2_inplace_release it was trying to unlock a gfs2_holder structure associated with a reservation, after said reservation was freed. The problem is that the statements have the wrong order. This patch corrects the order so that the reservation is freed after the gfs2_holder is unlocked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-01-11 09:23:26 +00:00
Steven Whitehouse	6a8099ed56	GFS2: Fix multi-block allocation Clean up gfs2_alloc_blocks so that it takes the full extent length rather than just the number of non-inode blocks as an argument. That will only make a difference in the inode allocation case for now. Also, this fixes the extent length handling around gfs2_alloc_extent() so that multi block allocations will work again. The rd_last_alloc block is set to the final block in the allocated extent (as per the update to i_goal, but referenced to a different start point). This also removes the dinode argument to rgblk_search() which is no longer used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-22 12:18:51 +00:00
Bob Peterson	564e12b115	GFS2: decouple quota allocations from block allocations This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-22 10:25:21 +00:00
Bob Peterson	b3e47ca0c2	GFS2: split function rgblk_search This patch splits function rgblk_search into a function that finds blocks to allocate (rgblk_search) and a function that assigns those blocks (gfs2_alloc_extent). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@rehat.com>	2011-11-21 16:48:02 +00:00
Steven Whitehouse	465f0a760d	GFS2: Fix up "off by one" in the previous patch The trace point should take extlen and not *ndata as the extent length. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-21 10:05:55 +00:00
Bob Peterson	6e87ed0fc9	GFS2: move toward a generic multi-block allocator This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-21 10:04:09 +00:00
Bob Peterson	b9f417f311	GFS2: remove vestigial al_alloced This patch removes the vestigial variable al_alloced from the gfs2_alloc structure. This is another baby step toward multi-block reservations. My next planned step is to decouple the quota variables from the gfs2_alloc structure so we can use a different method for allocations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-18 09:49:51 +00:00
Bob Peterson	3c5d785acf	GFS2: combine gfs2_alloc_block and gfs2_alloc_di GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-15 15:25:03 +00:00
Bob Peterson	c688b8b334	GFS2: Add non-try locks back to get_local_rgrp This upstream patch had what I believe is an unintended consequence: http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93 The patch changed function get_local_rgrp such that it ONLY used TRY locks for RGRP searches. Prior to that patch, the code used TRY locks during the first loop, and if that was unsuccessful, it used normal blocking locks on subsequent searches. This patch changes it back to the old way. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-11-15 15:24:22 +00:00
Steven Whitehouse	9ae32429fe	GFS2: Remove two unused variables The two variables being initialised in gfs2_inplace_reserve to track the file & line number of the caller are never used, so we might as well remove them. If something does go wrong, then a stack trace is probably more useful anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:52 +01:00
Steven Whitehouse	f75bbfb4dd	GFS2: Fix off-by-one in gfs2_blk2rgrpd Bob reported: I found an off-by-one problem with how I coded this section: It should be: + else if (blk >= cur->rd_data0 + cur->rd_data) In fact, cur->rd_data0 + cur->rd_data is the start of the next rgrp (the next ri_addr), so without the "=" check it can land on the wrong rgrp. In all normal cases, this won't be a problem: you're searching for a block _within_ the rgrp, which will pass the test properly. Where it gets into trouble is if you search the rgrps for the block exactly equal to ri_addr. I don't think anything in the kernel does this, but I found a place in gfs2-utils gfs2_edit where it does. So I definitely need to fix it in libgfs2. I'd like to suggest we fix it in the kernel as well for the sake of keeping the functions similar. So this patch fixes the above mentioned off by one error as well as removing the unused parent pointer. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:46 +01:00
Steven Whitehouse	ccad4e147a	GFS2: Correctly set goal block after allocation The new goal block should be set to the end of the newly allocated extent, not the start of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:42 +01:00
Steven Whitehouse	70b0c3656f	GFS2: Use cached rgrp in gfs2_rlist_add() Each block which is deallocated, requires a call to gfs2_rlist_add() and each of those calls was calling gfs2_blk2rgrpd() in order to figure out which rgrp the block belonged in. This can be speeded up by making use of the rgrp cached in the inode. We also reset this cached rgrp in case the block has changed rgrp. This should provide a big reduction in gfs2_blk2rgrpd() calls during deallocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:39 +01:00
Steven Whitehouse	534029e2fd	GFS2: Remove obsolete assert Given that a resource group has been locked, there is no reason why we should not be able to allocate as many blocks as are free. The al_requested parameter should really be considered as a minimum number of blocks to be available. Should this limit be overshot, there are other mechanisms which will prevent over allocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:36 +01:00
Steven Whitehouse	54335b1fca	GFS2: Cache the most recently used resource group in the inode This means that after the initial allocation for any inode, the last used resource group is cached in the inode for future use. This drastically reduces the number of lookups of resource groups in the common case, and this the contention on that data structure. The allocation algorithm is the same as previously, except that we always check to see if the goal block is within the cached rgrp first before going to the rbtree to look one up. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:34 +01:00
Steven Whitehouse	8339ee543e	GFS2: Make resource groups "append only" during life of fs Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-10-21 12:39:33 +01:00
Bob Peterson	7c9ca62113	GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>	2011-10-21 12:39:31 +01:00
Eric Sandeen	46fcb2ed29	GFS2: combine duplicated block freeing routines __gfs2_free_data and __gfs2_free_meta are almost identical, and can be trivially combined. [This is as per Eric's original patch minus gfs2_free_data() which had no callers left and plus the conversion of the bmap.c calls to these functions. All in all, a nice clean up] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:52 +01:00
Steven Whitehouse	6d3117b412	GFS2: Wipe directory hash table metadata when deallocating a directory The deallocation code for directories in GFS2 is largely divided into two parts. The first part deallocates any directory leaf blocks and marks the directory as being a regular file when that is complete. The second stage was identical to deallocating regular files. Regular files have their data blocks in a different address space to directories, and thus what would have been normal data blocks in a regular file (the hash table in a GFS2 directory) were deallocated correctly. However, a reference to these blocks was left in the journal (assuming of course that some previous activity had resulted in those blocks being in the journal or ail list). This patch uses the i_depth as a test of whether the inode is an exhash directory (we cannot test the inode type as that has already been changed to a regular file at this stage in deallocation) The original issue was reported by Chris Hertel as an issue he encountered running bonnie++ Reported-by: Christopher R. Hertel <crh@samba.org> Cc: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-21 14:05:58 +01:00
Steven Whitehouse	29687a2ac8	GFS2: Alter point of entry to glock lru list for glocks with an address_space Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 08:59:48 +01:00
Bob Peterson	95c8e17f2f	GFS2: Dump better debug info if a bitmap inconsistency is detected On rare occasions we encounter gfs2 problems where an invalid bitmap state transition is attempted. For example, trying to "unlink" a free block. In these cases, there is really no useful information logged to debug the problem. This patch adds more debug details that should allow us to more closely examine the problem and possibly solve it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 08:53:12 +01:00
Bob Peterson	44ad37d69b	GFS2: filesystem hang caused by incorrect lock order This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-18 15:23:50 +01:00
Bob Peterson	4c16c36ad6	GFS2: deallocation performance patch This patch is a performance improvement to GFS2's dealloc code. Rather than update the quota file and statfs file for every single block that's stripped off in unlink function do_strip, this patch keeps track and updates them once for every layer that's stripped. This is done entirely inside the existing transaction, so there should be no risk of corruption. The other functions that deallocate blocks will be unaffected because they are using wrapper functions that do the same thing that they do today. I tested this code on my roth cluster by creating 200 files in a directory, each of which is 100MB, then on four nodes, I simultaneously deleted the files, thus competing for GFS2 resources (but different files). The commands I used were: [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done The performance increase was significant: roth-01 roth-02 roth-03 roth-05 --------- --------- --------- --------- old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s Total time spent deleting: old: 118.6s new: 89.4 For this particular case, this showed a 25% performance increase for GFS2 unlinks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-24 12:13:48 +00:00
Bob Peterson	bcd7278d8a	GFS2: fsck.gfs2 reported statfs error after gfs2_grow When you do gfs2_grow it failed to take the very last rgrp into account when adding up the new free space due to an off-by-one error. It was not reading the last rgrp from the rindex because of a check for "<=" that should have been "<". Therefore, fsck.gfs2 was finding (and fixing) an error with the system statfs file. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-12-07 18:55:07 +00:00
Benjamin Marzinski	086d8334cf	GFS2: fix recursive locking during rindex truncates When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold, since you already hold it. However, if you haven't already read in the resource groups, you need to do that. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:41:54 +00:00
Benjamin Marzinski	0489b3f5eb	GFS2: reread rindex when necessary to grow rindex When GFS2 grew the filesystem, it was never rereading the rindex file during the grow. This is necessary for large grows when the filesystem is almost full, and GFS2 needs to use some of the space allocated earlier in the grow to complete it. Now, if GFS2 fails to reserve the necessary space and the rindex file is not uptodate, it rereads it. Also, the only difference between gfs2_ri_update() and gfs2_ri_update_special() was that gfs2_ri_update_special() didn't clear out the existing resource groups, since you knew that it was only called when there were no resource groups. Attempting to clear out the resource groups when there are none takes almost no time, and rarely happens, so I simply removed gfs2_ri_update_special(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:34:18 +00:00
Steven Whitehouse	044b9414c7	GFS2: Fix inode deallocation race This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-15 12:44:42 +00:00
Linus Torvalds	a2887097f2	Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...	2010-10-22 17:07:18 -07:00
Bob Peterson	46290341cd	GFS2 fatal: filesystem consistency error on rename This patch fixes a GFS2 problem whereby the first rename after a mount can result in a file system consistency error being flagged improperly and cause the file system to withdraw. The problem is that the rename code tries to run the rgrp list with function gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in from disk. The patch makes the rename function hold the rindex glock (as the gfs2_unlink code does today) which reads in the rgrp list if need be. There were a total of three places in the rename code that improperly referenced the rgrp list without the rindex glock and this patch fixes all three. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-30 17:23:03 +01:00
Benjamin Marzinski	3921120e75	GFS2: fallocate support This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:17 +01:00
Steven Whitehouse	9a3f236d40	GFS2: Add a bug trap in allocation code This adds a check to ensure that if we reach the block allocator that we don't try and proceed if there is no alloc structure hanging off the inode. This should only happen if there is a bug in GFS2. The error return code is distinctive in order that it will be easily spotted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:59 +01:00
Steven Whitehouse	a2e0f79939	GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:29 +01:00
Christoph Hellwig	dd3932eddf	block: remove BLKDEV_IFL_WAIT All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-16 20:52:58 +02:00
Christoph Hellwig	f1e4d518c3	gfs2: replace barriers with explicit flush / FUA usage Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:39 +02:00
Linus Torvalds	f16a5e3478	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix permissions checking for setflags ioctl() GFS2: Don't "get" xattrs for ACLs when ACLs are turned off GFS2: Rework reclaiming unlinked dinodes	2010-05-25 08:17:51 -07:00
Jens Axboe	ee9a3607fb	Merge branch 'master' into for-2.6.35 Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:27:26 +02:00
Bob Peterson	ed4878e8a4	GFS2: Rework reclaiming unlinked dinodes The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-21 16:11:36 +01:00
Bob Peterson	cc0581bd61	GFS2: stuck in inode wait, no glocks stuck This patch changes the lock ordering when gfs2 reclaims unlinked dinodes, thereby avoiding a livelock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-12 09:55:39 +01:00
Dmitry Monakhov	fbd9b09a17	blkdev: generalize flags for blkdev_issue_fn functions The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-28 19:47:36 +02:00
Bob Peterson	1a0eae8848	GFS2: glock livelock This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-04-14 16:48:05 +01:00
Steven Whitehouse	ea8d62dadd	GFS2: Use GFP_NOFS for alloc structure This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-02-01 10:01:34 +00:00
Steven Whitehouse	7fe3ec6fe5	GFS2: Fix previous patch The do_div() call needs to remain. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-02-01 10:00:23 +00:00
Benjamin Marzinski	55f0b4c546	GFS2: Don't withdraw on partial rindex entries ince gfs2 writes the rindex file a block at a time, and releases the exclusive lock after each block, it is possible that another process will grab the lock in the middle of the write. Since rindex entries are not an even divisor of blocks, that other process may see partial entries. On grows, this is fine. The process can simply ignore the the partial entires. Previously, the code withdrew when it saw partial entries. Now it simply ignores them. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-02-01 09:59:54 +00:00
Steven Whitehouse	2c77634965	GFS2: Locking order fix in gfs2_check_blk_state In some cases we already have the rindex lock when we enter this function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 11:57:41 +00:00
Anand Gadiyar	fd589a8f0a	trivial: fix typo "to to" in multiple files Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-09-21 15:14:55 +02:00
Linus Torvalds	355bbd8cb8	Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...	2009-09-14 17:55:15 -07:00
Steven Whitehouse	86d0063656	GFS2: Whitespace fixes Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-09-14 09:50:57 +01:00
Christoph Hellwig	746cd1e7e4	block: use blkdev_issue_discard in blk_ioctl_discard blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:53 +02:00
Steven Whitehouse	acf7e2444a	GFS2: Be extra careful about deallocating inodes There is a potential race in the inode deallocation code if two nodes try to deallocate the same inode at the same time. Most of the issue is solved by the iopen locking. There is still a small window which is not covered by the iopen lock. This patches fixes that and also makes the deallocation code more robust in the face of any errors in the rgrp bitmaps, or erroneous iopen callbacks from other nodes. This does introduce one extra disk read, but that is generally not an issue since its the same block that must be written to later in the deallocation process. The total disk accesses therefore stay the same, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-09-08 18:00:30 +01:00

1 2 3 4 5 ...

338 commits