Commit Graph

43 Commits

Author SHA1 Message Date
David Chinner 641c56fbfe [XFS] Prevent deadlock when flushing inodes on unmount
When we are unmounting the filesystem, we flush all the inodes to disk.
Unfortunately, if we have an inode cluster that has just been freed and
marked stale sitting in an incore log buffer (i.e. hasn't been flushed to
disk), it will be holding all the flush locks on the inodes in that
cluster.

xfs_iflush_all() which is called during unmount walks all the inodes
trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each
inode. If the inode is dirty, if grabs the flush lock and flushes it.
Unfortunately, find dirty inodes that already have their flush lock held
and so we sleep.

At this point in the unmount process, we are running single-threaded.
There is nothing more that can push on the log to force the transaction
holding the inode flush locks to disk and hence we deadlock.

The fix is to issue a log force before flushing the inodes on unmount so
that all the flush locks will be released before we start flushing the
inodes.

SGI-PV: 964538
SGI-Modid: xfs-linux-melb:xfs-kern:28862a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-07-14 15:33:38 +10:00
David Chinner 92821e2ba4 [XFS] Lazy Superblock Counters
When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-07-14 15:28:50 +10:00
Nathan Scott 4cc929ee30 [XFS] Don't grow filesystems past the size they can index.
When growing a filesystem we don't check to see if the new size overflows
the page cache index range, so we can do silly things like grow a
filesystem page 16TB on a 32bit. Check new filesystem sizes against the
limits the kernel can support.

SGI-PV: 957886
SGI-Modid: xfs-linux-melb:xfs-kern:28563a

Signed-Off-By: Nathan Scott <nscott@aconex.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-07-14 15:21:29 +10:00
Rafael J. Wysocki 8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Eric Sandeen 1c72bf9003 [XFS] The last argument "lsn" of xfs_trans_commit() is always called with
NULL.

Patch provided by Eric Sandeen.

SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:48:42 +10:00
Lachlan McIlroy 5478eead85 [XFS] Re-initialize the per-cpu superblock counters after recovery.
After filesystem recovery the superblock is re-read to bring in any
changes. If the per-cpu superblock counters are not re-initialized from
the superblock then the next time the per-cpu counters are disabled they
might overwrite the global counter with a bogus value.

SGI-PV: 957348
SGI-Modid: xfs-linux-melb:xfs-kern:27999a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:36:29 +11:00
David Chinner dbcabad19a [XFS] Fix block reservation mechanism.
The block reservation mechanism has been broken since the per-cpu
superblock counters were introduced. Make the block reservation code work
with the per-cpu counters by syncing the counters, snapshotting the amount
of available space and then doing a modifcation of the counter state
according to the result. Continue in a loop until we either have no space
available or we reserve some space.

SGI-PV: 956323
SGI-Modid: xfs-linux-melb:xfs-kern:27895a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:36:17 +11:00
David Chinner 20f4ebf2bf [XFS] Make growfs work for amounts greater than 2TB
The free block modification code has a 32bit interface, limiting the size
the filesystem can be grown even on 64 bit machines. On 32 bit machines,
there are other 32bit variables in transaction structures and interfaces
that need to be expanded to allow this to work.

SGI-PV: 959978
SGI-Modid: xfs-linux-melb:xfs-kern:27894a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:36:10 +11:00
David Chinner 03135cf726 [XFS] Fix UP build breakage due to undefined m_icsb_mutex.
SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27692a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:15 +11:00
David Chinner 20b642858b [XFS] Reduction global superblock lock contention near ENOSPC.
The existing per-cpu superblock counter code uses the global superblock
spin lock when we approach ENOSPC for global synchronisation. On larger
machines than this code was originally tested on this can still get
catastrophic spinlock contention due increasing rebalance frequency near
ENOSPC.

By introducing a sleeping lock that is used to serialise balances and
modifications near ENOSPC we prevent contention from needlessly from
wasting the CPU time of potentially hundreds of CPUs.

To reduce the number of balances occuring, we separate the need rebalance
case from the slow allocate case. Now, a counter running dry will trigger
a rebalance during which counters are disabled. Any thread that sees a
disabled counter enters a different path where it waits on the new mutex.
When it gets the new mutex, it checks if the counter is disabled. If the
counter is disabled, then we _know_ that we have to use the global counter
and lock and it is safe to do so immediately. Otherwise, we drop the mutex
and go back to trying the per-cpu counters which we know were re-enabled.

SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27612a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:09 +11:00
David Chinner 7989cb8ef5 [XFS] Keep stack usage down for 4k stacks by using noinline.
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.

Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.

Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.

SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:56 +11:00
David Chinner 4be536debe [XFS] Prevent free space oversubscription and xfssyncd looping.
The fix for recent ENOSPC deadlocks introduced certain limitations on
allocations. The fix could cause xfssyncd to loop endlessly if we did not
leave some space free for the allocator to work correctly. Basically, we
needed to ensure that we had at least 4 blocks free for an AG free list
and a block for the inode bmap btree at all times.

However, this did not take into account the fact that each AG has a free
list that needs 4 blocks. Hence any filesystem with more than one AG could
cause oversubscription of free space and make xfssyncd spin forever trying
to allocate space needed for AG freelists that was not available in the
AG.

The following patch reserves space for the free lists in all AGs plus the
inode bmap btree which prevents oversubscription. It also prevents those
blocks from being reported as free space (as they can never be used) and
makes the SMP in-core superblock accounting code and the reserved block
ioctl respect this requirement.

SGI-PV: 955674
SGI-Modid: xfs-linux-melb:xfs-kern:26894a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
2006-09-07 14:26:50 +10:00
Linus Torvalds 73a0e405dc Merge git://oss.sgi.com:8090/nathans/xfs-2.6
* git://oss.sgi.com:8090/nathans/xfs-2.6:
  [XFS] Fixup whitespace damage in log_write, remove final warning.
  [XFS] Rework code snippets slightly to remove remaining recent-gcc
  [XFS] Fix realtime subvolume expansion, a porting bug b0rked it.  Coverity
  [XFS] Remove a race condition where a linked inode could BUG_ON in
  [XFS] Remove redundant directory checks from inode link operation.
  [XFS] Remove a couple of no-longer-used macros.
  [XFS] Reduce size of xfs_trans_t structure. * remove ->t_forw, ->t_back --
  [XFS] remove unused behaviour lock - shrink XFS vnode as a side effect.
  [XFS] * There is trivial "inode => vnode => inode" conversion, but only
  [XFS] link(2) on directory is banned in VFS.
2006-06-27 19:09:16 -07:00
Chandra Seetharaman 5a67e4c5b6 [PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Nathan Scott 6fdf8ccc09 [XFS] Rework code snippets slightly to remove remaining recent-gcc
warnings.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26364a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-28 10:13:52 +10:00
Nathan Scott f6c2d1fa63 [XFS] Remove version 1 directory code. Never functioned on Linux, just
pure bloat.

SGI-PV: 952969
SGI-Modid: xfs-linux-melb:xfs-kern:26251a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-20 13:04:51 +10:00
Nathan Scott 67fcaa73ad [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters.
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26107a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09 17:00:52 +10:00
Nathan Scott b83bd13881 [XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters.
SGI-PV: 9533338
SGI-Modid: xfs-linux-melb:xfs-kern:26106a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09 16:48:30 +10:00
Nathan Scott b65745205f [XFS] Portability changes: remove prdev, stick to one diagnostic
interface.

SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26103a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09 15:29:40 +10:00
Yingping Lu d210a28cd8 [XFS] In actual allocation of file system blocks and freeing extents, the
transaction within each such operation may involve multiple locking of AGF
buffer. While the freeing extent function has sorted the extents based on
AGF number before entering into transaction, however, when the file system
space is very limited, the allocation of space would try every AGF to get
space allocated, this could potentially cause out-of-order locking, thus
deadlock could happen. This fix mitigates the scarce space for allocation
by setting aside a few blocks without reservation, and avoid deadlock by
maintaining ascending order of AGF locking.

SGI-PV: 947395
SGI-Modid: xfs-linux-melb:xfs-kern:210801a

Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09 14:55:18 +10:00
Nathan Scott e50bd16fe4 [XFS] Fix superblock validation regression for the zero imaxpct case.
Thanks to kjamieson for noticing.

SGI-PV: 951661
SGI-Modid: xfs-linux-melb:xfs-kern:25675a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:10:45 +10:00
Nathan Scott 764d1f89a5 [XFS] Implement the silent parameter to fill_super, previously ignored.
SGI-PV: 951299
SGI-Modid: xfs-linux-melb:xfs-kern:25632a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:04:17 +10:00
Nathan Scott c41564b5af [XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all
these typos.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25539a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 08:55:14 +10:00
Nathan Scott 9f989c9455 [XFS] Additional mount time superblock validation checks.
SGI-PV: 950491
SGI-Modid: xfs-linux-melb:xfs-kern:25354a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-14 13:29:32 +11:00
David Chinner 01e1b69cfc [XFS] using a spinlock per cpu for superblock counter exclusion results in
a preēmpt counter overflow at 256p and above. Change the exclusion
mechanism to use atomic bit operations and busy wait loops to emulate the
spin lock exclusion mechanism but without the preempt count issues.

SGI-PV: 950027
SGI-Modid: xfs-linux-melb:xfs-kern:25338a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-14 13:29:16 +11:00
David Chinner e8234a6871 [XFS] Add support for hotplug CPUs to the per-CPU superblock counters by
registering a notifier callback that listens to CPU up/down events to
modify the counters appropriately.

SGI-PV: 949726
SGI-Modid: xfs-linux-melb:xfs-kern:25214a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-14 13:23:52 +11:00
David Chinner 8d280b98cf [XFS] On machines with more than 8 cpus, when running parallel I/O
threads, the incore superblock lock becomes the limiting factor for
buffered write throughput. Make the contended fields in the incore
superblock use per-cpu counters so that there is no global lock to limit
scalability.

SGI-PV: 946630
SGI-Modid: xfs-linux-melb:xfs-kern:25106a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-14 13:13:09 +11:00
Jesper Juhl 014c2544e6 return statement cleanup - kill pointless parentheses
This patch removes pointless parentheses from return statements.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-15 02:37:08 +01:00
Nathan Scott ee2a4f7caa [XFS] Fix an intermittent pquota panic caused by dodgey quota flags to an
umount dquot flush call.

SGI-PV: 946444
SGI-Modid: xfs-linux-melb:xfs-kern:24680a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-01-11 15:33:36 +11:00
Christoph Hellwig 1df84c930a [XFS] Mark some lookup tables const. Thanks to Arjan van de Ven for
spotting these.

SGI-PV: 946028
SGI-Modid: xfs-linux-melb:xfs-kern:202617a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-01-11 15:29:52 +11:00
Jes Sorensen 794ee1baee [PATCH] mutex subsystem, semaphore to mutex: XFS
This patch switches XFS over to use the new mutex code directly as
opposed to the previous workaround patch I posted earlier that avoided
the namespace clash by forcing it back to semaphores. This falls in the
'works for me<tm>' category.

Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-09 15:59:21 -08:00
Eric Sandeen a749ee8615 [XFS] Fix calculation of reserved AGs for inodes in 32-bit inode mode
Spotted by Roger Willcocks <willcor @at@ gmail.com>

SGI-PV: 944858
SGI-Modid: xfs-linux:xfs-kern:201213a

Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 15:13:42 +11:00
Nathan Scott c11e2c369d [XFS] Rework fid encode/decode wrt 64 bit inums interacting with NFS.
SGI-PV: 937127
SGI-Modid: xfs-linux:xfs-kern:24201a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 15:11:45 +11:00
Nathan Scott 7b71876980 [XFS] Update license/copyright notices to match the prefered SGI
boilerplate.

SGI-PV: 913862
SGI-Modid: xfs-linux:xfs-kern:23903a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 14:58:39 +11:00
Nathan Scott a844f4510d [XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot.
SGI-PV: 943122
SGI-Modid: xfs-linux:xfs-kern:23901a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 14:38:42 +11:00
Nathan Scott fc1f8c1ca3 [XFS] Track external log/realtime device names for correct reporting in
/proc/mounts.

SGI-PV: 942984
SGI-Modid: xfs-linux:xfs-kern:23862a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 11:44:33 +11:00
Nathan Scott d8cc890d40 [XFS] Ondisk format extension for extended attributes (attr2). Basically,
the data/attr forks now grow up/down from either end of the literal area,
rather than dividing the literal area into two chunks and growing both
upward.  Means we can now make much more efficient use of the attribute
space, incl. fitting DMF attributes inline in 256 byte inodes, and large
jumps in dbench3 performance numbers.  It is self enabling, but can be
forced on/off via the attr2/noattr2 mount options.

SGI-PV: 941645
SGI-Modid: xfs-linux:xfs-kern:23835a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 10:34:53 +11:00
Christoph Hellwig da1650a5d6 [XFS] Add format checking to cmn_err and icmn_err
SGI-PV: 942243
SGI-Modid: xfs-linux:xfs-kern:198658a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02 10:21:35 +11:00
Christoph Hellwig 8401e9631c [XFS] remove xfs_incore_relse
SGI-PV: 936977
SGI-Modid: xfs-linux:xfs-kern:193409a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-06-21 15:38:03 +10:00
Christoph Hellwig efa8027804 [XFS] rewrite xfs_iflush_all
SGI-PV: 936890
SGI-Modid: xfs-linux:xfs-kern:193349a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-06-21 15:37:17 +10:00
Christoph Hellwig ba0f32d460 [XFS] mark various symbols static Patch from Adrian Bunk
SGI-PV: 936255
SGI-Modid: xfs-linux:xfs-kern:192760a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-06-21 15:36:52 +10:00
Nathan Scott de20614b35 [XFS] Block mount attempts for filesystems with version 1 directories.
SGI Modid: xfs-linux:xfs-kern:21937a

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Christoph Hellwig <hch@sgi.com>
2005-05-05 13:24:13 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00