Commit Graph

168 Commits

Author SHA1 Message Date
David Howells 2105c2820d afs: Fix race between post-modification dir edit and readdir/d_revalidate
AFS directories are retained locally as a structured file, with lookup
being effected by a local search of the file contents.  When a modification
(such as mkdir) happens, the dir file content is modified locally rather
than redownloading the directory.

The directory contents are accessed in a number of ways, with a number of
different locks schemes:

 (1) Download of contents - dvnode->validate_lock/write in afs_read_dir().

 (2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(),
     downgrading from (1) if necessary.

 (3) d_revalidate of child dentry - dvnode->validate_lock/read in
     afs_do_lookup_one() downgrading from (1) if necessary.

 (4) Edit of dir after modification - page locks on individual dir pages.

Unfortunately, because (4) uses different locking scheme to (1) - (3),
nothing protects against the page being scanned whilst the edit is
underway.  Even download is not safe as it doesn't lock the pages - relying
instead on the validate_lock to serialise as a whole (the theory being that
directory contents are treated as a block and always downloaded as a
block).

Fix this by write-locking dvnode->validate_lock around the edits.  Care
must be taken in the rename case as there may be two different dirs - but
they need not be locked at the same time.  In any case, once the lock is
taken, the directory version must be rechecked, and the edit skipped if a
later version has been downloaded by revalidation (there can't have been
any local changes because the VFS holds the inode lock, but there can have
been remote changes).

Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13 15:09:01 +01:00
David Howells b98f0ec91c afs: Fix rename operation status delivery
The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only
decode the second file status returned unless the parent directories are
different - unfortunately, this means that the xdr pointer isn't advanced
and the volsync record will be read incorrectly in such an instance.

Fix this by always decoding the second status into the second
status/callback block which wasn't being used if the dirs were the same.

The afs_update_dentry_version() calls that update the directory data
version numbers on the dentries can then unconditionally use the second
status record as this will always reflect the state of the destination dir
(the two records will be identical if the destination dir is the same as
the source dir)

Fixes: 260a980317 ("[AFS]: Add "directory write" support.")
Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13 15:09:01 +01:00
David Howells f52b83b0b1 afs: Fix afs_lookup() to not clobber the version on a new dentry
Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82ef5 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells 40a708bd62 afs: Fix use-after-loss-of-ref
afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells a28f239e29 afs: Fix race in commit bulk status fetch
When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally).  When the results are received, it will speculatively
populate the inode cache from the extra data.

However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.

If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():

	BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));

on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.

Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.

Fixes: 39db9815da ("afs: Fix application of the results of a inline bulk status fetch")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15 10:28:02 -08:00
David Howells a0753c2900 afs: Support RCU pathwalk
Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode
pathwalk to reduce latency in pathwalk elements that get done multiple
times.  We don't need to query the server unless we've received a
notification from it that something has changed or the callback has
expired.

This requires that we can request a key and check permits under RCU
conditions if we need to.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02 11:43:54 +01:00
Marc Dionne c4c613ff08 afs: Fix possible oops in afs_lookup trace event
The afs_lookup trace event can cause the following:

[  216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
[  216.576803] #PF: supervisor read access in kernel mode
[  216.576813] #PF: error_code(0x0000) - not-present page
...
[  216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]

If the inode from afs_do_lookup() is an error other than ENOENT, or if it
is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
try to dereference the error pointer as a valid pointer.

Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.

Ideally the trace would include the error value, but for now just avoid
the oops.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-22 13:33:26 +01:00
David Howells 9dd0b82ef5 afs: Fix missing dentry data version updating
In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold
the data version of the parent directory when it was created or when
d_revalidate() last caused it to be updated.  This is compared to the
->invalid_before field in the directory inode, rather than the actual data
version number, thereby allowing changes due to local edits to be ignored.
Only if the server data version gets bumped unexpectedly (eg. by a
competing client), do we need to revalidate stuff.

However, the d_fsdata field should also be updated if an rpc op is
performed that modifies that particular dentry.  Such ops return the
revised data version of the directory(ies) involved, so we should use that.

This is particularly problematic for rename, since a dentry from one
directory may be moved directly into another directory (ie. mv a/x b/x).
It would then be sporting the wrong data version - and if this is in the
future, for the destination directory, revalidations would be missed,
leading to foreign renames and hard-link deletion being missed.

Fix this by the following means:

 (1) Return the data version number from operations that read the directory
     contents - if they issue the read.  This starts in afs_dir_iterate()
     and is used, ignored or passed back by its callers.

 (2) In afs_lookup*(), set the dentry version to the version returned by
     (1) before d_splice_alias() is called and the dentry published.

 (3) In afs_d_revalidate(), set the dentry version to that returned from
     (1) if an rpc call was issued.  This means that if a parallel
     procedure, such as mkdir(), modifies the directory, we won't
     accidentally use the data version from that.

 (4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to
     the directory data version before d_instantiate() is called.

 (5) In afs_{rmdir,unlink}, update the target dentry's version to the
     directory data version as soon as we've updated the directory inode.

 (6) In afs_rename(), we need to unhash the old dentry before we start so
     that we don't get afs_d_revalidate() reverting the version change in
     cross-directory renames.

     We then need to set both the old and the new dentry versions the data
     version of the new directory before we call d_move() as d_move() will
     rehash them.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:52 +01:00
David Howells 5dc84855b0 afs: Only update d_fsdata if different in afs_d_revalidate()
In the in-kernel afs filesystem, d_fsdata is set with the data version of
the parent directory.  afs_d_revalidate() will update this to the current
directory version, but it shouldn't do this if it the value it read from
d_fsdata is the same as no lock is held and cmpxchg() is not used.

Fix the code to only change the value if it is different from the current
directory version.

Fixes: 260a980317 ("[AFS]: Add "directory write" support.")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:51 +01:00
David Howells 37c0bbb332 afs: Fix off-by-one in afs_rename() expected data version calculation
When afs_rename() calculates the expected data version of the target
directory in a cross-directory rename, it doesn't increment it as it
should, so it always thinks that the target inode is unexpectedly modified
on the server.

Fixes: a58823ac45 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:51 +01:00
Linus Torvalds 8dda9957e3 AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRyW8vu3V2unywtrAQIhsw//cVtxLx4ZCox5Z/93cdqych8RoCrwcUEG
 Cli0NAjlp/0HETvCsIqdkPKf+4OYCW1tHB2KTdbFdQLZptLgoEhykx89k70z9ggb
 ViieEa1GvAKhdamVqkPUC+3Q33uzyRaK7Gi5N3phJoaO+o328SlrPG0LerQgY0Np
 Rf3je56A1gIjEgWTmpStxiY262jlgaR3IuvpOqbu2G0TQVWV8CsBKw61fTdmEEQp
 dIkNO/xFXS+PvPdmQe5zCAjD/W2D+ggeBMbBwHF411qA60plGinubBYKZ98ikliZ
 OnQQPExI7mroIMzpYT+rzEQyxui2nz5t+Hj+d6t7iIvitNcX/Q53sVTq3RfQ0FjG
 QCd+j/l2p7fkXK4Sxgb/UBkj/pRr6W+FYSbQ/tmpD8UypEf5B3ln6GuA6yTMuNRF
 wVb744slKWq0c7KUuXmz806B2qJoyFG206jyFnoByvs6cPmB1+JqhBBYOKHcwjbo
 HIK+oUKkEfE6ofjQ3B9xOQ1anfbRnjjfJCmXvns9v57y/nRP2P78HUJNnEsOolk2
 nc3Ep41OgeZdwkts9KnSjmwy6VF3UZ2NQEiWXsUIOxGMtcodw9ci1bpquJ71oyut
 4sFMJvMU4eJD+XuCOlAgpbTaQ0Wuf11kFpl1Cof4fj0Z09C25Ahj6iKEKnumtO+4
 edfNLlwO6oo=
 =wgib
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "A set of minor changes for AFS:

   - Remove an unnecessary check in afs_unlink()

   - Add a tracepoint for tracking callback management

   - Add a tracepoint for afs_server object usage

   - Use struct_size()

   - Add mappings for AFS UAE abort codes to Linux error codes, using
     symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Add support for the UAE error table
  fs/afs: use struct_size() in kzalloc()
  afs: Trace afs_server usage
  afs: Add some callback management tracepoints
  afs: afs_unlink() doesn't need to check dentry->d_inode
2019-07-10 20:55:33 -07:00
Zhengyuan Liu ee102584ef fs/afs: use struct_size() in kzalloc()
As Gustavo said in other patches doing the same replace, we can now
use the new struct_size() helper to avoid leaving these open-coded and
prone to type mistake.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
David Howells 051d25250b afs: Add some callback management tracepoints
Add a couple of tracepoints to track callback management:

 (1) afs_cb_miss - Logs when we were unable to apply a callback, either due
     to the inode being discarded or due to a competing thread applying a
     callback first.

 (2) afs_cb_break - Logs when we attempted to clear the noted callback
     promise, either due to the server explicitly breaking the callback,
     the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
David Howells fa59f52f5b afs: afs_unlink() doesn't need to check dentry->d_inode
Don't check that dentry->d_inode is valid in afs_unlink().  We should be
able to take that as given.

This caused Smatch to issue the following warning:

	fs/afs/dir.c:1392 afs_unlink() error: we previously assumed 'vnode' could be null (see line 1375)

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
Thomas Gleixner 2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
David Howells 39db9815da afs: Fix application of the results of a inline bulk status fetch
Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
it will update inodes that are already extant (something that afs_iget()
doesn't do) and to cache permits for each inode created (thereby avoiding a
follow up FS.FetchStatus call to determine this).

Extant inodes need looking up in advance so that their cb_break counters
before and after the operation can be compared.  To this end, the inode
pointers are cached so that they don't need looking up again after the op.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells b835915325 afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.

This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells a38a75581e afs: Fix unlink to handle YFS.RemoveFile2 better
Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.

Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells f642404a04 afs: Make vnode->cb_interest RCU safe
Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest.  Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().

This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells a58823ac45 afs: Fix application of status and callback to be under same lock
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

 (1) Allocating a joint {status,callback} record (afs_status_cb) before
     calling the RPC function for each vnode for which the RPC reply
     contains a status or a status plus a callback.  A flag is set in the
     record to indicate if a callback was actually received.

 (2) These records are passed into the RPC functions to be filled in.  The
     afs_decode_status() and yfs_decode_status() functions are removed and
     the cb_lock is no longer taken.

 (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
     update the vnode.

 (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
     the vnode.

 (5) vnodes, expected data-version numbers and callback break counters
     (cb_break) no longer need to be passed to the reply delivery
     functions.

     Note that, for the moment, the file locking functions still need
     access to both the call and the vnode at the same time.

 (6) afs_vnode_commit_status() is now given the cb_break value and the
     expected data_version and the task of applying the status and the
     callback to the vnode are now done here.

     This is done under a single taking of vnode->cb_lock.

 (7) afs_pages_written_back() is now called by afs_store_data() rather than
     by the reply delivery function.

     afs_pages_written_back() has been moved to before the call point and
     is now given the first and last page numbers rather than a pointer to
     the call.

 (8) The indicator from YFS.RemoveFile2 as to whether the target file
     actually got removed (status.abort_code == VNOVNODE) rather than
     merely dropping a link is now checked in afs_unlink rather than in
     xdr_decode_YFSFetchStatus().

Supplementary fixes:

 (*) afs_cache_permit() now gets the caller_access mask from the
     afs_status_cb object rather than picking it out of the vnode's status
     record.  afs_fetch_status() returns caller_access through its argument
     list for this purpose also.

 (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
     than a read lock and now sets the callback inside the same critical
     section.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells 87182759cd afs: Fix order-1 allocation in afs_do_lookup()
afs_do_lookup() will do an order-1 allocation to allocate status records if
there are more than 39 vnodes to stat.

Fix this by allocating an array of {status,callback} records for each vnode
we want to examine using vmalloc() if larger than a page.

This not only gets rid of the order-1 allocation, but makes it easier to
grow beyond 50 records for YFS servers.  It also allows us to move to
{status,callback} tuples for other calls too and makes it easier to lock
across the application of the status and the callback to the vnode.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells 20b8391fff afs: Make some RPC operations non-interruptible
Make certain RPC operations non-interruptible, including:

 (*) Set attributes
 (*) Store data

     We don't want to get interrupted during a flush on close, flush on
     unlock, writeback or an inode update, leaving us in a state where we
     still need to do the writeback or update.

 (*) Extend lock
 (*) Release lock

     We don't want to get lock extension interrupted as the file locks on
     the server are time-limited.  Interruption during lock release is less
     of an issue since the lock is time-limited, but it's better to
     complete the release to avoid a several-minute wait to recover it.

     *Setting* the lock isn't a problem if it's interrupted since we can
      just return to the user and tell them they were interrupted - at
      which point they can elect to retry.

 (*) Silly unlink

     We want to remove silly unlink files if we can, rather than leaving
     them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

	kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list.  The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells b134d687dd afs: Log more information for "kAFS: AFS vnode with undefined type\n"
Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0).  This prints more information
to try and help pin down the problem.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells 80548b0399 afs: Add more tracepoints
Add four more tracepoints:

 (1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
     to log also.

 (2) afs_make_fs_call2 - Like the above but takes two filenames to log.

 (3) afs_lookup - Log the result of doing a successful lookup, including a
     negative result (fid 0:0).

 (4) afs_get_tree - Log the set up of a volume for mounting.

It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
and puts quotes around the filename in the text representation.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells 79ddbfa500 afs: Implement sillyrename for unlink and rename
Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.

Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.

A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup.  The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells 99987c5600 afs: Add directory reload tracepoint
Add a tracepoint (afs_reload_dir) to indicate when a directory is being
reloaded.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells 445b10289f afs: Improve dir check failure reports
Improve the content of directory check failure reports from:

	kAFS: afs_dir_check_page(6d57): bad magic 1/2 is 0000

to dump more information about the individual blocks in a directory page.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells 73116df7bb afs: Use d_instantiate() rather than d_add() and don't d_drop()
Use d_instantiate() rather than d_add() and don't d_drop() in
afs_vnode_new_inode().  The dentry shouldn't be removed as it's not
changing its name.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells 30062bd13e afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells f58db83fd3 afs: Get the target vnode in afs_rmdir() and get a callback on it
Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells 0067191201 afs: Commit the status on a new file/dir/symlink
Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells 3b6492df41 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells f51375cd9e afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Al Viro 1401a0fc2d afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT)
simpler logics in callers that way

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05 15:50:59 -04:00
Al Viro 34b2a88fb4 afs_lookup(): switch to d_splice_alias()
->lookup() methods can (and should) use d_splice_alias() instead of
d_add().  Even if they are not going to be hit by open_by_handle(),
code does get copied around; besides, d_splice_alias() has better
calling conventions for use in ->lookup(), so the code gets simpler.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05 15:44:14 -04:00
David Howells 68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells b61f7dcf4e afs: Fix directory page locking
The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data.  As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data.  Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
		...
	}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

  page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
  flags: 0xfffe000001004(referenced|private)
  raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
  raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  page->mem_cgroup:ffff98156b27c000
  ------------[ cut here ]------------
  kernel BUG at mm/filemap.c:1205!
  ...
  RIP: 0010:unlock_page+0x43/0x50
  ...
  Call Trace:
   afs_dir_iterate+0x789/0x8f0 [kafs]
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x166/0x1d0
   ? afs_do_lookup+0x69/0x490 [kafs]
   ? afs_do_lookup+0x101/0x490 [kafs]
   ? key_default_cmp+0x20/0x20
   ? request_key+0x3c/0x80
   ? afs_lookup+0xf1/0x340 [kafs]
   ? __lookup_slow+0x97/0x150
   ? lookup_slow+0x35/0x50
   ? walk_component+0x1bf/0x490
   ? path_lookupat.isra.52+0x75/0x200
   ? filename_lookup.part.66+0xa0/0x170
   ? afs_end_vnode_operation+0x41/0x60 [kafs]
   ? __check_object_size+0x9c/0x171
   ? strncpy_from_user+0x4a/0x170
   ? vfs_statx+0x73/0xe0
   ? __do_sys_newlstat+0x39/0x70
   ? __x64_sys_getdents+0xc9/0x140
   ? __x64_sys_getdents+0x140/0x140
   ? do_syscall_64+0x5b/0x160
   ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells 5a81327616 afs: Do better accretion of small writes on newly created content
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on ->flush() but rather on ->release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells 63a4681ff3 afs: Locally edit directory data for mkdir/create/unlink/...
Locally edit the contents of an AFS directory upon a successful inode
operation that modifies that directory (such as mkdir, create and unlink)
so that we can avoid the current practice of re-downloading the directory
after each change.

This is viable provided that the directory version number we get back from
the modifying RPC op is exactly incremented by 1 from what we had
previously.  The data in the directory contents is in a defined format that
we have to parse locally to perform lookups and readdir, so modifying isn't
a problem.

If the edit fails, we just clear the VALID flag on the directory and it
will be reloaded next time it is needed.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells 0031763698 afs: Adjust the directory XDR structures
Adjust the AFS directory XDR structures in a number of superficial ways:

 (1) Rename them to all begin afs_xdr_.

 (2) Use u8 instead of uint8_t.

 (3) Mark the structures as __packed so they don't get rearranged by the
     compiler.

 (4) Rename the hdr member of afs_xdr_dir_block to meta.

 (5) Rename the pagehdr member of afs_xdr_dir_block to hdr.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells 4ea219a839 afs: Split the directory content defs into a header
Split the directory content definitions into a header file so that they can
be used by multiple .c files.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells f3ddee8dc4 afs: Fix directory handling
AFS directories are structured blobs that are downloaded just like files
and then parsed by the lookup and readdir code and, as such, are currently
handled in the pagecache like any other file, with the entire directory
content being thrown away each time the directory changes.

However, since the blob is a known structure and since the data version
counter on a directory increases by exactly one for each change committed
to that directory, we can actually edit the directory locally rather than
fetching it from the server after each locally-induced change.

What we can't do, though, is mix data from the server and data from the
client since the server is technically at liberty to rearrange or compress
a directory if it sees fit, provided it updates the data version number
when it does so and breaks the callback (ie. sends a notification).

Further, lookup with lookup-ahead, readdir and, when it arrives, local
editing are likely want to scan the whole of a directory.

So directory handling needs to be improved to maintain the coherency of the
directory blob prior to permitting local directory editing.

To this end:

 (1) If any directory page gets discarded, invalidate and reread the entire
     directory.

 (2) If readpage notes that if when it fetches a single page that the
     version number has changed, the entire directory is flagged for
     invalidation.

 (3) Read as much of the directory in one go as we can.

Note that this removes local caching of directories in fscache for the
moment as we can't pass the pages to fscache_read_or_alloc_pages() since
page->lru is in use by the LRU.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells 66c7e1d319 afs: Split the dynroot stuff out and give it its own ops tables
Split the AFS dynamic root stuff out of the main directory handling file
and into its own file as they share little in common.

The dynamic root code also gets its own dentry and inode ops tables.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:00 +01:00
David Howells a4ff7401fb afs: Keep track of invalid-before version for dentry coherency
Each afs dentry is tagged with the version that the parent directory was at
last time it was validated and, currently, if this differs, the directory
is scanned and the dentry is refreshed.

However, this leads to an excessive amount of revalidation on directories
that get modified on the client without conflict with another client.  We
know there's no conflict because the parent directory's data version number
got incremented by exactly 1 on any create, mkdir, unlink, etc., therefore
we can trust the current state of the unaffected dentries when we perform a
local directory modification.

Optimise by keeping track of the last version of the parent directory that
was changed outside of the client in the parent directory's vnode and using
that to validate the dentries rather than the current version.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells d55b4da433 afs: Introduce a statistics proc file
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:54 +01:00
David Howells 37ab636880 afs: Implement @cell substitution handling
Implement @cell substitution handling such that if @cell is seen as a name
in a dynamic root mount, then the name of the root cell for that network
namespace will be substituted for @cell during lookup.

The substitution of @cell for the current net namespace is set by writing
the cell name to /proc/fs/afs/rootcell.  The value can be obtained by
reading the file.

For example:

	# mount -t afs none /kafs -o dyn
	# echo grand.central.org >/proc/fs/afs/rootcell
	# ls /kafs/@cell
	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/
	# cat /proc/fs/afs/rootcell
	grand.central.org

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:18:58 +01:00
David Howells 6f8880d8e6 afs: Implement @sys substitution handling
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system.  Up to 16 alternatives may be specified and these are
tried in turn until one works.  Each network namespace has[*] a separate
independent list.

Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".

[*] Or will, once network namespace support is finalised in kAFS.

The list may be set by:

	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

for which separate writes to the same fd are amalgamated and applied on
close.  The LF character may be used as a separator to specify multiple
items in the same write() call.

The list may be cleared by:

	# echo >/proc/fs/afs/sysname

and read by:

	# cat /proc/fs/afs/sysname
	foo
	bar
	linux-x86_64

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells 5cf9dd55a0 afs: Prospectively look up extra files when doing a single lookup
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells 4d673da145 afs: Support the AFS dynamic root
Support the AFS dynamic root which is a pseudo-volume that doesn't connect
to any server resource, but rather is just a root directory that
dynamically creates mountpoint directories where the name of such a
directory is the name of the cell.

Such a mount can be created thus:

	mount -t afs none /afs -o dyn

Dynamic root superblocks aren't shared except by bind mounts and
propagation.  Cell root volumes can then be mounted by referring to them by
name, e.g.:

	ls /afs/grand.central.org/
	ls /afs/.grand.central.org/

The kernel will upcall to consult the DNS if the address wasn't supplied
directly.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06 14:43:37 +00:00
David Howells 440fbc3a8a afs: Fix unlink
Repeating creation and deletion of a file on an afs mount will run the box
out of memory, e.g.:

	dd if=/dev/zero of=/afs/scratch/m0 bs=$((1024*1024)) count=512
	rm /afs/scratch/m0

The problem seems to be that it's not properly decrementing the nlink count
so that the inode can be scrapped.

Note that this doesn't fix local creation followed by remote deletion.
That's harder to handle and will require a separate patch as we're not told
that the file has been deleted - only that the directory has changed.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-01-02 10:02:19 +00:00
Colin Ian King 43dd388b21 afs: remove redundant assignment of dvnode to itself
The assignment of dvnode to itself is redundant and can be removed.
Cleans up warning detected by cppcheck:

fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 13:55:46 +00:00
David Howells 4433b69141 afs: Fix signal handling in some file ops
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop
the target dentry if a signal causes the operation to be killed immediately
before we try to contact the server.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 13:55:35 +00:00
David Howells bc1527dcb4 afs: Fix some dentry handling in dir ops and missing key_puts
Fix some of dentry handling in AFS directory ops:

 (1) Do d_drop() on the new_dentry before assigning a new inode to it in
     afs_vnode_new_inode().  It's fine to do this before calling afs_iget()
     because the operation has taken place on the server.

 (2) Replace d_instantiate()/d_rehash() with d_add().

 (3) Don't d_drop() the new_dentry in afs_rename() on error.

Also fix afs_link() and afs_rename() to call key_put() on all error paths
where the key is taken.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 10:56:51 +00:00
David Howells 215804a992 afs: Introduce a file-private data record
Introduce a file-private data record for kAFS and put the key into it
rather than storing the key in file->private_data.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells dab17c1add afs: Fix directory read/modify race
Because parsing of the directory wasn't being done under any sort of lock,
the pages holding the directory content can get invalidated whilst the
parsing is ongoing.

Further, the directory page check function gets called outside of the page
lock, so if the page gets cleared or updated, this may return reports of
bad magic numbers in the directory page.

Also, the directory may change size whilst checking and parsing are
ongoing, so more care needs to be taken here.

Fix this by:

 (1) Perform the page check from the page filling function before we set
     PageUptodate and drop the page lock.

 (2) Check for the file having shrunk and the page having been abandoned
     before checking the page contents.

 (3) Lock the page whilst parsing it for the directory iterator.

Whilst we're at it, add a tracepoint to report check failure.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells d2ddc776a4 afs: Overhaul volume and server record caching and fileserver rotation
The current code assumes that volumes and servers are per-cell and are
never shared, but this is not enforced, and, indeed, public cells do exist
that are aliases of each other.  Further, an organisation can, say, set up
a public cell and a private cell with overlapping, but not identical, sets
of servers.  The difference is purely in the database attached to the VL
servers.

The current code will malfunction if it sees a server in two cells as it
assumes global address -> server record mappings and that each server is in
just one cell.

Further, each server may have multiple addresses - and may have addresses
of different families (IPv4 and IPv6, say).

To this end, the following structural changes are made:

 (1) Server record management is overhauled:

     (a) Server records are made independent of cell.  The namespace keeps
     	 track of them, volume records have lists of them and each vnode
     	 has a server on which its callback interest currently resides.

     (b) The cell record no longer keeps a list of servers known to be in
     	 that cell.

     (c) The server records are now kept in a flat list because there's no
     	 single address to sort on.

     (d) Server records are now keyed by their UUID within the namespace.

     (e) The addresses for a server are obtained with the VL.GetAddrsU
     	 rather than with VL.GetEntryByName, using the server's UUID as a
     	 parameter.

     (f) Cached server records are garbage collected after a period of
     	 non-use and are counted out of existence before purging is allowed
     	 to complete.  This protects the work functions against rmmod.

     (g) The servers list is now in /proc/fs/afs/servers.

 (2) Volume record management is overhauled:

     (a) An RCU-replaceable server list is introduced.  This tracks both
     	 servers and their coresponding callback interests.

     (b) The superblock is now keyed on cell record and numeric volume ID.

     (c) The volume record is now tied to the superblock which mounts it,
     	 and is activated when mounted and deactivated when unmounted.
     	 This makes it easier to handle the cache cookie without causing a
     	 double-use in fscache.

     (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
     	 to get the server UUID list.

     (e) The volume name is updated if it is seen to have changed when the
     	 volume is updated (the update is keyed on the volume ID).

 (3) The vlocation record is got rid of and VLDB records are no longer
     cached.  Sufficient information is stored in the volume record, though
     an update to a volume record is now no longer shared between related
     volumes (volumes come in bundles of three: R/W, R/O and backup).

and the following procedural changes are made:

 (1) The fileserver cursor introduced previously is now fleshed out and
     used to iterate over fileservers and their addresses.

 (2) Volume status is checked during iteration, and the server list is
     replaced if a change is detected.

 (3) Server status is checked during iteration, and the address list is
     replaced if a change is detected.

 (4) The abort code is saved into the address list cursor and -ECONNABORTED
     returned in afs_make_call() if a remote abort happened rather than
     translating the abort into an error message.  This allows actions to
     be taken depending on the abort code more easily.

     (a) If a VMOVED abort is seen then this is handled by rechecking the
     	 volume and restarting the iteration.

     (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
         handled by sleeping for a short period and retrying and/or trying
         other servers that might serve that volume.  A message is also
         displayed once until the condition has cleared.

     (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
     	 moment.

     (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
     	 see if it has been deleted; if not, the fileserver is probably
     	 indicating that the volume couldn't be attached and needs
     	 salvaging.

     (e) If statfs() sees one of these aborts, it does not sleep, but
     	 rather returns an error, so as not to block the umount program.

 (5) The fileserver iteration functions in vnode.c are now merged into
     their callers and more heavily macroised around the cursor.  vnode.c
     is removed.

 (6) Operations on a particular vnode are serialised on that vnode because
     the server will lock that vnode whilst it operates on it, so a second
     op sent will just have to wait.

 (7) Fileservers are probed with FS.GetCapabilities before being used.
     This is where service upgrade will be done.

 (8) A callback interest on a fileserver is set up before an FS operation
     is performed and passed through to afs_make_call() so that it can be
     set on the vnode if the operation returns a callback.  The callback
     interest is passed through to afs_iget() also so that it can be set
     there too.

In general, record updating is done on an as-needed basis when we try to
access servers, volumes or vnodes rather than offloading it to work items
and special threads.

Notes:

 (1) Pre AFS-3.4 servers are no longer supported, though this can be added
     back if necessary (AFS-3.4 was released in 1998).

 (2) VBUSY is retried forever for the moment at intervals of 1s.

 (3) /proc/fs/afs/<cell>/servers no longer exists.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells c435ee3455 afs: Overhaul the callback handling
Overhaul the AFS callback handling by the following means:

 (1) Don't give up callback promises on vnodes that we are no longer using,
     rather let them just expire on the server or let the server break
     them.  This is actually more efficient for the server as the callback
     lookup is expensive if there are lots of extant callbacks.

 (2) Only give up the callback promises we have from a server when the
     server record is destroyed.  Then we can just give up *all* the
     callback promises on it in one go.

 (3) Servers can end up being shared between cells if cells are aliased, so
     don't add all the vnodes being backed by a particular server into a
     big FID-indexed tree on that server as there may be duplicates.

     Instead have each volume instance (~= superblock) register an interest
     in a server as it starts to make use of it and use this to allow the
     processor for callbacks from the server to find the superblock and
     thence the inode corresponding to the FID being broken by means of
     ilookup_nowait().

 (4) Rather than iterating over the entire callback list when a mass-break
     comes in from the server, maintain a counter of mass-breaks in
     afs_server (cb_seq) and make afs_validate() check it against the copy
     in afs_vnode.

     It would be nice not to have to take a read_lock whilst doing this,
     but that's tricky without using RCU.

 (5) Save a ref on the fileserver we're using for a call in the afs_call
     struct so that we can access its cb_s_break during call decoding.

 (6) Write-lock around callback and status storage in a vnode and read-lock
     around getattr so that we don't see the status mid-update.

This has the following consequences:

 (1) Data invalidation isn't seen until someone calls afs_validate() on a
     vnode.  Unfortunately, we need to use a key to query the server, but
     getting one from a background thread is tricky without caching loads
     of keys all over the place.

 (2) Mass invalidation isn't seen until someone calls afs_validate().

 (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
     Could this be replaced with rcu_read_lock() since inodes are destroyed
     under RCU conditions.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells 9ed900b116 afs: Push the net ns pointer to more places
Push the network namespace pointer to more places in AFS, including the
afs_server structure (which doesn't hold a ref on the netns).

In particular, afs_put_cell() now takes requires a net ns parameter so that
it can safely alter the netns after decrementing the cell usage count - the
cell will be deallocated by a background thread after being cached for a
period, which means that it's not safe to access it after reducing its
usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells d3e3b7eac8 afs: Add metadata xattrs
Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
available.  The following xattrs are now available:

 - "afs.cell"

   The name of the cell in which the vnode's volume resides.

 - "afs.fid"

   The volume ID, vnode ID and vnode uniquifier of the file as three hex
   numbers separated by colons.

 - "afs.volume"

   The name of the volume in which the vnode resides.

For example:

	# getfattr -d -m ".*" /mnt/scratch
	getfattr: Removing leading '/' from absolute path names
	# file: mnt/scratch
	afs.cell="mycell.myorg.org"
	afs.fid="10000b:1:1"
	afs.volume="scratch"

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-09 14:40:12 -07:00
Alexey Dobriyan 5b5e0928f7 lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Miklos Szeredi 2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Miklos Szeredi 1cd66c93ba fs: make remaining filesystems use .rename2
This is trivial to do:

 - add flags argument to foo_rename()
 - check if flags is zero
 - assign foo_rename() to .rename2 instead of .rename

This doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).

Filesystems converted:

9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

After this, we can get rid of the duplicate interfaces for rename.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Howells <dhowells@redhat.com> [AFS]
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Mark Fasheh <mfasheh@suse.com>
2016-09-27 11:03:58 +02:00
Al Viro 29884eff1f afs: switch to ->iterate_shared()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-10 14:27:44 -04:00
Al Viro be5b82dbfe make ext2_get_page() and friends work without external serialization
Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:25 -04:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
David Howells 2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Al Viro a455589f18 assorted conversions to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:20 -05:00
Miklos Szeredi ac7576f4b1 vfs: make first argument of dir_context.actor typed
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-31 17:48:54 -04:00
Eric W. Biederman 9b053f3207 vfs: Remove unnecessary calls of check_submounts_and_drop
Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:56 -04:00
Al Viro 13f3583892 afs: dget_parent() can't return a negative dentry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-29 22:02:24 -04:00
Al Viro 5d8943b04b afs: get rid of redundant ->d_name.len checks
No dentry can get to directory modification methods without
having passed either ->lookup() or ->atomic_open(); if name is
rejected by those two (or by ->d_hash()) with an error, it won't
be seen by anything else.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Miklos Szeredi ba81238076 afs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:51 -04:00
Al Viro 1bbae9f818 [readdir] convert afs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:58 +04:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Al Viro ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro 00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro 0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro 4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro 18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
David Howells d6e43f751f AFS: Use i_generation not i_version for the vnode uniquifier
Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:48 -04:00
Sage Weil 705da2de95 afs: remove unnecessary dentry_unhash on rmdir, dir rename
afs has no problems with references to unlinked directories.

CC: David Howells <dhowells@redhat.com>
CC: linux-afs@lists.infradead.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-28 01:02:53 -04:00
Sage Weil e4eaac06bc vfs: push dentry_unhash on rename_dir into file systems
Only a few file systems need this.  Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26 07:26:48 -04:00
Sage Weil 79bf7c732b vfs: push dentry_unhash on rmdir into file systems
Only a few file systems need this.  Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.

This does not change behavior for any in-tree file systems.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26 07:26:47 -04:00
David Howells d18610b0ce AFS: Use d_automount() rather than abusing follow_link()
Make AFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:33 -05:00
Al Viro d61dcce297 switch afs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:04:20 -05:00
Nick Piggin 34286d6662 fs: rcu-walk aware d_revalidate method
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Al Viro 7de9c6ee3e new helper: ihold()
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
wanglei bec5eb6141 AFS: Implement an autocell mount capability [ver #2]
Implement the ability for the root directory of a mounted AFS filesystem to
accept lookups of arbitrary directory names, to interpet the names as the names
of cells, to look the cell names up in the DNS for AFSDB records and to mount
the root.cell volume of the nominated cell on the pseudo-directory created by
lookup.

This facility is requested by passing:

	-o autocell

to the mountpoint for which this is desired, usually the /afs mount.

To use this facility, a DNS upcall program is required for AFSDB records.  This
can be obtained from:

	http://people.redhat.com/~dhowells/afs/dns.afsdb.c

It should be compiled with -lresolv and -lkeyutils and installed as, say:

	/usr/sbin/dns.afsdb

Then the following line needs to be added to /sbin/request-key.conf:

	create	dns_resolver afsdb:*	*	/usr/sbin/dns.afsdb %k

This can be tested by mounting AFS, say:

	insmod dns_resolver.ko
	insmod af-rxrpc.ko
	insmod kafs.ko rootcell=grand.central.org
	mount -t afs "#grand.central.org:root.cell." /afs -o autocell

and doing:

	ls /afs/grand.central.org/

which should show:

	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/

if it works.

Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:29 +00:00
Al Viro f6d335c08d AFS: Don't put struct file on the stack
Don't put struct file on the stack as it takes up quite a lot of space
and violates lifetime rules for struct file.

Rather than calling afs_readpage() indirectly from the directory routines by
way of read_mapping_page(), split afs_readpage() to have afs_page_filler()
that's given a key instead of a file and call read_cache_page(), specifying the
new function directly.  Use it in afs_readpages() as well.

Also make use of this in afs_mntpt_check_symlink() too for the same reason.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
2010-05-21 18:31:28 -04:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Artem Bityutskiy dd0d9a46f5 AFS: Fix compilation warning
Fix the following warning:

  fs/afs/dir.c: In function 'afs_d_revalidate':
  fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function
  fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this function

by marking the 'fid' variable as an uninitialized_var.  The problem is
that gcc doesn't always manage to work out that fid is always set on the
path through the function that uses it.

Cc: linux-afs@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:24:07 -07:00
Al Viro 79be57cc7f constify dentry_operations: AFS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27 14:44:00 -04:00
Christoph Hellwig 3222a3e55f [PATCH] fix ->llseek for more directories
With this patch all directory fops instances that have a readdir
that doesn't take the BKL are switched to generic_file_llseek.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-10-23 05:13:21 -04:00
Harvey Harrison 530b641278 afs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
David Howells e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Jean Noel Cordenner 7a224228ed vfs: Add 64 bit i_version support
The i_version field of the inode is changed to be a 64-bit counter that
is set on every inode creation and that is incremented every time the
inode data is modified (similarly to the "ctime" time-stamp).
The aim is to fulfill a NFSv4 requirement for rfc3530.
This first part concerns the vfs, it converts the 32-bit i_version in
the generic inode to a 64-bit, a flag is added in the super block in
order to check if the feature is enabled and the i_version is
incremented in the vfs.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
2008-01-28 23:58:27 -05:00
David Howells e8d6c55412 AFS: implement file locking
Implement file locking for AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:43 -07:00
Alexey Dobriyan e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00