Commit graph

568 commits

Author SHA1 Message Date
Tejun Heo
a7560a0132 sysfs: fix use-after-free in sysfs_kill_sb()
While restructuring the [u]mount path, 4b93dc9b1c ("sysfs, kernfs:
prepare mount path for kernfs") incorrectly updated sysfs_kill_sb() so
that it first kills super_block and then tries to dereference its
namespace tag to drop it.  Fix it by caching namespace tag before
killing the superblock and then drop the cached namespace tag.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/g/20131205031051.GC5135@yliu-dev.sh.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-10 22:40:12 -08:00
Tejun Heo
9b2db6e189 sysfs: bail early from kernfs_file_mmap() to avoid spurious lockdep warning
This is v3.14 fix for the same issue that a8b1474442 ("sysfs: give
different locking key to regular and bin files") addresses for v3.13.
Due to the extensive kernfs reorganization in v3.14 branch, the same
fix couldn't be ported as-is.  The v3.13 fix was ignored while merging
it into v3.14 branch.

027a485d12 ("sysfs: use a separate locking class for open files
depending on mmap") assigned different lockdep key to
sysfs_open_file->mutex depending on whether the file implements mmap
or not in an attempt to avoid spurious lockdep warning caused by
merging of regular and bin file paths.

While this restored some of the original behavior of using different
locks (at least lockdep is concerned) for the different clases of
files.  The restoration wasn't full because now the lockdep key
assignment depends on whether the file has mmap or not instead of
whether it's a regular file or not.

This means that bin files which don't implement mmap will get assigned
the same lockdep class as regular files.  This is problematic because
file_operations for bin files still implements the mmap file operation
and checking whether the sysfs file actually implements mmap happens
in the file operation after grabbing @sysfs_open_file->mutex.  We
still end up adding locking dependency from mmap locking to
sysfs_open_file->mutex to the regular file mutex which triggers
spurious circular locking warning.

For v3.13, a8b1474442 ("sysfs: give different locking key to regular
and bin files") fixed it by giving sysfs_open_file->mutex different
lockdep keys depending on whether the file is regular or bin instead
of whether mmap exists or not; however, due to the way sysfs is now
layered behind kernfs, this approach is no longer viable.  kernfs can
tell whether a sysfs node has mmap implemented or not but can't tell
whether a bin file from a regular one.

This patch updates kernfs such that kernfs_file_mmap() checks
SYSFS_FLAG_HAS_MMAP and bail before grabbing sysfs_open_file->mutex so
that it doesn't add spurious locking dependency from mmap to
sysfs_open_file->mutex and changes sysfs so that it specifies
kernfs_ops->mmap iff the sysfs file implements mmap.  Combined, this
ensures that sysfs_open_file->mutex is grabbed under mmap path iff the
sysfs file actually implements mmap.  As sysfs_open_file->mutex is
already given a different lockdep key if mmap is implemented, this
removes the spurious locking dependency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-10 21:33:31 -08:00
Tejun Heo
bfc5c17337 sysfs, kernfs: remove cross inclusions of internal headers
fs/kernfs/kernfs-internal.h needed to include fs/sysfs/sysfs.h because
part of kernfs core implementation was living in sysfs.

fs/sysfs/sysfs.h needed to include fs/kernfs/kernfs-internal.h because
include/linux/kernfs.h didn't expose enough interface.

The separation is complete and neither is true anymore.  Remove the
cross inclusion and make sysfs a proper user of kernfs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:54:50 -08:00
Tejun Heo
ac9bba0310 sysfs, kernfs: implement kernfs_ns_enabled()
fs/sysfs/symlink.c::sysfs_delete_link() tests @sd->s_flags for
SYSFS_FLAG_NS.  Let's add kernfs_ns_enabled() so that sysfs doesn't
have to test sysfs_dirent flag directly.  This makes things tidier for
kernfs proper too.

This is purely cosmetic.

v2: To avoid possible NULL deref, use noop dummy implementation which
    always returns false when !CONFIG_SYSFS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:41:28 -08:00
Tejun Heo
fa736a951e sysfs, kernfs: move mount core code to fs/kernfs/mount.c
Move core mount code to fs/kernfs/mount.c.  The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:16:08 -08:00
Tejun Heo
4b93dc9b1c sysfs, kernfs: prepare mount path for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
rearranges mount path so that the kernfs and sysfs parts are separate.

* As sysfs_super_info won't be visible outside kernfs proper,
  kernfs_super_ns() is added to allow kernfs users to access a
  super_block's namespace tag.

* Generic mount operation is separated out into kernfs_mount_ns().
  sysfs_mount() now just performs sysfs-specific permission check,
  acquires namespace tag, and invokes kernfs_mount_ns().

* Generic superblock release is separated out into kernfs_kill_sb()
  which can be used directly as file_system_type->kill_sb().  As sysfs
  needs to put the namespace tag, sysfs_kill_sb() wraps
  kernfs_kill_sb() with ns tag put.

* sysfs_dir_cachep init and sysfs_inode_init() are separated out into
  kernfs_init().  kernfs_init() uses only small amount of memory and
  trying to handle and propagate kernfs_init() failure doesn't make
  much sense.  Use SLAB_PANIC for sysfs_dir_cachep and make
  sysfs_inode_init() panic on failure.

  After this change, kernfs_init() should be called before
  sysfs_init(), fs/namespace.c::mnt_init() modified accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:16:08 -08:00
Tejun Heo
df394fb56c sysfs, kernfs: make super_blocks bind to different kernfs_roots
kernfs is being updated to allow multiple sysfs_dirent hierarchies so
that it can also be used by other users.  Currently, sysfs
super_blocks are always attached to one kernfs_root - sysfs_root - and
distinguished only by their namespace tags.

This patch adds sysfs_super_info->root and update
sysfs_fill/test_super() so that super_blocks are identified by the
combination of both the associated kernfs_root and namespace tag.
This allows mounting different kernfs hierarchies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:10:48 -08:00
Tejun Heo
ba7443bc65 sysfs, kernfs: implement kernfs_create/destroy_root()
There currently is single kernfs hierarchy in the whole system which
is used for sysfs.  kernfs needs to support multiple hierarchies to
allow other users.  This patch introduces struct kernfs_root which
serves as the root of each kernfs hierarchy and implements
kernfs_create/destroy_root().

* Each kernfs_root is associated with a root sd (sysfs_dentry).  The
  root is freed when the root sd is released and kernfs_destory_root()
  simply invokes kernfs_remove() on the root sd.  sysfs_remove_one()
  is updated to handle release of the root sd.  Note that ps_iattr
  update in sysfs_remove_one() is trivially updated for readability.

* Root sd's are now dynamically allocated using sysfs_new_dirent().
  Update sysfs_alloc_ino() so that it gives out ino from 1 so that the
  root sd still gets ino 1.

* While kernfs currently only points to the root sd, it'll soon grow
  fields which are specific to each hierarchy.  As determining a given
  sd's root will be necessary, sd->s_dir.root is added.  This backlink
  fits better as a separate field in sd; however, sd->s_dir is inside
  union with space to spare, so use it to save space and provide
  kernfs_root() accessor to determine the root sd.

* As hierarchies may be destroyed now, each mount needs to hold onto
  the hierarchy it's attached to.  Update sysfs_fill_super() and
  sysfs_kill_sb() so that they get and put the kernfs_root
  respectively.

* sysfs_root is replaced with kernfs_root which is dynamically created
  by invoking kernfs_create_root() from sysfs_init().

This patch doesn't introduce any visible behavior changes.

v2: kernfs_create_root() forgot to set @sd->priv.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:10:48 -08:00
Tejun Heo
061447a496 sysfs, kernfs: introduce sysfs_root_sd
Currently, it's assumed that there's a single kernfs hierarchy in the
system anchored at sysfs_root which is defined as a global struct.  To
allow other users of kernfs, this will be made dynamic.  Introduce a
new global variable sysfs_root_sd which points to &sysfs_root and
convert all &sysfs_root users.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
9e30cc9595 sysfs, kernfs: no need to kern_mount() sysfs from sysfs_init()
It has been very long since sysfs depended on vfs to keep track of
internal states and whether sysfs is mounted or not doesn't make any
difference to sysfs's internal operation.

In addition to init and filesystem type registration, sysfs_init()
invokes kern_mount() to create in-kernel mount of sysfs.  This
internal mounting doesn't server any purpose anymore.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
51a35e9fd0 sysfs, kernfs: make sysfs_super_info->ns const
Add const qualifier to sysfs_super_info->ns so that it's consistent
with other namespace tag usages in sysfs.  Because kobject doesn't use
const qualifier for namespace tags, this ends up requiring an explicit
cast to drop const qualifier in free_sysfs_super_info().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
ccc532dc12 sysfs, kernfs: drop unused params from sysfs_fill_super()
sysfs_fill_super() takes three params - @sb, @data and @silent - but
uses only @sb.  Drop the latter two.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
2072f1afdd sysfs, kernfs: move symlink core code to fs/kernfs/symlink.c
Move core symlink code to fs/kernfs/symlink.c.  fs/sysfs/symlink.c now
only contains sysfs wrappers around kernfs interfaces.  The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
414985ae23 sysfs, kernfs: move file core code to fs/kernfs/file.c
Move core file code to fs/kernfs/file.c.  fs/sysfs/file.c now contains
sysfs kernfs_ops callbacks, sysfs wrappers around kernfs interfaces,
and sysfs_schedule_callback().  The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.

This is pure relocation.

v2: Refreshed on top of the v2 of "sysfs, kernfs: prepare read path
    for kernfs".

v3: Refreshed on top of the v3 of "sysfs, kernfs: prepare read path
    for kernfs".

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
fd7b9f7b97 sysfs, kernfs: move dir core code to fs/kernfs/dir.c
Move core dir code to fs/kernfs/dir.c.  fs/sysfs/dir.c now only
contains sysfs_warn_dup() and sysfs wrappers around kernfs interfaces.
The respective declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

v2: sysfs_symlink_target_lock was mistakenly relocated to kernfs.  It
    should remain with sysfs.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
ffed24e228 sysfs, kernfs: move inode code to fs/kernfs/inode.c
There's nothing sysfs-specific in fs/sysfs/inode.c.  Move everything
in it to fs/kernfs/inode.c.  The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
ae6621b071 sysfs, kernfs: move internal decls to fs/kernfs/kernfs-internal.h
Move data structure, constant and basic accessor declarations from
fs/sysfs/sysfs.h to fs/kernfs/kernfs-internal.h.  The two files
currently include each other.  Once kernfs / sysfs separation is
complete, the cross inclusions will be removed.  Inclusion protectors
are added to fs/sysfs/sysfs.h to allow cross-inclusion.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
ccf73cf336 sysfs, kernfs: introduce kernfs[_find_and]_get() and kernfs_put()
Introduce kernfs interface for finding, getting and putting
sysfs_dirents.

* sysfs_find_dirent() is renamed to kernfs_find_ns() and lockdep
  assertion for sysfs_mutex is added.

* sysfs_get_dirent_ns() is renamed to kernfs_find_and_get().

* Macro inline dancing around __sysfs_get/put() are removed and
  kernfs_get/put() are made proper functions implemented in
  fs/sysfs/dir.c.

While the conversions are mostly equivalent, there's one difference -
kernfs_get() doesn't return the input param as its return value.  This
change is intentional.  While passing through the input increases
writability in some areas, it is unnecessary and has been shown to
cause confusion regarding how the last ref is handled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
517e64f578 sysfs, kernfs: revamp sysfs_dirent active_ref lockdep annotation
Currently, sysfs_dirent active_ref lockdep annotation uses
attribute->[s]key as the lockdep key, which forces
kernfs_create_file_ns() to assume that sysfs_dirent->priv is pointing
to a struct attribute which may not be true for non-sysfs users.  This
patch restructures the lockdep annotation such that

* kernfs_ops contains lockdep_key which is used by default for files
  created kernfs_create_file_ns().

* kernfs_create_file_ns_key() is introduced which takes an extra @key
  argument.  The created file will use the specified key for
  active_ref lockdep annotation.  If NULL is specified, lockdep for
  the file is disabled.

* sysfs_add_file_mode_ns() is updated to use
  kernfs_create_file_ns_key() with the appropriate key from the
  attribute or NULL if ignore_lockdep is set.

This makes the lockdep annotation properly contained in kernfs while
allowing sysfs to cleanly keep its current behavior.  This patch
doesn't introduce any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:15 -08:00
Tejun Heo
2b25a62901 sysfs, kernfs: reorganize SYSFS_* constants
We want to add one more SYSFS_FLAG_* but we can't use the next higher
bit, 0x10000, as the flag field is 16bits wide.  The flags are
currently arranged weirdly - 8 bits are set aside for the type flags
when there are only three three used, the first flag starts at 0x1000
instead of 0x0100 and flag literals have 5 digits (20 bits) when only
4 digits can be used.

Rearrange them so that type bits are only the lowest four, flags start
at 0x0010 and similar flags are grouped.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00
Tejun Heo
024f647117 sysfs, kernfs: introduce kernfs_notify()
Introduce kernfs interface to wake up poll(2) which takes and returns
sysfs_dirents.

sysfs_notify_dirent() is renamed to kernfs_notify() and sysfs_notify()
is updated so that it doesn't directly grab sysfs_mutex but acquires
the target sysfs_dirents using sysfs_get_dirent().
sysfs_notify_dirent() is reimplemented as a dumb inline wrapper around
kernfs_notify().

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00
Tejun Heo
d19b9846df sysfs, kernfs: add kernfs_ops->seq_{start|next|stop}()
kernfs_ops currently only supports single_open() behavior which is
pretty restrictive.  Add optional callbacks ->seq_{start|next|stop}()
which, when implemented, are invoked for seq_file traversal.  This
allows full seq_file functionality for kernfs users.  This currently
doesn't have any user and doesn't change any behavior.

v2: Refreshed on top of the updated "sysfs, kernfs: prepare read path
    for kernfs".

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00
Tejun Heo
2d0cfbec2a sysfs, kernfs: remove sysfs_add_one()
sysfs_add_one() is a wrapper around __sysfs_add_one() which prints out
duplicate name warning if __sysfs_add_one() fails with -EEXIST.  The
previous kernfs conversions moved all dup warnings to sysfs interface
functions and sysfs_add_one() doesn't have any user left.

Remove sysfs_add_one() and update __sysfs_add_one() to take its name.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00
Tejun Heo
496f73944a sysfs, kernfs: introduce kernfs_create_file[_ns]()
Introduce kernfs interface to create a file which takes and returns
sysfs_dirents.

The actual file creation part is separated out from
sysfs_add_file_mode_ns() into kernfs_create_file_ns().  The former now
only decides the kernfs_ops to use and the file's size and invokes the
latter.

This patch doesn't introduce behavior changes.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00
Tejun Heo
a7dc66dfb4 sysfs, kernfs: remove SYSFS_KOBJ_BIN_ATTR
After kernfs_ops and sysfs_dirent->s_attr.size addition, the
distinction between SYSFS_KOBJ_BIN_ATTR and SYSFS_KOBJ_ATTR is only
necessary while creating files to decide which kernfs_ops to use.
Afterwards, they behave exactly the same.

This patch removes SYSFS_KOBJ_BIN_ATTR along with sysfs_is_bin().
sysfs_add_file[_mode_ns]() are updated to take bool @is_bin instead of
@type.

This patch doesn't introduce any behavior changes.  This completely
isolates the distinction between the two sysfs file types in the sysfs
layer proper.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:41:35 -08:00
Tejun Heo
471bd7b78b sysfs, kernfs: add sysfs_dirent->s_attr.size
sysfs sets the size of regular files unconditionally at PAGE_SIZE and
takes the size of bin files from bin_attribute.  The latter is a
pretty bad interface which forces bin_attribute users to create a
separate copy of bin_attribute for each instance of the file -
e.g. pci resource files.

Add sysfs_dirent->s_attr.size so that the size can be specified
separately.  This unifies inode init paths of ATTR and BIN_ATTR
identical and allows for generic size handling for kernfs.

Unfortunately, this grows the size of sysfs_dirent by sizeof(loff_t).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:35:05 -08:00
Tejun Heo
f6acf8bb6a sysfs, kernfs: introduce kernfs_ops
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
introduces kernfs_ops which hosts methods kernfs users implement and
updates fs/sysfs/file.c such that sysfs_kf_*() functions populate
kernfs_ops and kernfs_file_*() functions call the matching entries
from kernfs_ops.

kernfs_ops contains the following groups of methods.

* seq_show() - for kernfs files which use seq_file for reads.

* read() - for direct read implementations.  Used iff seq_show() is
  not implemented.

* write() - for writes.

* mmap() - for mmaps.

Notes:

* sysfs_elem_attr->ops is added so that kernfs_ops can be accessed
  from sysfs_dirent.  kernfs_ops() helper is added to verify locking
  and access the field.

* SYSFS_FLAG_HAS_(SEQ_SHOW|MMAP) added.  sd->s_attr->ops is accessible
  only while holding active_ref and there are cases where we want to
  take different actions depending on which ops are implemented.
  These two flags cache whether the two ops are implemented for those.

* kernfs_file_*() no longer test sysfs type but chooses different
  behaviors depending on which methods in kernfs_ops are implemented.
  The conversions are trivial except for the open path.  As
  kernfs_file_open() now decides whether to allow read/write accesses
  depending on the kernfs_ops implemented, the presence of methods in
  kobjs and attribute_bin should be propagated to kernfs_ops.
  sysfs_add_file_mode_ns() is updated so that it propagates presence /
  absence of the callbacks through _empty, _ro, _wo, _rw kernfs_ops.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:35:05 -08:00
Tejun Heo
dd8a5b036b sysfs, kernfs: move sysfs_open_file to include/linux/kernfs.h
sysfs_open_file will be used as the primary handle for kernfs methods.
Move its definition from fs/sysfs/file.c to include/linux/kernfs.h and
mark the public and private fields.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:35:05 -08:00
Tejun Heo
c6fb449515 sysfs, kernfs: prepare open, release, poll paths for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
prepares the rest - open, release and poll.  There isn't much to do.
Just renaming is enough.  As sysfs_file_operations and
sysfs_bin_operations are identical now, use the same file_operations
for both - kernfs_file_operations.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:35:05 -08:00
Tejun Heo
fdbffaa478 sysfs, kernfs: prepare mmap path for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
rearranges mmap path so that the kernfs and sysfs parts are separate.

sysfs_kf_bin_mmap() which handles the interaction with bin_attribute
mmap method is factored out of sysfs_bin_mmap(), which is renamed to
kernfs_file_mmap().  All vma ops are renamed accordingly.

sysfs_bin_mmap() is updated such that it can be used for both file
types.  This will eventually allow using the same file_operations for
both file types, which is necessary to separate out kernfs.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:33:46 -08:00
Tejun Heo
50b38ca086 sysfs, kernfs: prepare write path for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
rearranges write path so that the kernfs and sysfs parts are separate.

kernfs_file_write() handles all boilerplate work including buffer
management and locking and invokes sysfs_kf_write() or
sysfs_kf_bin_write() depending on the file type which deals with the
interaction with kobj store or bin_attribute write method.

While this patch changes the order of some operations, it shouldn't
change any visible behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:33:46 -08:00
Tejun Heo
c2b19daf67 sysfs, kernfs: prepare read path for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
rearranges read path so that the kernfs and sysfs parts are separate.

* Regular file read path is refactored such that
  kernfs_seq_start/next/stop/show() handle all the boilerplate work
  including locking and updating event count for poll, while
  sysfs_kf_seq_show() deals with interaction with kobj show method.

* Bin file read path is refactored such that kernfs_file_direct_read()
  handles all the boilerplate work including buffer management and
  locking, while sysfs_kf_bin_read() deals with interaction with
  bin_attribute read method.

kernfs_file_read() is added.  It invokes either the seq_file or direct
read path depending on the file type.  This will eventually allow
using the same file_operations for both file types, which is necessary
to separate out kernfs.

While this patch changes the order of some operations, it shouldn't
change any visible behavior.

v2: Dropped unnecessary zeroing of @count from sysfs_kf_seq_show().
    Add comments explaining single_open() behavior.  Both suggested by
    Pavel.

v3: seq_stop() is called even after seq_start() failed.
    kernfs_seq_start() updated so that it doesn't unlock
    sysfs_open_file->mutex on failure so that kernfs_seq_stop()
    doesn't try to unlock an already unlocked mutex.  Reported by
    Fengguang.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:33:46 -08:00
Tejun Heo
93b2b8e4aa sysfs, kernfs: introduce kernfs_create_dir[_ns]()
Introduce kernfs interface to manipulate a directory which takes and
returns sysfs_dirents.

create_dir() is renamed to kernfs_create_dir_ns() and its argumantes
and return value are updated.  create_dir() usages are replaced with
kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced
with kernfs_create_dir().  Dup warnings are handled explicitly by
sysfs users of the kernfs interface.

sysfs_enable_ns() is renamed to kernfs_enable_ns().

This patch doesn't introduce any behavior changes.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

v3: kernfs_enable_ns() added.

v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2"
    so that this patch removes sysfs_enable_ns().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:20:13 -08:00
Tejun Heo
7c6e2d362c sysfs, kernfs: replace sysfs_dirent->s_dir.kobj and ->s_attr.[bin_]attr with ->priv
A directory sysfs_dirent points to the associated kobj.  A regular or
bin file points to the associated [bin_]attribute.  This patch
replaces sysfs_dirent->s_dir.kobj and ->s_attr.[bin_]attr with void *
->priv.

This is to prepare for kernfs interface so that sysfs can specify the
private data in the same way for directories and files.  This lower
debuggability but not by much - the whole thing was overlaid in a
union anyway.  If debuggability becomes an issue, we can later add
->priv accessors which explicitly check for the sysfs_dirent type and
performs casting.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:19:16 -08:00
Greg Kroah-Hartman
44c3eea650 Merge branch 'driver-core-linus' into driver-core-next
We need those sysfs fixes in this branch to make testing, and future
patches apply properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 21:58:09 -08:00
Tejun Heo
5d60418e54 sysfs, kernfs: introduce kernfs_setattr()
Introduce kernfs setattr interface - kernfs_setattr().

sysfs_sd_setattr() is renamed to __kernfs_setattr() and
kernfs_setattr() is a simple wrapper around it with sysfs_mutex
locking.  sysfs_chmod_file() is updated to get an explicit ref on
kobj->sd and then invoke kernfs_setattr() so that it doesn't have to
use internal interface.

This patch doesn't introduce any behavior differences.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:57:57 -08:00
Tejun Heo
890ece160c sysfs, kernfs: introduce kernfs_rename[_ns]()
Introduce kernfs rename interface, krenfs_rename[_ns]().

This is just rename of sysfs_rename().  No functional changes.
Function comment is added to kernfs_rename_ns() and @new_parent_sd is
renamed to @new_parent for consistency with other kernfs interfaces.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:57:57 -08:00
Tejun Heo
5d0e26bb59 sysfs, kernfs: introduce kernfs_create_link()
Separate out kernfs symlink interface - kernfs_create_link() - which
takes and returns sysfs_dirents, from sysfs_do_create_link_sd().
sysfs_do_create_link_sd() now just determines the parent and target
sysfs_dirents and invokes the new interface and handles dup warning.

This patch doesn't introduce behavior changes.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:57:56 -08:00
Tejun Heo
879f40d193 sysfs, kernfs: introduce kernfs_remove[_by_name[_ns]]()
Introduce kernfs removal interfaces - kernfs_remove() and
kernfs_remove_by_name[_ns]().

These are just renames of sysfs_remove() and sysfs_hash_and_remove().
No functional changes.

v2: Dummy kernfs_remove_by_name_ns() for !CONFIG_SYSFS updated to
    return -ENOSYS instead of 0.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:57:56 -08:00
Tejun Heo
ae2108ad32 sysfs: make __sysfs_add_one() fail if the parent isn't a directory
Currently the kobject based interface guarantees that a parent
sysfs_dirent is always a directory; however, the planned kernfs
interface will be directly based on sysfs_dirents and the caller may
specify non-directory node as the parent.  Add an explicit check in
__sysfs_add_one() so that such attempts fail with -EINVAL.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:14:43 -08:00
Tejun Heo
c84a3b2779 sysfs: drop kobj_ns_type handling, take #2
The way namespace tags are implemented in sysfs is more complicated
than necessary.  As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is.  If multiple namespace types are
needed, which currently aren't, we can simply compare the tag to a set
of allowed tags in the superblock assuming that the tags, being
pointers, won't have the same value across multiple types.

This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
an enable switch to turn on namespace under a node.  If enabled, all
children are required to have non-NULL namespace tags and filtered
against the super_block's tag.

kobject namespace determination is now performed in
lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
The sanity checks are also moved.  create_dir() is restructured to
ease such addition.  This removes most kobject namespace knowledge
from sysfs proper which will enable proper separation and layering of
sysfs.

This is the second try.  The first one was cb26a31157 ("sysfs: drop
kobj_ns_type handling") which tried to automatically enable namespace
if there are children with non-NULL namespace tags; however, it was
broken for symlinks as they should inherit the target's tag iff
namespace is enabled in the parent.  This led to namespace filtering
enabled incorrectly for wireless net class devices through phy80211
symlinks and thus network configuration failure.  a1212d278c
("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.

This shouldn't introduce any behavior changes, for real.

v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
    missing and caused build failure.  Reported by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:01:03 -08:00
Greg Kroah-Hartman
81440e7374 Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()"
This reverts commit 54d71145a4.

The root cause of these "inverted" sysfs removals have now been found,
so there is no need for this patch.  Keep this functionality around so
that this type of error doesn't show up in driver code again.

Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 09:44:55 -08:00
Tejun Heo
027a485d12 sysfs: use a separate locking class for open files depending on mmap
The following two commits implemented mmap support in the regular file
path and merged bin file support into the regular path.

 73d9714627 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
 3124eb1679 ("sysfs: merge regular and bin file handling")

After the merge, the following commands trigger a spurious lockdep
warning.  "test-mmap-read" simply mmaps the file and dumps the
content.

  $ cat /sys/block/sda/trace/act_mask
  $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.12.0-work+ #378 Not tainted
  -------------------------------------------------------
  test-mmap-read/567 is trying to acquire lock:
   (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120

  but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (&mm->mmap_sem){++++++}:
  ...
  -> #2 (sr_mutex){+.+.+.}:
  ...
  -> #1 (&bdev->bd_mutex){+.+.+.}:
  ...
  -> #0 (&of->mutex){+.+.+.}:
  ...

  other info that might help us debug this:

  Chain exists of:
   &of->mutex --> sr_mutex --> &mm->mmap_sem

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&mm->mmap_sem);
				 lock(sr_mutex);
				 lock(&mm->mmap_sem);
    lock(&of->mutex);

   *** DEADLOCK ***

  1 lock held by test-mmap-read/567:
   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0

  stack backtrace:
  CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
   ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
   0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
  Call Trace:
   [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
   [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
   [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
   [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
   [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
   [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
   [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
   [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
   [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
   [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
   [<ffffffff81008282>] SyS_mmap+0x22/0x30
   [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b

This happens because one file nests sr_mutex, which nests mm->mmap_sem
under it, under of->mutex while mmap implementation naturally nests
of->mutex under mm->mmap_sem.  The warning is false positive as
of->mutex is per open-file and the two paths belong to two different
files.  This warning didn't trigger before regular and bin file
supports were merged because only bin file supported mmap and the
other side of locking happened only on regular files which used
equivalent but separate locking.

It'd be best if we give separate locking classes per file but we can't
easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
add explicit file operations struct and can add per-ops lockdep key
there.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-23 10:52:13 -08:00
Mika Westerberg
54d71145a4 sysfs: handle duplicate removal attempts in sysfs_remove_group()
Commit bcdde7e221 (sysfs: make __sysfs_remove_dir() recursive) changed
the behavior so that directory removals will be done recursively. This
means that the sysfs group might already be removed if its parent directory
has been removed.

The current code outputs warnings similar to following log snippet when it
detects that there is no group for the given kobject:

 WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
 sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
 Modules linked in:
 CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
 Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
 Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
  ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
  ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
 Call Trace:
  [<ffffffff817daab1>] dump_stack+0x45/0x56
  [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
  [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
  [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
  [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
  [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
  [<ffffffff8142a0d0>] device_del+0x40/0x1b0
  [<ffffffff8142a24d>] device_unregister+0xd/0x20
  [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
  [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
  [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
  [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
  [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
  [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
  [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
  [<ffffffff8142a1b1>] device_del+0x121/0x1b0
  [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
  [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
  [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
  [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
  [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
  [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
  [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
  [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
  [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
  [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
  [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
  [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
  [<ffffffff8105db29>] worker_thread+0x119/0x390
  [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
  [<ffffffff81063a5d>] kthread+0xcd/0xf0
  [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
  [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180

On this particular machine I see ~16 of these message during Thunderbolt
hot-unplug.

Fix this in similar way that was done for sysfs_remove_one() by checking
if the parent directory has already been removed and bailing out early.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-23 10:52:13 -08:00
Linus Torvalds
a1212d278c Revert "sysfs: drop kobj_ns_type handling"
This reverts commit cb26a31157.

It mysteriously causes NetworkManager to not find the wireless device
for me.  As far as I can tell, Tejun *meant* for this commit to not make
any semantic changes, but there clearly are some.  So revert it, taking
into account some of the calling convention changes that happened in
this area in subsequent commits.

Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-07 20:47:28 +09:00
Tejun Heo
0cae60f914 sysfs: rename sysfs_assoc_lock and explain what it's about
sysfs_assoc_lock is an odd piece of locking.  In general, whoever owns
a kobject is responsible for synchronizing sysfs operations and sysfs
proper assumes that, for example, removal won't race with any other
operation; however, this doesn't work for symlinking because an entity
performing symlink doesn't usually own the target kobject and thus has
no control over its removal.

sysfs_assoc_lock synchronizes symlink operations against kobj->sd
disassociation so that symlink code doesn't end up dereferencing
already freed sysfs_dirent by racing with removal of the target
kobject.

This is quite obscure and the generic name of the lock and lack of
comments make it difficult to understand its role.  Let's rename it to
sysfs_symlink_target_lock and add comments explaining what's going on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 12:13:37 -07:00
Tejun Heo
044e3bc333 sysfs: use generic_file_llseek() for sysfs_file_operations
13c589d5b0 ("sysfs: use seq_file when reading regular files")
converted regular sysfs files to use seq_file.  The commit substituted
generic_file_llseek() with seq_lseek() for llseek implementation.

Before the change, all regular sysfs files were allowed to seek to any
position in [0, PAGE_SIZE] as the file size is always PAGE_SIZE and
generic_file_llseek() allows any seeking inside the range under file
size; however, seq_lseek()'s behavior is different.  It traverses the
output by repeatedly invoking ->show() until it reaches the target
offset or traversal indicates EOF.  As seq_files are fully dynamic and
may not end at all, it doesn't support seeking from the end
(SEEK_END).

Apparently, there are userland tools which uses SEEK_END to discover
the buffer size to use and the switch to seq_lseek() disturbs them as
SEEK_END fails with -EINVAL.

The only benefits of using seq_lseek() instead of
generic_file_llseek() are

* Early failure.  If traversing to certain file position should fail,
  seq_lseek() will report such failures on lseek(2) instead of the
  following read/write operations.

* EOF detection.  While SEEK_END is not supported, SEEK_SET/CUR +
  large offset can be used to detect eof - eof at the time of the seek
  anyway as the file size may change dynamically.

Both aren't necessary for sysfs or prospect kernfs users.  Revert to
genefic_file_llseek() and preserve the original behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: https://lkml.kernel.org/r/20131031114358.GA5551@osiris
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 12:13:37 -07:00
Vladimir Zapolskiy
1c1365e374 sysfs: return correct error code on unimplemented mmap()
Both POSIX.1-2008 and Linux Programmer's Manual have a dedicated return
error code for a case, when a file doesn't support mmap(), it's ENODEV.

This change replaces overloaded EINVAL with ENODEV in a situation
described above for sysfs binary files.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-30 10:21:39 -07:00
Tejun Heo
d1c1459e45 sysfs: separate out dup filename warning into a separate function
Separate out sysfs_warn_dup() out of sysfs_add_one().  This will help
separating out the core sysfs functionalities into kernfs so that it
can be used by non-sysfs users too.

This doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo
7eed6ecb07 sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
Most removal related logic is implemented in fs/sysfs/dir.c.  Move
sysfs_hash_and_remove() to fs/sysfs/dir.c so that __sysfs_remove()
doesn't have to be public.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo
baa97cb507 sysfs: remove unused sysfs_get_dentry() prototype
sysfs_get_dentry() has been gone for years now.  Remove the left-over
prototype.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo
672f76a81a sysfs: honor bin_attr.attr.ignore_lockdep
ignore_lockdep is currently honored only for regular files.  There's
no reason to ignore it for bin files.  Update sysfs_ignore_lockdep()
so that bin_attr.attr.ignore_lockdep works too.

While this doesn't have any in-kernel user, this unifies the behaviors
between regular and bin files and will help later changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo
56b3f3b884 sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
3124eb1679 ("sysfs: merge regular and bin file handling") folded bin
file handling into regular file handling.  Among other things, bin
file now shares the same open path including sysfs_open_dirent
association using sysfs_dirent->s_attr.open.  This is buggy because
->s_bin_attr lives in the same union and doesn't have the field.  This
bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active
field at the conflicting position.  It does have a field "buffers" but
it isn't used anymore.

This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that
the bin_attr is accessed through ->s_attr.bin_attr which lives with
->s_attr.attr in an anonymous union.  The code paths already assume
bin_attr contains attr as the first element, so this doesn't add any
more assumptions while making it explicit that the two types are
handled together.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:06 -07:00
Ming Lei
b9c0622516 sysfs: fix sysfs_write_file for bin file
Before patch(sysfs: prepare path write for unified regular / bin
file handling), when size of bin file is zero, writting still can
continue, but this patch changes the behaviour.

The worse thing is that firmware loader is broken by this patch,
and user space application can't write to firmware bin file any more
because both firmware loader and drivers can't know at advance how
large the firmware file is and have to set its initialized size as
zero.

This patch fixes the problem and keeps behaviour of writting to bin
as before.

Reported-by: Lothar Waßmann <LW@karo-electronics.de>
Tested-by: Lothar Waßmann <LW@karo-electronics.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-25 05:46:27 +01:00
Benjamin Herrenschmidt
d723a92dd4 sysfs/bin: Fix size handling overflow for bin_attribute
While looking at the code, I noticed that bin_attribute read() and write()
ops copy the inode size into an int for futher comparisons.

Some bin_attributes can be fairly large. For example, pci creates some for
BARs set to the BAR size and giant BARs are around the corner, so this is
going to break something somewhere eventually.

Let's use the right type.

[adjust for seqfile conversions, only needed for bin_read() - gkh]

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14 10:07:19 -07:00
Tejun Heo
785a162d14 sysfs: make sysfs_file_ops() follow ignore_lockdep flag
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced
sysfs_file_ops() which determines the associated file operation of a
given sysfs_dirent.  As file ops access should be protected by an
active reference, the new function includes a lockdep assertion on the
sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep
flag into account and the lockdep assertion trips spuriously for files
which opt out from active reference lockdep checking.

# cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60()
 Modules linked in:
 CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000
  ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60
  ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50
 Call Trace:
  [<ffffffff81ca0131>] dump_stack+0x4e/0x82
  [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0
  [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60
  [<ffffffff8125a274>] sysfs_open_file+0x54/0x300
  [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280
  [<ffffffff811df820>] finish_open+0x30/0x40
  [<ffffffff811f0623>] do_last+0x503/0xd90
  [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0
  [<ffffffff811f23ba>] do_filp_open+0x3a/0x90
  [<ffffffff811e09a9>] do_sys_open+0x129/0x220
  [<ffffffff811e0abe>] SyS_open+0x1e/0x20
  [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b
 ---[ end trace aa48096b111dafdb ]---

Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and
move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep
assertion if sysfs_ignore_lockdep() is true.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14 08:40:39 -07:00
Tejun Heo
3124eb1679 sysfs: merge regular and bin file handling
With the previous changes, sysfs regular file code is ready to handle
bin files too.  This patch makes bin files share the regular file
path.

* sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.

* sysfs_init_inode() is updated to use the new sysfs_bin_operations
  instead of bin_fops for bin files.

* fs/sysfs/bin.c and the related pieces are removed.

This patch shouldn't introduce any behavior difference to bin file
accesses.

Overall, this unification reduces the amount of duplicate logic, makes
behaviors more consistent and paves the road for building simpler and
more versatile interface which will allow other subsystems to make use
of sysfs for their pseudo filesystems.

v2: Stale fs/sysfs/bin.c reference dropped from
    Documentation/DocBook/filesystems.tmpl.  Reported by kbuild test
    robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
49fe604781 sysfs: prepare open path for unified regular / bin file handling
sysfs bin file handling will be merged into the regular file support.
This patch prepares the open path.

This patch updates sysfs_open_file() such that it can handle both
regular and bin files.

This is a preparation and the new bin file path isn't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
73d9714627 sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c
sysfs bin file handling will be merged into the regular file support.
This patch copies mmap support from bin so that fs/sysfs/file.c can
handle mmapping bin files.

The code is copied mostly verbatim with the following updates.

* ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
  references are replaced with sysfs_open_file ones.

* Symbols are prefixed with sysfs_.

* sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
  ->files.  Invocation of this function is added to
  sysfs_addrm_finish().

* sysfs_bin_mmap() is added to sysfs_bin_operations.

This is a preparation and the new mmap path isn't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
2f0c6b7593 sysfs: add sysfs_bin_read()
sysfs bin file handling will be merged into the regular file support.
This patch prepares the read path.

Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use
sysfs_open_file instead of bin_buffer.  The function is identical copy
except for the use of sysfs_open_file.

The new function is added to sysfs_bin_operations.  This isn't used
yet but will eventually replace fs/sysfs/bin.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
f9b9a6217c sysfs: prepare path write for unified regular / bin file handling
sysfs bin file handling will be merged into the regular file support.
This patch prepares the write path.

bin file write is almost identical to regular file write except that
the write length is capped by the inode size and @off is passed to the
write method.  This patch adds bin file handling to sysfs_write_file()
so that it can handle both regular and bin files.

A new file_operations struct sysfs_bin_operations is added, which
currently only hosts sysfs_write_file() and generic_file_llseek().
This isn't used yet but will eventually replace fs/sysfs/bin.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
3ff65d3cb0 sysfs: collapse fs/sysfs/bin.c::fill_read() into read()
read() is simple enough and fill_read() being in a separate function
doesn't add anything.  Let's collapse it into read().  This will make
merging bin file handling with regular file.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
91270162bf sysfs: skip bin_buffer->buffer while reading
After b31ca3f5df ("sysfs: fix deadlock"), bin read() first writes
data to bb->buffer and bounces it to a transient kernel buffer which
is then copied out to userland.  The double bouncing doesn't add
anything.  Let's just use the transient buffer directly.

While at it, rename @temp to @buf for clarity.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo
13c589d5b0 sysfs: use seq_file when reading regular files
sysfs read path implements its own buffering scheme between userland
and kernel callbacks, which essentially is a degenerate duplicate of
seq_file.  This patch replaces the custom read buffering
implementation in sysfs with seq_file.

While the amount of code reduction is small, this reduces low level
hairiness and enables future development of a new versatile API based
on seq_file so that sysfs features can be shared with other
subsystems.

As write path was already converted to not use sysfs_open_file->page,
this patch makes ->page and ->count unused and removes them.

Userland behavior remains the same except for some extreme corner
cases - e.g. sysfs will now regenerate the content each time a file is
read after a non-contiguous seek whereas the original code would keep
using the same content.  While this is a userland visible behavior
change, it is extremely unlikely to be noticeable and brings sysfs
behavior closer to that of procfs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo
8ef445f080 sysfs: use transient write buffer
There isn't much to be gained by keeping around kernel buffer while a
file is open especially as the read path planned to be converted to
use seq_file and won't use the buffer.  This patch makes
sysfs_write_file() use per-write transient buffer instead of
sysfs_open_file->page.

This simplifies the write path, enables removing sysfs_open_file->page
once read path is updated and will help merging bin file write path
which already requires the use of a transient buffer due to a locking
order issue.

As the function comments of flush_write_buffer() and
sysfs_write_buffer() are being updated anyway, reformat them so that
they're more conventional.

v2: Use min_t() instead of min() in sysfs_write_file() to avoid build
    warning on arm.  Reported by build test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo
bcafe4eea3 sysfs: add sysfs_open_file->sd and ->file
sysfs will be converted to use seq_file for read path, which will make
it difficult to pass around multiple pointers directly.  This patch
adds sysfs_open_file->sd and ->file so that we can reach all the
necessary data structures from sysfs_open_file.

flush_write_buffer() is updated to drop @dentry which was used to
discover the sysfs_dirent as it's now available through
sysfs_open_file->sd.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo
58282d8dc2 sysfs: rename sysfs_buffer to sysfs_open_file
sysfs read path will be converted to use seq_file which will handle
buffering making sysfs_buffer a misnomer.  Rename sysfs_buffer to
sysfs_open_file, and sysfs_open_dirent->buffers to ->files.

This path is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:16:28 -07:00
Tejun Heo
c75ec764cf sysfs: add sysfs_open_file_mutex
Add a separate mutex to protect sysfs_open_dirent->buffers list.  This
will allow performing sleepable operations while traversing
sysfs_buffers, which will be renamed to sysfs_open_file.

Note that currently sysfs_open_dirent->buffers list isn't being used
for anything and this patch doesn't make any functional difference.
It will be used to merge regular and bin file supports.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:15:48 -07:00
Tejun Heo
375b611e60 sysfs: remove sysfs_buffer->ops
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
sysfs_buffer->ops to be used while the file is open.  This patch
removes the caching and makes each operation directly fetch sysfs_ops.

This patch doesn't introduce any behavior difference and is to prepare
for merging regular and bin file supports.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:04:34 -07:00
Tejun Heo
aea585ef8f sysfs: remove sysfs_buffer->needs_read_fill
->needs_read_fill is used to implement the following behaviors.

1. Ensure buffer filling on the first read.
2. Force buffer filling after a write.
3. Force buffer filling after a successful poll.

However, #2 and #3 don't really work as sysfs doesn't reset file
position.  While the read buffer would be refilled, the next read
would continue from the position after the last read or write,
requiring an explicit seek to the start for it to be useful, which
makes ->needs_read_fill superflous as read buffer is always refilled
if f_pos == 0.

Update sysfs_read_file() to test buffer->page for #1 instead and
remove ->needs_read_fill.  While this changes behavior in extreme
corner cases - e.g. re-reading a sysfs file after seeking to non-zero
position after a write or poll, it's highly unlikely to lead to actual
breakage.  This change is to prepare for using seq_file in the read
path.

While at it, reformat a comment in fill_write_buffer().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 11:02:04 -07:00
Tejun Heo
89e51dab7c sysfs: remove unused sysfs_buffer->pos
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 10:54:47 -07:00
Tejun Heo
250f7c3fee sysfs: introduce [__]sysfs_remove()
Given a sysfs_dirent, there is no reason to have multiple versions of
removal functions.  A function which removes the specified
sysfs_dirent and its descendants is enough.

This patch intorduces [__}sysfs_remove() which replaces all internal
variations of removal functions.  This will be the only removal
function in the planned new sysfs_dirent based interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:38:52 -07:00
Tejun Heo
bcdde7e221 sysfs: make __sysfs_remove_dir() recursive
Currently, sysfs directory removal is inconsistent in that it would
remove any files directly under it but wouldn't recurse into
directories.  Thanks to group subdirectories, this doesn't even match
with kobject boundaries.  sysfs is in the process of being separated
out so that it can be used by multiple subsystems and we want to have
a consistent behavior - either removal of a sysfs_dirent should remove
every descendant entries or none instead of something inbetween.

This patch implements proper recursive removal in
__sysfs_remove_dir().  The function now walks its subtree in a
post-order walk to remove all descendants.

This is a behavior change but kobject / driver layer, which currently
is the only consumer, has already been updated to handle duplicate
removal attempts, so nothing should be broken after this change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:38:52 -07:00
Tejun Heo
26ea12dec0 kobject: grab an extra reference on kobject->sd to allow duplicate deletes
sysfs currently has a rather weird behavior regarding removals.  A
directory removal would delete all files directly under it but
wouldn't recurse into subdirectories, which, while a bit inconsistent,
seems to make sense at the first glance as each directory is
supposedly associated with a kobject and each kobject can take care of
the directory deletion; however, this doesn't really hold as we have
groups which can be directories without a kobject associated with it
and require explicit deletions.

We're in the process of separating out sysfs from kboject / driver
core and want a consistent behavior.  A removal should delete either
only the specified node or everything under it.  I think it is helpful
to support recursive atomic removal and later patches will implement
it.

Such change means that a sysfs_dirent associated with kobject may be
deleted before the kobject itself is removed if one of its ancestor
gets removed before it.  As sysfs_remove_dir() puts the base ref, we
may end up with dangling pointer on descendants.  This can be solved
by holding an extra reference on the sd from kobject.

Acquire an extra reference on the associated sysfs_dirent on directory
creation and put it after removal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:38:52 -07:00
Tejun Heo
d69ac5a0bb sysfs: remove sysfs_addrm_cxt->parent_sd
sysfs_addrm_start/finish() enclose sysfs_dirent additions and
deletions and sysfs_addrm_cxt is used to record information necessary
to finish the operations.  Currently, sysfs_addrm_start() takes
@parent_sd, records it in sysfs_addrm_cxt, and assumes that all
operations in the block are performed under that @parent_sd.

This assumption has been fine until now but we want to make some
operations behave recursively and, while having @parent_sd recorded in
sysfs_addrm_cxt doesn't necessarily prevents that, it becomes
confusing.

This patch removes sysfs_addrm_cxt->parent_sd and makes
sysfs_add_one() take an explicit @parent_sd parameter.  Note that
sysfs_remove_one() doesn't need the extra argument as its parent is
always known from the target @sd.

While at it, add __acquires/releases() notations to
sysfs_addrm_start/finish() respectively.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:16:43 -07:00
Tejun Heo
cfec0bc835 sysfs: @name comes before @ns
Some internal sysfs functions which take explicit namespace argument
are weird in that they place the optional @ns in front of @name which
is contrary to the established convention.  This is confusing and
error-prone especially as @ns and @name may be interchanged without
causing compilation warning.

Swap the positions of @name and @ns in the following internal
functions.

 sysfs_find_dirent()
 sysfs_rename()
 sysfs_hash_and_remove()
 sysfs_name_hash()
 sysfs_name_compare()
 create_dir()

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:34:38 -07:00
Tejun Heo
388975ccca sysfs: clean up sysfs_get_dirent()
The pre-existing sysfs interfaces which take explicit namespace
argument are weird in that they place the optional @ns in front of
@name which is contrary to the established convention.  For example,
we end up forcing vast majority of sysfs_get_dirent() users to do
sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
especially as @ns and @name may be interchanged without causing
compilation warning.

This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
around sysfs_get_dirent_ns().  This makes confusions a lot less
likely.

There are other interfaces which take @ns before @name.  They'll be
updated by following patches.

This patch doesn't introduce any functional changes.

v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
    error on module builds.  Reported by build test robot.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:33:18 -07:00
Tejun Heo
cb26a31157 sysfs: drop kobj_ns_type handling
The way namespace tags are implemented in sysfs is more complicated
than necessary.  As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is or where namespace is enabled.

If multiple namespace types are needed, which currently aren't, we can
simply compare the tag to a set of allowed tags in the superblock
assuming that the tags, being pointers, won't have the same value
across multiple types.  Also, whether to filter by namespace tag or
not can be trivially determined by whether the node has any tagged
children or not.

This patch rips out kobj_ns_type handling from sysfs.  sysfs no longer
cares whether specific type of namespace is enabled or not.  If a
sysfs_dirent has a non-NULL tag, the parent is marked as needing
namespace filtering and the value is tested against the allowed set of
tags for the superblock (currently only one but increasing this number
isn't difficult) and the sysfs_dirent is ignored if it doesn't match.

This removes most kobject namespace knowledge from sysfs proper which
will enable proper separation and layering of sysfs.  The namespace
sanity checks in fs/sysfs/dir.c are replaced by the new sanity check
in kobject_namespace().  As this is the only place ktype->namespace()
is called for sysfs, this doesn't weaken the sanity check
significantly.  I omitted converting the sanity check in
sysfs_do_create_link_sd().  While the check can be shifted to upper
layer, mistakes there are well contained and should be easily visible
anyway.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:30:22 -07:00
Tejun Heo
4b30ee58ee sysfs: remove ktype->namespace() invocations in symlink code
There's no reason for sysfs to be calling ktype->namespace().  It is
backwards, obfuscates what's going on and unnecessarily tangles two
separate layers.

There are two places where symlink code calls ktype->namespace().

* sysfs_do_create_link_sd() calls it to find out the namespace tag of
  the target directory.  Unless symlinking races with cross-namespace
  renaming, this equals @target_sd->s_ns.

* sysfs_rename_link() uses it to find out the new namespace to rename
  to and the new namespace can be different from the existing one.
  The function is renamed to sysfs_rename_link_ns() with an explicit
  @ns argument and the ktype->namespace() invocation is shifted to the
  device layer.

While this patch replaces ktype->namespace() invocation with the
recorded result in @target_sd, this shouldn't result in any behvior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:30:22 -07:00
Tejun Heo
e34ff49061 sysfs: remove ktype->namespace() invocations in directory code
For some unrecognizable reason, namespace information is communicated
to sysfs through ktype->namespace() callback when there's *nothing*
which needs the use of a callback.  The whole sequence of operations
is completely synchronous and sysfs operations simply end up calling
back into the layer which just invoked it in order to find out the
namespace information, which is completely backwards, obfuscates
what's going on and unnecessarily tangles two separate layers.

This patch doesn't remove ktype->namespace() but shifts its handling
to kobject layer.  We probably want to get rid of the callback in the
long term.

This patch adds an explicit param to sysfs_{create|rename|move}_dir()
and renames them to sysfs_{create|rename|move}_dir_ns(), respectively.
ktype->namespace() invocations are moved to the calling sites of the
above functions.  A new helper kboject_namespace() is introduced which
directly tests kobj_ns_type_operations->type which should give the
same result as testing sysfs_fs_type(parent_sd) and returns @kobj's
namespace tag as necessary.  kobject_namespace() is extern as it will
be used from another file in the following patches.

This patch should be an equivalent conversion without any functional
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:30:22 -07:00
Tejun Heo
58292cbe66 sysfs: make attr namespace interface less convoluted
sysfs ns (namespace) implementation became more convoluted than
necessary while trying to hide ns information from visible interface.
The relatively recent attr ns support is a good example.

* attr ns tag is determined by sysfs_ops->namespace() callback while
  dir tag is determined by kobj_type->namespace().  The placement is
  arbitrary.

* Instead of performing operations with explicit ns tag, the namespace
  callback is routed through sysfs_attr_ns(), sysfs_ops->namespace(),
  class_attr_namespace(), class_attr->namespace().  It's not simpler
  in any sense.  The only thing this convolution does is traversing
  the whole stack backwards.

The namespace callbacks are unncessary because the operations involved
are inherently synchronous.  The information can be provided in in
straight-forward top-down direction and reversing that direction is
unnecessary and against basic design principles.

This backward interface is unnecessarily convoluted and hinders
properly separating out sysfs from driver model / kobject for proper
layering.  This patch updates attr ns support such that

* sysfs_ops->namespace() and class_attr->namespace() are dropped.

* sysfs_{create|remove}_file_ns(), which take explicit @ns param, are
  added and sysfs_{create|remove}_file() are now simple wrappers
  around the ns aware functions.

* ns handling is dropped from sysfs_chmod_file().  Nobody uses it at
  this point.  sysfs_chmod_file_ns() can be added later if necessary.

* Explicit @ns is propagated through class_{create|remove}_file_ns()
  and netdev_class_{create|remove}_file_ns().

* driver/net/bonding which is currently the only user of attr
  namespace is updated to use netdev_class_{create|remove}_file_ns()
  with @bh->net as the ns tag instead of using the namespace callback.

This patch should be an equivalent conversion without any functional
difference.  It makes the code easier to follow, reduces lines of code
a bit and helps proper separation and layering.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 14:50:01 -07:00
Tejun Heo
bcac3769ca sysfs: drop semicolon from to_sysfs_dirent() definition
The expansion of to_sysfs_dirent() contains an unncessary trailing
semicolon making it impossible to use in the middle of statements.
Drop it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 14:48:28 -07:00
Linus Torvalds
dc0755cdb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 2 (of many) from Al Viro:
 "Mostly Miklos' series this time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify dcache.c inlined helpers where possible
  fuse: drop dentry on failed revalidate
  fuse: clean up return in fuse_dentry_revalidate()
  fuse: use d_materialise_unique()
  sysfs: use check_submounts_and_drop()
  nfs: use check_submounts_and_drop()
  gfs2: use check_submounts_and_drop()
  afs: use check_submounts_and_drop()
  vfs: check unlinked ancestors before mount
  vfs: check submounts and drop atomically
  vfs: add d_walk()
  vfs: restructure d_genocide()
2013-09-07 14:36:57 -07:00
Linus Torvalds
c7c4591db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
 "This is an assorted mishmash of small cleanups, enhancements and bug
  fixes.

  The major theme is user namespace mount restrictions.  nsown_capable
  is killed as it encourages not thinking about details that need to be
  considered.  A very hard to hit pid namespace exiting bug was finally
  tracked and fixed.  A couple of cleanups to the basic namespace
  infrastructure.

  Finally there is an enhancement that makes per user namespace
  capabilities usable as capabilities, and an enhancement that allows
  the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns:  Kill nsown_capable it makes the wrong thing easy
  capabilities: allow nice if we are privileged
  pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  userns: Allow PR_CAPBSET_DROP in a user namespace.
  namespaces: Simplify copy_namespaces so it is clear what is going on.
  pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
  sysfs: Restrict mounting sysfs
  userns: Better restrictions on when proc and sysfs can be mounted
  vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
  kernel/nsproxy.c: Improving a snippet of code.
  proc: Restrict mounting the proc filesystem
  vfs: Lock in place mounts from more privileged users
2013-09-07 14:35:32 -07:00
Miklos Szeredi
6497d160f6 sysfs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:53 -04:00
Eric W. Biederman
7dc5dbc879 sysfs: Restrict mounting sysfs
Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
over the net namespace.  The principle here is if you create or have
capabilities over it you can mount it, otherwise you get to live with
what other people have mounted.

Instead of testing this with a straight forward ns_capable call,
perform this check the long and torturous way with kobject helpers,
this keeps direct knowledge of namespaces out of sysfs, and preserves
the existing sysfs abstractions.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-08-28 21:35:14 -07:00
Eric W. Biederman
e51db73532 userns: Better restrictions on when proc and sysfs can be mounted
Rely on the fact that another flavor of the filesystem is already
mounted and do not rely on state in the user namespace.

Verify that the mounted filesystem is not covered in any significant
way.  I would love to verify that the previously mounted filesystem
has no mounts on top but there are at least the directories
/proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
for other filesystems to mount on top of.

Refactor the test into a function named fs_fully_visible and call that
function from the mount routines of proc and sysfs.  This makes this
test local to the filesystems involved and the results current of when
the mounts take place, removing a weird threading of the user
namespace, the mount namespace and the filesystems themselves.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-08-26 19:17:03 -07:00
Greg Kroah-Hartman
09239ed4aa sysfs: group.c: fix up kerneldoc
Fix up the wording of sysfs_create/remove_groups() a bit.

Reported-by: Anthony Foiani <tkil@scrye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-22 09:23:28 -07:00
Greg Kroah-Hartman
2c3a908b4b sysfs: sysfs.h: fix coding style issues
This fixes up the remaining coding style issues in sysfs.h

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:40:05 -07:00
Greg Kroah-Hartman
07ac62a604 sysfs: file.c: fix up broken string warnings
This fixes the coding style warnings in fs/sysfs/file.c for broken
strings across lines.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:37:42 -07:00
Greg Kroah-Hartman
37814ee0ba sysfs: dir.c: fix up odd do/while indentation
This fixes up the odd do/while after an if statement warning in dir.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:36:02 -07:00
Greg Kroah-Hartman
060cc749e9 sysfs: fix up uaccess.h coding style warnings
This fixes the uaccess.h warnings in the sysfs.c files.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:34:59 -07:00
Greg Kroah-Hartman
ddfd6d074e sysfs: fix up 80 column coding style issues
This fixes up the 80 column coding style issues in the sysfs .c files.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:33:34 -07:00
Greg Kroah-Hartman
1b18dc2beb sysfs: fix up space coding style issues
This fixes up all of the space-related coding style issues for the sysfs
code.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:28:26 -07:00
Greg Kroah-Hartman
ab9bf4be4d sysfs: remove trailing whitespace
This removes all trailing whitespace errors in the sysfs code.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:21:17 -07:00
Greg Kroah-Hartman
1b866757fc sysfs: fix placement of EXPORT_SYMBOL()
The export should happen after the function, not at the bottom of the
file, so fix that up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:17:47 -07:00
Greg Kroah-Hartman
9e2a47ed64 sysfs: group: update copyright to add myself and the LF
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:14:11 -07:00
Greg Kroah-Hartman
f9ae443b5a sysfs: group.c: add kerneldoc for sysfs_remove_group
sysfs_remove_group() never had kerneldoc, so add it, and fix up the
kerneldoc for sysfs_remove_groups() which didn't specify the parameters
properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:12:34 -07:00
Greg Kroah-Hartman
16aebf1c5d sysfs: group.c: fix up broken string coding style
checkpatch complains about the broken string in the file, and it's
correct, so fix it up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:10:02 -07:00
Greg Kroah-Hartman
995d8ed943 sysfs: group.c: fix up some * coding style issues
This fixes up the * coding style warnings for the group.c sysfs file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:07:29 -07:00
Greg Kroah-Hartman
e6c56920fd sysfs: group.c: fix trailing whitespace
There was some trailing spaces in the file, fix that up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:06:14 -07:00
Greg Kroah-Hartman
d363bc53ef sysfs: group.c: move EXPORT_SYMBOL_GPL() to the proper location
This fixes up the coding style issue of incorrectly placing the
EXPORT_SYMBOL_GPL() macro, it should be right after the function itself,
not at the end of the file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:04:12 -07:00
Greg Kroah-Hartman
3e9b2bae83 sysfs: add sysfs_create/remove_groups()
These functions are being open-coded in 3 different places in the driver
core, and other driver subsystems will want to start doing this as well,
so move it to the sysfs core to keep it all in one place, where we know
it is written properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 16:02:19 -07:00
Oliver Schinagl
388a8c353d sysfs: prevent warning when only using binary attributes
When only using bin_attrs instead of attrs the kernel prints a warning
and refuses to create the sysfs entry. This fixes that.

Signed-off-by: Oliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-16 10:57:36 -07:00
Greg Kroah-Hartman
6ab9cea160 sysfs: add support for binary attributes in groups
groups should be able to support binary attributes, just like it
supports "normal" attributes.  This lets us only handle one type of
structure, groups, throughout the driver core and subsystems, making
binary attributes a "full fledged" part of the driver model, and not
something just "tacked on".

Reported-by: Oliver Schinagl <oliver@schinagl.nl>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-16 10:57:36 -07:00
Linus Torvalds
fc76a258d4 Driver core patches for 3.11-rc1
Here's the big driver core merge for 3.11-rc1
 
 Lots of little things, and larger firmware subsystem updates, all
 described in the shortlog.  Nice thing here is that we finally get rid
 of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
 been always on for a number of kernel releases, now it's just removed.)
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlHRsGMACgkQMUfUDdst+ylIIACfW8lLxOPVK+iYG699TWEBAkp0
 LFEAnjlpAMJ1JnoZCuWDZObNCev93zGB
 =020+
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the big driver core merge for 3.11-rc1

  Lots of little things, and larger firmware subsystem updates, all
  described in the shortlog.  Nice thing here is that we finally get rid
  of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
  been always on for a number of kernel releases, now it's just
  removed)"

* tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
  driver core: device.h: fix doc compilation warnings
  firmware loader: fix another compile warning with PM_SLEEP unset
  build some drivers only when compile-testing
  firmware loader: fix compile warning with PM_SLEEP set
  kobject: sanitize argument for format string
  sysfs_notify is only possible on file attributes
  firmware loader: simplify holding module for request_firmware
  firmware loader: don't export cache_firmware and uncache_firmware
  drivers/base: Use attribute groups to create sysfs memory files
  firmware loader: fix compile warning
  firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER
  Documentation: Updated broken link in HOWTO
  Finally eradicate CONFIG_HOTPLUG
  driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend
  driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware
  Documentation: Tidy up some drivers/base/core.c kerneldoc content.
  platform_device: use a macro instead of platform_driver_register
  firmware: move EXPORT_SYMBOL annotations
  firmware: Avoid deadlock of usermodehelper lock at shutdown
  dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly
  ...
2013-07-02 11:44:19 -07:00
Al Viro
d55fea8ddb [readdir] convert sysfs
get rid of the kludges in sysfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:36 +04:00
Nick Dyer
fc60bb8339 sysfs_notify is only possible on file attributes
If sysfs_notify is called on a binary attribute, bad things can
happen, so prevent it.

Note, no in-kernel usage of this is currently present, but in the
future, it's good to be safe.

Changes in V2:
- Also ignore sysfs_notify on dirs, links
- Use WARN_ON rather than silently failing
- Compiled and tested (huge apologies about first submission)

Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07 16:05:50 -07:00
Rami Rosen
5240d58c04 sysfs: kill sysfs_sb declaration in fs/sysfs/inode.c.
This patch removes sysfs_sb declaration from fs/sysfs/inode.c
(due to 0f4288ec6f,
 "Kill unused sysfs_sb variable").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-17 10:26:08 -07:00
Warner Wang
434749108c sysfs: sysfs_link_sibling(): fix typo in comment
Fix a typo subling->sibling in the comment of sysfs_link_sibling().

Signed-off-by: Warner Wang <warner.wang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-17 10:26:08 -07:00
Ming Lei
bb2b0051d7 sysfs: check if one entry has been removed before freeing
It might be a kernel disaster if one sysfs entry is freed but
still referenced by sysfs tree.

Recently Dave and Sasha reported one use-after-free problem on
sysfs entry, and the problem has been troubleshooted with help
of debug message added in this patch.

Given sysfs_get_dirent/sysfs_put are exported APIs, even inside
sysfs they are called in many contexts(kobject/attribe add/delete,
inode init/drop, dentry lookup/release, readdir, ...), it is healthful
to check the removed flag before freeing one entry and dump message
if it is freeing without being removed first.

Cc: Dave Jones <davej@redhat.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05 15:35:52 -07:00
Ming Lei
f7db5e7660 sysfs: fix use after free in case of concurrent read/write and readdir
The inode->i_mutex isn't hold when updating filp->f_pos
in read()/write(), so the filp->f_pos might be read as
0 or 1 in readdir() when there is concurrent read()/write()
on this same file, then may cause use after free in readdir().

The bug can be reproduced with Li Zefan's test code on the
link:

	https://patchwork.kernel.org/patch/2160771/

This patch fixes the use after free under this situation.

Cc: stable <stable@vger.kernel.org>
Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-03 11:09:02 -07:00
Greg Kroah-Hartman
0f8b1a0204 Merge v3.9-rc5 into driver-core-next
We want the fixes in here.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-01 11:05:59 -07:00
Linus Torvalds
97f084b8e6 sysfs fixes for 3.9-rc4
Here are two fixes for sysfs that resolve issues that have been found by the
 Trinity fuzz tool, causing oopses in sysfs.  They both have been in linux-next
 for a while to ensure that they do not cause any other problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUdHUACgkQMUfUDdst+ykk+ACfWz6U/DW97ibFusDj+Sys1pEt
 essAn15ZFy/pT5myhCvxqVH0MHrIftup
 =BM+Q
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull sysfs fixes from Greg Kroah-Hartman:
 "Here are two fixes for sysfs that resolve issues that have been found
  by the Trinity fuzz tool, causing oopses in sysfs.  They both have
  been in linux-next for a while to ensure that they do not cause any
  other problems."

* tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: handle failure path correctly for readdir()
  sysfs: fix race between readdir and lseek
2013-03-28 15:52:14 -07:00
Eric W. Biederman
87a8ebd637 userns: Restrict when proc and sysfs can be mounted
Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27 07:50:08 -07:00
Maarten Lankhorst
3db3c62584 sysfs: use atomic_inc_unless_negative in sysfs_get_active
It seems that sysfs has an interesting way of doing the same thing.
This removes the cpu_relax unfortunately, but if it's really needed,
it would be better to add this to include/linux/atomic.h to benefit
all atomic ops users.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-25 10:42:36 -07:00
Ming Lei
e5110f411d sysfs: handle failure path correctly for readdir()
In case of 'if (filp->f_pos ==  0 or 1)' of sysfs_readdir(),
the failure from filldir() isn't handled, and the reference counter
of the sysfs_dirent object pointed by filp->private_data will be
released without clearing filp->private_data, so use after free
bug will be triggered later.

This patch returns immeadiately under the situation for fixing the bug,
and it is reasonable to return from readdir() when filldir() fails.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20 16:53:42 -07:00
Ming Lei
991f76f837 sysfs: fix race between readdir and lseek
While readdir() is running, lseek() may set filp->f_pos as zero,
then may leave filp->private_data pointing to one sysfs_dirent
object without holding its reference counter, so the sysfs_dirent
object may be used after free in next readdir().

This patch holds inode->i_mutex to avoid the problem since
the lock is always held in readdir path.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20 16:53:42 -07:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Linus Torvalds
06991c28f3 Driver core patches for 3.9-rc1
Here is the big driver core merge for 3.9-rc1
 
 There are two major series here, both of which touch lots of drivers all
 over the kernel, and will cause you some merge conflicts:
   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.
   - remove CONFIG_EXPERIMENTAL
 
 If you need me to provide a merged tree to handle these resolutions,
 please let me know.
 
 Other than those patches, there's not much here, some minor fixes and
 updates.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
 =yWAQ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg Kroah-Hartman:
 "Here is the big driver core merge for 3.9-rc1

  There are two major series here, both of which touch lots of drivers
  all over the kernel, and will cause you some merge conflicts:

   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.

   - remove CONFIG_EXPERIMENTAL

  Other than those patches, there's not much here, some minor fixes and
  updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
  base: memory: fix soft/hard_offline_page permissions
  drivercore: Fix ordering between deferred_probe and exiting initcalls
  backlight: fix class_find_device() arguments
  TTY: mark tty_get_device call with the proper const values
  driver-core: constify data for class_find_device()
  firmware: Ignore abort check when no user-helper is used
  firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
  firmware: Make user-mode helper optional
  firmware: Refactoring for splitting user-mode helper code
  Driver core: treat unregistered bus_types as having no devices
  watchdog: Convert to devm_ioremap_resource()
  thermal: Convert to devm_ioremap_resource()
  spi: Convert to devm_ioremap_resource()
  power: Convert to devm_ioremap_resource()
  mtd: Convert to devm_ioremap_resource()
  mmc: Convert to devm_ioremap_resource()
  mfd: Convert to devm_ioremap_resource()
  media: Convert to devm_ioremap_resource()
  iommu: Convert to devm_ioremap_resource()
  drm: Convert to devm_ioremap_resource()
  ...
2013-02-21 12:05:51 -08:00
Rafael J. Wysocki
0bb8f3d6ae sysfs: Functions for adding/removing symlinks to/from attribute groups
The most convenient way to expose ACPI power resources lists of a
device is to put symbolic links to sysfs directories representing
those resources into special attribute groups in the device's sysfs
directory.  For this purpose, it is necessary to be able to add
symbolic links to attribute groups.

For this reason, add sysfs helper functions for adding/removing
symbolic links to/from attribute groups, sysfs_add_link_to_group()
and sysfs_remove_link_from_group(), respectively.

This change set includes a build fix from David Rientjes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 21:51:13 +01:00
Greg Kroah-Hartman
595e0eb067 Revert "sysfs: Convert print_symbol to %pSR"
This reverts commit 6ad58fa82d as %pSR
isn't in the tree yet.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 13:09:57 -08:00
Joe Perches
6ad58fa82d sysfs: Convert print_symbol to %pSR
Use the new vsprintf extension to avoid any possible
message interleaving.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 12:57:07 -08:00
Bin Wang
6b8fbde418 sysfs: Fixed a trailing white space error
This patch removes the trailing white space in fs/sysfs/mount.c.

Signed-off-by: Bin Wang <wbin00@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 12:54:47 -08:00
Linus Torvalds
6a2b60b17b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "While small this set of changes is very significant with respect to
  containers in general and user namespaces in particular.  The user
  space interface is now complete.

  This set of changes adds support for unprivileged users to create user
  namespaces and as a user namespace root to create other namespaces.
  The tyranny of supporting suid root preventing unprivileged users from
  using cool new kernel features is broken.

  This set of changes completes the work on setns, adding support for
  the pid, user, mount namespaces.

  This set of changes includes a bunch of basic pid namespace
  cleanups/simplifications.  Of particular significance is the rework of
  the pid namespace cleanup so it no longer requires sending out
  tendrils into all kinds of unexpected cleanup paths for operation.  At
  least one case of broken error handling is fixed by this cleanup.

  The files under /proc/<pid>/ns/ have been converted from regular files
  to magic symlinks which prevents incorrect caching by the VFS,
  ensuring the files always refer to the namespace the process is
  currently using and ensuring that the ptrace_mayaccess permission
  checks are always applied.

  The files under /proc/<pid>/ns/ have been given stable inode numbers
  so it is now possible to see if different processes share the same
  namespaces.

  Through the David Miller's net tree are changes to relax many of the
  permission checks in the networking stack to allowing the user
  namespace root to usefully use the networking stack.  Similar changes
  for the mount namespace and the pid namespace are coming through my
  tree.

  Two small changes to add user namespace support were commited here adn
  in David Miller's -net tree so that I could complete the work on the
  /proc/<pid>/ns/ files in this tree.

  Work remains to make it safe to build user namespaces and 9p, afs,
  ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
  Kconfig guard remains in place preventing that user namespaces from
  being built when any of those filesystems are enabled.

  Future design work remains to allow root users outside of the initial
  user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
  proc: Usable inode numbers for the namespace file descriptors.
  proc: Fix the namespace inode permission checks.
  proc: Generalize proc inode allocation
  userns: Allow unprivilged mounts of proc and sysfs
  userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
  procfs: Print task uids and gids in the userns that opened the proc file
  userns: Implement unshare of the user namespace
  userns: Implent proc namespace operations
  userns: Kill task_user_ns
  userns: Make create_new_namespaces take a user_ns parameter
  userns: Allow unprivileged use of setns.
  userns: Allow unprivileged users to create new namespaces
  userns: Allow setting a userns mapping to your current uid.
  userns: Allow chown and setgid preservation
  userns: Allow unprivileged users to create user namespaces.
  userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
  userns: fix return value on mntns_install() failure
  vfs: Allow unprivileged manipulation of the mount namespace.
  vfs: Only support slave subtrees across different user namespaces
  vfs: Add a user namespace reference from struct mnt_namespace
  ...
2012-12-17 15:44:47 -08:00
Josh Triplett
1f20dfdaed sysfs: Mark sysfs_attr_ns static
Nothing outside of fs/sysfs/file.c references this function, so mark it static.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 16:25:36 -08:00
Eric W. Biederman
4f326c0064 userns: Allow unprivilged mounts of proc and sysfs
- The context in which proc and sysfs are mounted have no
  effect on the the uid/gid of their files so no conversion is
  needed except allowing the mount.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:19:18 -08:00
Geert Uytterhoeven
66081a7251 sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()
The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24 15:57:14 -07:00
Robert P. J. Day
6f1cbd4a25 sysfs: Fix comment typo "sysf_create_link".
More pedantry.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-04 16:11:31 -07:00
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Jan Kara
14ae417c6f sysfs: Push file_update_time() into bin_page_mkwrite()
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:47 +04:00
Linus Torvalds
fa93669a19 Driver core merge for 3.6-rc1
Here's the big driver core pull request for 3.6-rc1.
 
 Unlike 3.5, this kernel should be a lot tamer, with the printk changes now
 settled down.  All we have here is some extcon driver updates, w1 driver
 updates, a few printk cleanups that weren't needed for 3.5, but are good to
 have now, and some other minor fixes/changes in the driver core.
 
 All of these have been in the linux-next releases for a while now.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAlARgIUACgkQMUfUDdst+ynDHgCfRNwIB9L+zZvjcKE5e1BhDbUl
 wVUAn398DFgbJ1+PjGkd1EMR2uVTh7Ou
 =MIFu
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core changes from Greg Kroah-Hartman:
 "Here's the big driver core pull request for 3.6-rc1.

  Unlike 3.5, this kernel should be a lot tamer, with the printk changes
  now settled down.  All we have here is some extcon driver updates, w1
  driver updates, a few printk cleanups that weren't needed for 3.5, but
  are good to have now, and some other minor fixes/changes in the driver
  core.

  All of these have been in the linux-next releases for a while now.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  printk: Export struct log size and member offsets through vmcoreinfo
  Drivers: hv: Change the hex constant to a decimal constant
  driver core: don't trigger uevent after failure
  extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device
  sysfs: fail dentry revalidation after namespace change fix
  sysfs: fail dentry revalidation after namespace change
  extcon: spelling of detach in function doc
  extcon: arizona: Stop microphone detection if we give up on it
  extcon: arizona: Update cable reporting calls and split headset
  PM / Runtime: Do not increment device usage counts before probing
  kmsg - do not flush partial lines when the console is busy
  kmsg - export "continuation record" flag to /dev/kmsg
  kmsg - avoid warning for CONFIG_PRINTK=n compilations
  kmsg - properly print over-long continuation lines
  driver-core: Use kobj_to_dev instead of re-implementing it
  driver-core: Move kobj_to_dev from genhd.h to device.h
  driver core: Move deferred devices to the end of dpm_list before probing
  driver core: move uevent call to driver_register
  driver core: fix shutdown races with probe/remove(v3)
  Extcon: Arizona: Add driver for Wolfson Arizona class devices
  ...
2012-07-26 11:25:33 -07:00
Andrew Morton
17f79be93d sysfs: fail dentry revalidation after namespace change fix
don't assume that KOBJ_NS_TYPE_NONE==0.  Also save a test-n-branch.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:43:55 -07:00
Glauber Costa
e5bcac6147 sysfs: fail dentry revalidation after namespace change
When we change the namespace tag of a sysfs entry, the associated dentry
is still kept around. readdir() will work correctly and not display the
old entries, but open() will still succeed, so will reads and writes.

This will no longer happen if sysfs is remounted, hinting that this is a
cache-related problem.

I am using the following sequence to demonstrate that:

shell1:
ip link add type veth
unshare -nm

shell2:
ip link set veth1 <pid_of_shell_1>
cat /sys/devices/virtual/net/veth1/ifindex

Before that patch, this will succeed (fail to fail). After it, it will
correctly return an error. Differently from a normal rename, which we
handle fine, changing the object namespace will keep it's path intact.
So this check seems necessary as well.

[ v2: get type from parent, as suggested by Eric Biederman ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Tejun Heo <tj@kernel.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:43:55 -07:00
David Howells
9249e17fe0 VFS: Pass mount flags to sget()
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:34 +04:00
Al Viro
e77fb7cef8 sysfs: just use d_materialise_unique()
same as for nfs et.al.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:12 +04:00
Al Viro
469796d105 sysfs: switch to ->s_d_op and ->d_release()
a) ->d_iput() is wrong here - what we do to inode is completely usual, it's
dentry->d_fsdata that we want to drop.  Just use ->d_release().

b) switch to ->s_d_op - no need to play with d_set_d_op()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:06 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro
0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Linus Torvalds
90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00
Eric W. Biederman
ab27b91b9f userns: Convert sysfs to use kgid/kuid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:29 -07:00
Alan Stern
356c05d58a sysfs: get rid of some lockdep false positives
This patch (as1554) fixes a lockdep false-positive report.  The
problem arises because lockdep is unable to deal with the
tree-structured locks created by the device core and sysfs.

This particular problem involves a sysfs attribute method that
unregisters itself, not from the device it was called for, but from a
descendant device.  Lockdep doesn't understand the distinction and
reports a possible deadlock, even though the operation is safe.

This is the sort of thing that would normally be handled by using a
nested lock annotation; unfortunately it's not feasible to do that
here.  There's no sensible way to tell sysfs when attribute removal
occurs in the context of a parent attribute method.

As a workaround, the patch adds a new flag to struct attribute
telling sysfs not to inform lockdep when it acquires a readlock on a
sysfs_dirent instance for the attribute.  The readlock is still
acquired, but lockdep doesn't know about it and hence does not
complain about impossible deadlock scenarios.

Also added are macros for static initialization of attribute
structures with the ignore_lockdep flag set.  The three offending
attributes in the USB subsystem are converted to use the new macros.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Tejun Heo <tj@kernel.org>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-14 12:19:56 -07:00
Jan Kara
dbd5768f87 vfs: Rename end_writeback() to clear_inode()
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:41 +08:00
Sasikantha babu
b4eafca113 sysfs: Removed dup_name entirely in sysfs_rename
Since no one using "dup_name", removed it completely in sysfs_rename.

Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-02 14:55:09 -07:00
Dan Williams
3a198886ab sysfs: handle 'parent deleted before child added'
In scsi at least two cases of the parent device being deleted before the
child is added have been observed.

1/ scsi is performing async scans and the device is removed prior to the
   async can thread running (can happen with an in-opportune / unlikely
   unplug during initial scan).

2/ libsas discovery event running after the parent port has been torn
   down (this is a bug in libsas).

Result in crash signatures like:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
 Stack:
  00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
  ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
  ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a

In this scenario the parent is still valid (because we have a
reference), but it has been device_del()'d which means its kobj->sd
pointer is NULL'd via:

 device_del()->kobject_del()->sysfs_remove_dir()

...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():

 return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;

This scenario is being fixed in scsi/libsas, but if other subsystems
present the same ordering the system need not immediately crash.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10 14:48:51 -07:00
Bruno Prémont
5631f2c18f sysfs: Prevent crash on unset sysfs group attributes
Do not let the kernel crash when a device is registered with
sysfs while group attributes are not set (aka NULL).

Warn about the offender with some information about the offending
device.

This would warn instead of trying NULL pointer deref like:
 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<ffffffff81152673>] internal_create_group+0x83/0x1a0
 PGD 0
 Oops: 0000 [#1] SMP
 CPU 0
 Modules linked in:

 Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc1-x86_64 #3 HP ProLiant DL360 G4
 RIP: 0010:[<ffffffff81152673>]  [<ffffffff81152673>] internal_create_group+0x83/0x1a0
 RSP: 0018:ffff88019485fd70  EFLAGS: 00010202
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
 RDX: ffff880192e99908 RSI: ffff880192e99630 RDI: ffffffff81a26c60
 RBP: ffff88019485fdc0 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff880192e99908 R11: 0000000000000000 R12: ffffffff81a16a00
 R13: ffff880192e99908 R14: ffffffff81a16900 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88019bc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper/0 (pid: 1, threadinfo ffff88019485e000, task ffff880194878000)
 Stack:
  ffff88019485fdd0 ffff880192da9d60 0000000000000000 ffff880192e99908
  ffff880192e995d8 0000000000000001 ffffffff81a16a00 ffff880192da9d60
  0000000000000000 0000000000000000 ffff88019485fdd0 ffffffff811527be
 Call Trace:
  [<ffffffff811527be>] sysfs_create_group+0xe/0x10
  [<ffffffff81376ca6>] device_add_groups+0x46/0x80
  [<ffffffff81377d3d>] device_add+0x46d/0x6a0
  ...

Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10 14:48:51 -07:00
Tom Goff
70fa4a62e9 sysfs: Update the name hash for an entry after changing the namespace
This is needed to allow renaming network devices that have been moved
to another network namespace.

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 15:33:00 -07:00