Commit Graph

150 Commits

Author SHA1 Message Date
Al Viro 7e0e953bb0 procfs: fix race between symlink removals and traversals
use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:43:12 -05:00
Linus Torvalds 50652963ea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS updates from Al Viro:
 "This cycle a lot of stuff sits on topical branches, so I'll be sending
  more or less one pull request per branch.

  This is the first pile; more to follow in a few.  In this one are
  several misc commits from early in the cycle (before I went for
  separate branches), plus the rework of mntput/dput ordering on umount,
  switching to use of fs_pin instead of convoluted games in
  namespace_unlock()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the IO-triggering parts of umount to fs_pin
  new fs_pin killing logics
  allow attaching fs_pin to a group not associated with some superblock
  get rid of the second argument of acct_kill()
  take count and rcu_head out of fs_pin
  dcache: let the dentry count go down to zero without taking d_lock
  pull bumping refcount into ->kill()
  kill pin_put()
  mode_t whack-a-mole: chelsio
  file->f_path.dentry is pinned down for as long as the file is open...
  get rid of lustre_dump_dentry()
  gut proc_register() a bit
  kill d_validate()
  ncpfs: get rid of d_validate() nonsense
  selinuxfs: don't open-code d_genocide()
2015-02-17 14:56:45 -08:00
Alexander Kuleshov 6bee55f94f fs: proc: use PDE() to get proc_dir_entry
Use the PDE() helper to get proc_dir_entry instead of coding it directly.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Al Viro d443b9fd56 gut proc_register() a bit
There are only 3 callers and quite a bit of that thing is executed
exactly in one of those.  Just lift it there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Nicolas Dichtel 2fc1e948e8 fs/proc.c: use rb_entry_safe() instead of rb_entry()
Better to use existing macro that rewriting them.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:09 -08:00
Debabrata Banerjee b208d54b75 procfs: fix error handling of proc_register()
proc_register() error paths are leaking inodes and directory refcounts.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:09 -08:00
Nicolas Dichtel 710585d492 fs/proc: use a rb tree for the directory entries
When a lot of netdevices are created, one of the bottleneck is the
creation of proc entries.  This serie aims to accelerate this part.

The current implementation for the directories in /proc is using a single
linked list.  This is slow when handling directories with large numbers of
entries (eg netdevice-related entries when lots of tunnels are opened).

This patch replaces this linked list by a red-black tree.

Here are some numbers:

dummy30000.batch contains 30 000 times 'link add type dummy'.

Before the patch:
  $ time ip -b dummy30000.batch
  real    2m31.950s
  user    0m0.440s
  sys     2m21.440s
  $ time rmmod dummy
  real    1m35.764s
  user    0m0.000s
  sys     1m24.088s

After the patch:
  $ time ip -b dummy30000.batch
  real    2m0.874s
  user    0m0.448s
  sys     1m49.720s
  $ time rmmod dummy
  real    1m13.988s
  user    0m0.000s
  sys     1m1.008s

The idea of improving this part was suggested by Thierry Herbelot.

[akpm@linux-foundation.org: initialise proc_root.subdir at compile time]
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Thierry Herbelot <thierry.herbelot@6wind.com>.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:09 -08:00
Alexey Dobriyan 4dcc03fc45 proc: make proc_subdir_lock static
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:22 -07:00
Alexey Dobriyan dbcdb50441 proc: add and remove /proc entry create checks
* remove proc_create(NULL, ...) check, let it oops

* warn about proc_create("", ...) and proc_create("very very long name", ...)
  proc code keeps length as u8, no 256+ name length possible

* warn about proc_create("123", ...)
  /proc/$PID and /proc/misc namespaces are separate things,
  but dumb module might create funky a-la $PID entry.

* remove post mortem strchr('/') check
  Triggering it implies either strchr() is buggy or memory corruption.
  It should be VFS check anyway.

In reality, none of these checks will ever trigger,
it is preparation for the next patch.

Based on patch from Al Viro.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:22 -07:00
Rui Xiang cdf7e8dded proc: set attributes of pde using accessor functions
Use existing accessors proc_set_user() and proc_set_size() to set
attributes.  Just a cleanup.

Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:01 -08:00
Al Viro b26d4cd385 consolidate simple ->d_delete() instances
Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-15 22:04:17 -05:00
Linus Torvalds fd3930f70c proc: more readdir conversion bug-fixes
In the previous commit, Richard Genoud fixed proc_root_readdir(), which
had lost the check for whether all of the non-process /proc entries had
been returned or not.

But that in turn exposed _another_ bug, namely that the original readdir
conversion patch had yet another problem: it had lost the return value
of proc_readdir_de(), so now checking whether it had completed
successfully or not didn't actually work right anyway.

This reinstates the non-zero return for the "end of base entries" that
had also gotten lost in commit f0c3b5093a ("[readdir] convert
procfs").  So now you get all the base entries *and* you get all the
process entries, regardless of getdents buffer size.

(Side note: the Linux "getdents" manual page actually has a nice example
application for testing getdents, which can be easily modified to use
different buffers.  Who knew? Man-pages can be useful)

Reported-by: Emmanuel Benisty <benisty.e@gmail.com>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Cc: Richard Genoud <richard.genoud@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-19 16:26:12 -07:00
Al Viro f0c3b5093a [readdir] convert procfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:32 +04:00
David Howells c30480b92c proc: Make the PROC_I() and PDE() macros internal to procfs
Make the PROC_I() and PDE() macros internal to procfs.  This means making
PDE_DATA() out of line.  This could be made more optimal by storing
PDE()->data into inode->i_private.

Also provide a __PDE_DATA() that is inline and internal to procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:47 -04:00
David Howells a8ca16ea7b proc: Supply a function to remove a proc entry by PDE
Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer.  This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Grant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:46 -04:00
David Howells 4a520d2769 proc: Supply an accessor for getting the data from a PDE's parent
Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.

This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required.  The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: Maxim Mikityanskiy <maxtram95@gmail.com>
cc: YAMANE Toshiaki <yamanetoshi@gmail.com>
cc: linux-wireless@vger.kernel.org
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:42 -04:00
David Howells 270b5ac215 proc: Add proc_mkdir_data()
Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation.  This means no access is then required to the proc_dir_entry
struct to set this.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Neela Syam Kolli <megaraidlinux@lsi.com>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
cc: linux-wireless@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:41 -04:00
David Howells 271a15eabe proc: Supply PDE attribute setting accessor functions
Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:18 -04:00
David Howells 3cb5bf1bf9 proc: Delete create_proc_read_entry()
Delete create_proc_read_entry() as it no longer has any users.

Also delete read_proc_t, write_proc_t, the read_proc member of the
proc_dir_entry struct and the support functions that use them.  This saves a
pointer for every PDE allocated.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29 15:42:00 -04:00
Al Viro 866ad9a747 procfs: preparations for remove_proc_entry() race fixes
* leave ->proc_fops alone; make ->pde_users negative instead
* trim pde_opener
* move relevant code in fs/proc/inode.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 15:16:51 -04:00
David Howells ad147d011f procfs: Clean up huge if-statement in __proc_file_read()
Switch huge if-statement in __proc_file_read() around.  This then puts the
single line loop break immediately after the if-statement and allows us to
de-indent the huge comment and make it take fewer lines.  The code following
the if-statement then follows naturally from the call to dp->read_proc().

Signed-off-by: David Howells <dhowells@redhat.com>
2013-04-09 15:16:50 -04:00
David Howells 80e928f7eb proc: Kill create_proc_entry()
Kill create_proc_entry() in favour of create_proc_read_entry(), proc_create()
and proc_create_data().

Signed-off-by: David Howells <dhowells@redhat.com>
2013-04-09 14:16:39 -04:00
Al Viro d9dda78bad procfs: new helper - PDE_DATA(inode)
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data.  Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:32 -04:00
Al Viro ee21ed0afc procfs: kill ->write_proc()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:32 -04:00
Al Viro b6cdc73103 procfs: don't allow to use proc_create, create_proc_entry, etc. for directories
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:14 -04:00
Al Viro 8ce584c741 procfs: add proc_remove_subtree()
just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:09:17 -04:00
Andrew Morton 87ebdc00ee fs/proc: clean up printks
- use pr_foo() throughout

- remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

- nuke a few warnings which I've never seen happen, ever.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:11 -08:00
Al Viro d3d009cb96 saner proc_get_inode() calling conventions
Make it drop the pde in *all* cases when no new reference to it is
put into an inode - both when an inode had already been set up
(as we were already doing) and when inode allocation has failed.
Makes for simpler logics in callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:14 -05:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Eric W. Biederman dfb2ea45be proc: Allow proc_free_inum to be called from any context
While testing the pid namespace code I hit this nasty warning.

[  176.262617] ------------[ cut here ]------------
[  176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
[  176.265145] Hardware name: Bochs
[  176.265677] Modules linked in:
[  176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
[  176.266564] Call Trace:
[  176.266564]  [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0
[  176.266564]  [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20
[  176.266564]  [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0
[  176.266564]  [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20
[  176.266564]  [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50
[  176.266564]  [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80
[  176.266564]  [<ffffffff8111d195>] put_pid_ns+0x35/0x50
[  176.266564]  [<ffffffff810c608a>] put_pid+0x4a/0x60
[  176.266564]  [<ffffffff8146b177>] tty_ioctl+0x717/0xc10
[  176.266564]  [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90
[  176.266564]  [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10
[  176.266564]  [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70
[  176.266564]  [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550
[  176.266564]  [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60
[  176.266564]  [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80
[  176.266564]  [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0
[  176.266564]  [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0
[  176.266564]  [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50
[  176.266564]  [<ffffffff81939199>] system_call_fastpath+0x16/0x1b
[  176.266564] ---[ end trace 387af88219ad6143 ]---

It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
put_pid is called with another spinlock held and irqs disabled.

For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25 16:23:12 -08:00
Linus Torvalds 4c9a44aebe Merge branch 'akpm' (Andrew's patch-bomb)
Merge the rest of Andrew's patches for -rc1:
 "A bunch of fixes and misc missed-out-on things.

  That'll do for -rc1.  I still have a batch of IPC patches which still
  have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
  keys: use keyring_alloc() to create module signing keyring
  keys: fix unreachable code
  sendfile: allows bypassing of notifier events
  SGI-XP: handle non-fatal traps
  fat: fix incorrect function comment
  Documentation: ABI: remove testing/sysfs-devices-node
  proc: fix inconsistent lock state
  linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
  memcg: don't register hotcpu notifier from ->css_alloc()
  checkpatch: warn on uapi #includes that #include <uapi/...
  revert "rtc: recycle id when unloading a rtc driver"
  mm: clean up transparent hugepage sysfs error messages
  hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
  hfsplus: rework processing of hfs_btree_write() returned error
  hfsplus: rework processing errors in hfsplus_free_extents()
  hfsplus: avoid crash on failed block map free
  kcmp: include linux/ptrace.h
  drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
  mm: cma: WARN if freed memory is still in use
  exec: do not leave bprm->interp on stack
  ...
2012-12-20 20:00:43 -08:00
Xiaotian Feng ee297209bf proc: fix inconsistent lock state
Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq.  Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.

Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

  =================================
  [ INFO: inconsistent lock state ]
  3.7.0 #36 Not tainted
  ---------------------------------
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
   (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
  {SOFTIRQ-ON-W} state was registered at:
     __lock_acquire+0x8ae/0xca0
     lock_acquire+0x199/0x200
     _raw_spin_lock+0x41/0x50
     proc_alloc_inum+0x4c/0xd0
     alloc_mnt_ns+0x49/0xc0
     create_mnt_ns+0x25/0x70
     mnt_init+0x161/0x1c7
     vfs_caches_init+0x107/0x11a
     start_kernel+0x348/0x38c
     x86_64_start_reservations+0x131/0x136
     x86_64_start_kernel+0x103/0x112
  irq event stamp: 2993422
  hardirqs last  enabled at (2993422):  _raw_spin_unlock_irqrestore+0x55/0x80
  hardirqs last disabled at (2993421):  _raw_spin_lock_irqsave+0x29/0x70
  softirqs last  enabled at (2993394):  _local_bh_enable+0x13/0x20
  softirqs last disabled at (2993395):  call_softirq+0x1c/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(proc_inum_lock);
    <Interrupt>
      lock(proc_inum_lock);

   *** DEADLOCK ***

  no locks held by swapper/1/0.

  stack backtrace:
  Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
  Call Trace:
   <IRQ>  [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
    print_usage_bug+0x2a5/0x2c0
    mark_lock+0x33b/0x5e0
    __lock_acquire+0x813/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_free_inum+0x1c/0x50
    free_pid_ns+0x1c/0x50
    put_pid_ns+0x2e/0x50
    put_pid+0x4a/0x60
    delayed_put_pid+0x12/0x20
    rcu_process_callbacks+0x462/0x790
    __do_softirq+0x1b4/0x3b0
    call_softirq+0x1c/0x30
    do_softirq+0x59/0xd0
    irq_exit+0x54/0xd0
    smp_apic_timer_interrupt+0x95/0xa3
    apic_timer_interrupt+0x72/0x80
    cpuidle_enter_tk+0x10/0x20
    cpuidle_enter_state+0x17/0x50
    cpuidle_idle_call+0x287/0x520
    cpu_idle+0xba/0x130
    start_secondary+0x2b3/0x2bc

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20 17:40:20 -08:00
Marco Stornelli 46f6955710 procfs: drop vmtruncate
Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20 14:00:01 -05:00
Eric W. Biederman 33d6dce607 proc: Generalize proc inode allocation
Generalize the proc inode allocation so that it can be
used without having to having to create a proc_dir_entry.

This will allow namespace file descriptors to remain light
weight entitities but still have the same inode number
when the backing namespace is the same.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:19 -08:00
yan 17baa2a2c4 proc: use kzalloc instead of kmalloc and memset
Part of the memory will be written twice after this change, but that
should be negligible.

[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan 620727506d proc: return -ENOMEM when inode allocation failed
If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion.  proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:17 +09:00
Al Viro 00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro d161a13f97 switch procfs to umode_t use
both proc_dir_entry ->mode and populating functions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Miklos Szeredi bfe8684869 filesystems: add set_nlink()
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
David Howells 09570f9149 proc: make struct proc_dir_entry::name a terminal array rather than a pointer
Since __proc_create() appends the name it is given to the end of the PDE
structure that it allocates, there isn't a need to store a name pointer.
Instead we can just replace the name pointer with a terminal char array of
_unspecified_ length.  The compiler will simply append the string to statically
defined variables of PDE type overlapping any hole at the end of the structure
and, unlike specifying an explicitly _zero_ length array, won't give a warning
if you try to statically initialise it with a string of more than zero length.

Also, whilst we're at it:

 (1) Move namelen to end just prior to name and reduce it to a single byte
     (name shouldn't be longer than NAME_MAX).

 (2) Move pde_unload_lock two places further on so that if it's four bytes in
     size on a 64-bit machine, it won't cause an unused hole in the PDE struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-27 12:50:45 -07:00
Alexey Dobriyan 011159a0a7 airo: correct proc entry creation interfaces
* use proc_mkdir_mode() instead of create_proc_entry(S_IFDIR|...),
  export proc_mkdir_mode() for that, oh well.
* don't supply S_IFREG to proc_create_data(), it's unnecessary

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-05-16 14:25:28 -04:00
Alexey Dobriyan 312ec7e50c proc: make struct proc_dir_entry::namelen unsigned int
1. namelen is declared "unsigned short" which hints for "maybe space savings".
   Indeed in 2.4 struct proc_dir_entry looked like:

        struct proc_dir_entry {
                unsigned short low_ino;
                unsigned short namelen;

   Now, low_ino is "unsigned int", all savings were gone for a long time.
   "struct proc_dir_entry" is not that countless to worry about it's size,
   anyway.

2. converting from unsigned short to int/unsigned int can only create
   problems, we better play it safe.

Space is not really conserved, because of natural alignment for the next
field.  sizeof(struct proc_dir_entry) remains the same.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:37 -07:00
Alexey Dobriyan 3740a20c4f proc: less LOCK/UNLOCK in remove_proc_entry()
For the common case where a proc entry is being removed and nobody is in
the process of using it, save a LOCK/UNLOCK pair.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Alexey Dobriyan 6d1b6e4eff proc: ->low_ino cleanup
- ->low_ino is write-once field -- reading it under locks is unnecessary.

- /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
   PROC_DYNAMIC_FIRST check never triggers.

- in proc_get_inode(), inode number always matches proc dir entry, so
  save one parameter.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Nick Piggin fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Christoph Hellwig 1025774ce4 remove inode_setattr
Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:37 -04:00
Amerigo Wang 57f87869f0 proc: remove obsolete comments
A quick test shows these comments are obsolete, so just remove them.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Alexey Dobriyan 12bac0d9f4 proc: warn on non-existing proc entries
* warn if creation goes on to non-existent directory
* warn if removal goes on from non-existing directory
* warn if non-existing proc entry is removed

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Alexey Dobriyan e17a5765f2 proc: do translation + unlink atomically at remove_proc_entry()
remove_proc_entry() does

	lock
	lookup parent
	unlock
	lock
	unlink proc entry from lists
	unlock

which can be made bit more correct by doing parent translation + unlink
without dropping lock.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
Helight.Xu 587d4a17d8 some clean up in fs/proc
EXPORT_SYMBOL(proc_symlink);
EXPORT_SYMBOL(proc_mkdir);
EXPORT_SYMBOL(create_proc_entry);
EXPORT_SYMBOL(proc_create_data);
EXPORT_SYMBOL(remove_proc_entry);

Those EXPORT_SYMBOL shouldn't be in fs/proc/root.c,
should be in fs/proc/generic.c.

Signed-off-by: Helight.Xu <helight.xu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 13:00:18 -05:00
Alexey Dobriyan 135d5655dc proc: rename de_get() to pde_get() and inline it
* de_get() is trivial -- make inline, save a few bits of code, drop
  "refcount is 0" check -- it should be done in some generic refcount
  code, don't recall it's was helpful

* rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

* remove obvious and incorrent comments

* in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
  be normal one, remove_proc_entry() was supposed to do "-1" and code now
  reflects that.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:57 -08:00
Alexey Dobriyan 3dec7f59c3 proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
struct proc_dir_entry::owner is going to be removed. Now it's only necessary
to protect PDEs which are using ->read_proc, ->write_proc hooks.

However, ->owner assignments are racy and make it very easy for someone to switch
->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

So, ->owner is on death row.

Proxy file operations exist already (proc_file_operations), just bump usecount
when necessary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-03-31 01:14:27 +04:00
Randy Dunlap 1681bc30f2 proc: move fs/proc/inode-alloc.txt comment into a source file
so that people will realize that it exists and can update it as needed.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-03-31 01:13:12 +04:00
Al Viro d72f71eb0e constify dentry_operations: procfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27 14:44:01 -04:00
Alexey Dobriyan b4df2b92d8 proc: stop using BKL
There are four BKL users in proc: de_put(), proc_lookup_de(),
proc_readdir_de(), proc_root_readdir(),

1) de_put()
-----------
de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
needed. BKL doesn't matter to possible refcount leak as well.

2) proc_lookup_de()
-------------------
Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
potentially blocking, all callers of proc_lookup_de() eventually end up
from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
doesn't protect anything.

3) proc_readdir_de()
--------------------
"." and ".." part doesn't need BKL, walking PDE list is under
proc_subdir_lock, calling filldir callback is potentially blocking
because it writes to luserspace. All proc_readdir_de() callers
eventually come from ->readdir hook which is under directory's
->i_mutex -- BKL doesn't protect anything.

4) proc_root_readdir_de()
-------------------------
proc_root_readdir_de is ->readdir hook, see (3).

Since readdir hooks doesn't use BKL anymore, switch to
generic_file_llseek, since it also takes directory's i_mutex.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-05 12:27:44 +03:00
Arjan van de Ven 6c2f91e077 proc: use WARN() rather than printk+backtrace
Use WARN() rather than a printk() + backtrace();
this gives a more standard format message as well as complete
information (including line numbers etc) that will be collected
by kerneloops.org

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:34:38 +04:00
Alexey Dobriyan 665020c35e proc: more debugging for "already registered" case
Print parent directory name as well.

The aim is to catch non-creation of parent directory when proc_mkdir will
return NULL and all subsequent registrations go directly in /proc instead
of intended directory.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Fixed insane printk string while at it.  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-13 14:41:50 -07:00
Alexey Dobriyan cc99609917 [PATCH] proc: inode number fixlet
Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.

One still needs 2^28 proc entries to notice this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:03 -04:00
Alexey Dobriyan 9a18540915 [PATCH 2/2] proc: switch inode number allocation to IDA
proc doesn't use "associate pointer with id" feature of IDR, so switch
to IDA.

NOTE, NOTE, NOTE:
	Do not apply if release_inode_number() still mantions MAX_ID_MASK!

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01 11:25:28 -04:00
Alexey Dobriyan 67935df49d [PATCH 1/2] proc: fix inode number bogorithmetic
Id which proc gets from IDR for inode number and id which proc removes
from IDR do not match. E.g. 0x11a transforms into 0x8000011a.

Which stayed unnoticed for a long time because, surprise, idr_remove()
masks out that high bit before doing anything.

All of this due to "| ~MAX_ID_MASK" in release_inode_number().

I still don't understand how it's supposed to work, because "| ~MASK"
is not an inversion for "& MAX" operation.

So, use just one nice, working addition. Make start offset unsigned int,
while I'm at it. It's longness is not used anywhere.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01 11:25:27 -04:00
Arjan van de Ven 267e2a9c71 Use WARN() in fs/proc/
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
This way, the entire if() {} section can collapse into the WARN() as well.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Alexey Dobriyan 881adb8535 proc: always do ->release
Current two-stage scheme of removing PDE emphasizes one bug in proc:

		open
				rmmod
				remove_proc_entry
		close

->release won't be called because ->proc_fops were cleared.  In simple
cases it's small memory leak.

For every ->open, ->release has to be done.  List of openers is introduced
which is traversed at remove_proc_entry() if neeeded.

Discussions with Al long ago (sigh).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:44 -07:00
Denis V. Lunev 78e92b99ec netns: assign PDE->data before gluing entry into /proc tree
In this unfortunate case, proc_mkdir_mode wrapper can't be used anymore and
this is no way to reuse proc_create_data due to nlinks assignment. So,
copy the code from proc_mkdir and assign PDE->data at the appropriate
moment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 04:12:41 -07:00
Denis V. Lunev 59b7435149 proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code.  The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:

    Typical PDE creation code looks like:

    	pde = create_proc_entry("foo", 0, NULL);
    	if (pde)
    		pde->proc_fops = &foo_proc_fops;

    Notice that PDE is first created, only then ->proc_fops is set up to
    final value. This is a problem because right after creation
    a) PDE is fully visible in /proc , and
    b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
       possible to ->read without ->open (see one class of oopses below).

    The fix is new API called proc_create() which makes sure ->proc_fops are
    set up before gluing PDE to main tree. Typical new code looks like:

    	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
    	if (!pde)
    		return -ENOMEM;

    Fix most networking users for a start.

    In the long run, create_proc_entry() for regular files will go.

In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.

create_proc_entries is replaced in the entire kernel code as new method
is also simply better.

This patch:

The problem is the same as for de->proc_fops.  Right now PDE becomes visible
without data set.  So, the entry could be looked up without data.  This, in
most cases, will simply OOPS.

proc_create_data call is created to address this issue.  proc_create now
becomes a wrapper around it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan 8731f14d37 proc: remove ->get_info infrastructure
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.

P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
      is long-standing crap, BTW, thus
      a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
      b) new such users should be rejected,
      c) everyone is encouraged to convert his favourite ->read_proc user or
         I'll do it, lazy bastards.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:19 -07:00
Alexey Dobriyan 5e971dce0b proc: drop several "PDE valid/invalid" checks
proc-misc code is noticeably full of "if (de)" checks when PDE passed is
always valid.  Remove them.

Addition of such check in proc_lookup_de() is for failed lookup case.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan 7cee4e00e0 proc: less special case in xlate code
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail.  However, if NULL is passed as parent then create/remove
accept full path as a argument.  This is arbitrary restriction -- all
infrastructure is in place.

So, patch allows the following to succeed:

	create_proc_entry("foo/bar", 0, pde_baz);
	remove_proc_entry("baz/foo/bar", &proc_root);

Also makes the following to behave identically:

	create_proc_entry("foo/bar", 0, NULL);
	create_proc_entry("foo/bar", 0, &proc_root);

Discrepancy noticed by Den Lunev (IIRC).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan f649d6d326 proc: simplify locking in remove_proc_entry()
proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan e93b4ea20a proc: print more information when removing non-empty directories
This usually saves one recompile to insert similar printk like below. :)

Sample nastygram:

remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G   M     2.6.25-rc1 #5

Call Trace:
 [<ffffffff80231974>] warn_on_slowpath+0x64/0x90
 [<ffffffff80232a6e>] printk+0x4e/0x60
 [<ffffffff802d6c8a>] remove_proc_entry+0x18a/0x200
 [<ffffffff8045cd88>] mutex_lock_nested+0x1c8/0x2d0
 [<ffffffff8025f0f0>] __try_stop_module+0x0/0x40
 [<ffffffff8025effd>] sys_delete_module+0x14d/0x200
 [<ffffffff8045df3d>] lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff8031c307>] __up_read+0x27/0xa0
 [<ffffffff8045decc>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b6ab>] system_call_after_swapgs+0x7b/0x80

---[ end trace 10ef850597e89c54 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Pavel Emelyanov e9720acd72 [NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
  screwup pointed out by Stephen.

  To get the correct nlink count the ->getattr callback for /proc/net
  is overridden to read one from the net->proc_net entry.

  To make selinux still work the net->proc_net entry is initialized
  properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
Alexey Dobriyan 2d3a4e3666 proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:

	pde = create_proc_entry("foo", 0, NULL);
	if (pde)
		pde->proc_fops = &foo_proc_fops;

Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
   possible to ->read without ->open (see one class of oopses below).

The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
	if (!pde)
		return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
       00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
       00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
 [<c106f7ce>] seq_read+0x24/0x28a
 [<c106f7aa>] seq_read+0x0/0x28a
 [<c106f7ce>] seq_read+0x24/0x28a
 [<c106f7aa>] seq_read+0x0/0x28a
 [<c10818b8>] proc_reg_read+0x60/0x73
 [<c1081858>] proc_reg_read+0x0/0x73
 [<c105a34f>] vfs_read+0x6c/0x8b
 [<c105a6f3>] sys_read+0x3c/0x63
 [<c10025f2>] sysenter_past_esp+0x5f/0xa5
 [<c10697a7>] destroy_inode+0x24/0x33
 =======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:24 -08:00
Zhang Rui 94413d8807 proc: detect duplicate names on registration
Print a warning if PDE is registered with a name which already exists in
target directory.

Bug report and a simple fix can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=8798

[\n fixlet and no undescriptive variable usage --adobriyan]
[akpm@linux-foundation.org: make printk comprehensible]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Alexey Dobriyan fd2cbe4888 proc: remove useless check on symlink removal
proc symlinks always have valid ->data containing destination of symlink.  No
need to check it on removal -- proc_symlink() already done it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Alexey Dobriyan 76df0c25d0 proc: simplify function prototypes
Move code around so as to reduce the number of forward-declarations.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Alexey Dobriyan 4237e0d3de proc: less LOCK operations during lookup
Pseudo-code for lookup effectively is:

	LOCK kernel
	LOCK proc_subdir_lock
		find PDE
		UNLOCK proc_subdir_lock

		get inode

		LOCK proc_subdir_lock
		goto unlock
	UNLOCK proc_subdir_lock
	UNLOCK kernel

We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
to unlock_kernel() directly.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Eric W. Biederman 3790ee4bd8 proc: remove/Fix proc generic d_revalidate
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-10 19:43:55 -08:00
Alexey Dobriyan 5a622f2d0f proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&de->count) == 0)
					if (atomic_dec_and_test(&de->count))
						if (de->deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de->deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de->deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [<c10ac4f0>] vsnprintf+0x2ad/0x49b
 [<c10ac779>] vscnprintf+0x14/0x1f
 [<c1018e6b>] vprintk+0xc5/0x2f9
 [<c10379f1>] handle_fasteoi_irq+0x0/0xab
 [<c1004f44>] do_IRQ+0x9f/0xb7
 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
 [<c100264e>] need_resched+0x1f/0x21
 [<c10190ba>] printk+0x1b/0x1f
 [<c107c8ad>] de_put+0x3d/0x50
 [<c107c8f8>] proc_delete_inode+0x38/0x41
 [<c107c8c0>] proc_delete_inode+0x0/0x41
 [<c1066298>] generic_delete_inode+0x5e/0xc6
 [<c1065aa9>] iput+0x60/0x62
 [<c1063c8e>] d_kill+0x2d/0x46
 [<c1063fa9>] dput+0xdc/0xe4
 [<c10571a1>] __fput+0xb0/0xcd
 [<c1054e49>] filp_close+0x48/0x4f
 [<c1055ee9>] sys_close+0x67/0xa5
 [<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:20 -08:00
Linus Torvalds 8002cedc1a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
2007-12-03 08:15:36 -08:00
Eric W. Biederman 2b1e300a9d [NETNS]: Fix /proc/net breakage
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify ->lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-12-02 00:33:17 +11:00
Alexey Dobriyan c2319540cd proc: fix NULL ->i_fop oops
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file->f_op->readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
   could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
   readdir will return only "." and "..", because pde->next was cleared by
   remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
   fixed, just race window made smaller, because vfs_readdir() is run
   without sb_lock held and without file_list_lock held.  Effectively,
   ->i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
       00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
       00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
 [<c1061040>] filldir64+0x0/0xc5
 [<c1061295>] sys_getdents64+0x63/0xa5
 [<c10026ba>] sysenter_past_esp+0x5f/0x85
 =======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
      Unfortunately we have another copy of this junk in
      security/selinux/selinuxfs.c:sel_remove_entries() which would need the
      same treatment."

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Eric W. Biederman e1a1c997af proc: fix proc_kill_inodes to kill dentries on all proc superblocks
It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks.  Handle
that now.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:38 -08:00
Mel Gorman e12ba74d8f Group short-lived and reclaimable kernel allocations
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Changli Gao 99fc06df72 procfs directory entry cleanup
Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:43 -07:00
Alexey Dobriyan 786d7e1612 Fix rmmod/read/write races in /proc entries
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
   meanwhile. Or, more generically, system call done on /proc file, method
   supplied by module is called, module dissapeares meanwhile.

   pde = create_proc_entry()
   if (!pde)
	return -ENOMEM;
   pde->write_proc = ...
				open
				write
				copy_from_user
   pde = create_proc_entry();
   if (!pde) {
	remove_proc_entry();
	return -ENOMEM;
	/* module unloaded */
   }
				*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

  remove_proc_entry		vfs_read
  proc_kill_inodes		[check ->f_op validness]
				[check ->f_op->read validness]
				[verify_area, security permissions checks]
	->f_op = NULL;
				if (file->f_op->read)
					/* ->f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:39 -07:00
Darrick J. Wong 59cd0cbc75 Fix race between proc_readdir and remove_proc_entry
Fix the following race:

proc_readdir				remove_proc_entry
============				=================

spin_lock(&proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&proc_subdir_lock);
					spin_lock(&proc_subdir_lock);
					[find PDE]
					[free PDE, refcount is 0]
					spin_unlock(&proc_subdir_lock);
		    /* boom */
if (filldir(dirent, de->name, ...

[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Alexey Dobriyan 7695650a92 Fix race between proc_get_inode() and remove_proc_entry()
proc_lookup				remove_proc_entry
===========				=================

lock_kernel();
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
spin_unlock(&proc_subdir_lock);
					spin_lock(&proc_subdir_lock);
					[find PDE with refcount 0]
					[check refcount and free PDE]
					spin_unlock(&proc_subdir_lock);
proc_get_inode:
	de_get(de); /* boom */

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:01 -07:00
Eric W. Biederman 77b14db502 [PATCH] sysctl: reimplement the sysctl proc support
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files.  Making it possible to have /proc/sys vary
depending on the namespace you are in.  The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible.  By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:10:00 -08:00
Arjan van de Ven c5ef1c42c5 [PATCH] mark struct inode_operations const 3
Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Arjan van de Ven 00977a59b9 [PATCH] mark struct file_operations const 6
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Josef "Jeff" Sipek 2fddfeefee [PATCH] proc: change uses of f_{dentry, vfsmnt} to use f_path
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the proc
filesystem code.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:41 -08:00
Arjan van de Ven 99ac48f54a [PATCH] mark f_ops const in the inode
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Steven Rostedt 64a07bd82e [PATCH] protect remove_proc_entry
It has been discovered that the remove_proc_entry has a race in the removing
of entries in the proc file system that are siblings.  There's no protection
around the traversing and removing of elements that belong in the same
subdirectory.

This subdirectory list is protected in other areas by the BKL.  So the BKL was
at first used to protect this area too, but unfortunately, remove_proc_entry
may be called with spinlocks held.  The BKL may schedule, so this was not a
solution.

The final solution was to add a new global spin lock to protect this list,
called proc_subdir_lock.  This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that this
list is modified and added this protection there too.  Care must be taken
since these locations call several functions that may also schedule.

Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions of
the spin lock was _not_ used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
Adrian Bunk fee781e6c2 [PATCH] fs/proc/: function prototypes belong in header files
Function prototypes belong into header files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:03 -08:00
Linus Torvalds 8b90db0df7 Insanity avoidance in /proc
The old /proc interfaces were never updated to use loff_t, and are just
generally broken.  Now, we should be using the seq_file interface for
all of the proc files, but converting the legacy functions is more work
than most people care for and has little upside..

But at least we can make the non-LFS rules explicit, rather than just
insanely wrapping the offset or something.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-30 08:39:10 -08:00
Eric Dumazet 2f51201662 [PATCH] reduce sizeof(struct file)
Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
file_free_rcu) time, to reduce the size of this critical kernel object.

The trick I used is to move f_rcuhead and f_list in an union called f_u

The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
becomes f_u.f_list

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:19 -08:00
Miklos Szeredi 2b579beec2 [PATCH] proc: link count fix
This patch fixes bug titled "sunrpc as module and bad proc/sys link count"
reported by Jiri Slaby.

The problem was, that only proc_dir_entry->nlink was updated and the
corresponding inode->i_nlink was not.  The fix is to implement the
inode->getattr() method, and update i_nlink (if necessary).

A quick audit of proc code shows that no other attribute changes after
creation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:28 -07:00
Al Viro 008b150a3c [PATCH] Fix up symlink function pointers
This fixes up the symlink functions for the calling convention change:

 * afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
   smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
 * befs, smbfs, xfs - same for ->put_link()

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-19 18:08:21 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00