Commit Graph

173 Commits

Author SHA1 Message Date
Oleg Nesterov 34c8f07b9a signals: handle_stop_signal: don't worry about SIGKILL
handle_stop_signal() clears SIGNAL_STOP_DEQUEUED when sig == SIGKILL.  Remove
this nasty special case.  It was needed to prevent the race with group stop
and exit caused by thread-specific SIGKILL.  Now that we use complete_signal()
for private signals too this is not needed, complete_signal() will notice
SIGKILL and abort the soon-to-begin group stop.

Except: the target thread is dead (has PF_EXITING).  But in that case we
should not just clear SIGNAL_STOP_DEQUEUED and nothing more.  We should either
kill the whole thread group, or silently ignore the signal.

I suspect we are not right wrt zombie leaders, but this is another issue which
and should be fixed separately.  Note that this check can't abort the group
stop if it was already started/finished, this check only adds a subtle side
effect if we race with the thread which has already dequeued sig_kernel_stop()
signal and temporary released ->siglock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov ac5c215383 signals: join send_sigqueue() with send_group_sigqueue()
We export send_sigqueue() and send_group_sigqueue() for the only user,
posix_timer_event().  This is a bit silly, because both are just trivial
helpers on top of do_send_sigqueue() and because the we pass the unused
.si_signo parameter.

Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov e62e6650e9 signals: unify send_sigqueue/send_group_sigqueue completely
Suggested by Pavel Emelyanov.

send_sigqueue/send_group_sigqueue are only differ in how they lock ->siglock.
Unify them.  send_group_sigqueue() uses spin_lock() because it knows the task
can't exit, but in that case lock_task_sighand() can't fail and doesn't hurt.

Note that the "sig" argument is ignored, it is always equal to ->si_signo.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Pavel Emelyanov 4cd4b6d4e0 signals: fold complete_signal() into send_signal/do_send_sigqueue
Factor out complete_signal() callsites.  This change completely unifies the
helpers sending the specific/group signals.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov 5fcd835bf8 signals: use __group_complete_signal() for the specific signals too
Based on Pavel Emelyanov's suggestion.

Rename __group_complete_signal() to complete_signal() and use it to process
the specific signals too.  To do this we simply add the "int group" argument.

This allows us to greatly simply the signal-sending code and adds a useful
behaviour change.  We can avoid the unneeded wakeups for the private signals
because wants_signal() is more clever than sigismember(blocked), but more
importantly we now take into account the fatal specific signals too.

The latter allows us to kill some subtle checks in handle_stop_signal() and
makes the specific/group signal's behaviour more consistent.  For example,
currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was the
signal sent by kill() or tkill() if the signal was not blocked.

And.  This allows us to tweak/fix the behaviour when the specific signal is
sent to the dying/dead ->group_leader.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov 2ca3515aa5 signals: change send_signal/do_send_sigqueue to take "boolean group" parameter
send_signal() is used either with ->pending or with ->signal->shared_pending.
Change it to take "int group" instead, this argument will be re-used later.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 71f11dc025 signals: move the definition of __group_complete_signal() up
Move the unchanged definition of __group_complete_signal() so that send_signal
can see it.  To simplify the reading of the next patches.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov db51aeccd7 signals: microoptimize the usage of ->curr_target
Suggested by Roland McGrath.

Initialize signal->curr_target in copy_signal().  This way ->curr_target is
never == NULL, we can kill the check in __group_complete_signal's hot path.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 08d2c30ce9 signals: send_sig_info: don't take tasklist_lock
The comment in send_sig_info() is wrong, tasklist_lock can't help.

The caller must ensure the task can't go away, otherwise ->sighand can be NULL
even before we take the lock.

p->sighand could be changed by exec(), but I can't imagine how it is possible
to prevent exit(), but not exec().

Since the things seem to work, I assume all callers are correct.  However,
drm_vbl_send_signals() looks broken.  block_all_signals() which is solely used
by drm is definitely broken.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 3547ff3aef signals: do_tkill: don't use tasklist_lock
Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid
taking tasklist lock.

Note that we don't return an error if lock_task_sighand() fails, we pretend
the task dies after receiving the signal.  Otherwise, we should fight with the
nasty races with mt-exec without having any advantage.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 6e65acba7c signals: move handle_stop_signal() into send_signal()
Move handle_stop_signal() into send_signal().  This factors out a couple of
callsites and allows us to do further unifications.

Also, with this change specific_send_sig_info() does handle_stop_signal().
Not that this is really important, we never send STOP/CONT via send_sig() and
friends, but still this looks more consistent.

The only (afaics) special case is get_signal_to_deliver().  If the traced task
dequeues SIGCONT, it can re-send it to itself after ptrace_stop() if the
signal was blocked by debugger.  In that case handle_stop_signal() is
unnecessary, but hopefully not a problem.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov c99fcf28b8 signals: send_group_sigqueue: don't take tasklist_lock
handle_stop_signal() was changed, now send_group_sigqueue() doesn't need
tasklist_lock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov f8c5b5c06f signals: __group_complete_signal: cache the value of p->signal
Cosmetic, cache p->signal to make the code a bit more readable.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 5fc894bb4f signals: send_sigqueue: don't forget about handle_stop_signal()
send_group_sigqueue() calls handle_stop_signal(), send_sigqueue() doesn't.
This is not consistent and in fact I'd say this is (minor) bug.

Move handle_stop_signal() from send_group_sigqueue() to do_send_sigqueue(),
the latter is called by send_sigqueue() too.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov 5c193e8871 signals: send_sigqueue: don't take rcu lock
lock_task_sighand() was changed, send_sigqueue() doesn't need rcu_read_lock()
any longer.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov f6b76d4fb0 get_signal_to_deliver: use the cached ->signal/sighand values
Cache the values of current->signal/sighand.  Shrinks .text a bit and makes
the code more readable.  Also, remove "sigset_t *mask", it is pointless
because in fact we save the constant offset.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:35 -07:00
Oleg Nesterov ad16a46069 handle_stop_signal: use the cached p->signal value
Cache the value of p->signal, and change the code to use while_each_thread()
helper.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov fc321d2e60 handle_stop_signal: unify partial/full stop handling
Now that handle_stop_signal() doesn't drop ->siglock, we can't see both
->group_stop_count && SIGNAL_STOP_STOPPED.  Merge two "if" branches.

As Roland pointed out, we never actually needed 2 do_notify_parent_cldstop()
calls.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov 6ca25b5513 kill_pid_info: don't take now unneeded tasklist_lock
Previously handle_stop_signal(SIGCONT) could drop ->siglock.  That is why
kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't
go away after unlock.  Not needed now.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov e442055193 signals: re-assign CLD_CONTINUED notification from the sender to reciever
Based on discussion with Jiri and Roland.

In short: currently handle_stop_signal(SIGCONT, p) sends the notification to
p->parent, with this patch p itself notifies its parent when it becomes
running.

handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify
the parent with do_notify_parent_cldstop().  This leads to multiple problems:

	- as Jiri Kosina pointed out, the stopped task can resume without
	  actually seeing SIGCONT which may have a handler.

	- we race with another sig_kernel_stop() signal which may come in
	  that window.

	- we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT
	  in that window.

	- we can't avoid taking tasklist_lock() while sending SIGCONT.

With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED
flag in p->signal->flags and returns.  The notification is sent by the first
task which returns from finish_stop() (there should be at least one) or any
other signalled thread from get_signal_to_deliver().

This is a user-visible change.  Say, currently kill(SIGCONT, stopped_child)
can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed
unpredictably.  Another difference is that if the child is ptraced by another
process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach()
while currently it always goes to the tracer which doesn't actually need this
notification.  Hopefully not a problem.

The patch asks for the futher obvious cleanups, I'll send them separately.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov 3b5e9e53c6 signals: cleanup security_task_kill() usage/implementation
Every implementation of ->task_kill() does nothing when the signal comes from
the kernel.  This is correct, but means that check_kill_permission() should
call security_task_kill() only for SI_FROMUSER() case, and we can remove the
same check from ->task_kill() implementations.

(sadly, check_kill_permission() is the last user of signal->session/__session
 but we can't s/task_session_nr/task_session/ here).

NOTE: Eric W.  Biederman pointed out cap_task_kill() should die, and I think
he is very right.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: David Quigley <dpquigl@tycho.nsa.gov>
Cc: Eric Paris <eparis@redhat.com>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Pavel Emelyanov 9e3bd6c3fb signals: consolidate send_sigqueue and send_group_sigqueue
Both functions do the same thing after proper locking, but with
different sigpending structs, so move the common code into a helper.

After this we have 4 places that look very similar: send_sigqueue: calls
do_send_sigqueue and signal_wakeup send_group_sigqueue: calls
do_send_sigqueue and __group_complete_signal __group_send_sig_info:
calls send_signal and __group_complete_signal specific_send_sig_info:
calls send_signal and signal_wakeup

Besides, send_signal performs actions similar to do_send_sigqueue's
and __group_complete_signal - to signal_wakeup.

It looks like they can be consolidated gracefully.

Oleg said:

  Personally, I think this change is very good.  But send_sigqueue() and
  send_group_sigqueue() have a very subtle difference which I was never able
  to understand.

  Let's suppose that sigqueue is already queued, and the signal is ignored
  (the latter means we should re-schedule cpu timer or handle overrruns).  In
  that case send_sigqueue() returns 0, but send_group_sigqueue() returns 1.

  I think this is not the problem (in fact, I think this patch makes the
  behaviour more correct), but I hope Thomas can take a look and confirm.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Pavel Emelyanov c5363d0363 signals: clean dequeue_signal from excess checks and assignments
The signr variable may be declared without initialization - it is set ro the
return value from __dequeue_signal() right at the function beginning.

Besides, after recalc_sigpending() two checks for signr to be not 0 may be
merged into one.  Both if-s become easier to read.

Thanks to Oleg for pointing out mistakes in the first version of this patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Pavel Emelyanov 93585eeaf3 signals: consolidate checks for whether or not to ignore a signal
Both sig_ignored() and do_sigaction() check for signr to be explicitly or
implicitly ignored.  Introduce a helper for them.

This patch is aimed to help handling signals by pid namespace's init, and was
derived from one of Oleg's patches
https://lists.linux-foundation.org/pipermail/containers/2007-December/009308.html
so, if he doesn't mind, he should be considered as an author.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov 1406f2d321 lock_task_sighand: add rcu lock/unlock
Most of the callers of lock_task_sighand() doesn't actually need rcu_lock().
lock_task_sighand() needs it only to safely play with tsk->sighand, it can
take the lock itself.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Oleg Nesterov 573cf9ad72 signals: do_signal_stop(): use signal_group_exit()
do_signal_stop() needs signal_group_exit() but checks sig->group_exit_task.
 This (optimization) is correct, SIGNAL_STOP_DEQUEUED and SIGNAL_GROUP_EXIT
are mutually exclusive, but looks confusing.  Use signal_group_exit(), this
is not fastpath, the code clarity is more important.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Pavel Emelyanov 2acb024d55 signals: consolidate checking for ignored/legacy signals
Two callers for send_signal() - the specific_send_sig_info and the
__group_send_sig_info - both check for sig to be ignored or already queued.

Move these checks into send_signal() and make it return 1 to indicate that the
signal is dropped, but there's no error in this.

Besides, merge comments and spell-check them.

[oleg@tv-sign.ru: simplifications]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Pavel Emelyanov af7fff9c13 signals: turn LEGACY_QUEUE macro into static inline function
This makes the code more readable, due to less brackets and small letters in
name.

I also move it above the send_signal() as a preparation for the 3rd patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Pavel Emelyanov e1401c6bbb signals: remove unused variable from send_signal()
This function doesn't change the ret's value and thus always returns 0, with a
single exception of returning -EAGAIN explicitly.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Pavel Machek f5264481c8 trivial: small cleanups
These are small cleanups all over the tree.

Trivial style and comment changes to
  fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
2008-04-21 22:15:06 +00:00
Roland McGrath 18c98b6527 ptrace_signal subroutine
This breaks out the ptrace handling from get_signal_to_deliver into a
new subroutine.  The actual code there doesn't change, and it gets
inlined into nearly identical compiled code.  This makes the function
substantially shorter and thus easier to read, and it nicely isolates
the ptrace magic.

Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-18 08:17:57 -07:00
Roland McGrath 13b1c3d4b4 freezer vs stopped or traced
This changes the "freezer" code used by suspend/hibernate in its treatment
of tasks in TASK_STOPPED (job control stop) and TASK_TRACED (ptrace) states.

As I understand it, the intent of the "freezer" is to hold all tasks
from doing anything significant.  For this purpose, TASK_STOPPED and
TASK_TRACED are "frozen enough".  It's possible the tasks might resume
from ptrace calls (if the tracer were unfrozen) or from signals
(including ones that could come via timer interrupts, etc).  But this
doesn't matter as long as they quickly block again while "freezing" is
in effect.  Some minor adjustments to the signal.c code make sure that
try_to_freeze() very shortly follows all wakeups from both kinds of
stop.  This lets the freezer code safely leave stopped tasks unmolested.

Changing this fixes the longstanding bug of seeing after resuming from
suspend/hibernate your shell report "[1] Stopped" and the like for all
your jobs stopped by ^Z et al, as if you had freshly fg'd and ^Z'd them.
It also removes from the freezer the arcane special case treatment for
ptrace'd tasks, which relied on intimate knowledge of ptrace internals.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 07:59:54 -08:00
Harvey Harrison b5606c2d44 remove final fastcall users
fastcall always expands to empty, remove it.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:18 -08:00
Pavel Emelyanov 146a505d49 Get rid of the kill_pgrp_info() function
There's only one caller left - the kill_pgrp one - so merge these two
functions and forget the kill_pgrp_info one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:29 -08:00
Pavel Emelyanov d5df763b81 Clean up the kill_something_info
This is the first step (of two) in removing the kill_pgrp_info.

All the users of this function are in kernel/signal.c, but all they need is to
call __kill_pgrp_info() with the tasklist_lock read-locked.

Fortunately, one of its users is the kill_something_info(), which already
needs this lock in one of its branches, so clean these branches up and call
the __kill_pgrp_info() directly.

Based on Oleg's view of how this function should look.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:29 -08:00
Oleg Nesterov fea9d17554 ITIMER_REAL: convert to use struct pid
signal_struct->tsk points to the ->group_leader and thus we have the nasty
code in de_thread() which has to change it and restart ->real_timer if the
leader is changed.

Use "struct pid *leader_pid" instead.  This also allows us to kill now
unneeded send_group_sig_info().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:29 -08:00
Oleg Nesterov d36174bc2b uglify kill_pid_info() to fix kill() vs exec() race
kill_pid_info()->pid_task() could be the old leader of the execing process.
In that case it is possible that the leader will be released before we take
siglock. This means that kill_pid_info() (and thus sys_kill()) can return a
false -ESRCH.

Change the code to retry when lock_task_sighand() fails. The endless loop is
not possible, __exit_signal() both clears ->sighand and does detach_pid().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:28 -08:00
Oleg Nesterov 5dee1707df move the related code from exit_notify() to exit_signals()
The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress.  In that case
nothing special is needed, just set PF_EXITING and return.

Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().

So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock.  But this in fact fixes another
___pure___ theoretical minor race.  __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov d12619b5ff fix group stop with exit race
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly.  Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.

However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already.  In that case the group stop never
finishes properly.

Note: this is a minimal fix, we can add some optimizations later.  Say we
can return quickly if thread_group_empty().  Also, we can move some signal
related code from exit_notify() to exit_signals().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov 20686a309a ptrace_stop: fix racy nonstop_code setting
If the tracer is gone and we are not going to stop, ptrace_stop() sets
->exit_code = nostop_code.  However, the tracer could actually clear the
exit code before detaching.  In that case get_signal_to_deliver() "resends"
the signal which was cancelled by the debugger.  For example, it is
possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in
STOPPED state.

Change the behaviour of ptrace_stop().  If the caller is ptrace notify(),
we should always clear ->exit_code.  If the caller is
get_signal_to_deliver(), we should not touch it at all.  To do so, change
the nonstop_code parameter to "bool clear_code" and change the callers
accordingly.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov 6405f7f467 ptrace_stop: fix the race with ptrace detach+attach
If the tracer went away (may_ptrace_stop() failed), ptrace_stop() drops
tasklist and then changes the ->state from TASK_TRACED to TASK_RUNNING.

This can fool another tracer which attaches to us in between.  Change the
->state under tasklist_lock to ensure that ptrace_check_attach() can't wrongly
succeed.  Also, remove the unnecessary mb().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov 6b39c7bfbd kill PT_ATTACHED
Since the patch

	"Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
	commit f5b40e363a

we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.

This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
which were needed to catch the "ptrace attach is in progress" case.  We can
also remove the flag itself since nobody else uses it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Roland McGrath 1a669c2f16 Add arch_ptrace_stop
This adds support to allow asm/ptrace.h to define two new macros,
arch_ptrace_stop_needed and arch_ptrace_stop.  These control special
machine-specific actions to be done before a ptrace stop.  The new code
compiles away to nothing when the new macros are not defined.  This is the
case on all machines to begin with.

On ia64, these macros will be defined to solve the long-standing issue of
ptrace vs register backing store.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:07 -08:00
Oleg Nesterov d9ae90ac4b use __set_task_state() for TRACED/STOPPED tasks
1. It is much easier to grep for ->state change if __set_task_state() is used
   instead of the direct assignment.

2. ptrace_stop() and handle_group_stop() use set_task_state() which adds the
   unneeded mb() (btw even if we use mb() it is still possible that do_wait()
   sees the new ->state but not ->exit_code, but this is ok).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:00 -08:00
Oleg Nesterov ed5d2cac11 exec: rework the group exit and fix the race with kill
As Roland pointed out, we have the very old problem with exec.  de_thread()
sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
clears signal->flags.  All signals (even fatal ones) sent in this window
(which is not too small) will be lost.

With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
the new helper, should be used to detect exit_group() or exec() in progress.
It can have more users, but this patch does only strictly necessary changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Oleg Nesterov f558b7e408 remove handle_group_stop() in favor of do_signal_stop()
Every time we set SIGNAL_GROUP_EXIT or clear SIGNAL_STOP_DEQUEUED we also
reset ->group_stop_count.

This means that the SIGNAL_GROUP_EXIT check in handle_group_stop() is not
needed, and do_signal_stop() should check SIGNAL_STOP_DEQUEUED only when
->group_stop_count == 0. With these changes handle_group_stop() becomes the
subset of do_signal_stop(), we can kill it and use do_signal_stop() instead.

Also, a preparation for the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Oleg Nesterov 198466b41d __group_complete_signal(): fix coredump with group stop race
When __group_complete_signal() sees sig_kernel_coredump() signal, it starts
the group stop, but sets ->group_exit_task = t in a hope that "t" will
actually dequeue this signal and invoke do_coredump().  However, by the
time "t" enters get_signal_to_deliver() it is possible that the signal was
blocked/ignored or we have another pending !SIG_KERNEL_COREDUMP_MASK signal
which will be dequeued first.  This means the task could be stopped but not
killed.

Remove this code from __group_complete_signal().  Note also this patch
removes the bogus signal_wake_up(t, 1).  This thread can't be
STOPPED/TRACED, note the corresponding check in wants_signal().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Trond Myklebust 13f09b95a8 Ensure that we export __fatal_signal_pending()
It may be used by the modules nfs.ko and sunrpc.ko

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Made it a regular export rather than GPL-only  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-01 12:58:14 +11:00
Linus Torvalds 75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
H. Peter Anvin 65ea5b0349 x86: rename the struct pt_regs members for 32/64-bit consistency
We have a lot of code which differs only by the naming of specific
members of structures that contain registers.  In order to enable
additional unifications, this patch drops the e- or r- size prefix
from the register names in struct pt_regs, and drops the x- prefixes
for segment registers on the 32-bit side.

This patch also performs the equivalent renames in some additional
places that might be candidates for unification in the future.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:56 +01:00