linux-stable

Commit Graph

Author	SHA1	Message	Date
Peter Zijlstra	06249738a4	workqueue: Manually break affinity on hotplug Don't rely on the scheduler to force break affinity for us -- it will stop doing that for per-cpu-kthreads. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.464718669@infradead.org	2020-11-10 18:38:58 +01:00
Mauro Carvalho Chehab	3eb6b31bfb	workqueue: fix a kernel-doc warning As warned by Sphinx: ./Documentation/core-api/workqueue:400: ./kernel/workqueue.c:1218: WARNING: Unexpected indentation. the return code table is currently not recognized, as it lacks markups. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2020-10-16 07:28:20 +02:00
Stephen Boyd	f9e62f318f	treewide: Make all debug_obj_descriptors const This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by moving the structure to read-only memory. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org	2020-09-24 21:56:25 +02:00
Christoph Hellwig	fe557319aa	maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-17 10:57:41 -07:00
Lai Jiangshan	10cdb15759	workqueue: use BUILD_BUG_ON() for compile time test instead of WARN_ON() Any runtime WARN_ON() has to be fixed, and BUILD_BUG_ON() can help you nitice it earlier. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-06-01 11:02:42 -04:00
Lai Jiangshan	b8f06b0444	workqueue: remove useless unlock() and lock() in series This is no point to unlock() and then lock() the same mutex back to back. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:25:23 -04:00
Lai Jiangshan	4f3f4cf388	workqueue: void unneeded requeuing the pwq in rescuer thread `008847f66c` ("workqueue: allow rescuer thread to do more work.") made the rescuer worker requeue the pwq immediately if there may be more work items which need rescuing instead of waiting for the next mayday timer expiration. Unfortunately, it checks only whether the pool needs help from rescuers, but it doesn't check whether the pwq has work items in the pool (the real reason that this rescuer can help for the pool). The patch adds the check and void unneeded requeuing. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:22:10 -04:00
Sebastian Andrzej Siewior	a9b8a98529	workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t The workqueue code has it's internal spinlocks (pool::lock), which are acquired on most workqueue operations. These spinlocks are converted to 'sleeping' spinlocks on a RT-kernel. Workqueue functions can be invoked from contexts which are truly atomic even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such contexts is forbidden. The pool::lock hold times are bound and the code sections are relatively short, which allows to convert pool::lock and as a consequence wq_mayday_lock to raw spinlocks which are truly spinning locks even on a PREEMPT_RT kernel. With the previous conversion of the manager waitqueue to a simple waitqueue workqueues are now fully RT compliant. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:03:47 -04:00
Sebastian Andrzej Siewior	d8bb65ab70	workqueue: Use rcuwait for wq_manager_wait The workqueue code has it's internal spinlock (pool::lock) and also implicit spinlock usage in the wq_manager waitqueue. These spinlocks are converted to 'sleeping' spinlocks on a RT-kernel. Workqueue functions can be invoked from contexts which are truly atomic even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such contexts is forbidden. pool::lock can be converted to a raw spinlock as the lock held times are short. But the workqueue manager waitqueue is handled inside of pool::lock held regions which again violates the lock nesting rules of raw and regular spinlocks. The manager waitqueue has no special requirements like custom wakeup callbacks or mass wakeups. While it does not use exclusive wait mode explicitly there is no strict requirement to queue the waiters in a particular order as there is only one waiter at a time. This allows to replace the waitqueue with rcuwait which solves the locking problem because rcuwait relies on existing locking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:00:35 -04:00
Zhang Qiang	342ed2400b	workqueue: Remove unnecessary kfree() call in rcu_free_wq() The data structure member "wq->rescuer" was reset to a null pointer in one if branch. It was passed to a call of the function "kfree" in the callback function "rcu_free_wq" (which was eventually executed). The function "kfree" does not perform more meaningful data processing for a passed null pointer (besides immediately returning from such a call). Thus delete this function call which became unnecessary with the referenced software update. Fixes: `def98c84b6` ("workqueue: Fix spurious sanity check failures in destroy_workqueue()") Suggested-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Zhang Qiang <qiang.zhang@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-27 09:52:41 -04:00
Dan Carpenter	b92b36eadf	workqueue: Fix an use after free in init_rescuer() We need to preserve error code before freeing "rescuer". Fixes: `f187b6974f` ("workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-11 10:25:42 -04:00
Sean Fu	f187b6974f	workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO. Replace inline function PTR_ERR_OR_ZERO with IS_ERR and PTR_ERR to remove redundant parameter definitions and checks. Reduce code size. Before: text data bss dec hex filename 47510 5979 840 54329 d439 kernel/workqueue.o After: text data bss dec hex filename 47474 5979 840 54293 d415 kernel/workqueue.o Signed-off-by: Sean Fu <fxinrong@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-05 11:56:07 -04:00
Sebastian Andrzej Siewior	62849a9612	workqueue: Remove the warning in wq_worker_sleeping() The kernel test robot triggered a warning with the following race: task-ctx A interrupt-ctx B worker -> process_one_work() -> work_item() -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> ->sleeping = 1 atomic_dec_and_test(nr_running) __schedule(); interrupt async_page_fault() -> local_irq_enable(); -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> if (WARN_ON(->sleeping)) return -> __schedule() -> sched_update_worker() -> wq_worker_running() -> atomic_inc(nr_running); -> ->sleeping = 0; -> sched_update_worker() -> wq_worker_running() if (!->sleeping) return In this context the warning is pointless everything is fine. An interrupt before wq_worker_sleeping() will perform the ->sleeping assignment (0 -> 1 > 0) twice. An interrupt after wq_worker_sleeping() will trigger the warning and nr_running will be decremented (by A) and incremented once (only by B, A will skip it). This is the case until the ->sleeping is zeroed again in wq_worker_running(). Remove the WARN statement because this condition may happen. Document that preemption around wq_worker_sleeping() needs to be disabled to protect ->sleeping and not just as an optimisation. Fixes: `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian	2020-04-08 11:35:20 +02:00
Linus Torvalds	0adb8bc039	Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Nothing too interesting. Just two trivial patches" * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Mark up unlocked access to wq->first_flusher workqueue: Make workqueue_init*() return void	2020-04-03 12:27:36 -07:00
Chris Wilson	00d5d15b06	workqueue: Mark up unlocked access to wq->first_flusher [ 7329.671518] BUG: KCSAN: data-race in flush_workqueue / flush_workqueue [ 7329.671549] [ 7329.671572] write to 0xffff8881f65fb250 of 8 bytes by task 37173 on cpu 2: [ 7329.671607] flush_workqueue+0x3bc/0x9b0 (kernel/workqueue.c:2844) [ 7329.672527] [ 7329.672540] read to 0xffff8881f65fb250 of 8 bytes by task 37175 on cpu 0: [ 7329.672571] flush_workqueue+0x28d/0x9b0 (kernel/workqueue.c:2835) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-12 14:26:50 -04:00
Hillf Danton	aa202f1f56	workqueue: don't use wq_select_unbound_cpu() for bound works wq_select_unbound_cpu() is designed for unbound workqueues only, but it's wrongly called when using a bound workqueue too. Fixing this ensures work queued to a bound workqueue with cpu=WORK_CPU_UNBOUND always runs on the local CPU. Before, that would happen only if wq_unbound_cpumask happened to include it (likely almost always the case), or was empty, or we got lucky with forced round-robin placement. So restricting /sys/devices/virtual/workqueue/cpumask to a small subset of a machine's CPUs would cause some bound work items to run unexpectedly there. Fixes: `ef55718044` ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Hillf Danton <hdanton@sina.com> [dj: massage changelog] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-10 10:30:51 -04:00
Yu Chen	2333e82995	workqueue: Make workqueue_init*() return void The return values of workqueue_init() and workqueue_early_int() are always 0, and there is no usage of their return value. So just make them return void. Signed-off-by: Yu Chen <chen.yu@easystack.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-04 11:21:49 -05:00
Linus Torvalds	c677124e63	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io\|memory\|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...	2020-01-28 10:07:09 -08:00
Daniel Jordan	1c5da0ec7f	workqueue: add worker function to workqueue_execute_end tracepoint It's surprising that workqueue_execute_end includes only the work when its counterpart workqueue_execute_start has both the work and the worker function. You can't set a tracing filter or trigger based on the function, and postprocessing scripts interested in specific functions are harder to write since they have to remember the work from _start and match it up with the same field in _end. Add the function name, taking care to use the copy stashed in the worker since the work is no longer safe to touch. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2020-01-15 08:02:47 -08:00
Ingo Molnar	1e5f8a3085	Linux 5.5-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf 33WGSyI= =iEwx -----END PGP SIGNATURE----- Merge tag 'v5.5-rc3' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-12-25 10:41:37 +01:00
Sebastian Andrzej Siewior	025f50f386	sched/rt, workqueue: Use PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Update the comment to use PREEMPTION because it is true for both preemption models. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20191015191821.11479-35-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-12-08 14:37:37 +01:00
Kefeng Wang	1d9a6159bd	workqueue: Use pr_warn instead of pr_warning Use pr_warn() instead of the remaining pr_warning() calls. Link: http://lkml.kernel.org/r/20191128004752.35268-2-wangkefeng.wang@huawei.com To: joe@perches.com To: linux-kernel@vger.kernel.org Cc: gregkh@linuxfoundation.org Cc: tj@kernel.org Cc: arnd@arndb.de Cc: sergey.senozhatsky@gmail.com Cc: rostedt@goodmis.org Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-12-06 09:59:30 +01:00
Linus Torvalds	1ae78780ed	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...	2019-11-26 15:42:43 -08:00
Sebastian Andrzej Siewior	49e9d1a9fa	workqueue: Add RCU annotation for pwq list walk An additional check has been recently added to ensure that a RCU related lock is held while the RCU list is iterated. The `pwqs' are sometimes iterated without a RCU lock but with the &wq->mutex acquired leading to a warning. Teach list_for_each_entry_rcu() that the RCU usage is okay if &wq->mutex is acquired during the list traversal. Fixes: `28875945ba` ("rcu: Add support for consolidated-RCU reader checking") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-11-15 11:53:35 -08:00
Joel Fernandes (Google)	5a6446626d	workqueue: Convert for_each_wq to use built-in list check Because list_for_each_entry_rcu() can now check for holding a lock as well as for being in an RCU read-side critical section, this commit replaces the workqueue_sysfs_unregister() function's use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with list_for_each_entry_rcu() augmented with a lockdep_is_held() optional argument. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-30 08:34:10 -07:00
Tejun Heo	e66b39af00	workqueue: Fix pwq ref leak in rescuer_thread() `008847f66c` ("workqueue: allow rescuer thread to do more work.") made the rescuer worker requeue the pwq immediately if there may be more work items which need rescuing instead of waiting for the next mayday timer expiration. Unfortunately, it doesn't check whether the pwq is already on the mayday list and unconditionally gets the ref and moves it onto the list. This doesn't corrupt the list but creates an additional reference to the pwq. It got queued twice but will only be removed once. This leak later can trigger pwq refcnt warning on workqueue destruction and prevent freeing of the workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # v3.19+	2019-10-04 10:23:11 -07:00
Tejun Heo	c29eb85386	workqueue: more destroy_workqueue() fixes destroy_workqueue() warnings still, at a lower frequency, trigger spuriously. The problem seems to be in-flight operations which haven't reached put_pwq() yet. * Make sanity check grab all the related locks so that it's synchronized against operations which puts pwq at the end. * Always print out the offending pwq. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>	2019-10-04 10:23:01 -07:00
Tejun Heo	30ae2fc0a7	workqueue: Minor follow-ups to the rescuer destruction change * Now that wq->rescuer may be cleared while rescuer is still there, switch show_pwq() debug printout to test worker->rescue_wq to identify rescuers intead of testing wq->rescuer. * Update comment on ->rescuer locking. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>	2019-09-20 14:09:14 -07:00
Tejun Heo	8efe1223d7	workqueue: Fix missing kfree(rescuer) in destroy_workqueue() Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Qian Cai <cai@lca.pw> Fixes: `def98c84b6` ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")	2019-09-20 13:39:57 -07:00
Tejun Heo	def98c84b6	workqueue: Fix spurious sanity check failures in destroy_workqueue() Before actually destrying a workqueue, destroy_workqueue() checks whether it's actually idle. If it isn't, it prints out a bunch of warning messages and leaves the workqueue dangling. It unfortunately has a couple issues. * Mayday list queueing increments pwq's refcnts which gets detected as busy and fails the sanity checks. However, because mayday list queueing is asynchronous, this condition can happen without any actual work items left in the workqueue. * Sanity check failure leaves the sysfs interface behind too which can lead to init failure of newer instances of the workqueue. This patch fixes the above two by * If a workqueue has a rescuer, disable and kill the rescuer before sanity checks. Disabling and killing is guaranteed to flush the existing mayday list. * Remove sysfs interface before sanity checks. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Marcin Pawlowski <mpawlowski@fb.com> Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: stable@vger.kernel.org	2019-09-18 18:45:23 -07:00
Daniel Jordan	509b320489	workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs Change the calling convention for apply_workqueue_attrs to require CPU hotplug read exclusion. Avoids lockdep complaints about nested calls to get_online_cpus in a future patch where padata calls apply_workqueue_attrs when changing other CPU-hotplug-sensitive data structures with the CPU read lock already held. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2019-09-13 21:15:40 +10:00
Daniel Jordan	513c98d086	workqueue: unconfine alloc/apply/free_workqueue_attrs() padata will use these these interfaces in a later patch, so unconfine them. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2019-09-13 21:15:39 +10:00
Thomas Gleixner	be69d00d97	workqueue: Remove GPF argument from alloc_workqueue_attrs() All callers use GFP_KERNEL. No point in having that argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-06-27 14:12:19 -07:00
Thomas Gleixner	2c9858ecbe	workqueue: Make alloc/apply/free_workqueue_attrs() static None of those functions have any users outside of workqueue.c. Confine them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-06-27 14:12:15 -07:00
Thomas Gleixner	457c899653	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:45 +02:00
Linus Torvalds	23c970608a	Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Only three commits, of which two are trivial. The non-trivial chagne is Thomas's patch to switch workqueue from sched RCU to regular one. The use of sched RCU is mostly historic and doesn't really buy us anything noticeable" * 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use normal rcu kernel/workqueue: Document wq_worker_last_func() argument kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue'	2019-05-09 13:48:52 -07:00
Linus Torvalds	0968621917	Printk changes for 5.2 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY= =WZfC -----END PGP SIGNATURE----- Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()	2019-05-07 09:18:12 -07:00
Thomas Gleixner	6d25be5782	sched/core, workqueues: Distangle worker accounting from rq lock The worker accounting for CPU bound workers is plugged into the core scheduler code and the wakeup code. This is not a hard requirement and can be avoided by keeping track of the state in the workqueue code itself. Keep track of the sleeping state in the worker itself and call the notifier before entering the core scheduler. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from scheduling and being woken immediately after switching away. When nr_running is updated when the task is retunrning from schedule() then it is later compared when it is done from ttwu(). [ bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de Oliveira ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-16 16:55:15 +02:00
Sakari Ailus	d75f773c86	treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' \| grep -v '^$tools\\|Documentation$/' \| \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-04-09 14:19:06 +02:00
Thomas Gleixner	24acfb7182	workqueue: Use normal rcu There is no need for sched_rcu. The undocumented reason why sched_rcu is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by the fact that sched_rcu reader side critical sections are also protected by preempt or irq disabled regions. Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock where it is not yet explicit acquired. Replace local_irq_disable() with rcu_read_lock(). Update asserts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: mangle changelog a little] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-04-08 12:37:43 -07:00
Bart Van Assche	82efcab3b9	workqueue: Only unregister a registered lockdep key The recent change to prevent use after free and a memory leak introduced an unconditional call to wq_unregister_lockdep() in the error handling path. If the lockdep key had not been registered yet, then the lockdep core emits a warning. Only call wq_unregister_lockdep() if wq_register_lockdep() has been called first. Fixes: `009bb421b6` ("workqueue, lockdep: Fix an alloc_workqueue() error path") Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Qian Cai <cai@lca.pw> Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org	2019-03-21 12:00:18 +01:00
Bart Van Assche	8194fe94ab	kernel/workqueue: Document wq_worker_last_func() argument This patch avoids that the following warning is reported when building with W=1: kernel/workqueue.c:938: warning: Function parameter or member 'task' not described in 'wq_worker_last_func' Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-03-19 10:48:20 -07:00
Mathieu Malaterre	a2775bbc1d	kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue' Silence warnings (triggered at W=1) by adding relevant __printf attributes. kernel/workqueue.c:4249:2: warning: function 'alloc_workqueue' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-03-15 08:47:22 -07:00
Linus Torvalds	9e55f87c0e	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A few fixes for lockdep: - initialize lockdep internal RCU head after initializing RCU - prevent use after free in a alloc_workqueue() error handling path - plug a memory leak in the workqueue core which fails to free a dynamically allocated lock name. - make Clang happy" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: workqueue, lockdep: Fix a memory leak in wq->lock_name workqueue, lockdep: Fix an alloc_workqueue() error path locking/lockdep: Only call init_rcu_head() after RCU has been initialized locking/lockdep: Avoid a Clang warning	2019-03-10 13:48:14 -07:00
Qian Cai	69a106c00e	workqueue, lockdep: Fix a memory leak in wq->lock_name The following commit: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name), but wq_init_lockdep() does not point wq->lock_name to the newly allocated slab object. This can be reproduced by running LTP fallocate04 followed by oom01 tests: unreferenced object 0xc0000005876384d8 (size 64): comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s) hex dump (first 32 bytes): 28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e 78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi backtrace: [<00000000cb452883>] kvasprintf+0x6c/0xe0 [<000000004654ddac>] kasprintf+0x34/0x60 [<000000001c68f311>] alloc_workqueue+0x1f8/0x6ac [<0000000003c2ad83>] ext4_fill_super+0x23d4/0x3c80 [ext4] [<0000000006610538>] mount_bdev+0x25c/0x290 [<00000000bcf955ec>] ext4_mount+0x28/0x50 [ext4] [<0000000016e08fd3>] legacy_get_tree+0x4c/0xb0 [<0000000042b6a5fc>] vfs_get_tree+0x6c/0x190 [<00000000268ab022>] do_mount+0xb9c/0x1100 [<00000000698e6898>] ksys_mount+0x158/0x180 [<0000000064e391fd>] sys_mount+0x20/0x30 [<00000000ba378f12>] system_call+0x5c/0x70 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: catalin.marinas@arm.com Cc: jiangshanlai@gmail.com Cc: tj@kernel.org Fixes: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-03-09 14:15:52 +01:00
Bart Van Assche	009bb421b6	workqueue, lockdep: Fix an alloc_workqueue() error path This patch fixes a use-after-free and a memory leak in an alloc_workqueue() error path. Repoted by syzkaller and KASAN: BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline] BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858 CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:197 [inline] lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 wq_init_lockdep kernel/workqueue.c:3444 [inline] alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511 __do_kmalloc mm/slab.c:3726 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3735 kmalloc include/linux/slab.h:553 [inline] kzalloc include/linux/slab.h:743 [inline] alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3821 alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888090fc2580 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 280 bytes inside of 512-byte region [ffff888090fc2580, ffff888090fc2780) Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-03-09 14:15:52 +01:00
Linus Torvalds	b5dd0c658c	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - some of the rest of MM - various misc things - dynamic-debug updates - checkpatch - some epoll speedups - autofs - rapidio - lib/, lib/lzo/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) samples/mic/mpssd/mpssd.h: remove duplicate header kernel/fork.c: remove duplicated include include/linux/relay.h: fix percpu annotation in struct rchan arch/nios2/mm/fault.c: remove duplicate include unicore32: stop printing the virtual memory layout MAINTAINERS: fix GTA02 entry and mark as orphan mm: create the new vm_fault_t type arm, s390, unicore32: remove oneliner wrappers for memblock_alloc() arch: simplify several early memory allocations openrisc: simplify pte_alloc_one_kernel() sh: prefer memblock APIs returning virtual address microblaze: prefer memblock API returning virtual address powerpc: prefer memblock APIs returning virtual address lib/lzo: separate lzo-rle from lzo lib/lzo: implement run-length encoding lib/lzo: fast 8-byte copy on arm64 lib/lzo: 64-bit CTZ on arm64 lib/lzo: tidy-up ifdefs ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size ipc: annotate implicit fall through ...	2019-03-07 19:25:37 -08:00
Johannes Weiner	4b04700275	kernel: workqueue: clarify wq_worker_last_func() caller requirements This function can only be called safely from very specific scheduler contexts. Document those. Link: http://lkml.kernel.org/r/20190206150528.31198-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-07 18:32:01 -08:00
Linus Torvalds	abf7c3d8dd	Merge branch 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "All trivial. Two comment updates and one more initialization sanity check in flush_work()" * 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix spelling in source code comments workqueue: fix typo in comment workqueue: Try to catch flush_work() without INIT_WORK().	2019-03-07 10:09:52 -08:00
Linus Torvalds	e431f2d74e	Driver core patches for 5.1-rc1 Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+euQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynyTgCfbV8CLums843sBnT8NnWrTMTdTCcAn1K4re0m ep8g+6oRLxJy414hogxQ =bLs2 -----END PGP SIGNATURE----- Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...	2019-03-06 14:52:48 -08:00
Bart Van Assche	bf393fd4a3	workqueue: Fix spelling in source code comments Change "execuing" into "executing" and "guarnateed" into "guaranteed". Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-03-05 07:52:39 -08:00
Bart Van Assche	669de8bda8	kernel/workqueue: Use dynamic lockdep keys for workqueues The following commit: `87915adc3f` ("workqueue: re-add lockdep dependencies for flushing") improved deadlock checking in the workqueue implementation. Unfortunately that patch also introduced a few false positive lockdep complaints. This patch suppresses these false positives by allocating the workqueue mutex lockdep key dynamically. An example of a false positive lockdep complaint suppressed by this patch can be found below. The root cause of the lockdep complaint shown below is that the direct I/O code can call alloc_workqueue() from inside a work item created by another alloc_workqueue() call and that both workqueues share the same lockdep key. This patch avoids that that lockdep complaint is triggered by allocating the work queue lockdep keys dynamically. In other words, this patch guarantees that a unique lockdep key is associated with each work queue mutex. ====================================================== WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted fio/4129 is trying to acquire lock: 00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970 but task is already holding lock: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#14){+.+.}: down_write+0x3d/0x80 __generic_file_fsync+0x77/0xf0 ext4_sync_file+0x3c9/0x780 vfs_fsync_range+0x66/0x100 dio_complete+0x2f5/0x360 dio_aio_complete_work+0x1c/0x20 process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(&dio->complete_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}: lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock((work_completion)(&dio->complete_work)); lock(&sb->s_type->i_mutex_key#14); lock((wq_completion)"dio/%s"sb->s_id); * DEADLOCK * 1 lock held by fio/4129: #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 stack backtrace: CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org [ Reworked the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-02-28 07:55:47 +01:00
Liu Song	8bdc620178	workqueue: fix typo in comment qeueue/queue Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-02-21 08:03:38 -08:00
Greg Kroah-Hartman	9481caf39b	Merge 5.0-rc6 into driver-core-next We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-11 09:09:02 +01:00
Johannes Weiner	1b69ac6b40	psi: fix aggregation idle shut-off psi has provisions to shut off the periodic aggregation worker when there is a period of no task activity - and thus no data that needs aggregating. However, while developing psi monitoring, Suren noticed that the aggregation clock currently won't stay shut off for good. Debugging this revealed a flaw in the idle design: an aggregation run will see no task activity and decide to go to sleep; shortly thereafter, the kworker thread that executed the aggregation will go idle and cause a scheduling change, during which the psi callback will kick the !pending worker again. This will ping-pong forever, and is equivalent to having no shut-off logic at all (but with more code!) Fix this by exempting aggregation workers from psi's clock waking logic when the state change is them going to sleep. To do this, tag workers with the last work function they executed, and if in psi we see a worker going to sleep after aggregating psi data, we will not reschedule the aggregation work item. What if the worker is also executing other items before or after? Any psi state times that were incurred by work items preceding the aggregation work will have been collected from the per-cpu buckets during the aggregation itself. If there are work items following the aggregation work, the worker's last_func tag will be overwritten and the aggregator will be kept alive to process this genuine new activity. If the aggregation work is the last thing the worker does, and we decide to go idle, the brief period of non-idle time incurred between the aggregation run and the kworker's dequeue will be stranded in the per-cpu buckets until the clock is woken by later activity. But that should not be a problem. The buckets can hold 4s worth of time, and future activity will wake the clock with a 2s delay, giving us 2s worth of data we can leave behind when disabling aggregation. If it takes a worker more than two seconds to go idle after it finishes its last work item, we likely have bigger problems in the system, and won't notice one sample that was averaged with a bogus per-CPU weight. Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org Fixes: `eb414681d5` ("psi: pressure stall information for CPU, memory, and IO") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-02-01 15:46:23 -08:00
Alexander Duyck	8204e0c111	workqueue: Provide queue_work_node to queue work near a given NUMA node Provide a new function, queue_work_node, which is meant to schedule work on a "random" CPU of the requested NUMA node. The main motivation for this is to help assist asynchronous init to better improve boot times for devices that are local to a specific node. For now we just default to the first CPU that is in the intersection of the cpumask of the node and the online cpumask. The only exception is if the CPU is local to the node we will just use the current CPU. This should work for our purposes as we are currently only using this for unbound work so the CPU will be translated to a node anyway instead of being directly used. As we are only using the first CPU to represent the NUMA node for now I am limiting the scope of the function so that it can only be used with unbound workqueues. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 14:20:54 +01:00
Tetsuo Handa	4d43d395fe	workqueue: Try to catch flush_work() without INIT_WORK(). syzbot found a flush_work() caller who forgot to call INIT_WORK() because that work_struct was allocated by kzalloc() [1]. But the message INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. by lock_map_acquire() is failing to tell that INIT_WORK() is missing. Since flush_work() without INIT_WORK() is a bug, and INIT_WORK() should set ->func field to non-zero, let's warn if ->func field is zero. [1] https://syzkaller.appspot.com/bug?id=a5954455fcfa51c29ca2ab55b203076337e1c770 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-01-25 07:28:29 -08:00
Paul E. McKenney	25b0077511	workqueue: Replace call_rcu_sched() with call_rcu() Now that call_rcu()'s callback is not invoked until after all preempt-disable regions of code have completed (in addition to explicitly marked RCU read-side critical sections), call_rcu() can be used in place of call_rcu_sched(). This commit therefore makes that change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Tejun Heo <tj@kernel.org>	2018-11-27 09:21:44 -08:00
Vincent Whitchurch	cb9d7fd51d	watchdog: Mark watchdog touch functions as notrace Some architectures need to use stop_machine() to patch functions for ftrace, and the assumption is that the stopped CPUs do not make function calls to traceable functions when they are in the stopped state. Commit `ce4f06dcbb` ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") added calls to the watchdog touch functions from the stopped CPUs and those functions lack notrace annotations. This leads to crashes when enabling/disabling ftrace on ARM kernels built with the Thumb-2 instruction set. Fix it by adding the necessary notrace annotations. Fixes: `ce4f06dcbb` ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: oleg@redhat.com Cc: tj@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com	2018-08-30 12:56:40 +02:00
Linus Torvalds	9022ada8ab	Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Over the lockdep cross-release churn, workqueue lost some of the existing annotations. Johannes Berg restored it and also improved them" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: re-add lockdep dependencies for flushing workqueue: skip lockdep wq dependency in cancel_work_sync()	2018-08-24 13:16:36 -07:00
Johannes Berg	87915adc3f	workqueue: re-add lockdep dependencies for flushing In flush_work(), we need to create a lockdep dependency so that the following scenario is appropriately tagged as a problem: work_function() { mutex_lock(&mutex); ... } other_function() { mutex_lock(&mutex); flush_work(&work); // or cancel_work_sync(&work); } This is a problem since the work might be running and be blocked on trying to acquire the mutex. Similarly, in flush_workqueue(). These were removed after cross-release partially caught these problems, but now cross-release was reverted anyway. IMHO the removal was erroneous anyway though, since lockdep should be able to catch potential problems, not just actual ones, and cross-release would only have caught the problem when actually invoking wait_for_completion(). Fixes: `fd1a5b04df` ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-08-22 08:31:38 -07:00
Johannes Berg	d6e89786be	workqueue: skip lockdep wq dependency in cancel_work_sync() In cancel_work_sync(), we can only have one of two cases, even with an ordered workqueue: * the work isn't running, just cancelled before it started * the work is running, but then nothing else can be on the workqueue before it Thus, we need to skip the lockdep workqueue dependency handling, otherwise we get false positive reports from lockdep saying that we have a potential deadlock when the workqueue also has other work items with locking, e.g. work1_function() { mutex_lock(&mutex); ... } work2_function() { /* nothing / } other_function() { queue_work(ordered_wq, &work1); queue_work(ordered_wq, &work2); mutex_lock(&mutex); cancel_work_sync(&work2); } As described above, this isn't a problem, but lockdep will currently flag it as if cancel_work_sync() was flush_work(), which is* a problem. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-08-22 08:31:37 -07:00
Kees Cook	6396bb2215	treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) \| kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) \| kzalloc(sizeof(TYPE) * C2, ...) \| kzalloc(C1 * C2 * C3, ...) \| kzalloc(C1 * C2, ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Linus Torvalds	5f85942c2e	SCSI misc on 20180610 This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc, xfcp, hisi_sas, cxlflash, qla2xxx. In the absence of Nic, we're also taking target updates which are mostly minor except for the tcmu refactor. The only real core change to worry about is the removal of high page bouncing (in sas, storvsc and iscsi). This has been well tested and no problems have shown up so far. Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWx1pbCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUucAP42pccS ziKyiOizuxv9fZ4Q+nXd1A9zhI5tqqpkHjcQegEA40qiZSi3EKGKR8W0UpX7Ntmo tqrZJGojx9lnrAM2RbQ= =NMXg -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc, xfcp, hisi_sas, cxlflash, qla2xxx. In the absence of Nic, we're also taking target updates which are mostly minor except for the tcmu refactor. The only real core change to worry about is the removal of high page bouncing (in sas, storvsc and iscsi). This has been well tested and no problems have shown up so far" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits) scsi: lpfc: update driver version to 12.0.0.4 scsi: lpfc: Fix port initialization failure. scsi: lpfc: Fix 16gb hbas failing cq create. scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc scsi: lpfc: correct oversubscription of nvme io requests for an adapter scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx) scsi: hisi_sas: Mark PHY as in reset for nexus reset scsi: hisi_sas: Fix return value when get_free_slot() failed scsi: hisi_sas: Terminate STP reject quickly for v2 hw scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot scsi: hisi_sas: Try wait commands before before controller reset scsi: hisi_sas: Init disks after controller reset scsi: hisi_sas: Create a scsi_host_template per HW module scsi: hisi_sas: Reset disks when discovered scsi: hisi_sas: Add LED feature for v3 hw scsi: hisi_sas: Change common allocation mode of device id scsi: hisi_sas: change slot index allocation mode scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate() scsi: hisi_sas: fix a typo in hisi_sas_task_prep() ...	2018-06-10 13:01:12 -07:00
Linus Torvalds	2857676045	- Introduce arithmetic overflow test helper functions (Rasmus) - Use overflow helpers in 2-factor allocators (Kees, Rasmus) - Introduce overflow test module (Rasmus, Kees) - Introduce saturating size helper functions (Matthew, Kees) - Treewide use of struct_size() for allocators (Kees) -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g 8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J 2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4 bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61 jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa 3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04 KCDVLYPxwQQqK1Mqig== =/3L8 -----END PGP SIGNATURE----- Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "This adds the new overflow checking helpers and adds them to the 2-factor argument allocators. And this adds the saturating size helpers and does a treewide replacement for the struct_size() usage. Additionally this adds the overflow testing modules to make sure everything works. I'm still working on the treewide replacements for allocators with "simple" multiplied arguments: alloc(a b, ...) -> alloc_array(a, b, ...) and zalloc(a * b, ...) -> calloc(a, b, ...) as well as the more complex cases, but that's separable from this portion of the series. I expect to have the rest sent before -rc1 closes; there are a lot of messy cases to clean up. Summary: - Introduce arithmetic overflow test helper functions (Rasmus) - Use overflow helpers in 2-factor allocators (Kees, Rasmus) - Introduce overflow test module (Rasmus, Kees) - Introduce saturating size helper functions (Matthew, Kees) - Treewide use of struct_size() for allocators (Kees)" tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Use struct_size() for devm_kmalloc() and friends treewide: Use struct_size() for vmalloc()-family treewide: Use struct_size() for kmalloc()-family device: Use overflow helpers for devm_kmalloc() mm: Use overflow helpers in kvmalloc() mm: Use overflow helpers in kmalloc_array() test_overflow: Add memory allocation overflow tests overflow.h: Add allocation size calculation helpers test_overflow: Report test failures test_overflow: macrofy some more, do more tests for free lib: add runtime test of check__overflow functions compiler.h: enable builtin overflow checkers and add fallback code	2018-06-06 17:27:14 -07:00
Kees Cook	acafe7e302	treewide: Use struct_size() for kmalloc()-family One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void ) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof pkey_cache + tprops->pkey_tbl_len // sizeof pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc\|kzalloc\|kvmalloc\|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(VAR) + COUNT * sizeof(VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc\|kzalloc\|kvmalloc\|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(VAR) + COUNT sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc\|kzalloc\|kvmalloc\|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-06 11:15:43 -07:00
Mathieu Malaterre	66448bc274	workqueue: move function definitions within CONFIG_SMP block In commit `7ee681b252` ("workqueue: Convert to state machine callbacks"), three new function definitions were added: ‘workqueue_prepare_cpu’, ‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’. Move these function definitions within a CONFIG_SMP block since they are not used outside of it. This will match function declarations in header <include/linux/workqueue.h>, and silence the following gcc warning (W=1): kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes] kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes] kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-05-23 11:16:58 -07:00
Tejun Heo	197f6accac	workqueue: Make sure struct worker is accessible for wq_worker_comm() The worker struct could already be freed when wq_worker_comm() tries to access it for reporting. This patch protects PF_WQ_WORKER modifications with wq_pool_attach_mutex and makes wq_worker_comm() test the flag before dereferencing worker from kthread_data(), which ensures that it only dereferences when the worker struct is valid. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Lai Jiangshan <jiangshanlai@gmail.com> Fixes: `6b59808bfe` ("workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}")	2018-05-21 08:04:35 -07:00
Tejun Heo	6b59808bfe	workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status} There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef \| grep kworker root 4 2 0 Feb25 ? 00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ? 00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ? 00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ? 00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ? 00:00:00 [kworker/3:0H] ... This patch makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient Unfortunately, ps(1) truncates comm to 15 characters, # ps 25 PID TTY STAT TIME COMMAND 25 ? I 0:00 [kworker/2:0-eve] making it a lot less useful; however, this should be an easy fix from ps(1) side. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Craig Small <csmall@enc.com.au>	2018-05-18 08:47:13 -07:00
Tejun Heo	8bf895931e	workqueue: Set worker->desc to workqueue name by default Work functions can use set_worker_desc() to improve the visibility of what the worker task is doing. Currently, the desc field is unset at the beginning of each execution and there is a separate field to track the field is set during the current execution. Instead of leaving empty till desc is set, worker->desc can be used to remember the last workqueue the worker worked on by default and users that use set_worker_desc() can override it to something more informative as necessary. This simplifies desc handling and helps tracking the last workqueue that the worker exected on to improve visibility. Signed-off-by: Tejun Heo <tj@kernel.org>	2018-05-18 08:47:13 -07:00
Tejun Heo	a2d812a27a	workqueue: Make worker_attach/detach_pool() update worker->pool For historical reasons, the worker attach/detach functions don't currently manage worker->pool and the callers are manually and inconsistently updating it. This patch moves worker->pool updates into the worker attach/detach functions. This makes worker->pool consistent and clearly defines how worker->pool updates are synchronized. This will help later workqueue visibility improvements by allowing safe access to workqueue information from worker->task. Signed-off-by: Tejun Heo <tj@kernel.org>	2018-05-18 08:47:13 -07:00
Tejun Heo	1258fae73c	workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex To improve workqueue visibility, we want to be able to access workqueue information from worker tasks. The per-pool attach mutex makes that difficult because there's no way of stabilizing task -> worker pool association without knowing the pool first. Worker attach/detach is a slow path and there's no need for different pools to be able to perform them concurrently. This patch replaces the per-pool attach_mutex with global wq_pool_attach_mutex to prepare for visibility improvement changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2018-05-18 08:47:13 -07:00
Steffen Maier	5c750d58e9	scsi: zfcp: workqueue: set description for port work items with their WWPN as context As a prerequisite, complement commit `3d1cb2059d` ("workqueue: include workqueue info when printing debug dump of a worker task") to be usable with kernel modules by exporting the symbol set_worker_desc(). Current built-in user was introduced with commit `ef3b101925` ("writeback: set worker desc to identify writeback workers in task dumps"). Can help distinguishing work items which do not have adapter scope. Description is printed out with task dump for debugging on WARN, BUG, panic, or magic-sysrq [show-task-states(t)]. Example: $ echo 0 >\| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed & $ echo 't' >\| /proc/sysrq-trigger $ dmesg sysrq: SysRq : Show State task PC stack pid father ... zfcp_q_0.0.1880 S14640 2165 2 0x02000000 Call Trace: ([<00000000009df464>] __schedule+0xbf4/0xc78) [<00000000009df57c>] schedule+0x94/0xc0 [<0000000000168654>] rescuer_thread+0x33c/0x3a0 [<000000000016f8be>] kthread+0x166/0x178 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc [<00000000009e71ec>] kernel_thread_starter+0x0/0xc no locks held by zfcp_q_0.0.1880/2165. ... kworker/u512:2 D11280 2193 2 0x02000000 Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327) ^^^^^^^^^^^^^^^^^^^^^ Call Trace: ([<00000000009df464>] __schedule+0xbf4/0xc78) [<00000000009df57c>] schedule+0x94/0xc0 [<00000000009e50c0>] schedule_timeout+0x488/0x4d0 [<00000000001e425c>] msleep+0x5c/0x78 >>test code only<< [<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp] [<0000000000167154>] process_one_work+0x3b4/0x718 [<000000000016771c>] worker_thread+0x264/0x408 [<000000000016f8be>] kthread+0x166/0x178 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc [<00000000009e71ec>] kernel_thread_starter+0x0/0xc 2 locks held by kworker/u512:2/2193: #0: (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718 #1: ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718 ... ============================================= Showing busy workqueues and worker pools: workqueue zfcp_q_0.0.1880: flags=0x2000a pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1 in-flight: 2193:zfcp_scsi_rport_work [zfcp] pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311 Work items with adapter scope are already identified by the workqueue name "zfcp_q_<devbusid>" and the work item function name. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-05-18 11:27:21 -04:00
Linus Torvalds	d92cd810e6	Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "rcu_work addition and a couple trivial changes" * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove the comment about the old manager_arb mutex workqueue: fix the comments of nr_idle fs/aio: Use rcu_work instead of explicit rcu and work item cgroup: Use rcu_work instead of explicit rcu and work item RCU, workqueue: Implement rcu_work	2018-04-03 18:00:13 -07:00
Lai Jiangshan	f75da8a8a9	workqueue: remove the comment about the old manager_arb mutex The manager_arb mutex doesn't exist any more. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-03-20 13:01:45 -07:00
Lai Jiangshan	5826cc8f5a	workqueue: fix the comments of nr_idle Since the worker rebinding behavior was refactored, there is no idle worker off the idle_list now. The comment is outdated and can be just removed. It also groups nr_workers and nr_idle together. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-03-20 13:01:36 -07:00
Ingo Molnar	10c18c44a6	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 08:08:02 +01:00
Tejun Heo	05f0fe6b74	RCU, workqueue: Implement rcu_work There are cases where RCU callback needs to be bounced to a sleepable context. This is currently done by the RCU callback queueing a work item, which can be cumbersome to write and confusing to read. This patch introduces rcu_work, a workqueue work variant which gets executed after a RCU grace period, and converts the open coded bouncing in fs/aio and kernel/cgroup. v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior after queue_rcu_work(). v2: Use rcu_barrier() instead of synchronize_rcu() to wait for completion of previously queued rcu callback as per Paul. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2018-03-19 10:12:03 -07:00
Stephen Hemminger	6417250d3f	workqueue: remove unused cancel_work() Found this by accident. There are no usages of bare cancel_work() in current kernel source. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-03-13 13:37:42 -07:00
Arvind Yadav	537f4146c5	workqueue: use put_device() instead of kfree() Never directly free @dev after calling device_register(), even if it returned an error! Always use put_device() to give up the reference initialized in this function instead. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-03-13 13:26:03 -07:00
Ingo Molnar	fc4c5a3828	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-09 07:32:20 +01:00
Dave Airlie	dfe8db2237	Fixes for 4.16. I contains fixes for deadlock on runtime suspend on few drivers, a memory leak on non-blocking commits, a crash on color-eviction. The is also meson and edid fixes, plus a fix for a doc warning. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJajY3SAAoJEEN0HIUfOBk0/5IP/jTa0VKe7UurEzj9Vzgt4USu tVre4MGN42peY2PbVSsBmvHAOeyII7la1/NkiFi8wZKQ2MXw43NenKOcRLDW0r9b 6U8Tlq3sU//NdUDAiLLx9hKb+i31ag+wodvULt0PKtEWDsxWDSRZUo792as2YUkC VxHuIQywNABohn2Ya8Og1dON25GD7zRzNzH7O+g+fds/Qvav0504u2v10jBKJC0D IB2oc3ZtJR8n0dFpzhnEB7YkxyvkrsWZQ1LtutGFgrr54F0KVHvAm/VMZ5qzyCRi kvJN81OFo0xpdE7ZMSQ5YAvcPsEC5ifSNaaxpawsM904H7fS6FNhHMg7cGGi1f7R B8YbLrdy+mBnQPNNbPcDPQA+YN/tRv4rRmmdLdkDbdY1GM/JJ4C7PTuLL6mX1iWU DuHiaFS0KZGoS0XCVbvhLkPt5fsmvp+QxBpeNAtxgOdn2pRquDmGZ1jTVEG2mw5U rqoPURa3urqdSwj8ba0jbJo6WBAmb1uWeyJ7xpyUVhR9SR30+URYVWwJEPDOgTnQ PaEzjobntgDLaq5NbhpEvmYmylv1SPkucGtkCtwPxIrrh5Z84pZTJ1th2ogfn3Ti VL25dTlzFpsjEMgC72wCi0eiP7qLVTX9vHYZBzkeIjIWDH0rCnCFxvjwmD/aVUbz Ex1/fGNEVkFupcYLu7m4 =555h -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2018-02-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Fixes for 4.16. I contains fixes for deadlock on runtime suspend on few drivers, a memory leak on non-blocking commits, a crash on color-eviction. The is also meson and edid fixes, plus a fix for a doc warning. * tag 'drm-misc-fixes-2018-02-21' of git://anongit.freedesktop.org/drm/drm-misc: drm/tve200: fix kernel-doc documentation comment include drm/meson: fix vsync buffer update drm: Handle unexpected holes in color-eviction drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA drm/amdgpu: Fix deadlock on runtime suspend drm/radeon: Fix deadlock on runtime suspend drm/nouveau: Fix deadlock on runtime suspend drm: Allow determining if current task is output poll worker workqueue: Allow retrieval of current task's work struct drm/atomic: Fix memleak on ERESTARTSYS during non-blocking commits	2018-02-22 08:39:26 +10:00
Frederic Weisbecker	1bda3f8087	sched/isolation: Isolate workqueues when "nohz_full=" is set As we prepare for offloading the residual 1hz scheduler ticks to workqueue, let's affine those to housekeepers so that they don't interrupt the CPUs that don't want to be disturbed. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1519186649-3242-5-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-02-21 09:49:08 +01:00
Lukas Wunner	27d4ee0307	workqueue: Allow retrieval of current task's work struct Introduce a helper to retrieve the current task's work struct if it is a workqueue worker. This allows us to fix a long-standing deadlock in several DRM drivers wherein the ->runtime_suspend callback waits for a specific worker to finish and that worker in turn calls a function which waits for runtime suspend to finish. That function is invoked from multiple call sites and waiting for runtime suspend to finish is the correct thing to do except if it's executing in the context of the worker. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de	2018-02-16 22:24:25 +01:00
Linus Torvalds	5d8515bc23	Staging/IIO patches for 4.16-rc1 Here is the big Staging and IIO driver patches for 4.16-rc1. There is the normal amount of new IIO drivers added, like all releases. The networking IPX and the ncpfs filesystem are moved into the staging tree, as they are on their way out of the kernel due to lack of use anymore. The visorbus subsystem finall has started moving out of the staging tree to the "real" part of the kernel, and the most and fsl-mc codebases are almost ready to move out, that will probably happen for 4.17-rc1 if all goes well. Other than that, there is a bunch of license header cleanups in the tree, along with the normal amount of coding style churn that we all know and love for this codebase. I also got frustrated at the Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting huge chunks of it that were never even being used. Full details of everything is in the shortlog. All of these patches have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLxoA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yk4vgCgjeMlwhtar65DIticIRj626EFxiQAnjGmH8Kd d9Xz2Piq8X47uSsC/6AE =xxMT -----END PGP SIGNATURE----- Merge tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big Staging and IIO driver patches for 4.16-rc1. There is the normal amount of new IIO drivers added, like all releases. The networking IPX and the ncpfs filesystem are moved into the staging tree, as they are on their way out of the kernel due to lack of use anymore. The visorbus subsystem finall has started moving out of the staging tree to the "real" part of the kernel, and the most and fsl-mc codebases are almost ready to move out, that will probably happen for 4.17-rc1 if all goes well. Other than that, there is a bunch of license header cleanups in the tree, along with the normal amount of coding style churn that we all know and love for this codebase. I also got frustrated at the Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting huge chunks of it that were never even being used. Full details of everything is in the shortlog. All of these patches have been in linux-next for a while with no reported issues" * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits) staging: rtlwifi: remove redundant initialization of 'cfg_cmd' staging: rtl8723bs: remove a couple of redundant initializations staging: comedi: reformat lines to 80 chars or less staging: lustre: separate a connection destroy from free struct kib_conn Staging: rtl8723bs: Use !x instead of NULL comparison Staging: rtl8723bs: Remove dead code Staging: rtl8723bs: Change names to conform to the kernel code staging: ccree: Fix missing blank line after declaration staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd' staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST staging: fbtft: remove unused FB_TFT_SSD1325 kconfig staging: comedi: dt2811: remove redundant initialization of 'ns' staging: wilc1000: fix alignments to match open parenthesis staging: wilc1000: removed unnecessary defined enums typedef staging: wilc1000: remove unnecessary use of parentheses staging: rtl8192u: remove redundant initialization of 'timeout' staging: sm750fb: fix CamelCase for dispSet var staging: lustre: lnet/selftest: fix compile error on UP build staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons staging: rts5208: Fix "seg_no" calculation in reset_ms_card() ...	2018-02-01 09:51:57 -08:00
Linus Torvalds	f8cc87b6c1	Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Workqueue has an early init trick where workqueues can be created and work items queued on them before the workqueue subsystem is online. This helps simplifying early init and operation of low level subsystems which use workqueues for managerial things which aren't depended upon early during boot. Out of laziness, the early init didn't cover workqueues with WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too. This was originally brought up for RCU but RCU didn't actually need this. I still think it's a good idea to cover it" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: allow WQ_MEM_RECLAIM on early init workqueues workqueue: separate out init_rescuer()	2018-01-30 14:45:39 -08:00
Linus Torvalds	d772794637	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle were: - Updates to use cond_resched() instead of cond_resched_rcu_qs() where feasible (currently everywhere except in kernel/rcu and in kernel/torture.c). Also a couple of fixes to avoid sending IPIs to offline CPUs. - Updates to simplify RCU's dyntick-idle handling. - Updates to remove almost all uses of smp_read_barrier_depends() and read_barrier_depends(). - Torture-test updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) torture: Save a line in stutter_wait(): while -> for torture: Eliminate torture_runnable and perf_runnable torture: Make stutter less vulnerable to compilers and races locking/locktorture: Fix num reader/writer corner cases locking/locktorture: Fix rwsem reader_delay torture: Place all torture-test modules in one MAINTAINERS group rcutorture/kvm-build.sh: Skip build directory check rcutorture: Simplify functions.sh include path rcutorture: Simplify logging rcutorture/kvm-recheck-*: Improve result directory readability check rcutorture/kvm.sh: Support execution from any directory rcutorture/kvm.sh: Use consistent help text for --qemu-args rcutorture/kvm.sh: Remove unused variable, `alldone` rcutorture: Remove unused script, config2frag.sh rcutorture/configinit: Fix build directory error message rcutorture: Preempt RCU-preempt readers more vigorously torture: Reduce #ifdefs for preempt_schedule() rcu: Remove have_rcu_nocb_mask from tree_plugin.h rcu: Add comment giving debug strategy for double call_rcu() tracing, rcu: Hide trace event rcu_nocb_wake when not used ...	2018-01-30 10:15:30 -08:00
NeilBrown	6106c0f824	staging: lustre: lnet: convert selftest to use workqueues Instead of the cfs workitem library, use workqueues. As lnet wants to provide a cpu mask of allowed cpus, it needs to be a WQ_UNBOUND work queue so that tasks can run on cpus other than where they were submitted. This patch also exported apply_workqueue_attrs() which is a documented part of the workqueue API, that isn't currently exported. lustre needs it to allow workqueue thread to be limited to a subset of CPUs. Acked-by: Tejun Heo <tj@kernel.org> (for export of apply_workqueue_attrs) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-15 15:44:08 +01:00
Sergey Senozhatsky	62635ea8c1	workqueue: avoid hard lockups in show_workqueue_state() show_workqueue_state() can print out a lot of messages while being in atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console device is slow it may end up triggering NMI hard lockup watchdog. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v4.5+	2018-01-12 11:39:49 -08:00
Tejun Heo	40c17f75df	workqueue: allow WQ_MEM_RECLAIM on early init workqueues Workqueues can be created early during boot before workqueue subsystem in fully online - work items are queued waiting for later full initialization. However, early init wasn't supported for WQ_MEM_RECLAIM workqueues causing unnecessary annoyances for a subset of users. Expand early init support to include WQ_MEM_RECLAIM workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>	2018-01-08 05:38:37 -08:00
Tejun Heo	983c751532	workqueue: separate out init_rescuer() Separate out init_rescuer() from __alloc_workqueue_key() to prepare for early init support for WQ_MEM_RECLAIM. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>	2018-01-08 05:38:32 -08:00
Ingo Molnar	475c5ee193	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Updates to use cond_resched() instead of cond_resched_rcu_qs() where feasible (currently everywhere except in kernel/rcu and in kernel/torture.c). Also a couple of fixes to avoid sending IPIs to offline CPUs. - Updates to simplify RCU's dyntick-idle handling. - Updates to remove almost all uses of smp_read_barrier_depends() and read_barrier_depends(). - Miscellaneous fixes. - Torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-01-03 14:14:18 +01:00
Sergey Senozhatsky	01dfee9582	workqueue: remove unneeded kallsyms include The filw was converted from print_symbol() to %pf some time ago (`044c782ce3` "workqueue: fix checkpatch issues"). kallsyms does not seem to be needed anymore. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-12-11 07:15:43 -08:00
Lai Jiangshan	62408c1ef0	workqueue/hotplug: remove the workaround in rebind_workers() Since the cpu/hotplug refactoring, DOWN_FAILED is never called without preceding DOWN_PREPARE making the workaround unnecessary. Remove it. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-12-04 14:46:09 -08:00
Lai Jiangshan	e8b3f8db7a	workqueue/hotplug: simplify workqueue_offline_cpu() Since the recent cpu/hotplug refactoring, workqueue_offline_cpu() is guaranteed to run on the local cpu which is going offline. This also fixes the following deadlock by removing work item scheduling and flushing from CPU hotplug path. http://lkml.kernel.org/r/1504764252-29091-1-git-send-email-prsood@codeaurora.org tj: Description update. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-12-04 14:44:11 -08:00
Paul E. McKenney	a7e6425ea5	workqueue: Eliminate cond_resched_rcu_qs() in favor of cond_resched() Now that cond_resched() also provides RCU quiescent states when needed, it can be used in place of cond_resched_rcu_qs(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>	2017-12-04 10:28:10 -08:00
Tal Shorer	c98a980509	workqueue: respect isolated cpus when queueing an unbound work Initialize wq_unbound_cpumask to exclude cpus that were isolated by the cmdline's isolcpus parameter. Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-11-27 08:57:00 -08:00
Kees Cook	841b86f328	treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's\|$TIMER_FUNC_TYPE$\|\|g' \ $(git grep TIMER_FUNC_TYPE \| cut -d: -f1 \| sort -u) perl -pi -e 's\|$TIMER_DATA_TYPE$\|\|g' \ $(git grep TIMER_DATA_TYPE \| cut -d: -f1 \| sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>	2017-11-21 16:35:54 -08:00
Linus Torvalds	0be500363c	Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "There was a commit to make unbound kworkers respect cpu isolation but it conflicted with the restructuring of cpu isolation and got reverted, so the only thing left is the trivial comment fix. Will retry the cpu isolation change after this merge window" * 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix comment for unbound workqueue's attrbutes Revert "workqueue: respect isolated cpus when queueing an unbound work" workqueue: respect isolated cpus when queueing an unbound work	2017-11-15 14:15:21 -08:00
Linus Torvalds	2bcc673101	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Yet another big pile of changes: - More year 2038 work from Arnd slowly reaching the point where we need to think about the syscalls themself. - A new timer function which allows to conditionally (re)arm a timer only when it's either not running or the new expiry time is sooner than the armed expiry time. This allows to use a single timer for multiple timeout requirements w/o caring about the first expiry time at the call site. - A new NMI safe accessor to clock real time for the printk timestamp work. Can be used by tracing, perf as well if required. - A large number of timer setup conversions from Kees which got collected here because either maintainers requested so or they simply got ignored. As Kees pointed out already there are a few trivial merge conflicts and some redundant commits which was unavoidable due to the size of this conversion effort. - Avoid a redundant iteration in the timer wheel softirq processing. - Provide a mechanism to treat RTC implementations depending on their hardware properties, i.e. don't inflict the write at the 0.5 seconds boundary which originates from the PC CMOS RTC to all RTCs. No functional change as drivers need to be updated separately. - The usual small updates to core code clocksource drivers. Nothing really exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) timers: Add a function to start/reduce a timer pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() timer: Prepare to change all DEFINE_TIMER() callbacks netfilter: ipvs: Convert timers to use timer_setup() scsi: qla2xxx: Convert timers to use timer_setup() block/aoe: discover_timer: Convert timers to use timer_setup() ide: Convert timers to use timer_setup() drbd: Convert timers to use timer_setup() mailbox: Convert timers to use timer_setup() crypto: Convert timers to use timer_setup() drivers/pcmcia: omap1: Fix error in automated timer conversion ARM: footbridge: Fix typo in timer conversion drivers/sgi-xp: Convert timers to use timer_setup() drivers/pcmcia: Convert timers to use timer_setup() drivers/memstick: Convert timers to use timer_setup() drivers/macintosh: Convert timers to use timer_setup() hwrng/xgene-rng: Convert timers to use timer_setup() auxdisplay: Convert timers to use timer_setup() sparc/led: Convert timers to use timer_setup() mips: ip22/32: Convert timers to use timer_setup() ...	2017-11-13 17:56:58 -08:00
Frederic Weisbecker	8e8eb73075	workqueue: Use lockdep to assert IRQs are disabled/enabled Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1509980490-4285-4-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-11-08 11:13:48 +01:00
Ingo Molnar	8c5db92a70	Merge branch 'linus' into locking/core, to resolve conflicts Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-11-07 10:32:44 +01:00
Wang Long	9a19b46386	workqueue: Fix comment for unbound workqueue's attrbutes Signed-off-by: Wang Long <wanglong19@meituan.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-11-06 07:04:47 -08:00
Tejun Heo	edbfd9112f	Revert "workqueue: respect isolated cpus when queueing an unbound work" This reverts commit `b5149873a0`. It conflicts with the following isolcpus change from the sched branch. `edb9382175` ("sched/isolation: Move isolcpus= handling to the housekeeping code") Let's revert for now. Signed-off-by: Tejun Heo <tj@kernel.org>	2017-11-03 07:02:15 -07:00
Byungchul Park	fd1a5b04df	workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes The workqueue code added manual lock acquisition annotations to catch deadlocks. After lockdepcrossrelease was introduced, some of those became redundant, since wait_for_completion() already does the acquisition and tracking. Remove the duplicate annotations. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-25 12:19:03 +02:00
Mark Rutland	c95491ed6d	locking/atomics, workqueue: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't currently harmful. However, for some features it is necessary to instrument reads and writes separately, which is not possible with ACCESS_ONCE(). This distinction is critical to correct operation. It's possible to transform the bulk of kernel code using the Coccinelle script below. However, this doesn't handle comments, leaving references to ACCESS_ONCE() instances which have been removed. As a preparatory step, this patch converts the workqueue code and comments to use {READ,WRITE}_ONCE() consistently. ---- virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-12-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-25 11:01:03 +02:00
Tal Shorer	b5149873a0	workqueue: respect isolated cpus when queueing an unbound work Initialize wq_unbound_cpumask to exclude cpus that were isolated by the cmdline's isolcpus parameter. Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-10-21 09:32:15 -07:00
Kees Cook	32a6c7233c	workqueue: Convert timers to use timer_setup() (part 2) In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. (The prior workqueue patch missed a few timers.) Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Link: https://lkml.kernel.org/r/20171016225825.GA99101@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-10-18 17:04:25 +02:00
Tejun Heo	692b48258d	workqueue: replace pool->manager_arb mutex with a flag Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by lockdep: [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted [ 1270.473240] ----------------------------------------------------- [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280 [ 1270.474994] [ 1270.474994] and this task is already holding: [ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0 [ 1270.476046] which would create a new lock dependency: [ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.} [ 1270.476949] [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock: [ 1270.477553] (&pool->lock/1){-.-.} ... [ 1270.488900] to a HARDIRQ-irq-unsafe lock: [ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.} ... [ 1270.494735] Possible interrupt unsafe locking scenario: [ 1270.494735] [ 1270.495250] CPU0 CPU1 [ 1270.495600] ---- ---- [ 1270.495947] lock(&(&lock->wait_lock)->rlock); [ 1270.496295] local_irq_disable(); [ 1270.496753] lock(&pool->lock/1); [ 1270.497205] lock(&(&lock->wait_lock)->rlock); [ 1270.497744] <Interrupt> [ 1270.497948] lock(&pool->lock/1); , which will cause a irq inversion deadlock if the above lock scenario happens. The root cause of this safe -> unsafe lock order is the mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock held. Unlocking mutex while holding an irq spinlock was never safe and this problem has been around forever but it never got noticed because the only time the mutex is usually trylocked while holding irqlock making actual failures very unlikely and lockdep annotation missed the condition until the recent `b9c16a0e1f` ("locking/mutex: Fix lockdep_assert_held() fail"). Using mutex for pool->manager_arb has always been a bit of stretch. It primarily is an mechanism to arbitrate managership between workers which can easily be done with a pool flag. The only reason it became a mutex is that pool destruction path wants to exclude parallel managing operations. This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE and make the destruction path wait for the current manager on a wait queue. v2: Drop unnecessary flag clearing before pool destruction as suggested by Boqun. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org	2017-10-10 07:13:57 -07:00
Kees Cook	8c20feb606	workqueue: Convert callback to use from_timer() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch workqueue to use from_timer() and pass the timer pointer explicitly. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Len Brown <len.brown@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Michael Reed <mdr@sgi.com> Cc: netdev@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-14-git-send-email-keescook@chromium.org	2017-10-05 15:01:22 +02:00
Kees Cook	5cd79d6abd	timer: Remove users of TIMER_DEFERRED_INITIALIZER This removes uses of TIMER_DEFERRED_INITIALIZER and chooses a location to call timer_setup() from before add_timer() or mod_timer() is called. Adjusts callbacks to use from_timer() as needed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Len Brown <len.brown@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Michael Reed <mdr@sgi.com> Cc: netdev@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-7-git-send-email-keescook@chromium.org	2017-10-05 15:01:18 +02:00
Linus Torvalds	9954d4892a	Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Nothing major. I introduced a flag collsion bug during v4.13 cycle which is fixed in this pull request. Fortunately, the flag is for debugging / verification and the bug isn't critical" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix flag collision workqueue: Use TASK_IDLE workqueue: fix path to documentation workqueue: doc change for ST behavior on NUMA systems	2017-09-06 21:59:31 -07:00
Tejun Heo	058fc47ee2	Merge branch 'for-4.13-fixes' into for-4.14	2017-09-05 06:33:41 -07:00
Peter Zijlstra	f52be57080	locking/lockdep: Untangle xhlock history save/restore from task independence Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to ensure the temporal IRQ events don't interact with task state, the XHLOCK_PROC is a fundament different beast that just happens to share the interface. The purpose of XHLOCK_PROC is to annotate independent execution inside one task. For example workqueues, each work should appear to run in its own 'pristine' 'task'. Remove XHLOCK_PROC in favour of its own interface to avoid confusion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: kernel-team@lge.com Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 15:14:38 +02:00
Peter Zijlstra	e6f3faa734	locking/lockdep: Fix workqueue crossrelease annotation The new completion/crossrelease annotations interact unfavourable with the extant flush_work()/flush_workqueue() annotations. The problem is that when a single work class does: wait_for_completion(&C) and complete(&C) in different executions, we'll build dependencies like: lock_map_acquire(W) complete_acquire(C) and lock_map_acquire(W) complete_release(C) which results in the dependency chain: W->C->W, which lockdep thinks spells deadlock, even though there is no deadlock potential since works are ran concurrently. One possibility would be to change the work 'lock' to recursive-read, but that would mean hitting a lockdep limitation on recursive locks. Also, unconditinoally switching to recursive-read here would fail to detect the actual deadlock on single-threaded workqueues, which do have a problem with this. For now, forcefully disregard these locks for crossrelease. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: byungchul.park@lge.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: oleg@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-25 11:06:33 +02:00
Peter Zijlstra	a1d14934ea	workqueue/lockdep: 'Fix' flush_work() annotation The flush_work() annotation as introduced by commit: `e159489baa` ("workqueue: relax lockdep annotation on flush_work()") hits on the lockdep problem with recursive read locks. The situation as described is: Work W1: Work W2: Task: ARR(Q) ARR(Q) flush_workqueue(Q) A(W1) A(W2) A(Q) flush_work(W2) R(Q) A(W2) R(W2) if (special) A(Q) else ARR(Q) R(Q) where: A - acquire, ARR - acquire-read-recursive, R - release. Where under 'special' conditions we want to trigger a lock recursion deadlock, but otherwise allow the flush_work(). The allowing is done by using recursive read locks (ARR), but lockdep is broken for recursive stuff. However, there appears to be no need to acquire the lock if we're not 'special', so if we remove the 'else' clause things become much simpler and no longer need the recursion thing at all. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: byungchul.park@lge.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: oleg@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-25 11:06:32 +02:00
Peter Zijlstra	c5a94a618e	workqueue: Use TASK_IDLE Workqueues don't use signals, it (ab)uses TASK_INTERRUPTIBLE to avoid increasing the loadavg numbers. We've 'recently' introduced TASK_IDLE for this case: `80ed87c8a9` ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE") use it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-08-23 06:30:35 -07:00
Boqun Feng	52fa5bc5cb	locking/lockdep: Explicitly initialize wq_barrier::done::map With the new lockdep crossrelease feature, which checks completions usage, a false positive is reported in the workqueue code: > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released > Task C : acquired of lock#3 -> wait for completion of barr->done > (Task C is in lru_add_drain_all_cpuslocked()) > Worker D : wait for wfc.work to be released -> will complete barr->done Such a dead lock can not happen because Task C's barr->done and Worker D's barr->done can not be the same instance. The reason of this false positive is we initialize all wq_barrier::done at insert_wq_barrier() via init_completion(), which makes them belong to the same lock class, therefore, impossible circles are reported. To fix this, explicitly initialize the lockdep map for wq_barrier::done in insert_wq_barrier(), so that the lock class key of wq_barrier::done is a subkey of the corresponding work_struct, as a result we won't build a dependency between a wq_barrier with a unrelated work, and we can differ wq barriers based on the related works, so the false positive above is avoided. Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE to make the code simple and away from unnecessary #ifdefs. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-17 12:12:33 +02:00
Byungchul Park	b09be676e0	locking/lockdep: Implement the 'crossrelease' feature Lockdep is a runtime locking correctness validator that detects and reports a deadlock or its possibility by checking dependencies between locks. It's useful since it does not report just an actual deadlock but also the possibility of a deadlock that has not actually happened yet. That enables problems to be fixed before they affect real systems. However, this facility is only applicable to typical locks, such as spinlocks and mutexes, which are normally released within the context in which they were acquired. However, synchronization primitives like page locks or completions, which are allowed to be released in any context, also create dependencies and can cause a deadlock. So lockdep should track these locks to do a better job. The 'crossrelease' implementation makes these primitives also be tracked. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 12:29:07 +02:00
Benjamin Peterson	9a2614916a	workqueue: fix path to documentation Signed-off-by: Benjamin Peterson <bp@benjamin.pe> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-08-07 08:03:24 -07:00
Michael Bringmann	1ad0f0a7aa	workqueue: Work around edge cases for calc of pool's cpumask There is an underlying assumption/trade-off in many layers of the Linux system that CPU <-> node mapping is static. This is despite the presence of features like NUMA and 'hotplug' that support the dynamic addition/ removal of fundamental system resources like CPUs and memory. PowerPC systems, however, do provide extensive features for the dynamic change of resources available to a system. Currently, there is little or no synchronization protection around the updating of the CPU <-> node mapping, and the export/update of this information for other layers / modules. In systems which can change this mapping during 'hotplug', like PowerPC, the information is changing underneath all layers that might reference it. This patch attempts to ensure that a valid, usable cpumask attribute is used by the workqueue infrastructure when setting up new resource pools. It prevents a crash that has been observed when an 'empty' cpumask is passed along to the worker/task scheduling code. It is intended as a temporary workaround until a more fundamental review and correction of the issue can be done. [With additions to the patch provided by Tejun Hao <tj@kernel.org>] Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-07-28 11:05:52 -04:00
Tejun Heo	0a94efb5ac	workqueue: implicit ordered attribute should be overridable `5c0338c687` ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `5c0338c687` ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")	2017-07-25 13:28:56 -04:00
Tejun Heo	5c0338c687	workqueue: restore WQ_UNBOUND/max_active==1 to be ordered The combination of WQ_UNBOUND and max_active == 1 used to imply ordered execution. After NUMA affinity `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues"), this is no longer true due to per-node worker pools. While the right way to create an ordered workqueue is alloc_ordered_workqueue(), the documentation has been misleading for a long time and people do use WQ_UNBOUND and max_active == 1 for ordered workqueues which can lead to subtle bugs which are very difficult to trigger. It's unlikely that we'd see noticeable performance impact by enforcing ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's automatically set __WQ_ORDERED for those workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: Alexei Potashnik <alexei@purestorage.com> Fixes: `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues") Cc: stable@vger.kernel.org # v3.10+	2017-07-19 11:24:19 -04:00
Ingo Molnar	ac6424b981	sched/wait: Rename wait_queue_t => wait_queue_entry_t Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue entry. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:18:27 +02:00
Linus Torvalds	3527d3e951	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...	2017-05-01 19:12:53 -07:00
Linus Torvalds	ad1490bcd2	Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue update from Tejun Heo: "One trivial patch to use setup_deferrable_timer() instead of open-coding the initialization" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use setup_deferrable_timer	2017-05-01 13:49:27 -07:00
Thomas Gleixner	0e8d6a9336	workqueue: Provide work_on_cpu_safe() work_on_cpu() is not protected against CPU hotplug. For code which requires to be either executed on an online CPU or to fail if the CPU is not available the callsite would have to protect against CPU hotplug. Provide a function which does get/put_online_cpus() around the call to work_on_cpu() and fails the call with -ENODEV if the target CPU is not online. Preparatory patch to convert several racy task affinity manipulations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-15 12:20:53 +02:00
Geliang Tang	c30fb26b11	workqueue: use setup_deferrable_timer Use setup_deferrable_timer() instead of init_timer_deferrable() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-03-06 15:42:20 -05:00
Tejun Heo	637fdbae60	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq If queue_delayed_work() gets called with NULL @wq, the kernel will oops asynchronuosly on timer expiration which isn't too helpful in tracking down the offender. This actually happened with smc. __queue_delayed_work() already does several input sanity checks synchronously. Add NULL @wq check. Reported-by: Dave Jones <davej@codemonkey.org.uk> Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk Signed-off-by: Tejun Heo <tj@kernel.org>	2017-03-06 15:33:42 -05:00
Kees Cook	dfb4357da6	time: Remove CONFIG_TIMER_STATS Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: linux-doc@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Xing Gao <xgao01@email.wm.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jessica Frazelle <me@jessfraz.com> Cc: kernel-hardening@lists.openwall.com Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Olof Johansson <olof@lixom.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-10 11:15:08 +01:00
Tejun Heo	8bc4a04455	Merge branch 'for-4.9' into for-4.10	2016-10-19 12:12:40 -04:00
Tejun Heo	2186d9f940	workqueue: move wq_numa_init() to workqueue_init() While splitting up workqueue initialization into two parts, ac8f73400782 ("workqueue: make workqueue available early during boot") put wq_numa_init() into workqueue_init_early(). Unfortunately, on some archs including power and arm64, cpu to node mapping isn't yet established by the time the early init is called leading to incorrect NUMA initialization and subsequently the following oops due to zero cpumask on node-specific unbound pools. Unable to handle kernel paging request for data at address 0x00000038 Faulting instruction address: 0xc0000000000fc0cc Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94 task: c0000007f5400000 task.stack: c000001ffc084000 NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0 REGS: c000001ffc087780 TRAP: 0300 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161005) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 48000424 XER: 00000000 CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0 GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600 GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000 GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001 GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001 GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000 NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0 LR [c0000000000ed928] activate_task+0x78/0xe0 Call Trace: [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable) [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0 [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0 [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0 [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250 [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150 [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360 [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160 [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc Instruction dump: 62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000 60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58 ---[ end trace 0000000000000000 ]--- Fix it by moving wq_numa_init() to workqueue_init(). As this means that the early intialization may not have full NUMA info for per-cpu pools and ignores NUMA affinity for unbound pools, fix them up from workqueue_init() after wq_numa_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot") Signed-off-by: Tejun Heo <tj@kernel.org>	2016-10-19 12:12:26 -04:00
Petr Mladek	e700591ae0	kthread: rename probe_kthread_data() to kthread_probe_data() Patch series "kthread: Kthread worker API improvements" The intention of this patchset is to make it easier to manipulate and maintain kthreads. Especially, I want to replace all the custom main cycles with a generic one. Also I want to make the kthreads sleep in a consistent state in a common place when there is no work. This patch (of 11): A good practice is to prefix the names of functions by the name of the subsystem. This patch fixes the name of probe_kthread_data(). The other wrong functions names are part of the kthread worker API and will be fixed separately. Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:33 -07:00
Tejun Heo	863b710b66	workqueue: remove keventd_up() keventd_up() no longer has in-kernel users. Remove it and make wq_online static. Signed-off-by: Tejun Heo <tj@kernel.org>	2016-09-17 13:18:21 -04:00
Tejun Heo	3347fa0928	workqueue: make workqueue available early during boot Workqueue is currently initialized in an early init call; however, there are cases where early boot code has to be split and reordered to come after workqueue initialization or the same code path which makes use of workqueues is used both before workqueue initailization and after. The latter cases have to gate workqueue usages with keventd_up() tests, which is nasty and easy to get wrong. Workqueue usages have become widespread and it'd be a lot more convenient if it can be used very early from boot. This patch splits workqueue initialization into two steps. workqueue_init_early() which sets up the basic data structures so that workqueues can be created and work items queued, and workqueue_init() which actually brings up workqueues online and starts executing queued work items. The former step can be done very early during boot once memory allocation, cpumasks and idr are initialized. The latter right after kthreads become available. This allows work item queueing and canceling from very early boot which is what most of these use cases want. * As systemd_wq being initialized doesn't indicate that workqueue is fully online anymore, update keventd_up() to test wq_online instead. The follow-up patches will get rid of all its usages and the function itself. * Flushing doesn't make sense before workqueue is fully initialized. The flush functions trigger WARN and return immediately before fully online. * Work items are never in-flight before fully online. Canceling can always succeed by skipping the flush step. * Some code paths can no longer assume to be called with irq enabled as irq is disabled during early boot. Use irqsave/restore operations instead. v2: Watchdog init, which requires timer to be running, moved from workqueue_init_early() to workqueue_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com	2016-09-17 13:18:21 -04:00
Tejun Heo	fa07fb6a4e	workqueue: dump workqueue state on sanity check failures in destroy_workqueue() destroy_workqueue() performs a number of sanity checks to ensure that the workqueue is empty before proceeding with destruction. However, it's not always easy to tell what's going on just from the warning message. Let's dump workqueue state after sanity check failures to help debugging. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com Cc: Dmitry Vyukov <dvyukov@google.com>	2016-09-16 11:08:39 -04:00
Jens Axboe	f72b8792d1	workqueue: add cancel_work() Like cancel_delayed_work(), but for regular work. Signed-off-by: Jens Axboe <axboe@fb.com> Mehed-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org>	2016-08-29 08:13:21 -06:00
Linus Torvalds	a6408f6cb6	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the next part of the hotplug rework. - Convert all notifiers with a priority assigned - Convert all CPU_STARTING/DYING notifiers The final removal of the STARTING/DYING infrastructure will happen when the merge window closes. Another 700 hundred line of unpenetrable maze gone :)" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) timers/core: Correct callback order during CPU hot plug leds/trigger/cpu: Move from CPU_STARTING to ONLINE level powerpc/numa: Convert to hotplug state machine arm/perf: Fix hotplug state machine conversion irqchip/armada: Avoid unused function warnings ARC/time: Convert to hotplug state machine clocksource/atlas7: Convert to hotplug state machine clocksource/armada-370-xp: Convert to hotplug state machine clocksource/exynos_mct: Convert to hotplug state machine clocksource/arm_global_timer: Convert to hotplug state machine rcu: Convert rcutree to hotplug state machine KVM/arm/arm64/vgic-new: Convert to hotplug state machine smp/cfd: Convert core to hotplug state machine x86/x2apic: Convert to CPU hotplug state machine profile: Convert to hotplug state machine timers/core: Convert to hotplug state machine hrtimer: Convert to hotplug state machine x86/tboot: Convert to hotplug state machine arm64/armv8 deprecated: Convert to hotplug state machine hwtracing/coresight-etm4x: Convert to hotplug state machine ...	2016-07-29 13:55:30 -07:00
Rafael J. Wysocki	7f234a4d8a	Merge branches 'pm-sleep' and 'pm-tools' * pm-sleep: PM / hibernate: Introduce test_resume mode for hibernation x86 / hibernate: Use hlt_play_dead() when resuming from hibernation PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() PM / hibernate: Recycle safe pages after image restoration PM / hibernate: Simplify mark_unsafe_pages() PM / hibernate: Do not free preallocated safe pages during image restore PM / suspend: show workqueue state in suspend flow PM / sleep: make PM notifiers called symmetrically PM / sleep: Make pm_prepare_console() return void PM / Hibernate: Don't let kasan instrument snapshot.c * pm-tools: PM / tools: scripts: AnalyzeSuspend v4.2 tools/turbostat: allow user to alter DESTDIR and PREFIX	2016-07-25 13:44:32 +02:00
Thomas Gleixner	7ee681b252	workqueue: Convert to state machine callbacks Get rid of the prio ordering of the separate notifiers and use a proper state callback pair. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-14 09:34:43 +02:00
Roger Lu	7b776af66d	PM / suspend: show workqueue state in suspend flow If freezable workqueue aborts suspend flow, show workqueue state for debug purpose. Signed-off-by: Roger Lu <roger.lu@mediatek.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-02 01:42:48 +02:00
Peter Zijlstra	d945b5e9f0	workqueue: Fix setting affinity of unbound worker threads With commit `e9d867a67f` ("sched: Allow per-cpu kernel threads to run on online && !active"), __set_cpus_allowed_ptr() expects that only strict per-cpu kernel threads can have affinity to an online CPU which is not yet active. This assumption is currently broken in the CPU_ONLINE notification handler for the workqueues where restore_unbound_workers_cpumask() calls set_cpus_allowed_ptr() when the first cpu in the unbound worker's pool->attr->cpumask comes online. Since set_cpus_allowed_ptr() is called with pool->attr->cpumask in which only one CPU is online which is not yet active, we get the following WARN_ON during an CPU online operation. ------------[ cut here ]------------ WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 __set_cpus_allowed_ptr+0x228/0x2e0 Modules linked in: CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4 <..snip..> Call Trace: [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable) [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470 [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100 [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0 [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50 [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250 [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120 [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0 [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0 [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130 [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c ---[ end trace 00f1456578b2a3b2 ]--- This patch fixes this by limiting the mask to the intersection of the pool affinity and online CPUs. Changelog-cribbed-from: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-06-16 15:37:05 -04:00
Du, Changbin	b9fdac7f66	debugobjects: insulate non-fixup logic related to static obj from fixup callbacks When activating a static object we need make sure that the object is tracked in the object tracker. If it is a non-static object then the activation is illegal. In previous implementation, each subsystem need take care of this in their fixup callbacks. Actually we can put it into debugobjects core. Thus we can save duplicated code, and have pure fixup callbacks. To achieve this, a new callback "is_static_object" is introduced to let the type specific code decide whether a object is static or not. If yes, we take it into object tracker, otherwise give warning and invoke fixup callback. This change has paassed debugobjects selftest, and I also do some test with all debugobjects supports enabled. At last, I have a concern about the fixups that can it change the object which is in incorrect state on fixup? Because the 'addr' may not point to any valid object if a non-static object is not tracked. Then Change such object can overwrite someone's memory and cause unexpected behaviour. For example, the timer_fixup_activate bind timer to function stub_timer. Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com [changbin.du@intel.com: improve code comments where invoke the new is_static_object callback] Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-19 19:12:14 -07:00
Du, Changbin	02a982a6ec	workqueue: update debugobjects fixup callbacks return type Update the return type to use bool instead of int, corresponding to change (debugobjects: make fixup functions return bool instead of int) Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-19 19:12:14 -07:00
Linus Torvalds	da92223908	Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding DOWN_PREPARE which can trigger a WARN_ON() in workqueue. The bug has been there for a very long time. It only triggers if CPU down fails at a specific point and I don't think it has adverse effects other than the warning messages. The fix is very low impact" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix rebind bound workers warning	2016-05-13 16:16:51 -07:00
Wanpeng Li	f7c17d26f4	workqueue: fix rebind bound workers warning ------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0 Modules linked in: CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31 Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046 Call Trace: dump_stack+0x89/0xd4 __warn+0xfd/0x120 warn_slowpath_null+0x1d/0x20 rebind_workers+0x1c0/0x1d0 workqueue_cpu_up_callback+0xf5/0x1d0 notifier_call_chain+0x64/0x90 ? trace_hardirqs_on_caller+0xf2/0x220 ? notify_prepare+0x80/0x80 __raw_notifier_call_chain+0xe/0x10 __cpu_notify+0x35/0x50 notify_down_prepare+0x5e/0x80 ? notify_prepare+0x80/0x80 cpuhp_invoke_callback+0x73/0x330 ? __schedule+0x33e/0x8a0 cpuhp_down_callbacks+0x51/0xc0 cpuhp_thread_fun+0xc1/0xf0 smpboot_thread_fn+0x159/0x2a0 ? smpboot_create_threads+0x80/0x80 kthread+0xef/0x110 ? wait_for_completion+0xf0/0x120 ? schedule_tail+0x35/0xf0 ret_from_fork+0x22/0x50 ? __init_kthread_worker+0x70/0x70 ---[ end trace eb12ae47d2382d8f ]--- notify_down_prepare: attempt to take down CPU 0 failed This bug can be reproduced by below config w/ nohz_full= all cpus: CONFIG_BOOTPARAM_HOTPLUG_CPU0=y CONFIG_DEBUG_HOTPLUG_CPU0=y CONFIG_NO_HZ_FULL=y As Thomas pointed out: \| If a down prepare callback fails, then DOWN_FAILED is invoked for all \| callbacks which have successfully executed DOWN_PREPARE. \| \| But, workqueue has actually two notifiers. One which handles \| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE. \| \| Now look at the priorities of those callbacks: \| \| CPU_PRI_WORKQUEUE_UP = 5 \| CPU_PRI_WORKQUEUE_DOWN = -5 \| \| So the call order on DOWN_PREPARE is: \| \| CB 1 \| CB ... \| CB workqueue_up() -> Ignores DOWN_PREPARE \| CB ... \| CB X ---> Fails \| \| So we call up to CB X with DOWN_FAILED \| \| CB 1 \| CB ... \| CB workqueue_up() -> Handles DOWN_FAILED \| CB ... \| CB X-1 \| \| So the problem is that the workqueue stuff handles DOWN_FAILED in the up \| callback, while it should do it in the down callback. Which is not a good idea \| either because it wants to be called early on rollback... \| \| Brilliant stuff, isn't it? The hotplug rework will solve this problem because \| the callbacks become symetric, but for the existing mess, we need some \| workaround in the workqueue code. The boot CPU handles housekeeping duty(unbound timers, workqueues, timekeeping, ...) on behalf of full dynticks CPUs. It must remain online when nohz full is enabled. There is a priority set to every notifier_blocks: workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down So tick_nohz_cpu_down callback failed when down prepare cpu 0, and notifier_blocks behind tick_nohz_cpu_down will not be called any more, which leads to workers are actually not unbound. Then hotplug state machine will fallback to undo and online cpu 0 again. Workers will be rebound unconditionally even if they are not unbound and trigger the warning in this progress. This patch fix it by catching !DISASSOCIATED to avoid rebind bound workers. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>	2016-05-12 12:00:23 -04:00
Linus Torvalds	b75a2bf899	Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "So, it turns out we had a silly bug in the most fundamental part of workqueue for a very long time. AFAICS, this dates back to pre-git era and has quite likely been there from the time workqueue was first introduced. A work item uses its PENDING bit to synchronize multiple queuers. Anyone who wins the PENDING bit owns the pending state of the work item. Whether a queuer wins or loses the race, one thing should be guaranteed - there will soon be at least one execution of the work item - where "after" means that the execution instance would be able to see all the changes that the queuer has made prior to the queueing attempt. Unfortunately, we were missing a smp_mb() after clearing PENDING for execution, so nothing guaranteed visibility of the changes that a queueing loser has made, which manifested as a reproducible blk-mq stall. Lots of kudos to Roman for debugging the problem. The patch for -stable is the minimal one. For v3.7, Peter is working on a patch to make the code path slightly more efficient and less fragile" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix ghost PENDING flag while doing MQ IO	2016-04-27 12:03:59 -07:00
Roman Pen	346c09f804	workqueue: fix ghost PENDING flag while doing MQ IO The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list with the following backtrace: [ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds. [ 601.347574] Tainted: G O 4.4.5-1-storage+ #6 [ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000 [ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server] [ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000 [ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0 [ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38 [ 601.350965] Call Trace: [ 601.351203] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.351444] [<ffffffff815b01d5>] schedule+0x35/0x80 [ 601.351709] [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230 [ 601.351958] [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220 [ 601.352208] [<ffffffff810bd737>] ? ktime_get+0x37/0xa0 [ 601.352446] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.352688] [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110 [ 601.352951] [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [ 601.353196] [<ffffffff815b093b>] bit_wait_io+0x1b/0x70 [ 601.353440] [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90 [ 601.353689] [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0 [ 601.353958] [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40 [ 601.354200] [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140 [ 601.354441] [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30 [ 601.354688] [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70 [ 601.354932] [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50 [ 601.355193] [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0 [ 601.355432] [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100 [ 601.355679] [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0 [ 601.355925] [<ffffffff81198379>] vfs_write+0xa9/0x1a0 [ 601.356164] [<ffffffff811c59d8>] kernel_write+0x38/0x50 The underlying device is a null_blk, with default parameters: queue_mode = MQ submit_queues = 1 Verification that nullb0 has something inflight: root@pserver8:~# cat /sys/block/nullb0/inflight 0 1 root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \; ... /sys/block/nullb0/mq/0/cpu2/rq_list CTX pending: ffff8838038e2400 ... During debug it became clear that stalled request is always inserted in the rq_list from the following path: save_stack_trace_tsk + 34 blk_mq_insert_requests + 231 blk_mq_flush_plug_list + 281 blk_flush_plug_list + 199 wait_on_page_bit + 192 __filemap_fdatawait_range + 228 filemap_fdatawait_range + 20 filemap_write_and_wait_range + 63 blkdev_fsync + 27 vfs_fsync_range + 73 blkdev_write_iter + 202 __vfs_write + 170 vfs_write + 169 kernel_write + 56 So blk_flush_plug_list() was called with from_schedule == true. If from_schedule is true, that means that finally blk_mq_insert_requests() offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue, i.e. it calls kblockd_schedule_delayed_work_on(). That means, that we race with another CPU, which is about to execute __blk_mq_run_hw_queue() work. Further debugging shows the following traces from different CPUs: CPU#0 CPU#1 ---------------------------------- ------------------------------- reqeust A inserted STORE hctx->ctx_map[0] bit marked kblockd_schedule...() returns 1 <schedule to kblockd workqueue> request B inserted STORE hctx->ctx_map[1] bit marked kblockd_schedule...() returns 0 * WORK PENDING bit is cleared * flush_busy_ctxs() is executed, but bit 1, set by CPU#1, is not observed As a result request B pended forever. This behaviour can be explained by speculative LOAD of hctx->ctx_map on CPU#0, which is reordered with clear of PENDING bit and executed _before_ actual STORE of bit 1 on CPU#1. The proper fix is an explicit full barrier <mfence>, which guarantees that clear of PENDING bit is to be executed before all possible speculative LOADS or STORES inside actual work function. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2016-04-26 11:23:22 -04:00
Linus Torvalds	ef504fa591	Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Three trivial workqueue changes" * 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix comment for work_on_cpu() sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() workqueue: Replace usage of init_name with dev_set_name()	2016-03-18 20:05:39 -07:00
Peter Zijlstra	25528213fe	tags: Fix DEFINE_PER_CPU expansions $ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,], $[[:alnum:]_]$/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,], $[[:alnum:]_]$/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Anna-Maria Gleixner	22aceb3176	workqueue: Fix comment for work_on_cpu() Function is processed in thread context, not in user context. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-03-11 12:39:01 -05:00
Alexander Gordeev	9b7f6597f0	sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() Given that wq_worker_sleeping() could only be called for a CPU it is running on, we do not need passing a CPU ID as an argument. Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-03-02 10:28:47 -05:00
Lars-Peter Clausen	23217b443b	workqueue: Replace usage of init_name with dev_set_name() The init_name property of the device struct is sort of a hack and should only be used for statically allocated devices. Since the device is dynamically allocated here it is safe to use the proper way to set a devices name by calling dev_set_name(). Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-02-17 16:14:18 -05:00
Tejun Heo	d6e022f1d2	workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup When looking up the pool_workqueue to use for an unbound workqueue, workqueue assumes that the target CPU is always bound to a valid NUMA node. However, currently, when a CPU goes offline, the mapping is destroyed and cpu_to_node() returns NUMA_NO_NODE. This has always been broken but hasn't triggered often enough before `874bbfe600` ("workqueue: make sure delayed work run in local cpu"). After the commit, workqueue forcifully assigns the local CPU for delayed work items without explicit target CPU to fix a different issue. This widens the window where CPU can go offline while a delayed work item is pending causing delayed work items dispatched with target CPU set to an already offlined CPU. The resulting NUMA_NO_NODE mapping makes workqueue try to queue the work item on a NULL pool_workqueue and thus crash. While `874bbfe600` has been reverted for a different reason making the bug less visible again, it can still happen. Fix it by mapping NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node(). This is a temporary workaround. The long term solution is keeping CPU -> NODE mapping stable across CPU off/online cycles which is being worked on. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com	2016-02-10 12:13:05 -05:00
Tejun Heo	f303fccb82	workqueue: implement "workqueue.debug_force_rr_cpu" debug feature Workqueue used to guarantee local execution for work items queued without explicit target CPU. The guarantee is gone now which can break some usages in subtle ways. To flush out those cases, this patch implements a debug feature which forces round-robin CPU selection for all such work items. The debug feature defaults to off and can be enabled with a kernel parameter. The default can be flipped with a debug config option. If you hit this commit during bisection, please refer to `041bd12e27` ("Revert "workqueue: make sure delayed work run in local cpu"") for more information and ping me. Signed-off-by: Tejun Heo <tj@kernel.org>	2016-02-09 17:59:38 -05:00
Mike Galbraith	ef55718044	workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs WORK_CPU_UNBOUND work items queued to a bound workqueue always run locally. This is a good thing normally, but not when the user has asked us to keep unbound work away from certain CPUs. Round robin these to wq_unbound_cpumask CPUs instead, as perturbation avoidance trumps performance. tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty (wq_unbound_cpumask AND cpu_online_mask). If we want that, it should be done when config changes. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-02-09 17:59:38 -05:00
Tejun Heo	041bd12e27	Revert "workqueue: make sure delayed work run in local cpu" This reverts commit `874bbfe600`. Workqueue used to implicity guarantee that work items queued without explicit CPU specified are put on the local CPU. Recent changes in timer broke the guarantee and led to vmstat breakage which was fixed by `176bed1de5` ("vmstat: explicitly schedule per-cpu work on the CPU we need it to run on"). vmstat is the most likely to expose the issue and it's quite possible that there are other similar problems which are a lot more difficult to trigger. As a preventive measure, `874bbfe600` ("workqueue: make sure delayed work run in local cpu") was applied to restore the local CPU guarnatee. Unfortunately, the change exposed a bug in timer code which got fixed by `22b886dd10` ("timers: Use proper base migration in add_timer_on()"). Due to code restructuring, the commit couldn't be backported beyond certain point and stable kernels which only had `874bbfe600` started crashing. The local CPU guarantee was accidental more than anything else and we want to get rid of it anyway. As, with the vmstat case fixed, `874bbfe600` is causing more problems than it's fixing, it has been decided to take the chance and officially break the guarantee by reverting the commit. A debug feature will be added to force foreign CPU assignment to expose cases relying on the guarantee and fixes for the individual cases will be backported to stable as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `874bbfe600` ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: stable@vger.kernel.org Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shli@fb.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Michal Hocko <mhocko@kernel.org>	2016-02-09 16:11:26 -05:00
Tejun Heo	23d11a58a9	workqueue: skip flush dependency checks for legacy workqueues `fca839c00a` ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to flush a !WQ_MEM_RECLAIM workquee. This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory reclaim path and making it depend on something which may need more memory to make forward progress can lead to deadlocks. Unfortunately, workqueues created with the legacy create_workqueue() interface always have WQ_MEM_RECLAIM regardless of whether they are depended upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings cause spurious triggering of the flush dependency checks. WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144() workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu ... Workqueue: deferwq deferred_probe_work_func [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14) [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4) [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0) [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40) [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144) [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c) [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180) [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10) [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338) [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac) [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8) [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278) [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c) [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec) [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc) [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8) [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148) [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114) [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8) [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8) [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac) [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0) [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94) [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114) [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c) [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98) [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8) [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c) [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4) [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c) Fix it by marking workqueues created via create_workqueue() with __WQ_LEGACY and disabling flush dependency checks on them. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Thierry Reding <thierry.reding@gmail.com> Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com Fixes: `fca839c00a` ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")	2016-01-29 13:31:10 -05:00
wanghaibin	6201171e3b	workqueue: simplify the apply_workqueue_attrs_locked() If the apply_wqattrs_prepare() returns NULL, it has already cleaned up the related resources, so it can return directly and avoid calling the clean up function again. This doesn't introduce any functional changes. Signed-off-by: wanghaibin <wanghaibin.wang@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-01-07 11:04:34 -05:00
Tejun Heo	82607adcf9	workqueue: implement lockup detector Workqueue stalls can happen from a variety of usage bugs such as missing WQ_MEM_RECLAIM flag or concurrency managed work item indefinitely staying RUNNING. These stalls can be extremely difficult to hunt down because the usual warning mechanisms can't detect workqueue stalls and the internal state is pretty opaque. To alleviate the situation, this patch implements workqueue lockup detector. It periodically monitors all worker_pools periodically and, if any pool failed to make forward progress longer than the threshold duration, triggers warning and dumps workqueue state as follows. BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s! Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256 pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent workqueue events_power_efficient: flags=0x80 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 pending: check_lifetime, neigh_periodic_work workqueue cgroup_pidlist_destroy: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 pending: cgroup_pidlist_destroy_work_fn ... The detection mechanism is controller through kernel parameter workqueue.watchdog_thresh and can be updated at runtime through the sysfs module parameter file. v2: Decoupled from softlockup control knobs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2015-12-08 11:29:47 -05:00
Tejun Heo	fca839c00a	workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue Task or work item involved in memory reclaim trying to flush a non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to deadlock. Trigger WARN_ONCE() if such conditions are detected. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>	2015-12-08 11:29:36 -05:00
Linus Torvalds	e25ac7ddaa	Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue update from Tejun Heo: "This pull request contains one patch to make an unbound worker pool allocated from the NUMA node containing it if such node exists. As unbound worker pools are node-affine by default, this makes most pools allocated on the right node" * 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Allocate the unbound pool using local node memory	2015-11-05 14:16:27 -08:00
Xunlei Pang	e2273584d3	workqueue: Allocate the unbound pool using local node memory Currently, get_unbound_pool() uses kzalloc() to allocate the worker pool. Actually, we can use the right node to do the allocation, achieving local memory access. This patch selects target node first, and uses kzalloc_node() instead. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-10-12 12:17:31 -04:00
Shaohua Li	874bbfe600	workqueue: make sure delayed work run in local cpu My system keeps crashing with below message. vmstat_update() schedules a delayed work in current cpu and expects the work runs in the cpu. schedule_delayed_work() is expected to make delayed work run in local cpu. The problem is timer can be migrated with NO_HZ. __queue_work() queues work in timer handler, which could run in a different cpu other than where the delayed work is scheduled. The end result is the delayed work runs in different cpu. The patch makes __queue_delayed_work records local cpu earlier. Where the timer runs doesn't change where the work runs with the change. [ 28.010131] ------------[ cut here ]------------ [ 28.010609] kernel BUG at ../mm/vmstat.c:1392! [ 28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 28.011860] Modules linked in: [ 28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G W4.3.0-rc3+ #634 [ 28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014 [ 28.014160] Workqueue: events vmstat_update [ 28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000 [ 28.015445] RIP: 0010:[<ffffffff8115f921>] [<ffffffff8115f921>]vmstat_update+0x31/0x80 [ 28.016282] RSP: 0018:ffff8800ba42fd80 EFLAGS: 00010297 [ 28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000 [ 28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d [ 28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000 [ 28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640 [ 28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000 [ 28.020071] FS: 0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000 [ 28.020071] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0 [ 28.020071] Stack: [ 28.020071] ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88 [ 28.020071] ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8 [ 28.020071] ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340 [ 28.020071] Call Trace: [ 28.020071] [<ffffffff8106bd88>] process_one_work+0x1c8/0x540 [ 28.020071] [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540 [ 28.020071] [<ffffffff8106c214>] worker_thread+0x114/0x460 [ 28.020071] [<ffffffff8106c100>] ? process_one_work+0x540/0x540 [ 28.020071] [<ffffffff81071bf8>] kthread+0xf8/0x110 [ 28.020071] [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200 [ 28.020071] [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70 [ 28.020071] [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200 Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v2.6.31+	2015-09-30 13:06:46 -04:00
Linus Torvalds	7d3e2eb178	Merge branch 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Only three trivial changes for workqueue this time - doc, MAINTAINERS and EXPORT_SYMBOL updates" * 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix some docbook warnings workqueue: Make flush_workqueue() available again to non GPL modules workqueue: add myself as a dedicated reviwer	2015-09-02 08:02:20 -07:00
Linus Torvalds	a1d8561172	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: `9d89c257df` ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...	2015-08-31 20:26:22 -07:00
Peter Zijlstra	25834c73f9	sched: Fix a race between __kthread_bind() and sched_setaffinity() Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY without locks, a caller might observe an old value and race with the set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo it: __kthread_bind() do_set_cpus_allowed() <SYSCALL> sched_setaffinity() if (p->flags & PF_NO_SETAFFINITIY) set_cpus_allowed_ptr() p->flags \|= PF_NO_SETAFFINITY Fix the bug by putting everything under the regular scheduler locks. This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 12:06:09 +02:00
Tim Gardner	1dadafa86a	workqueue: Make flush_workqueue() available again to non GPL modules Commit `37b1ef31a5` ("workqueue: move flush_scheduled_work() to workqueue.h") moved the exported non GPL flush_scheduled_work() from a function to an inline wrapper. Unfortunately, it directly calls flush_workqueue() which is a GPL function. This has the effect of changing the licensing requirement for this function and makes it unavailable to non GPL modules. See commit `ad7b1f841f` ("workqueue: Make schedule_work() available again to non GPL modules") for precedent. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-08-04 14:04:54 -04:00
Paul E. McKenney	f78f5b90c4	rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for consistency with the WARN() series of macros. This also requires inverting the sense of the conditional, which this commit also does. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2015-07-22 15:27:32 -07:00
Linus Torvalds	02201e3f1b	Minor merge needed, due to function move. Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek tLdh/a9GiCitqS0bT7GE =tWPQ -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits) modules: only use mod->param_lock if CONFIG_MODULES param: fix module param locks when !CONFIG_SYSFS. rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() module: add per-module param_lock module: make perm const params: suppress unused variable error, warn once just in case code changes. modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'. kernel/module.c: avoid ifdefs for sig_enforce declaration kernel/workqueue.c: remove ifdefs over wq_power_efficient kernel/params.c: export param_ops_bool_enable_only kernel/params.c: generalize bool_enable_only kernel/module.c: use generic module param operaters for sig_enforce kernel/params: constify struct kernel_param_ops uses sysfs: tightened sysfs permission checks module: Rework module_addr_{min,max} module: Use __module_address() for module_address_lookup() module: Make the mod_tree stuff conditional on PERF_EVENTS \|\| TRACING module: Optimize __module_address() using a latched RB-tree rbtree: Implement generic latch_tree seqlock: Introduce raw_read_seqcount_latch() ...	2015-07-01 10:49:25 -07:00
Shailendra Verma	402dd89d6c	workqueue: fix typos in comments tj: dropped iff -> if, iff is if and only if not a typo. Spotted by Randy Dunlap. Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org>	2015-05-29 09:20:01 -04:00
Luis R. Rodriguez	552f530cbc	kernel/workqueue.c: remove ifdefs over wq_power_efficient We can avoid an ifdef over wq_power_efficient's declaration by just using IS_ENABLED(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-05-28 11:32:12 +09:30
Lai Jiangshan	37b1ef31a5	workqueue: move flush_scheduled_work() to workqueue.h flush_scheduled_work() is just a simple call to flush_work(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-21 17:26:22 -04:00
Lai Jiangshan	899a94fe15	workqueue: remove the lock from wq_sysfs_prep_attrs() Reading to wq->unbound_attrs requires protection of either wq_pool_mutex or wq->mutex, and wq_sysfs_prep_attrs() is called with wq_pool_mutex held, so we don't need to grab wq->mutex here. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-21 17:26:22 -04:00
Lai Jiangshan	da7f91b2e2	workqueue: remove the declaration of copy_workqueue_attrs() This pre-declaration was unneeded since a previous refactor patch `6ba94429c8` ("workqueue: Reorder sysfs code"). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-21 17:26:22 -04:00
Lai Jiangshan	d4d3e25797	workqueue: ensure attrs changes are properly synchronized Current modification to attrs via sysfs is not fully synchronized. Process A (change cpumask) \| Process B (change numa affinity) wq_cpumask_store() \| wq_sysfs_prep_attrs() \| \| apply_workqueue_attrs() apply_workqueue_attrs() \| It results that the Process B's operation is totally reverted without any notification, it is a buggy behavior. So this patch moves wq_sysfs_prep_attrs() into the protection under wq_pool_mutex to ensure attrs changes are properly synchronized. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-19 17:37:00 -04:00
Lai Jiangshan	a0111cf671	workqueue: separate out and refactor the locking of applying attrs Applying attrs requires two locks: get_online_cpus() and wq_pool_mutex, and this code is duplicated at two places (apply_workqueue_attrs() and workqueue_set_unbound_cpumask()). So we separate out this locking code into apply_wqattrs_[un]lock() and do a minor refactor on apply_workqueue_attrs(). The apply_wqattrs_[un]lock() will be also used on later patch for ensuring attrs changes are properly synchronized. tj: minor updates to comments Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-19 17:37:00 -04:00
Lai Jiangshan	f7142ed483	workqueue: simplify wq_update_unbound_numa() wq_update_unbound_numa() is known be called with wq_pool_mutex held. But wq_update_unbound_numa() requests wq->mutex before reading wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq. But these fields were changed to be allowed being read with wq_pool_mutex held. So we simply remove the mutex_lock(&wq->mutex). Without the dependence on the the mutex_lock(&wq->mutex), the test of wq->unbound_attrs->no_numa can also be moved upward. The old code need a long comment to describe the stableness of @wq->unbound_attrs which is also guaranteed by wq_pool_mutex now, so we don't need this such comment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-18 16:22:57 -04:00
Lai Jiangshan	5b95e1af8d	workqueue: wq_pool_mutex protects the attrs-installation Current wq_pool_mutex doesn't proctect the attrs-installation, it results that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa() has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa and the old_pwq. attrs-installation is a short operation, so this change will no cause any latency for other operations which also acquire the wq_pool_mutex. The only unprotected attrs-installation code is in apply_workqueue_attrs(), so this patch touches code less than comments. It is also a preparation patch for next several patches which read wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with only wq_pool_mutex held. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-18 16:22:56 -04:00
Chen Hanxiao	b749b1b673	workqueue: fix a typo s/detemined/determined Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-13 10:23:56 -04:00
Gong Zhaogang	30186c6fdc	workqueue: function name in the comment differs from the real function name modify wq_calc_node_mask to wq_calc_node_cpumask Signed-off-by: Gong Zhaogang <gongzhaogang@inspur.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-11 11:03:34 -04:00
Lai Jiangshan	042f7df15a	workqueue: Allow modifying low level unbound workqueue cpumask Allow to modify the low-level unbound workqueues cpumask through sysfs. This is performed by traversing the entire workqueue list and calling apply_wqattrs_prepare() on the unbound workqueues with the new low level mask. Only after all the preparation are done, we commit them all together. Ordered workqueues are ignored from the low level unbound workqueue cpumask, it will be handled in near future. All the (default & per-node) pwqs are mandatorily controlled by the low level cpumask. If the user configured cpumask doesn't overlap with the low level cpumask, the low level cpumask will be used for the wq instead. The comment of wq_calc_node_cpumask() is updated and explicitly requires that its first argument should be the attrs of the default pwq. The default wq_unbound_cpumask is cpu_possible_mask. The workqueue subsystem doesn't know its best default value, let the system manager or the other subsystem set it when needed. Changed from V8: merge the calculating code for the attrs of the default pwq together. minor change the code&comments for saving the user configured attrs. remove unnecessary list_del(). minor update the comment of wq_calc_node_cpumask(). update the comment of workqueue_set_unbound_cpumask(); Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-04-30 09:24:29 -04:00
Frederic Weisbecker	b05a79280b	workqueue: Create low-level unbound workqueues cpumask Create a cpumask that limits the affinity of all unbound workqueues. This cpumask is controlled through a file at the root of the workqueue sysfs directory. It works on a lower-level than the per WQ_SYSFS workqueues cpumask files such that the effective cpumask applied for a given unbound workqueue is the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and the new /sys/devices/virtual/workqueue/cpumask file. This patch implements the basic infrastructure and the read interface. wq_unbound_cpumask is initially set to cpu_possible_mask. Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-04-27 11:13:40 -04:00
Lai Jiangshan	2d5f0764b5	workqueue: split apply_workqueue_attrs() into 3 stages Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation, so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't ensure the transaction must succeed or fail as a complete unit. To solve this, we split apply_workqueue_attrs() into three stages. The first stage does the preparation: allocation memory, pwqs. The second stage does the attrs-installaion and pwqs-installation. The third stage frees the allocated memory and (old or unused) pwqs. As the result, batching multiple apply_workqueue_attrs()s can succeed or fail as a complete unit: 1) batch do all the first stage for all the workqueues 2) only commit all when all the above succeed. This patch is a preparation for the next patch ("Allow modifying low level unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs(). The patch doesn't have functionality changed except two minor adjustment: 1) free_unbound_pwq() for the error path is removed, we use the heavier version put_pwq_unlocked() instead since the error path is rare. this adjustment simplifies the code. 2) the memory-allocation is also moved into wq_pool_mutex. this is needed to avoid to do the further splitting. tj: minor updates to comments. Suggested-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-04-27 11:13:40 -04:00
Frederic Weisbecker	6ba94429c8	workqueue: Reorder sysfs code The sysfs code usually belongs to the botom of the file since it deals with high level objects. In the workqueue code it's misplaced and such that we'll need to work around functions references to allow the sysfs code to call APIs like apply_workqueue_attrs(). Lets move that block further in the file, almost the botom. And declare workqueue_sysfs_unregister() just before destroy_workqueue() which reference it. tj: Moved workqueue_sysfs_unregister() forward declaration where other forward declarations are. Suggested-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-04-06 11:16:04 -04:00
Tejun Heo	3494fc3084	workqueue: dump workqueues on sysrq-t Workqueues are used extensively throughout the kernel but sometimes it's difficult to debug stalls involving work items because visibility into its inner workings is fairly limited. Although sysrq-t task dump annotates each active worker task with the information on the work item being executed, it is challenging to find out which work items are pending or delayed on which queues and how pools are being managed. This patch implements show_workqueue_state() which dumps all busy workqueues and pools and is called from the sysrq-t handler. At the end of sysrq-t dump, something like the following is printed. Showing busy workqueues and worker pools: ... workqueue filler_wq: flags=0x0 pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 in-flight: 491:filler_workfn, 507:filler_workfn pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 in-flight: 501:filler_workfn pending: filler_workfn ... workqueue test_wq: flags=0x8 pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1 in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500) delayed: test_workfn1 BAR(492), test_workfn2 ... pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137 pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469 pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16 pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62 The above shows that test_wq is executing test_workfn() on pid 510 which is the rescuer and also that there are two tasks 69 and 500 waiting for the work item to finish in flush_work(). As test_wq has max_active of 1, there are two work items for test_workfn1() and test_workfn2() which are delayed till the current work item is finished. In addition, pid 492 is flushing test_workfn1(). The work item for test_workfn() is being executed on pwq of pool 2 which is the normal priority per-cpu pool for CPU 1. The pool has three workers, two of which are executing filler_workfn() for filler_wq and the last one is assuming the manager role trying to create more workers. This extra workqueue state dump will hopefully help chasing down hangs involving workqueues. v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting. v2: As suggested by Andrew, minor formatting change in pr_cont_work(), printk()'s replaced with pr_info()'s, and cpumask printing now uses cpulist_pr_cont(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@redhat.com>	2015-03-09 09:22:28 -04:00
Tejun Heo	2607d7a6db	workqueue: keep track of the flushing task and pool manager Add wq_barrier->task and worker_pool->manager to keep track of the flushing task and pool manager respectively. These are purely informational and will be used to implement sysrq dump of workqueues. Signed-off-by: Tejun Heo <tj@kernel.org>	2015-03-09 09:22:28 -04:00
Tejun Heo	e2dca7adff	workqueue: make the workqueues list RCU walkable The workqueues list is protected by wq_pool_mutex and a workqueue and its subordinate data structures are freed directly on destruction. We want to add the ability dump workqueues from a sysrq callback which requires walking all workqueues without grabbing wq_pool_mutex. This patch makes freeing of workqueues RCU protected and makes the workqueues list walkable while holding RCU read lock. Note that pool_workqueues and pools are already sched-RCU protected. For consistency, workqueues are also protected with sched-RCU. While at it, reverse the workqueues list so that a workqueue which is created earlier comes before. The order of the list isn't significant functionally but this makes the planned sysrq dump list system workqueues first. Signed-off-by: Tejun Heo <tj@kernel.org>	2015-03-09 09:22:28 -04:00
Tejun Heo	8603e1b300	workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Cc: stable@vger.kernel.org Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com>	2015-03-05 08:04:13 -05:00
Tejun Heo	dfbcbf42dd	workqueue: use %pb[l] to format bitmaps including cpumasks and nodemasks printk and friends can now format bitmaps using '%pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:37 -08:00
Tejun Heo	29187a9eea	workqueue: fix subtle pool management issue which can stall whole worker_pool A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: stable@vger.kernel.org	2015-01-16 14:21:16 -05:00
NeilBrown	008847f66c	workqueue: allow rescuer thread to do more work. When there is serious memory pressure, all workers in a pool could be blocked, and a new thread cannot be created because it requires memory allocation. In this situation a WQ_MEM_RECLAIM workqueue will wake up the rescuer thread to do some work. The rescuer will only handle requests that are already on ->worklist. If max_requests is 1, that means it will handle a single request. The rescuer will be woken again in 100ms to handle another max_requests requests. I've seen a machine (running a 3.0 based "enterprise" kernel) with thousands of requests queued for xfslogd, which has a max_requests of 1, and is needed for retiring all 'xfs' write requests. When one of the worker pools gets into this state, it progresses extremely slowly and possibly never recovers (only waited an hour or two). With this patch we leave a pool_workqueue on mayday list until it is clearly no longer in need of assistance. This allows all requests to be handled in a timely fashion. We keep each pool_workqueue on the mayday list until need_to_create_worker() is false, and no work for this workqueue is found in the pool. I have tested this in combination with a (hackish) patch which forces all work items to be handled by the rescuer thread. In that context it significantly improves performance. A similar patch for a 3.0 kernel significantly improved performance on a heavy work load. Thanks to Jan Kara for some design ideas, and to Dongsu Park for some comments and testing. tj: Inverted the lock order between wq_mayday_lock and pool->lock with a preceding patch and simplified this patch. Added comment and updated changelog accordingly. Dongsu spotted missing get_pwq() in the simplified code. Cc: Dongsu Park <dongsu.park@profitbricks.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-12-08 12:39:16 -05:00
Tejun Heo	b2d829096b	workqueue: invert the order between pool->lock and wq_mayday_lock Currently, pool->lock nests inside pool->lock. There's no inherent reason for this order. The only place where the two locks are held together is pool_mayday_timeout() and it just got decided that way. This nesting order turns out to complicate things with the planned rescuer_thread() update. Let's invert them. This doesn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: NeilBrown <neilb@suse.de> Cc: Dongsu Park <dongsu.park@profitbricks.com>	2014-12-08 12:39:16 -05:00
Tejun Heo	0479c8c549	workqueue: cosmetic update in rescuer_thread() rescuer_thread() caches &rescuer->scheduled in a local variable scheduled for convenience. There's one WARN_ON_ONCE() which was using &rescuer->scheduled directly. Replace it with the local variable. This patch causes no functional difference. Signed-off-by: Tejun Heo <tj@kernel.org>	2014-12-04 10:14:54 -05:00
Joe Lawrence	3e28e37720	workqueue: Use cond_resched_rcu_qs macro Tidy up and use cond_resched_rcu_qs when calling cond_resched and reporting potential quiescent state to RCU. Splitting this change in this way allows easy backporting to -stable for kernel versions not having cond_resched_rcu_qs(). Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-10-06 05:58:26 -07:00
Joe Lawrence	789cbbeca4	workqueue: Add quiescent state between work items Similar to the stop_machine deadlock scenario on !PREEMPT kernels addressed in `b22ce2785d` "workqueue: cond_resched() after processing each work item", kworker threads requeueing back-to-back with zero jiffy delay can stall RCU. The cond_resched call introduced in that fix will yield only iff there are other higher priority tasks to run, so force a quiescent RCU state between work items. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Link: https://lkml.kernel.org/r/20140926105227.01325697@jlaw-desktop.mno.stratus.com Link: https://lkml.kernel.org/r/20140929115445.40221d8e@jlaw-desktop.mno.stratus.com Fixes: `b22ce2785d` ("workqueue: cond_resched() after processing each work item") Cc: <stable@vger.kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-10-06 05:57:43 -07:00
Linus Torvalds	f2a84170ed	Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_() to define __this_cpu_() percpu: reorder macros in percpu header files percpu: move {raw\|this}_cpu_() definitions to include/linux/percpu-defs.h percpu: move generic {raw\|this}_cpu__N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw\|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...	2014-08-04 10:09:27 -07:00
Linus Torvalds	c4c3f5fba0	Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Lai has been doing a lot of cleanups of workqueue and kthread_work. No significant behavior change. Just a lot of cleanups all over the place. Some are a bit invasive but overall nothing too dangerous" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: kthread_work: remove the unused wait_queue_head kthread_work: wake up worker only when the worker is idle workqueue: use nr_node_ids instead of wq_numa_tbl_len workqueue: remove the misnamed out_unlock label in get_unbound_pool() workqueue: remove the stale comment in pwq_unbound_release_workfn() workqueue: move rescuer pool detachment to the end workqueue: unfold start_worker() into create_worker() workqueue: remove @wakeup from worker_set_flags() workqueue: remove an unneeded UNBOUND test before waking up the next worker workqueue: wake regular worker if need_more_worker() when rescuer leave the pool workqueue: alloc struct worker on its local node workqueue: reuse the already calculated pwq in try_to_grab_pending() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() workqueue: sanity check pool->cpu in wq_worker_sleeping() workqueue: clear leftover flags when detached workqueue: remove useless WARN_ON_ONCE() workqueue: use schedule_timeout_interruptible() instead of open code workqueue: remove the empty check in too_many_workers() workqueue: use "pool->cpu < 0" to stand for an unbound pool	2014-08-04 10:04:44 -07:00
Lai Jiangshan	ddcb57e2ed	workqueue: use nr_node_ids instead of wq_numa_tbl_len They are the same and nr_node_ids is provided by the memory subsystem. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:10:39 -04:00
Lai Jiangshan	3fb1823c09	workqueue: remove the misnamed out_unlock label in get_unbound_pool() After the locking was moved up to the caller of the get_unbound_pool(), out_unlock label doesn't need to do any unlock operation and the name became bad, so we just remove this label, and the only usage-site "goto out_unlock" is subsituted to "return pool". Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:10:39 -04:00
Lai Jiangshan	29b1cb416a	workqueue: remove the stale comment in pwq_unbound_release_workfn() In `75ccf5950f` ("workqueue: prepare flush_workqueue() for dynamic creation and destrucion of unbound pool_workqueues"), a comment about the synchronization for the pwq in pwq_unbound_release_workfn() was added. The comment claimed the flush_mutex wasn't strictly necessary, it was correct in that time, due to the pwq was protected by workqueue_lock. But it is incorrect now since the wq->flush_mutex was renamed to wq->mutex and workqueue_lock was removed, the wq->mutex is strictly needed. But the comment was miss-updated when the synchronization was changed. This patch removes the incorrect comments and doesn't add any new comment to explain why wq->mutex is needed here, which is definitely obvious and wq->pwqs_node has "WQ" notation in its definition which is better comment. The old commit mentioned above also introduced a comment in link_pwq() about the synchronization. This comment is also removed in this patch since the whole link_pwq() is proteced by wq->mutex. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:10:39 -04:00
Lai Jiangshan	13b1d625ef	workqueue: move rescuer pool detachment to the end In `51697d3939` ("workqueue: use generic attach/detach routine for rescuers"), The rescuer detaches itself from the pool before put_pwq() so that the put_unbound_pool() will not destroy the rescuer-attached pool. It is unnecessary. worker_detach_from_pool() can be used as the last statement to access to the pool just like the regular workers, put_unbound_pool() will wait for it to detach and then free the pool. So we move the worker_detach_from_pool() down, make it coincide with the regular workers. tj: Minor description update. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:10:39 -04:00
Lai Jiangshan	051e185010	workqueue: unfold start_worker() into create_worker() Simply unfold the code of start_worker() into create_worker() and remove the original start_worker() and create_and_start_worker(). The only trade-off is the introduced overhead that the pool->lock is released and regrabbed after the newly worker is started. The overhead is acceptible since the manager is slow path. And because this new locking behavior, the newly created worker may grab the lock earlier than the manager and go to process work items. In this case, the recheck need_to_create_worker() may be true as expected and the manager goes to restart which is the correct behavior. tj: Minor updates to description and comments. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:10:39 -04:00
Lai Jiangshan	228f1d0018	workqueue: remove @wakeup from worker_set_flags() worker_set_flags() has only two callers, each specifying %true and %false for @wakeup. Let's push the wake up to the caller and remove @wakeup from worker_set_flags(). The caller can use the following instead if wakeup is necessary: worker_set_flags(); if (need_more_worker(pool)) wake_up_worker(pool); This makes the code simpler. This patch doesn't introduce behavior changes. tj: Updated description and comments. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 12:08:36 -04:00
Lai Jiangshan	a489a03eca	workqueue: remove an unneeded UNBOUND test before waking up the next worker In process_one_work(): if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool)) wake_up_worker(pool); the first test is unneeded. Even if the first test is removed, it doesn't affect the wake-up logic for WORKER_UNBOUND, and it will not introduce any useless wake-ups for normal per-cpu workers since nr_running is always >= 1. It will introduce useless/redundant wake-ups for CPU_INTENSIVE, but this case is rare and the next patch will also remove this redundant wake-up. tj: Minor updates to the description and comment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-22 10:37:52 -04:00
Lai Jiangshan	d8ca83e68c	workqueue: wake regular worker if need_more_worker() when rescuer leave the pool We don't need to wake up regular worker when nr_running==1, so need_more_worker() is sufficient here. And need_more_worker() gives us better readability due to the name of "keep_working()" implies the rescuer should keep working now but the rescuer is actually leaving. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-18 18:46:11 -04:00
Lai Jiangshan	f7537df520	workqueue: alloc struct worker on its local node When the create_worker() is called from non-manager, the struct worker is allocated from the node of the caller which may be different from the node of pool->node. So we add a node ID argument for the alloc_worker() to ensure the struct worker is allocated from the preferable node. tj: @nid renamed to @node for consistency. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-15 11:11:20 -04:00
Lai Jiangshan	9c34a7042e	workqueue: reuse the already calculated pwq in try_to_grab_pending() try_to_grab_pending() was re-calculating the associated pwq using get_work_pwq() when it already has it cached in a local varible and the association can't change. Reuse the local variable instead. This doesn't introduce any functional changes. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-11 11:14:50 -04:00
Yasuaki Ishimatsu	5a6024f160	workqueue: zero cpumask of wq_numa_possible_cpumask on init When hot-adding and onlining CPU, kernel panic occurs, showing following call trace. BUG: unable to handle kernel paging request at 0000000000001d08 IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10 PGD 0 Oops: 0000 [#1] SMP ... Call Trace: [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50 [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0 [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0 [<ffffffff811926f1>] new_slab+0x91/0x300 [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482 [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0 [<ffffffff810a3c78>] ? load_balance+0x218/0x890 [<ffffffff8101a679>] ? sched_clock+0x9/0x10 [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10 [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200 [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0 [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60 [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8105d0ec>] do_fork+0xbc/0x360 [<ffffffff8105d3b6>] kernel_thread+0x26/0x30 [<ffffffff81086652>] kthreadd+0x2c2/0x300 [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60 [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60 In my investigation, I found the root cause is wq_numa_possible_cpumask. All entries of wq_numa_possible_cpumask is allocated by alloc_cpumask_var_node(). And these entries are used without initializing. So these entries have wrong value. When hot-adding and onlining CPU, wq_update_unbound_numa() is called. wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq() calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set as follow: 3592 /* if cpumask is contained inside a NUMA node, we belong to that node */ 3593 if (wq_numa_enabled) { 3594 for_each_node(node) { 3595 if (cpumask_subset(pool->attrs->cpumask, 3596 wq_numa_possible_cpumask[node])) { 3597 pool->node = node; 3598 break; 3599 } 3600 } 3601 } But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong node is selected. As a result, kernel panic occurs. By this patch, all entries of wq_numa_possible_cpumask are allocated by zalloc_cpumask_var_node to initialize them. And the panic disappeared. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Fixes: `bce903809a` ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")	2014-07-07 09:56:48 -04:00
Lai Jiangshan	85327af61d	workqueue: stronger test in process_one_work() When POOL_DISASSOCIATED is cleared, the running worker's local CPU should be the same as pool->cpu without any exception even during cpu-hotplug. This patch changes "(proposition_A && proposition_B && proposition_C)" to "(proposition_B && proposition_C)", so if the old compound proposition is true, the new one must be true too. so this won't hide any possible bug which can be hit by old test. tj: Minor description update and dropped the obvious comment. CC: Jason J. Herne <jjherne@linux.vnet.ibm.com> CC: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-01 17:40:14 -04:00
Lai Jiangshan	3de5e88485	workqueue: clear POOL_DISASSOCIATED in rebind_workers() `a9ab775bca` ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") moved pool locking into rebind_workers() but left "pool->flags &= ~POOL_DISASSOCIATED" in workqueue_cpu_up_callback(). There is nothing necessarily wrong with it, but there is no benefit either. Let's move it into rebind_workers() and achieve the following benefits: 1) better readability, POOL_DISASSOCIATED is cleared in rebind_workers() as expected. 2) we can guarantee that, when POOL_DISASSOCIATED is clear, the running workers of the pool are on the local CPU (pool->cpu). tj: Minor description update. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-01 17:40:14 -04:00
Maxime Bizon	bddbceb688	workqueue: fix dev_set_uevent_suppress() imbalance Uevents are suppressed during attributes registration, but never restored, so kobject_uevent() does nothing. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Fixes: `226223ab3c`	2014-06-23 14:40:49 -04:00
Lai Jiangshan	807407c0a2	workqueue: stronger test in process_one_work() After the recent changes, when POOL_DISASSOCIATED is cleared, the running worker's local CPU should be the same as pool->cpu without any exception even during cpu-hotplug. Update the sanity check in process_one_work() accordingly. This patch changes "(proposition_A && proposition_B && proposition_C)" to "(proposition_B && proposition_C)", so if the old compound proposition is true, the new one must be true too. so this will not hide any possible bug which can be caught by the old test. tj: Minor updates to the description. CC: Jason J. Herne <jjherne@linux.vnet.ibm.com> CC: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 15:43:33 -04:00
Lai Jiangshan	f05b558d7e	workqueue: clear POOL_DISASSOCIATED in rebind_workers() The commit `a9ab775bca` ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") moved the pool->lock into rebind_workers() without also moving "pool->flags &= ~POOL_DISASSOCIATED". There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not being moved together, but there isn't any benefit either. We move it into rebind_workers() and achieve these benefits: 1) Better readability. POOL_DISASSOCIATED is cleared in rebind_workers() as expected. 2) When POOL_DISASSOCIATED is cleared, we can ensure that all the running workers of the pool are on the local CPU (pool->cpu). tj: Cosmetic updates to the code and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 15:43:33 -04:00
Lai Jiangshan	92b69f5091	workqueue: sanity check pool->cpu in wq_worker_sleeping() In theory, pool->cpu is equals to @cpu in wq_worker_sleeping() after worker->flags is checked. And "pool->cpu != cpu" sanity check will help us if something wrong. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:32:27 -04:00
Lai Jiangshan	b62c075194	workqueue: clear leftover flags when detached When a worker is detached, the worker->flags may still have WORKER_UNBOUND or WORKER_REBOUND, it is OK for all cases: 1) if it is a normal worker, the worker will be dead, it is OK. 2) if it is a rescuer, it may re-attach to a pool with this leftover flag[s], it is still correct except it may cause unneeded wakeup. It is correct but not good, so we just remove the leftover flags. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:29:12 -04:00
Lai Jiangshan	25ef09586d	workqueue: remove useless WARN_ON_ONCE() The @cpu is fetched via smp_processor_id() in this function, so the check is useless. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:28:20 -04:00
Lai Jiangshan	e212f361fb	workqueue: use schedule_timeout_interruptible() instead of open code schedule_timeout_interruptible(CREATE_COOLDOWN) is exactly the same as the original code. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:25:51 -04:00
Lai Jiangshan	e6a9a77123	workqueue: remove the empty check in too_many_workers() The commit `ea1abd6197` ("workqueue: reimplement idle worker rebinding") used a trick which simply removes all to-be-bound idle workers from the idle list and lets them add themselves back after completing rebinding. And this trick caused the @worker_pool->nr_idle may deviate than the actual number of idle workers on @worker_pool->idle_list. More specifically, nr_idle may be non-zero while ->idle_list is empty. All users of ->nr_idle and ->idle_list are audited. The only affected one is too_many_workers() which is updated to check %false if ->idle_list is empty regardless of ->nr_idle. The commit/trick was complicated due to it just tried to simplify an even more complicated problem (workers had to rebind itself). But the commit `a9ab775bca` ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") fixed all these problems and the mentioned trick was useless and is gone. So, now the @worker_pool->nr_idle is exactly the actual number of workers on @worker_pool->idle_list. too_many_workers() should recover as it was before the trick. So we remove the empty check. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:24:40 -04:00
Lai Jiangshan	61d0fbb4b6	workqueue: use "pool->cpu < 0" to stand for an unbound pool There is a piece of sanity checks code in the put_unbound_pool(). The meaning of this code is "if it is not an unbound pool, it will complain and return" IIUC. But the code uses "pool->flags & POOL_DISASSOCIATED" imprecisely due to a non-unbound pool may also have this flags. We should use "pool->cpu < 0" to stand for an unbound pool, so we covert the code to it. There is no strictly wrong if we still keep "pool->flags & POOL_DISASSOCIATED" here, but it is just a noise if we keep it: 1) we focus on "unbound" here, not "[dis]association". 2) "pool->cpu < 0" already implies "pool->flags & POOL_DISASSOCIATED". Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-06-19 12:05:00 -04:00
Linus Torvalds	da85d191f5	Merge branch 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Lai simplified worker destruction path and internal workqueue locking and there are some other minor changes. Except for the removal of some long-deprecated interfaces which haven't had any in-kernel user for quite a while, there shouldn't be any difference to workqueue users" * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info workqueue: remove the confusing POOL_FREEZING workqueue: rename first_worker() to first_idle_worker() workqueue: remove unused work_clear_pending() workqueue: remove unused WORK_CPU_END workqueue: declare system_highpri_wq workqueue: use generic attach/detach routine for rescuers workqueue: separate pool-attaching code out from create_worker() workqueue: rename manager_mutex to attach_mutex workqueue: narrow the protection range of manager_mutex workqueue: convert worker_idr to worker_ida workqueue: separate iteration role from worker_idr workqueue: destroy worker directly in the idle timeout handler workqueue: async worker destruction workqueue: destroy_worker() should destroy idle workers only workqueue: use manager lock only to protect worker_idr workqueue: Remove deprecated system_nrt[_freezable]_wq workqueue: Remove deprecated flush[_delayed]_work_sync() kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info workqueue: simplify wq_update_unbound_numa() by jumping to use_dfl_pwq if the target cpumask equals wq's	2014-06-09 14:56:49 -07:00
Valdis Kletnieks	015af06e10	kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info This commit did an incorrect printk->pr_info conversion. If we were converting to pr_info() we should lose the log_level parameter. The problem is that this is called (indirectly) by show_regs_print_info(), which is called with various log_levels (from _INFO clear to _EMERG). So we leave it as a printk() call so the desired log_level is applied. Not a full revert, as the other half of the patch is correct. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-28 10:22:34 -04:00
Lai Jiangshan	74b414ead1	workqueue: remove the confusing POOL_FREEZING Currently, the global freezing state is propagated to worker_pools via POOL_FREEZING and then to each workqueue; however, the middle step - propagation through worker_pools - can be skipped as long as one or more max_active adjustments happens for each workqueue after the update to the global state is visible. The global workqueue freezing state and the max_active adjustments during workqueue creation and [un]freezing are serialized with wq_pool_mutex, so it's trivial to guarantee that max_actives stay in sync with global freezing state. POOL_FREEZING is unnecessary and makes the code more confusing and complicates freeze_workqueues_begin() and thaw_workqueues() by requiring them to walk through all pools. Remove POOL_FREEZING and use workqueue_freezing directly instead. tj: Description and comment updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-22 11:35:51 -04:00
Lai Jiangshan	1037de36ed	workqueue: rename first_worker() to first_idle_worker() first_worker() actually returns the first idle workers, the name first_idle_worker() which is self-commnet will be better. All the callers of first_worker() expect it returns an idle worker, the name first_idle_worker() with "idle" notation makes reviewers happier. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-22 11:35:51 -04:00
Ingo Molnar	65c2ce7004	Linux 3.15-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0= =2ShG -----END PGP SIGNATURE----- Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-22 10:28:56 +02:00
Lai Jiangshan	51697d3939	workqueue: use generic attach/detach routine for rescuers There are several problems with the code that rescuers use to bind themselve to the target pool's cpumask. 1) It is very different from how the normal workers bind to cpumask, increasing code complexity and maintenance overhead. 2) The code of cpu-binding for rescuers is complicated. 3) If one or more cpu hotplugs happen while a rescuer is processing its scheduled work items, the rescuer may not stay bound to the cpumask of the pool. This is an allowed behavior, but is still hairy. It will be better if the cpumask of the rescuer is always kept synchronized with the pool across cpu hotplugs. Using generic attach/detach routine will solve the above problems and results in much simpler code. tj: Minor description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:32 -04:00
Lai Jiangshan	4736cbf7a4	workqueue: separate pool-attaching code out from create_worker() Currently, the code to attach a new worker to its pool is embedded in create_worker(). Separating this code out will make the codes clearer and will allow rescuers to share the code path later. tj: Description and comment updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:32 -04:00
Lai Jiangshan	92f9c5c40c	workqueue: rename manager_mutex to attach_mutex manager_mutex is only used to protect the attaching for the pool and the pool->workers list. It protects the pool->workers and operations based on this list, such as: cpu-binding for the workers in the pool->workers the operations to set/clear WORKER_UNBOUND So let's rename manager_mutex to attach_mutex to better reflect its role. This patch is a pure rename. tj: Minor command and description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:32 -04:00
Lai Jiangshan	4d757c5c81	workqueue: narrow the protection range of manager_mutex In create_worker(), as pool->worker_ida now uses ida_simple_get()/ida_simple_put() and doesn't require external synchronization, it doesn't need manager_mutex. struct worker allocation and kthread allocation are not visible by any one before attached, so they don't need manager_mutex either. The above operations are before the attaching operation which attaches the worker to the pool. Between attaching and starting the worker, the worker is already attached to the pool, so the cpu hotplug will handle cpu-binding for the worker correctly and we don't need the manager_mutex after attaching. The conclusion is that only the attaching operation needs manager_mutex, so we narrow the protection section of manager_mutex in create_worker(). Some comments about manager_mutex are removed, because we will rename it to attach_mutex and add worker_attach_to_pool() later which will be self-explanatory. tj: Minor description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:32 -04:00
Lai Jiangshan	7cda9aae05	workqueue: convert worker_idr to worker_ida We no longer iterate workers via worker_idr and worker_idr is used only for allocating/freeing ID, so we can convert it to worker_ida. By using ida_simple_get/remove(), worker_ida doesn't require external synchronization, so we don't need manager_mutex to protect it and the ID-removal code is allowed to be moved out from worker_detach_from_pool(). In a later patch, worker_detach_from_pool() will be used in rescuers which don't have IDs, so we move the ID-removal code out from worker_detach_from_pool() into worker_thread(). tj: Minor description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:31 -04:00
Lai Jiangshan	da028469ba	workqueue: separate iteration role from worker_idr worker_idr has the iteration (iterating for attached workers) and worker ID duties. These two duties don't have to be tied together. We can separate them and use a list for tracking attached workers and iteration. Before this separation, it wasn't possible to add rescuer workers to worker_idr due to rescuer workers couldn't allocate ID dynamically because ID-allocation depends on memory-allocation, which rescuer can't depend on. After separation, we can easily add the rescuer workers to the list for iteration without any memory-allocation. It is required when we attach the rescuer worker to the pool in later patch. tj: Minor description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:31 -04:00
Lai Jiangshan	3347fc9f36	workqueue: destroy worker directly in the idle timeout handler Since destroy_worker() doesn't need to sleep nor require manager_mutex, destroy_worker() can be directly called in the idle timeout handler, it helps us remove POOL_MANAGE_WORKERS and maybe_destroy_worker() and simplify the manage_workers() After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't need to test whether it needs to manage after processed works. So we can remove the test branch. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-05-20 10:59:31 -04:00
Lai Jiangshan	60f5a4bcf8	workqueue: async worker destruction worker destruction includes these parts of code: adjust pool's stats remove the worker from idle list detach the worker from the pool kthread_stop() to wait for the worker's task exit free the worker struct We can find out that there is no essential work to do after kthread_stop(), which means destroy_worker() doesn't need to wait for the worker's task exit, so we can remove kthread_stop() and free the worker struct in the worker exiting path. However, put_unbound_pool() still needs to sync the all the workers' destruction before destroying the pool; otherwise, the workers may access to the invalid pool when they are exiting. So we also move the code of "detach the worker" to the exiting path and let put_unbound_pool() to sync with this code via detach_completion. The code of "detach the worker" is wrapped in a new function "worker_detach_from_pool()" although worker_detach_from_pool() is only called once (in worker_thread()) after this patch, but we need to wrap it for these reasons: 1) The code of "detach the worker" is not short enough to unfold them in worker_thread(). 2) the name of "worker_detach_from_pool()" is self-comment, and we add some comments above the function. 3) it will be shared by rescuer in later patch which allows rescuer and normal thread use the same attach/detach frameworks. The worker id is freed when detaching which happens before the worker is fully dead, but this id of the dying worker may be re-used for a new worker, so the dying worker's task name is changed to "worker/dying" to avoid two or several workers having the same name. Since "detach the worker" is moved out from destroy_worker(), destroy_worker() doesn't require manager_mutex, so the "lockdep_assert_held(&pool->manager_mutex)" in destroy_worker() is removed, and destroy_worker() is not protected by manager_mutex in put_unbound_pool(). tj: Minor description updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:30 -04:00
Lai Jiangshan	73eb7fe73a	workqueue: destroy_worker() should destroy idle workers only We used to have the CPU online failure path where a worker is created and then destroyed without being started. A worker was created for the CPU coming online and if the online operation failed the created worker was shut down without being started. But this behavior was changed. The first worker is created and started at the same time for the CPU coming online. It means that we had already ensured in the code that destroy_worker() destroys only idle workers and we don't want to allow it to destroy any non-idle worker in the future. Otherwise, it may be buggy and it may be extremely hard to check. We should force destroy_worker() to destroy only idle workers explicitly. Since destroy_worker() destroys only idle workers, this patch does not change any functionality. We just need to update the comments and the sanity check code. In the sanity check code, we will refuse to destroy the worker if !(worker->flags & WORKER_IDLE). If the worker entered idle which means it is already started, so we remove the check of "worker->flags & WORKER_STARTED", after this removal, WORKER_STARTED is totally unneeded, so we remove WORKER_STARTED too. In the comments for create_worker(), "Create a new worker which is bound..." is changed to "... which is attached..." due to we change the name of this behavior to attaching. tj: Minor description / comment updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:30 -04:00
Lai Jiangshan	9625ab1727	workqueue: use manager lock only to protect worker_idr worker_idr is highly bound to managers and is always/only accessed in manager lock context. So we don't need pool->lock for it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-20 10:59:30 -04:00
Fabian Frederick	2d916033a3	kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info tj: Refreshed on top of wq/for-3.16. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-05-12 13:59:35 -04:00
Daeseok Youn	534a3fbb3f	workqueue: simplify wq_update_unbound_numa() by jumping to use_dfl_pwq if the target cpumask equals wq's wq_update_unbound_numa(), when it's decided that the newly updated cpumask equals the default, looks at whether the current pwq is already the default one and skips setting pwq to the default one. This extra step is unnecessary and we can always jump to use_dfl_pwq instead. Simplify the code by removing the conditional. This doesn't make any functional difference. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-04-18 15:50:34 -04:00
Lai Jiangshan	77668c8b55	workqueue: fix a possible race condition between rescuer and pwq-release There is a race condition between rescuer_thread() and pwq_unbound_release_workfn(). Even after a pwq is scheduled for rescue, the associated work items may be consumed by any worker. If all of them are consumed before the rescuer gets to them and the pwq's base ref was put due to attribute change, the pwq may be released while still being linked on @wq->maydays list making the rescuer dereference already freed pwq later. Make send_mayday() pin the target pwq until the rescuer is done with it. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.10+	2014-04-18 12:33:29 -04:00
Lai Jiangshan	4d595b866d	workqueue: make rescuer_thread() empty wq->maydays list before exiting After a @pwq is scheduled for emergency execution, other workers may consume the affectd work items before the rescuer gets to them. This means that a workqueue many have pwqs queued on @wq->maydays list while not having any work item pending or in-flight. If destroy_workqueue() executes in such condition, the rescuer may exit without emptying @wq->maydays. This currently doesn't cause any actual harm. destroy_workqueue() can safely destroy all the involved data structures whether @wq->maydays is populated or not as nobody access the list once the rescuer exits. However, this is nasty and makes future development difficult. Let's update rescuer_thread() so that it empties @wq->maydays after seeing should_stop to guarantee that the list is empty on rescuer exit. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.10+	2014-04-18 11:04:16 -04:00
Dongsheng Yang	8698a745d8	sched, treewide: Replace hardcoded nice values with MIN_NICE/MAX_NICE Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com Cc: devel@driverdev.osuosl.org Cc: devicetree@vger.kernel.org Cc: fcoe-devel@open-fcoe.org Cc: linux390@de.ibm.com Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: nbd-general@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: openipmi-developer@lists.sourceforge.net Cc: qla2xxx-upstream@qlogic.com Cc: linux-arch@vger.kernel.org [ Consolidated the patches, twiddled the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 12:07:24 +02:00
Daeseok Youn	77f300b198	workqueue: fix bugs in wq_update_unbound_numa() failure path wq_update_unbound_numa() failure path has the following two bugs. - alloc_unbound_pwq() is called without holding wq->mutex; however, if the allocation fails, it jumps to out_unlock which tries to unlock wq->mutex. - The function should switch to dfl_pwq on failure but didn't do so after alloc_unbound_pwq() failure. Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on alloc_unbound_pwq() failure. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Fixes: `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues")	2014-04-16 13:27:57 -04:00
Linus Torvalds	1ead658124	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Thomas Gleixner: "This assorted collection provides: - A new timer based timer broadcast feature for systems which do not provide a global accessible timer device. That allows those systems to put CPUs into deep idle states where the per cpu timer device stops. - A few NOHZ_FULL related improvements to the timer wheel - The usual updates to timer devices found in ARM SoCs - Small improvements and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) tick: Remove code duplication in tick_handle_periodic() tick: Fix spelling mistake in tick_handle_periodic() x86: hpet: Use proper destructor for delayed work workqueue: Provide destroy_delayed_work_on_stack() clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS timer: Remove code redundancy while calling get_nohz_timer_target() hrtimer: Rearrange comments in the order struct members are declared timer: Use variable head instead of &work_list in __run_timers() clocksource: exynos_mct: silence a static checker warning arm: zynq: Add support for cpufreq arm: zynq: Don't use arm_global_timer with cpufreq clocksource/cadence_ttc: Overhaul clocksource frequency adjustment clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI sh: Remove Kconfig entries for TMU, CMT and MTU2 ARM: shmobile: Remove CMT, TMU and STI Kconfig entries clocksource: armada-370-xp: Use atomic access for shared registers clocksource: orion: Use atomic access for shared registers clocksource: timer-keystone: Delete unnecessary variable clocksource: timer-keystone: introduce clocksource driver for Keystone ...	2014-04-01 11:00:07 -07:00
Thomas Gleixner	ea2e64f280	workqueue: Provide destroy_delayed_work_on_stack() If a delayed or deferrable work is on stack we need to tell debug objects that we are destroying the timer and the work. Otherwise we leak the tracking object. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-25 17:33:42 +01:00
Dongsheng Yang	144818422b	workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/6d85138180c00ce86975addab6e34b24b84f00a5.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-22 18:16:41 +01:00
Lai Jiangshan	5bdfff96c6	workqueue: ensure @task is valid across kthread_stop() When a kworker should die, the kworkre is notified through WORKER_DIE flag instead of kthread_should_stop(). This, IIRC, is primarily to keep the test synchronized inside worker_pool lock. WORKER_DIE is first set while holding pool->lock, the lock is dropped and kthread_stop() is called. Unfortunately, this means that there's a slight chance that the target kworker may see WORKER_DIE before kthread_stop() finishes and exits and frees the target task before or during kthread_stop(). Fix it by pinning the target task before setting WORKER_DIE and putting it after kthread_stop() is done. tj: Improved patch description and comment. Moved pinning above WORKER_DIE for better signify what it's protecting. CC: stable@vger.kernel.org Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-02-18 16:35:20 -05:00
Linus Torvalds	4a2829b976	Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue update from Tejun Heo: "Just one patch to add destroy_work_on_stack() annotations to help debugobj debugging" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()	2014-01-21 17:46:31 -08:00
Chuansheng Liu	440a113603	workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to call destroy_work_on_stack() which frees the debug object to pair with INIT_WORK_ONSTACK(). Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-01-11 22:26:33 -05:00
Linus Torvalds	9199c4caa1	PCI updates for v3.13: PCI device hotplug - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael J. Wysocki) Host bridge drivers - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn Helgaas) - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin (Jason Gunthorpe) Miscellaneous - Avoid unnecessary CPU switch when calling .probe() (Alexander Duyck) - Revert "workqueue: allow work_on_cpu() to be called recursively" (Bjorn Helgaas) - Disable Bus Master only on kexec reboot (Khalid Aziz) - Omit PCI ID macro strings to shorten quirk names for LTO (Michal Marek) MAINTAINERS \| 33 +++++++++++++++++++++++++++++++++ drivers/pci/host/pci-mvebu.c \| 5 +++++ drivers/pci/pci-driver.c \| 38 ++++++++++++++++++++++++++++++-------- drivers/pci/remove.c \| 4 +++- include/linux/kexec.h \| 3 +++ include/linux/pci.h \| 30 +++++++++++++++--------------- kernel/kexec.c \| 4 ++++ kernel/workqueue.c \| 32 ++++++++++---------------------- 8 files changed, 103 insertions(+), 46 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSrLDLAAoJEFmIoMA60/r8JO4P/iakGDezwXcd9YTbdaq/NmEK 31JtBao9rJNSUXpFGaFKGm99A479QNb+Fo1o5NaysyGbcAA2W0HeCkaee13hpw2D l3wvzJRoEwqVySOzMwlGsxxhYuvGiZ2WVALNkwBzwyCiYtypNnUtGNCSp/3J6XKp 2ZUjoagyixdS4oYoR4irZucPBWwzW89Gx4oJ0rBttNVsXjiT1D2OzYDYTuxMyb2E ZjXJqTUYzfwiFxqKPrGOabUPD9GX3EWaFmu01FLlsSznPZkk9yJR3/eq6I6utp/E WSpP+7v3jnPHjZVky1FdHaP5wqurtNRsiWBQVyNYF+M5WofKA1hGA+niJDEB/pVL vE74fJfZzpET1jlZR+7Z5qHPxW2A8Q2ARn74A42DYLQGmJEh7VdARcayV1E4diRH DzEnhf33b9bwIC1Q0cESWZ5RH4r2Xbjg0a1qoVQYi5VEJEMWXID3Ofk64Bd9QOz4 oLZ27V7clurD+gNarch/zgpd9LVIHLFriR2YWPpA0Iwy9EjueH8GsiOS3NqDn2SQ sQg5utE4vdnix4VrCvvbufAr3kJngndtuj/s7I3lZqi7nCyS+jeFFvMUd9h3MUqZ rs1IN1/qeTBBNi2dtmcKDN2ItahCoBhTI37JRaijZwe+B5m6mTe049it+E0SZPI2 IqLOYWL4ggT0se/cMXld =Edvk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI device hotplug - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael Wysocki) Host bridge drivers - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn Helgaas) - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin (Jason Gunthorpe) Miscellaneous - Avoid unnecessary CPU switch when calling .probe() (Alexander Duyck) - Revert "workqueue: allow work_on_cpu() to be called recursively" (Bjorn Helgaas) - Disable Bus Master only on kexec reboot (Khalid Aziz) - Omit PCI ID macro strings to shorten quirk names for LTO (Michal Marek)" * tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers PCI: Disable Bus Master only on kexec reboot PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin PCI: Omit PCI ID macro strings to shorten quirk names PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev() Revert "workqueue: allow work_on_cpu() to be called recursively" PCI: Avoid unnecessary CPU switch when calling driver .probe() method	2013-12-15 11:45:27 -08:00
Bjorn Helgaas	12997d1a99	Revert "workqueue: allow work_on_cpu() to be called recursively" This reverts commit `c2fda50966`. `c2fda50966` removed lockdep annotation from work_on_cpu() to work around the PCI path that calls work_on_cpu() from within a work_on_cpu() work item (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver .probe method). 961da7fb6b22 ("PCI: Avoid unnecessary CPU switch when calling driver .probe() method) avoids that recursive work_on_cpu() use in a different way, so this revert restores the work_on_cpu() lockdep annotation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Tejun Heo <tj@kernel.org>	2013-11-25 14:37:22 -07:00
Li Bin	4e8b22bd1a	workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues When one work starts execution, the high bits of work's data contain pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID is assigned WORK_OFFQ_POOL_NONE when the work being initialized indicating that no pool is associated and get_work_pool() uses it to check the associated pool. So if worker_pool_assign_id() assigns a ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers leakage, and it may break the non-reentrance guarantee. This patch fix this issue by modifying the worker_pool_assign_id() function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE. Furthermore, in the current implementation, the BUILD_BUG_ON() in init_workqueues makes no sense. The number of worker pools needed cannot be determined at compile time, because the number of backing pools for UNBOUND workqueues is dynamic based on the assigned custom attributes. So remove it. tj: Minor comment and indentation updates. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-11-22 18:14:47 -05:00

... 3 4 5 6 7 ...

890 Commits