linux-stable

Commit Graph

Author	SHA1	Message	Date
Tejun Heo	af73f5c9fe	workqueue: Rename workqueue_attrs->no_numa to ->ordered With the recent removal of NUMA related module param and sysfs knob, workqueue_attrs->no_numa is now only used to implement ordered workqueues. Let's rename the field so that it's less confusing especially with the planned CPU affinity awareness improvements. Just a rename. No functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	636b927eba	workqueue: Make unbound workqueues to use per-cpu pool_workqueues A pwq (pool_workqueue) represents an association between a workqueue and a worker_pool. When a work item is queued, the workqueue selects the pwq to use, which in turn determines the pool, and queues the work item to the pool through the pwq. pwq is also what implements the maximum concurrency limit - @max_active. As a per-cpu workqueue should be assocaited with a different worker_pool on each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq. However, unbound workqueues were sharing a pwq within each NUMA node by default. The sharing has several downsides: * Because @max_active is per-pwq, the meaning of @max_active changes depending on the machine configuration and whether workqueue NUMA locality support is enabled. * Makes per-cpu and unbound code deviate. * Gets in the way of making workqueue CPU locality awareness more flexible. This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu workqueues do by making the following changes: * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound workqueues. * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs the specified pwq to the target CPU's wq->cpu_pwq. * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq. This makes the return value of wq_calc_node_cpumask() unnecessary. It now returns void. * @max_active now means the same thing for both per-cpu and unbound workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer used in workqueue implementation and will be removed later. * All unbound pwq operations which used to be per-numa-node are now per-cpu. For most unbound workqueue users, this shouldn't cause noticeable changes. Work item issue and completion will be a small bit faster, flush_workqueue() would become a bit more expensive, and the total concurrency limit would likely become higher. All @max_active==1 use cases are currently being audited for conversion into alloc_ordered_workqueue() and they shouldn't be affected once the audit and conversion is complete. One area where the behavior change may be more noticeable is workqueue_congested() as the reported congestion state is now per CPU instead of NUMA node. There are only two users of this interface - drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are cc'd. Inputs on the behavior change would be very much appreciated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Karsten Graul <kgraul@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com>	2023-08-07 15:57:23 -10:00
Tejun Heo	4cbfd3de73	workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug When a CPU went online or offline, wq_update_unbound_numa() was called only on the CPU which was going up or down. This works fine because all CPUs on the same NUMA node share the same pool_workqueue slot - one CPU updating it updates it for everyone in the node. However, future changes will make each CPU use a separate pool_workqueue even when they're sharing the same worker_pool, which requires updating pool_workqueue's for all CPUs which may be sharing the same pool_workqueue on hotplug. To accommodate the planned changes, this patch updates workqueue_on/offline_cpu() so that they call wq_update_unbound_numa() for all CPUs sharing the same NUMA node as the CPU going up or down. In the current code, the second+ calls would be noops and there shouldn't be any behavior changes. * As wq_update_unbound_numa() is now called on multiple CPUs per each hotplug event, @cpu is renamed to @hotplug_cpu and another @cpu argument is added. The former indicates the CPU being hot[un]plugged and the latter the CPU whose pool_workqueue is being updated. * In wq_update_unbound_numa(), cpu_off is renamed to off_cpu for consistency with the new @hotplug_cpu. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	687a9aa56f	workqueue: Make per-cpu pool_workqueues allocated and released like unbound ones Currently, all per-cpu pwq's (pool_workqueue's) are allocated directly through a per-cpu allocation and thus, unlike unbound workqueues, not reference counted. This difference in lifetime management between the two types is a bit confusing. Unbound workqueues are currently accessed through wq->numa_pwq_tbl[] which isn't suitiable for the planned CPU locality related improvements. The plan is to unify pwq handling across per-cpu and unbound workqueues so that they're always accessed through wq->cpu_pwq. In preparation, this patch makes per-cpu pwq's to be allocated, reference counted and released the same way as unbound pwq's. wq->cpu_pwq now holds pointers to pwq's instead of containing them directly. pwq_unbound_release_workfn() is renamed to pwq_release_workfn() as it's now also used for per-cpu work items. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	967b494e2f	workqueue: Use a kthread_worker to release pool_workqueues pool_workqueue release path is currently bounced to system_wq; however, this is a bit tricky because this bouncing occurs while holding a pool lock and thus has risk of causing a A-A deadlock. This is currently addressed by the fact that only unbound workqueues use this bouncing path and system_wq is a per-cpu workqueue. While this works, it's brittle and requires a work-around like setting the lockdep subclass for the lock of unbound pools. Besides, future changes will use the bouncing path for per-cpu workqueues too making the current approach unusable. Let's just use a dedicated kthread_worker to untangle the dependency. This is just one more kthread for all workqueues and makes the pwq release logic simpler and more robust. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	fcecfa8f27	workqueue: Remove module param disable_numa and sysfs knobs pool_ids and numa Unbound workqueue CPU affinity is going to receive an overhaul and the NUMA specific knobs won't make sense anymore. Remove them. Also, the pool_ids knob was used for debugging and not really meaningful given that there is no visibility into the pools associated with those IDs. Remove it too. A future patch will improve overall visibility. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	797e8345cb	workqueue: Relocate worker and work management functions Collect first_idle_worker(), worker_enter/leave_idle(), find_worker_executing_work(), move_linked_works() and wake_up_worker() into one place. These functions will later be used to implement higher level worker management logic. No functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	ee1ceef727	workqueue: Rename wq->cpu_pwqs to wq->cpu_pwq wq->cpu_pwqs is a percpu variable carraying one pointer to a pool_workqueue. The field name being plural is unusual and confusing. Rename it to singular. This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:23 -10:00
Tejun Heo	fe089f87cc	workqueue: Not all work insertion needs to wake up a worker insert_work() always tried to wake up a worker; however, the only time it needs to try to wake up a worker is when a new active work item is queued. When a work item goes on the inactive list or queueing a flush work item, there's no reason to try to wake up a worker. This patch moves the worker wakeup logic out of insert_work() and places it in the active new work item queueing path in __queue_work(). While at it: * __queue_work() is dereferencing pwq->pool repeatedly. Add local variable pool. * Every caller of insert_work() calls debug_work_activate(). Consolidate the invocations into insert_work(). * In __queue_work() pool->watchdog_ts update is relocated slightly. This is to better accommodate future changes. This makes wakeups more precise and will help the planned change to assign work items to workers before waking them up. No behavior changes intended. v2: WARN_ON_ONCE(pool != last_pool) added in __queue_work() to clarify as suggested by Lai. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>	2023-08-07 15:57:22 -10:00
Tejun Heo	c0ab017d43	workqueue: Cleanups around process_scheduled_works() * Drop the trivial optimization in worker_thread() where it bypasses calling process_scheduled_works() if the first work item isn't linked. This is a mostly pointless micro optimization and gets in the way of improving the work processing path. * Consolidate pool->watchdog_ts updates in the two callers into process_scheduled_works(). Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:22 -10:00
Tejun Heo	bc8b50c2df	workqueue: Drop the special locking rule for worker->flags and worker_pool->flags worker->flags used to be accessed from scheduler hooks without grabbing pool->lock for concurrency management. This is no longer true since `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock"). Also, it's unclear why worker_pool->flags was using the "X" rule. All relevant users are accessing it under the pool lock. Let's drop the special "X" rule and use the "L" rule for these flag fields instead. While at it, replace the CONTEXT comment with lockdep_assert_held(). This allows worker_set/clr_flags() to be used from context which isn't the worker itself. This will be used later to implement assinging work items to workers before waking them up so that workqueue can have better control over which worker executes which work item on which CPU. The only actual changes are sanity checks. There shouldn't be any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:57:22 -10:00
Tejun Heo	87437656c2	workqueue: Merge branch 'for-6.5-fixes' into for-6.6 Unbound workqueue execution locality improvement patchset is about to applied which will cause merge conflicts with changes in for-6.5-fixes. Let's avoid future merge conflict by pulling in for-6.5-fixes. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 15:54:25 -10:00
Yang Yingliang	9680540c0c	workqueue: use LIST_HEAD to initialize cull_list Use LIST_HEAD() to initialize cull_list instead of open-coding it. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-08-07 08:36:51 -10:00
Tejun Heo	aa6fde93f3	workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items. Once detected, they're excluded from concurrency management to prevent them from blocking other per-cpu work items. If CONFIG_WQ_CPU_INTENSIVE_REPORT is enabled, repeat offenders are also reported so that the code can be updated. The default threshold is 10ms which is long enough to do fair bit of work on modern CPUs while short enough to be usually not noticeable. This unfortunately leads to a lot of, arguable spurious, detections on very slow CPUs. Using the same threshold across CPUs whose performance levels may be apart by multiple levels of magnitude doesn't make whole lot of sense. This patch scales up wq_cpu_intensive_thresh_us upto 1 second when BogoMIPS is below 4000. This is obviously very inaccurate but it doesn't have to be accurate to be useful. The mechanism is still useful when the threshold is fully scaled up and the benefits of reports are usually shared with everyone regardless of who's reporting, so as long as there are sufficient number of fast machines reporting, we don't lose much. Some (or is it all?) ARM CPUs systemtically report significantly lower BogoMIPS. While this doesn't break anything, given how widespread ARM CPUs are, it's at least a missed opportunity and it probably would be a good idea to teach workqueue about it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>	2023-07-25 11:49:57 -10:00
tiozhang	ace3c5499e	workqueue: add cmdline parameter `workqueue.unbound_cpus` to further constrain wq_unbound_cpumask at boot time Motivation of doing this is to better improve boot times for devices when we want to prevent our workqueue works from running on some specific CPUs, e,g, some CPUs are busy with interrupts. Signed-off-by: tiozhang <tiozhang@didiglobal.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-07-10 10:42:51 -10:00
Tetsuo Handa	20bdedafd2	workqueue: Warn attempt to flush system-wide workqueues. Based on commit `c4f135d643` ("workqueue: Wrap flush_workqueue() using a macro"), all in-tree users stopped flushing system-wide workqueues. Therefore, start emitting runtime message so that all out-of-tree users will understand that they need to update their code. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-07-10 10:39:17 -10:00
Linus Torvalds	7ab044a4f4	workqueue: Changes for v6.5 * Concurrency-managed per-cpu work items that hog CPUs and delay the execution of other work items are now automatically detected and excluded from concurrency management. Reporting on such work items can also be enabled through a config option. * Added tools/workqueue/wq_monitor.py which improves visibility into workqueue usages and behaviors. * Includes Arnd's minimal fix for gcc-13 enum warning on 32bit compiles. This conflicts with `afa4bb778e` ("workqueue: clean up WORK_* constant types, clarify masking") in master. Can be resolved by picking the master version. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoGvw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGZu0AP9IGK2opAzO9i3i1/Ys81b3sHi9PwrYWH3g252T Oe3O6QD/Wh0wYBVl0o7IdW6BGdd5iNwIEs420G53UmmPrATqsgQ= =TffY -----END PGP SIGNATURE----- Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: - Concurrency-managed per-cpu work items that hog CPUs and delay the execution of other work items are now automatically detected and excluded from concurrency management. Reporting on such work items can also be enabled through a config option. - Added tools/workqueue/wq_monitor.py which improves visibility into workqueue usages and behaviors. - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles, superseded by commit `afa4bb778e` in mainline. * tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0 workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() workqueue: fix enum type for gcc-13 workqueue: Track and monitor per-workqueue CPU time usage workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE workqueue: Improve locking rule description for worker fields workqueue: Move worker_set/clr_flags() upwards workqueue: Re-order struct worker fields workqueue: Add pwq->stats[] and a monitoring script Further upgrade queue_work_on() comment	2023-06-27 16:32:52 -07:00
Linus Torvalds	afa4bb778e	workqueue: clean up WORK_* constant types, clarify masking Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 \| return (void )(data & WORK_STRUCT_WQ_DATA_MASK); \| ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void ' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-06-23 12:08:14 -07:00
Zqiang	18c8ae8131	workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0 If workqueue.cpu_intensive_thresh_us is set to 0, the detection mechanism for CPU-hogging per-cpu work item will keep triggering spuriously: workqueue: process_srcu hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND workqueue: gc_worker hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND workqueue: gc_worker hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND workqueue: wait_rcu_exp_gp hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND workqueue: kfree_rcu_monitor hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND workqueue: kfree_rcu_monitor hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND workqueue: reg_todo hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND This commit therefore disables the CPU-hog detection mechanism when workqueue.cpu_intensive_thresh_us is set to 0. tj: Patch description updated and the condition check on cpu_intensive_thresh_us separated into a separate if statement for readability. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-05-25 10:46:53 -10:00
Zqiang	c8f6219be2	workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() Currently, pool->nr_running can be modified from timer tick, that means the timer tick can run nested inside a not-irq-protected section that's in the process of modifying nr_running. Consider the following scenario: CPU0 kworker/0:2 (events) worker_clr_flags(worker, WORKER_PREP \| WORKER_REBOUND); ->pool->nr_running++; (1) process_one_work() ->worker->current_func(work); ->schedule() ->wq_worker_sleeping() ->worker->sleeping = 1; ->pool->nr_running--; (0) .... ->wq_worker_running() .... CPU0 by interrupt: wq_worker_tick() ->worker_set_flags(worker, WORKER_CPU_INTENSIVE); ->pool->nr_running--; (-1) ->worker->flags \|= WORKER_CPU_INTENSIVE; .... ->if (!(worker->flags & WORKER_NOT_RUNNING)) ->pool->nr_running++; (will not execute) ->worker->sleeping = 0; .... ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE); ->pool->nr_running++; (0) .... worker_set_flags(worker, WORKER_PREP); ->pool->nr_running--; (-1) .... worker_enter_idle() ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running); if the nr_workers is equal to nr_idle, due to the nr_running is not zero, will trigger WARN_ON_ONCE(). [ 2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0 [ 2.462163] Modules linked in: [ 2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1 [ 2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 2.465127] Workqueue: 0x0 (events) [ 2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0 ... [ 2.472614] Call Trace: [ 2.473152] <TASK> [ 2.474182] worker_thread+0x71/0x430 [ 2.474992] ? _raw_spin_unlock_irqrestore+0x28/0x50 [ 2.475263] kthread+0x103/0x120 [ 2.475493] ? __pfx_worker_thread+0x10/0x10 [ 2.476355] ? __pfx_kthread+0x10/0x10 [ 2.476635] ret_from_fork+0x2c/0x50 [ 2.477051] </TASK> This commit therefore add the check of worker->sleeping in wq_worker_tick(), if the worker->sleeping is not zero, directly return. tj: Updated comment and description. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-05-24 11:57:26 -10:00
Tejun Heo	8a1dd1e547	workqueue: Track and monitor per-workqueue CPU time usage Now that wq_worker_tick() is there, we can easily track the rough CPU time consumption of each workqueue by charging the whole tick whenever a tick hits an active workqueue. While not super accurate, it provides reasonable visibility into the workqueues that consume a lot of CPU cycles. wq_monitor.py is updated to report the per-workqueue CPU times. v2: wq_monitor.py was using "cputime" as the key when outputting in json format. Use "cpu_time" instead for consistency with other fields. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-05-17 17:02:09 -10:00
Tejun Heo	6363845005	workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism Workqueue now automatically marks per-cpu work items that hog CPU for too long as CPU_INTENSIVE, which excludes them from concurrency management and prevents stalling other concurrency-managed work items. If a work function keeps running over the thershold, it likely needs to be switched to use an unbound workqueue. This patch adds a debug mechanism which tracks the work functions which trigger the automatic CPU_INTENSIVE mechanism and report them using pr_warn() with exponential backoff. v3: Documentation update. v2: Drop bouncing to kthread_worker for printing messages. It was to avoid introducing circular locking dependency through printk but not effective as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's just print directly using printk_deferred(). Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org>	2023-05-17 17:02:08 -10:00
Tejun Heo	616db8779b	workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE If a per-cpu work item hogs the CPU, it can prevent other work items from starting through concurrency management. A per-cpu workqueue which intends to host such CPU-hogging work items can choose to not participate in concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be error-prone and difficult to debug when missed. This patch adds an automatic CPU usage based detection. If a concurrency-managed work item consumes more CPU time than the threshold (10ms by default) continuously without intervening sleeps, wq_worker_tick() which is called from scheduler_tick() will detect the condition and automatically mark it CPU_INTENSIVE. The mechanism isn't foolproof: * Detection depends on tick hitting the work item. Getting preempted at the right timings may allow a violating work item to evade detection at least temporarily. * nohz_full CPUs may not be running ticks and thus can fail detection. * Even when detection is working, the 10ms detection delays can add up if many CPU-hogging work items are queued at the same time. However, in vast majority of cases, this should be able to detect violations reliably and provide reasonable protection with a small increase in code complexity. If some work items trigger this condition repeatedly, the bigger problem likely is the CPU being saturated with such per-cpu work items and the solution would be making them UNBOUND. The next patch will add a debug mechanism to help spot such cases. v4: Documentation for workqueue.cpu_intensive_thresh_us added to kernel-parameters.txt. v3: Switch to use wq_worker_tick() instead of hooking into preemptions as suggested by Peter. v2: Lai pointed out that wq_worker_stopping() also needs to be called from preemption and rtlock paths and an earlier patch was updated accordingly. This patch adds a comment describing the risk of infinte recursions and how they're avoided. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>	2023-05-17 17:02:08 -10:00
Tejun Heo	bdf8b9bfc1	workqueue: Improve locking rule description for worker fields * Some worker fields are modified only by the worker itself while holding pool->lock thus making them safe to read from self, IRQ context if the CPU is running the worker or while holding pool->lock. Add 'K' locking rule for them. * worker->sleeping is currently marked "None" which isn't very descriptive. It's used only by the worker itself. Add 'S' locking rule for it. A future patch will depend on the 'K' rule to access worker->current_* from the scheduler ticks. Signed-off-by: Tejun Heo <tj@kernel.org>	2023-05-17 17:02:08 -10:00
Tejun Heo	c54d5046a0	workqueue: Move worker_set/clr_flags() upwards They are going to be used in wq_worker_stopping(). Move them upwards. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>	2023-05-17 17:02:08 -10:00
Tejun Heo	725e8ec59c	workqueue: Add pwq->stats[] and a monitoring script Currently, the only way to peer into workqueue operations is through tracing. While possible, it isn't easy or convenient to monitor per-workqueue behaviors over time this way. Let's add pwq->stats[] that track relevant events and a drgn monitoring script - tools/workqueue/wq_monitor.py. It's arguable whether this needs to be configurable. However, it currently only has several counters and the runtime overhead shouldn't be noticeable given that they're on pwq's which are per-cpu on per-cpu workqueues and per-numa-node on unbound ones. Let's keep it simple for the time being. v2: Patch reordered to earlier with fewer fields. Field will be added back gradually. Help message improved. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>	2023-05-17 17:02:08 -10:00
Paul E. McKenney	854f5cc5b7	Further upgrade queue_work_on() comment The current queue_work_on() docbook comment says that the caller must ensure that the specified CPU can't go away, and further says that the penalty for failing to nail down the specified CPU is that the workqueue handler might find itself executing on some other CPU. This is true as far as it goes, but fails to note what happens if the specified CPU never was online. Therefore, further expand this comment to say that specifying a CPU that was never online will result in a splat. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-05-09 05:37:42 -10:00
Linus Torvalds	cd546fa325	workqueue changes for v6.4-rc1 Mostly changes from Petr to improve warning and error reporting. Workqueue now reports more of the relevant failures with better context which should help debugging. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErbOg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGfipAP9BXrUBI99uZL7XjSEe6rLWa0STODyWh67ikR+0 nUO22QEA+ttt5ecLZyopYB18Le06VBBW7J5cs0Cg83bCBTGb4wE= =gHbc -----END PGP SIGNATURE----- Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Mostly changes from Petr to improve warning and error reporting. Workqueue now reports more of the relevant failures with better context which should help debugging" * tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Introduce show_freezable_workqueues workqueue: Print backtraces from CPUs with hung CPU bound workqueues workqueue: Warn when a rescuer could not be created workqueue: Interrupted create_worker() is not a repeated event workqueue: Warn when a new worker could not be created workqueue: Fix hung time report of worker pools workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu() MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry	2023-04-29 09:48:52 -07:00
Jungseung Lee	704bc669e1	workqueue: Introduce show_freezable_workqueues Currently show_all_workqueue is called if freeze fails at the time of freeze the workqueues, which shows the status of all workqueues and of all worker pools. In this cases we may only need to dump state of only workqueues that are freezable and busy. This patch defines show_freezable_workqueues, which uses show_one_workqueue, a granular function that shows the state of individual workqueues, so that dump only the state of freezable workqueues at that time. tj: Minor message adjustment. Signed-off-by: Jungseung Lee <js07.lee@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-23 15:55:38 -10:00
Petr Mladek	cd2440d66f	workqueue: Print backtraces from CPUs with hung CPU bound workqueues The workqueue watchdog reports a lockup when there was not any progress in the worker pool for a long time. The progress means that a pending work item starts being proceed. Worker pools for unbound workqueues always wake up an idle worker and try to process the work immediately. The last idle worker has to create new worker first. The stall might happen only when a new worker could not be created in which case an error should get printed. Another problem might be too high load. In this case, workers are victims of a global system problem. Worker pools for CPU bound workqueues are designed for lightweight work items that do not need much CPU time. They are proceed one by one on a single worker. New worker is used only when a work is sleeping. It creates one additional scenario. The stall might happen when the CPU-bound workqueue is used for CPU-intensive work. More precisely, the stall is detected when a CPU-bound worker is in the TASK_RUNNING state for too long. In this case, it might be useful to see the backtrace from the problematic worker. The information how long a worker is in the running state is not available. But the CPU-bound worker pools do not have many workers in the running state by definition. And only few pools are typically blocked. It should be acceptable to print backtraces from all workers in TASK_RUNNING state in the stalled worker pools. The number of false positives should be very low. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 12:03:47 -10:00
Petr Mladek	4c0736a76a	workqueue: Warn when a rescuer could not be created Rescuers are created when a workqueue with WQ_MEM_RECLAIM is allocated. It typically happens during the system boot. systemd switches the root filesystem from initrd to the booted system during boot. It kills processes that block the switch for too long. One of the process might be modprobe that tries to create a workqueue. These problems are hard to reproduce. Also alloc_workqueue() does not pass the error code. Make the debugging easier by printing an error, similar to create_worker(). Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 12:03:46 -10:00
Petr Mladek	60f540389a	workqueue: Interrupted create_worker() is not a repeated event kthread_create_on_node() might get interrupted(). It is rare but realistic. For example, when an unbound workqueue is allocated in module_init() callback. It is done in the context of the "modprobe" process. And, for example, systemd might kill pending processes when switching root from initrd to the booted system. The interrupt is a one-off event and the race might be hard to reproduce. It is always worth printing. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 12:03:46 -10:00
Petr Mladek	3f0ea0b864	workqueue: Warn when a new worker could not be created The workqueue watchdog reports a lockup when there was not any progress in the worker pool for a long time. The progress means that a pending work item starts being proceed. The progress is guaranteed by using idle workers or creating new workers for pending work items. There are several reasons why a new worker could not be created: + there is not enough memory + there is no free pool ID (IDR API) + the system reached PID limit + the process creating the new worker was interrupted + the last idle worker (manager) has not been scheduled for a long time. It was not able to even start creating the kthread. None of these failures is reported at the moment. The only clue is that show_one_worker_pool() prints that there is a manager. It is the last idle worker that is responsible for creating a new one. But it is not clear if create_worker() is failing and why. Make the debugging easier by printing errors in create_worker(). The error code is important, especially from kthread_create_on_node(). It helps to distinguish the various reasons. For example, reaching memory limit (-ENOMEM), other system limits (-EAGAIN), or process interrupted (-EINTR). Use pr_once() to avoid repeating the same error every CREATE_COOLDOWN for each stuck worker pool. Ratelimited printk() might be better. It would help to know if the problem remains. It would be more clear if the create_worker() errors and workqueue stalls are related. Also old messages might get lost when the internal log buffer is full. The problem is that printk() might touch the watchdog. For example, see touch_nmi_watchdog() in serial8250_console_write(). It would require synchronization of the begin and length of the ratelimit interval with the workqueue watchdog. Otherwise, the error messages might break the watchdog. This does not look worth the complexity. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 12:03:46 -10:00
Petr Mladek	335a42ebb0	workqueue: Fix hung time report of worker pools The workqueue watchdog prints a warning when there is no progress in a worker pool. Where the progress means that the pool started processing a pending work item. Note that it is perfectly fine to process work items much longer. The progress should be guaranteed by waking up or creating idle workers. show_one_worker_pool() prints state of non-idle worker pool. It shows a delay since the last pool->watchdog_ts. The timestamp is updated when a first pending work is queued in __queue_work(). Also it is updated when a work is dequeued for processing in worker_thread() and rescuer_thread(). The delay is misleading when there is no pending work item. In this case it shows how long the last work item is being proceed. Show zero instead. There is no stall if there is no pending work. Fixes: `82607adcf9` ("workqueue: implement lockup detector") Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 12:03:46 -10:00
Ammar Faizi	a8ec5880bd	workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu() Use pr_warn_once() to achieve the same thing. It's simpler. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-03-17 11:49:03 -10:00
Greg Kroah-Hartman	686f669780	workqueue: move to use bus_get_dev_root() Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230313182918.1312597-8-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-17 15:29:23 +01:00
Valentin Schneider	c63a2e52d5	workqueue: Fold rebind_worker() within rebind_workers() !CONFIG_SMP builds complain about rebind_worker() being unused. Its only user, rebind_workers() is indeed only defined for CONFIG_SMP, so just fold the two lines back up there. Link: http://lore.kernel.org/r/20230113143102.2e94d74f@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-13 07:50:40 -10:00
Valentin Schneider	e02b931248	workqueue: Unbind kworkers before sending them to exit() It has been reported that isolated CPUs can suffer from interference due to per-CPU kworkers waking up just to die. A surge of workqueue activity during initial setup of a latency-sensitive application (refresh_vm_stats() being one of the culprits) can cause extra per-CPU kworkers to be spawned. Then, said latency-sensitive task can be running merrily on an isolated CPU only to be interrupted sometime later by a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity). Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with WORKER_DIE. Changing the affinity does require a sleepable context, leverage the newly introduced pool->idle_cull_work to get that. Remove dying workers from pool->workers and keep track of them in a separate list. This intentionally prevents for_each_loop_worker() from iterating over workers that are marked for death. Rename destroy_worker() to set_working_dying() to better reflect its effects and relationship with wake_dying_workers(). Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-12 06:21:49 -10:00
Valentin Schneider	9ab03be42b	workqueue: Don't hold any lock while rcuwait'ing for !POOL_MANAGER_ACTIVE put_unbound_pool() currently passes wq_manager_inactive() as exit condition to rcuwait_wait_event(), which grabs pool->lock to check for pool->flags & POOL_MANAGER_ACTIVE A later patch will require destroy_worker() to be invoked with wq_pool_attach_mutex held, which needs to be acquired before pool->lock. A mutex cannot be acquired within rcuwait_wait_event(), as it could clobber the task state set by rcuwait_wait_event() Instead, restructure the waiting logic to acquire any necessary lock outside of rcuwait_wait_event(). Since further work cannot be inserted into unbound pwqs that have reached ->refcnt==0, this is bound to make forward progress as eventually the worklist will be drained and need_more_worker(pool) will remain false, preventing any worker from stealing the manager position from us. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-12 06:21:49 -10:00
Valentin Schneider	3f959aa3b3	workqueue: Convert the idle_timer to a timer + work_struct A later patch will require a sleepable context in the idle worker timeout function. Converting worker_pool.idle_timer to a delayed_work gives us just that, however this would imply turning all idle_timer expiries into scheduler events (waking up a worker to handle the dwork). Instead, implement a "custom dwork" where the timer callback does some extra checks before queuing the associated work. No change in functionality intended. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-12 06:21:49 -10:00
Valentin Schneider	793777bc19	workqueue: Factorize unbind/rebind_workers() logic Later patches will reuse this code, move it into reusable functions. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-12 06:21:49 -10:00
Lai Jiangshan	99c621ef24	workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex When unbind_workers() reads wq_unbound_cpumask to set the affinity of freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex. Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also remove the need of temporary saved_cpumask. Fixes: `10a5a651e3` ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Reported-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-12 06:21:48 -10:00
Paul E. McKenney	c76feb0d5d	workqueue: Make show_pwq() use run-length encoding The show_pwq() function dumps out a pool_workqueue structure's activity, including the pending work-queue handlers: Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 0: cpus=0 node=0 flags=0x1 nice=0 active=10/256 refcnt=11 in-flight: 7:test_work_func, 64:test_work_func, 249:test_work_func pending: test_work_func, test_work_func, test_work_func1, test_work_func1, test_work_func1, test_work_func1, test_work_func1 When large systems are facing certain types of hang conditions, it is not unusual for this "pending" list to contain runs of hundreds of identical function names. This "wall of text" is difficult to read, and worse yet, it can be interleaved with other output such as stack traces. Therefore, make show_pwq() use run-length encoding so that the above printout instead looks like this: Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 0: cpus=0 node=0 flags=0x1 nice=0 active=10/256 refcnt=11 in-flight: 7:test_work_func, 64:test_work_func, 249:test_work_func pending: 2test_work_func, 5test_work_func1 When no comma would be printed, including the WORK_STRUCT_LINKED case, a new run is started unconditionally. This output is more readable, places less stress on the hardware, firmware, and software on the console-log path, and reduces interference with other output. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-06 14:23:24 -10:00
Richard Clark	33e3f0a335	workqueue: Add a new flag to spot the potential UAF error Currently if the user queues a new work item unintentionally into a wq after the destroy_workqueue(wq), the work still can be queued and scheduled without any noticeable kernel message before the end of a RCU grace period. As a debug-aid facility, this commit adds a new flag __WQ_DESTROYING to spot that issue by triggering a kernel WARN message. Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2023-01-04 12:25:29 -10:00
Uladzislau Rezki	a7e30c0e9a	workqueue: Make queue_rcu_work() use call_rcu_hurry() Earlier commits in this series allow battery-powered systems to build their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option. This Kconfig option causes call_rcu() to delay its callbacks in order to batch them. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%. This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner. However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it. Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_hurry() for the few places where laziness is inappropriate. And another call_rcu() instance that cannot be lazy is the one in queue_rcu_work(), given that callers to queue_rcu_work() are not necessarily OK with long delays. Therefore, make queue_rcu_work() use call_rcu_hurry() in order to revert to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Signed-off-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-11-30 13:17:05 -08:00
Linus Torvalds	865dad2022	kcfi updates for v6.1-rc1 This replaces the prior support for Clang's standard Control Flow Integrity (CFI) instrumentation, which has required a lot of special conditions (e.g. LTO) and work-arounds. The current implementation ("Kernel CFI") is specific to C, directly designed for the Linux kernel, and takes advantage of architectural features like x86's IBT. This series retains arm64 support and adds x86 support. Additional "generic" architectural support is expected soon: https://github.com/samitolvanen/llvm-project/commits/kcfi_generic - treewide: Remove old CFI support details - arm64: Replace Clang CFI support with Clang KCFI support - x86: Introduce Clang KCFI support -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4aAUWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkgWD/4mUgb7xewNIG/+fuipGd620Iao K0T8q4BNxLNRltOxNc3Q0WMDCggX0qJGCeds7EdFQJQOGxWcbifM8MAS4idAGM0G fc3Gxl1imC/oF6goCAbQgndA6jYFIWXGsv8LsRjAXRidWLFr3GFAqVqYJyokSySr 8zMQsEDuF4I1gQnOhEWdtPZbV3MQ4ZjfFzpv+33agbq6Gb72vKvDh3G6g2VXlxjt 1qnMtS+eEpbBU65cJkOi4MSLgymWbnIAeTMb0dbsV4kJ08YoTl8uz1B+weeH6GgT WP73ZJ4nqh1kkkT9EqS9oKozNB9fObhvCokEuAjuQ7i1eCEZsbShvRc0iL7OKTGG UfuTJa5qQ4h7Z0JS35FCSJETa+fcG0lTyEd133nLXLMZP9K2antf+A6O//fd0J1V Jg4VN7DQmZ+UNGOzRkL6dTtQUy4PkxhniIloaClfSYXxhNirA+v//sHTnTK3z2Bl 6qceYqmFmns2Laual7+lvnZgt6egMBcmAL/MOdbU74+KIR9Xw76wxQjifktHX+WF FEUQkUJDB5XcUyKlbvHoqobRMxvEZ8RIlC5DIkgFiPRE3TI0MqfzNSFnQ/6+lFNg Y0AS9HYJmcj8sVzAJ7ji24WPFCXzsbFn6baJa9usDNbWyQZokYeiv7ZPNPHPDVrv YEBP6aYko0lVSUS9qw== =Li4D -----END PGP SIGNATURE----- Merge tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kcfi updates from Kees Cook: "This replaces the prior support for Clang's standard Control Flow Integrity (CFI) instrumentation, which has required a lot of special conditions (e.g. LTO) and work-arounds. The new implementation ("Kernel CFI") is specific to C, directly designed for the Linux kernel, and takes advantage of architectural features like x86's IBT. This series retains arm64 support and adds x86 support. GCC support is expected in the future[1], and additional "generic" architectural support is expected soon[2]. Summary: - treewide: Remove old CFI support details - arm64: Replace Clang CFI support with Clang KCFI support - x86: Introduce Clang KCFI support" Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048 [1] Link: https://github.com/samitolvanen/llvm-project/commits/kcfi_generic [2] * tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) x86: Add support for CONFIG_CFI_CLANG x86/purgatory: Disable CFI x86: Add types to indirectly called assembly functions x86/tools/relocs: Ignore __kcfi_typeid_ relocations kallsyms: Drop CONFIG_CFI_CLANG workarounds objtool: Disable CFI warnings objtool: Preserve special st_shndx indexes in elf_update_symbol treewide: Drop __cficanonical treewide: Drop WARN_ON_FUNCTION_MISMATCH treewide: Drop function_nocfi init: Drop __nocfi from __init arm64: Drop unneeded __nocfi attributes arm64: Add CFI error handling arm64: Add types to indirect called assembly functions psci: Fix the function type for psci_initcall_t lkdtm: Emit an indirect call for CFI tests cfi: Add type helper macros cfi: Switch to -fsanitize=kcfi cfi: Drop __CFI_ADDRESSABLE cfi: Remove CONFIG_CFI_CLANG_SHADOW ...	2022-10-03 17:11:07 -07:00
Sami Tolvanen	4b24356312	treewide: Drop WARN_ON_FUNCTION_MISMATCH CONFIG_CFI_CLANG no longer breaks cross-module function address equality, which makes WARN_ON_FUNCTION_MISMATCH unnecessary. Remove the definition and switch back to WARN_ON_ONCE. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-15-samitolvanen@google.com	2022-09-26 10:13:14 -07:00
Tetsuo Handa	c0feea594e	workqueue: don't skip lockdep work dependency in cancel_work_sync() Like Hillf Danton mentioned syzbot should have been able to catch cancel_work_sync() in work context by checking lockdep_map in __flush_work() for both flush and cancel. in [1], being unable to report an obvious deadlock scenario shown below is broken. From locking dependency perspective, sync version of cancel request should behave as if flush request, for it waits for completion of work if that work has already started execution. ---------- #include <linux/module.h> #include <linux/sched.h> static DEFINE_MUTEX(mutex); static void work_fn(struct work_struct *work) { schedule_timeout_uninterruptible(HZ / 5); mutex_lock(&mutex); mutex_unlock(&mutex); } static DECLARE_WORK(work, work_fn); static int __init test_init(void) { schedule_work(&work); schedule_timeout_uninterruptible(HZ / 10); mutex_lock(&mutex); cancel_work_sync(&work); mutex_unlock(&mutex); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------- The check this patch restores was added by commit `0976dfc1d0` ("workqueue: Catch more locking problems with flush_work()"). Then, lockdep's crossrelease feature was added by commit `b09be676e0` ("locking/lockdep: Implement the 'crossrelease' feature"). As a result, this check was once removed by commit `fd1a5b04df` ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes"). But lockdep's crossrelease feature was removed by commit `e966eaeeb6` ("locking/lockdep: Remove the cross-release locking checks"). At this point, this check should have been restored. Then, commit `d6e89786be` ("workqueue: skip lockdep wq dependency in cancel_work_sync()") introduced a boolean flag in order to distinguish flush_work() and cancel_work_sync(), for checking "struct workqueue_struct" dependency when called from cancel_work_sync() was causing false positives. Then, commit `87915adc3f` ("workqueue: re-add lockdep dependencies for flushing") tried to restore "struct work_struct" dependency check, but by error checked this boolean flag. Like an example shown above indicates, "struct work_struct" dependency needs to be checked for both flush_work() and cancel_work_sync(). Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1] Reported-by: Hillf Danton <hdanton@sina.com> Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Fixes: `87915adc3f` ("workqueue: re-add lockdep dependencies for flushing") Cc: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-08-16 06:27:35 -10:00
Linus Torvalds	b44f2fd879	drm for 5.20/6.0 New driver: - logicvc vfio: - use aperture API core: - of: Add data-lane helpers and convert drivers - connector: Remove deprecated ida_simple_get() media: - Add various RGB666 and RGB888 format constants panel: - Add HannStar HSD101PWW - Add ETML0700Y5DHA dma-buf: - add sync-file API - set dma mask for udmabuf devices fbcon: - Improve scrolling performance - Sanitize input fbdev: - device unregistering fixes - vesa: Support COMPILE_TEST - Disable firmware-device registration when first native driver loads aperture: - fix segfault during hot-unplug - export for use with other subsystems client: - use driver validated modes dp: - aux: make probing more reliable - mst: Read extended DPCD capabilities during system resume - Support waiting for HDP signal - Port-validation fixes edid: - CEA data-block iterators - struct drm_edid introduction - implement HF-EEODB extension gem: - don't use fb format non-existing planes probe-helper: - use 640x480 as displayport fallback scheduler: - don't kill jobs in interrupt context bridge: - Add support for i.MX8qxp and i.MX8qm - lots of fixes/cleanups - Add TI-DLPC3433 - fy07024di26a30d: Optional GPIO reset - ldb: Add reg and reg-name properties to bindings, Kconfig fixes - lt9611: Fix display sensing; - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling - tc358775: Fix clock settings - ti-sn65dsi83: Allow GPIO to sleep - adv7511: I2C fixes - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback - fsl-ldb: Drop DE flip - ti-sn65dsi86: Convert to atomic modesetting amdgpu: - use atomic fence helpers in DM - fix VRAM address calculations - export CRTC bpc via debugfs - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Adjust GART size on newer APUs for S/G display - Soft reset for GFX 11 / SDMA 6 - Add gfxoff status query for vangogh - Fix timestamps for cursor only commits - Adjust GART size on newer APUs for S/G display - fix buddy memory corruption amdkfd: - MMU notifier fixes - P2P DMA support using dma-buf - Add available memory IOCTL - HMM profiler support - Simplify GPUVM validation - Unified memory for CWSR save/restore area i915: - General driver clean-up - DG2 enabling (still under force probe) - DG2 small BAR memory support - HuC loading support - DG2 workarounds - DG2/ATS-M device IDs added - Ponte Vecchio prep work and new blitter engines - add Meteorlake support - Fix sparse warnings - DMC MMIO range checks - Audio related fixes - Runtime PM fixes - PSR fixes - Media freq factor and per-gt enhancements - DSI fixes for ICL+ - Disable DMC flip queue handlers - ADL_P voltage swing updates - Use more the VBT for panel information - Fix on Type-C ports with TBT mode - Improve fastset and allow seamless M/N changes - Accept more fixed modes with VRR/DMRRS panels - Disable connector polling for a headless SKU - ADL-S display PLL w/a - Enable THP on Icelake and beyond - Fix i915_gem_object_ggtt_pin_ww regression on old platforms - Expose per tile media freq factor in sysfs - Fix dma_resv fence handling in multi-batch execbuf - Improve on suspend / resume time with VT-d enabled - export CRTC bpc settings via debugfs msm: - gpu: a619 support - gpu: Fix for unclocked GMU register access - gpu: Devcore dump enhancements - client utilization via fdinfo support - fix fence rollover issue - gem: Lockdep false-positive warning fix - gem: Switch to pfn mappings - WB support on sc7180 - dp: dropped custom bulk clock implementation - fix link retraining on resolution change - hdmi: dropped obsolete GPIO support tegra: - context isolation for host1x engines - tegra234 soc support mediatek: - add vdosys0/1 for mt8195 - add MT8195 dp_intf driver exynos: - Fix resume function issue of exynos decon driver by calling clk_disable_unprepare() properly if clk_prepare_enable() failed. nouveau: - set of misc fixes/cleanups - display cleanups gma500: - Cleanup connector I2C handling hyperv: - Unify VRAM allocation of Gen1 and Gen2 meson: - Support YUV422 output; Refcount fixes mgag200: - Support damage clipping - Support gamma handling - Protect concurrent HW access - Fixes to connector - Store model-specific limits in device-info structure - fix PCI register init panfrost: - Valhall support r128: - Fix bit-shift overflow rockchip: - Locking fixes in error path ssd130x: - Fix built-in linkage udl: - Always advertize VGA connector ast: - Support multiple outputs - fix black screen on resume sun4i: - HDMI PHY cleanups vc4: - Add support for BCM2711 vkms: - Allocate output buffer with vmalloc() mcde: - Fix ref-count leak mxsfb/lcdif: - Support i.MX8MP LCD controller stm/ltdc: - Support dynamic Z order - Support mirroring ingenic: - Fix display at maximum resolution -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmLp/7YACgkQDHTzWXnE hr7NjhAAnefa+72EG42OAqajbwTQMENOtFfqyL3k6ueK2ciYbsj/wklw/xc4Ok3o DM5kG54t+nA9L1M7UyE7eaO36/XcuvS8Ea0uKKkamWt+3Ux4g1Vo1J37nP5sK5jI GT/wceKA5sk3nuYly2lBby6mVTGuhAX+3edTAFeOwmd0WvQzzpy4vV+nCAgfshUs ql4gfQPdQdP+wiovUzCIEu6exCSCAI/Oc944fd3AJi5bZbOPFXRS4rMMOLSrdoXV 9P44EZExPbYrDuVUCx/UaZtN8D9myyyBfZe62CtdgNyTYUHXnHCBYue+7D/s5O+y GaLWcP128MsqZNmJNhmcWFIlgqowO24YkKUH68JH0UtBLSWich8rfdEsrxIidYED 0ma1jodRapjyZOjrHEJ3N5deKpoflMmqvCMpvIk1Ev6pT8KX9a6u34kLgsOVCV41 2bDEYD+DbRW2FexGR79yB2huXHGSnguco6069ca1oy9RF4q8cX6Pb1w2u42oS7zX lIgLIashilVR2AYg/qi6IPHavmOQ9ItSXPC+4YasYiMGp/mwePqpmL63b/wkhg0D nXn6/F8Bm6wle2FFbkLGwo1fF1Hn7RzTHSlqRWDKSEaMLhCus6M09VsobFCB19i0 lO4FNVTL8ZtryR94bgVmgi616w9hOhDhM9A+C0kJ9KBkDnDYUJU= =HQ9U -----END PGP SIGNATURE----- Merge tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Highlights: - New driver for logicvc - which is a display IP core. - EDID parser rework to add new extensions - fbcon scrolling improvements - i915 has some more DG2 work but not enabled by default, but should have enough features for userspace to work now. Otherwise it's lots of work all over the place. Detailed summary: New driver: - logicvc vfio: - use aperture API core: - of: Add data-lane helpers and convert drivers - connector: Remove deprecated ida_simple_get() media: - Add various RGB666 and RGB888 format constants panel: - Add HannStar HSD101PWW - Add ETML0700Y5DHA dma-buf: - add sync-file API - set dma mask for udmabuf devices fbcon: - Improve scrolling performance - Sanitize input fbdev: - device unregistering fixes - vesa: Support COMPILE_TEST - Disable firmware-device registration when first native driver loads aperture: - fix segfault during hot-unplug - export for use with other subsystems client: - use driver validated modes dp: - aux: make probing more reliable - mst: Read extended DPCD capabilities during system resume - Support waiting for HDP signal - Port-validation fixes edid: - CEA data-block iterators - struct drm_edid introduction - implement HF-EEODB extension gem: - don't use fb format non-existing planes probe-helper: - use 640x480 as displayport fallback scheduler: - don't kill jobs in interrupt context bridge: - Add support for i.MX8qxp and i.MX8qm - lots of fixes/cleanups - Add TI-DLPC3433 - fy07024di26a30d: Optional GPIO reset - ldb: Add reg and reg-name properties to bindings, Kconfig fixes - lt9611: Fix display sensing; - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling - tc358775: Fix clock settings - ti-sn65dsi83: Allow GPIO to sleep - adv7511: I2C fixes - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback - fsl-ldb: Drop DE flip - ti-sn65dsi86: Convert to atomic modesetting amdgpu: - use atomic fence helpers in DM - fix VRAM address calculations - export CRTC bpc via debugfs - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Adjust GART size on newer APUs for S/G display - Soft reset for GFX 11 / SDMA 6 - Add gfxoff status query for vangogh - Fix timestamps for cursor only commits - Adjust GART size on newer APUs for S/G display - fix buddy memory corruption amdkfd: - MMU notifier fixes - P2P DMA support using dma-buf - Add available memory IOCTL - HMM profiler support - Simplify GPUVM validation - Unified memory for CWSR save/restore area i915: - General driver clean-up - DG2 enabling (still under force probe) - DG2 small BAR memory support - HuC loading support - DG2 workarounds - DG2/ATS-M device IDs added - Ponte Vecchio prep work and new blitter engines - add Meteorlake support - Fix sparse warnings - DMC MMIO range checks - Audio related fixes - Runtime PM fixes - PSR fixes - Media freq factor and per-gt enhancements - DSI fixes for ICL+ - Disable DMC flip queue handlers - ADL_P voltage swing updates - Use more the VBT for panel information - Fix on Type-C ports with TBT mode - Improve fastset and allow seamless M/N changes - Accept more fixed modes with VRR/DMRRS panels - Disable connector polling for a headless SKU - ADL-S display PLL w/a - Enable THP on Icelake and beyond - Fix i915_gem_object_ggtt_pin_ww regression on old platforms - Expose per tile media freq factor in sysfs - Fix dma_resv fence handling in multi-batch execbuf - Improve on suspend / resume time with VT-d enabled - export CRTC bpc settings via debugfs msm: - gpu: a619 support - gpu: Fix for unclocked GMU register access - gpu: Devcore dump enhancements - client utilization via fdinfo support - fix fence rollover issue - gem: Lockdep false-positive warning fix - gem: Switch to pfn mappings - WB support on sc7180 - dp: dropped custom bulk clock implementation - fix link retraining on resolution change - hdmi: dropped obsolete GPIO support tegra: - context isolation for host1x engines - tegra234 soc support mediatek: - add vdosys0/1 for mt8195 - add MT8195 dp_intf driver exynos: - Fix resume function issue of exynos decon driver by calling clk_disable_unprepare() properly if clk_prepare_enable() failed. nouveau: - set of misc fixes/cleanups - display cleanups gma500: - Cleanup connector I2C handling hyperv: - Unify VRAM allocation of Gen1 and Gen2 meson: - Support YUV422 output; Refcount fixes mgag200: - Support damage clipping - Support gamma handling - Protect concurrent HW access - Fixes to connector - Store model-specific limits in device-info structure - fix PCI register init panfrost: - Valhall support r128: - Fix bit-shift overflow rockchip: - Locking fixes in error path ssd130x: - Fix built-in linkage udl: - Always advertize VGA connector ast: - Support multiple outputs - fix black screen on resume sun4i: - HDMI PHY cleanups vc4: - Add support for BCM2711 vkms: - Allocate output buffer with vmalloc() mcde: - Fix ref-count leak mxsfb/lcdif: - Support i.MX8MP LCD controller stm/ltdc: - Support dynamic Z order - Support mirroring ingenic: - Fix display at maximum resolution" * tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits) drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code drm/amdgpu: enable support for psp 13.0.4 block drm/amdgpu: add files for PSP 13.0.4 drm/amdgpu: add header files for MP 13.0.4 drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index drm/amdgpu: send msg to IMU for the front-door loading drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b" drm/amdgpu: fix hive reference leak when reflecting psp topology info drm/amd/pm: enable GFX ULV feature support for SMU13.0.0 drm/amd/pm: update driver if header for SMU 13.0.0 drm/amdgpu: move mes self test after drm sched re-started drm/amdgpu: drop non-necessary call trace dump drm/amdgpu: enable VCN cg and JPEG cg/pg drm/amdgpu: vcn_4_0_2 video codec query drm/amdgpu: add VCN_4_0_2 firmware support drm/amdgpu: add VCN function in NBIO v7.7 drm/amdgpu: fix a vcn4 boot poll bug in emulation mode drm/amd/amdgpu: add memory training support for PSP_V13 drm/amdkfd: remove an unnecessary amdgpu_bo_ref drm/amd/pm: Add get_gfx_off_status interface for yellow carp ...	2022-08-03 19:52:08 -07:00
Lai Jiangshan	46a4d679ef	workqueue: Avoid a false warning in unbind_workers() Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible fails and trigger the false warning. Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs. It is very easy to trigger the warning: Set wq_unbound_cpumask to a small set of CPUs. Offline all the CPUs of wq_unbound_cpumask. Offline an extra CPU and trigger the warning. Fixes: `10a5a651e3` ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-07-29 07:49:02 -10:00
Dave Airlie	344feb7ccf	Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.20-2022-07-05: amdgpu: - Various spelling and grammer fixes - Various eDP fixes - Various DMCUB fixes - VCN fixes - GMC 11 fixes - RAS fixes - TMZ support for GC 10.3.7 - GPUVM TLB flush fixes - SMU 13.0.x updates - DCN 3.2 Support - DCN 3.2.1 Support - MES updates - GFX11 modifiers support - USB-C fixes - MMHUB 3.0.1 support - SDMA 6.0 doorbell fixes - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Enable GPU reset for SMU 13.0.4 - OLED display fixes - MPO fixes - DC frame size fixes - ASPM support for PCIE 7.4/7.6 - GPU reset support for SMU 13.0.0 - GFX11 updates - VCN JPEG fix - BACO support for SMU 13.0.7 - VCN instance handling fix - GFX8 GPUVM TLB flush fix - GPU reset rework - VCN 4.0.2 support - GTT size fixes - DP link training fixes - LSDMA 6.0.1 support - Various backlight fixes - Color encoding fixes - Backlight config cleanup - VCN 4.x unified queue cleanup amdkfd: - MMU notifier fixes - Updates for GC 10.3.6 and 10.3.7 - P2P DMA support using dma-buf - Add available memory IOCTL - SDMA 6.0.1 fix - MES fixes - HMM profiler support radeon: - License fix - Backlight config cleanup UAPI: - Add available memory IOCTL to amdkfd Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html - HMM profiler support for amdkfd Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com	2022-07-12 11:07:32 +10:00
Andrey Grodzovsky	73b4b53276	Revert "workqueue: remove unused cancel_work()" This reverts commit `6417250d3f`. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Lai Jiangshan<jiangshanlai@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-10 15:24:38 -04:00
Tetsuo Handa	c4f135d643	workqueue: Wrap flush_workqueue() using a macro Since flush operation synchronously waits for completion, flushing system-wide WQs (e.g. system_wq) might introduce possibility of deadlock due to unexpected locking dependency. Tejun Heo commented at [1] that it makes no sense at all to call flush_workqueue() on the shared WQs as the caller has no idea what it's gonna end up waiting for. Although there is flush_scheduled_work() which flushes system_wq WQ with "Think twice before calling this function! It's very easy to get into trouble if you don't take great care." warning message, syzbot found a circular locking dependency caused by flushing system_wq WQ [2]. Therefore, let's change the direction to that developers had better use their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable. Steps for converting system-wide WQs into local WQs are explained at [3], and a conversion to stop flushing system-wide WQs is in progress. Now we want some mechanism for preventing developers who are not aware of this conversion from again start flushing system-wide WQs. Since I found that WARN_ON() is complete but awkward approach for teaching developers about this problem, let's use __compiletime_warning() for incomplete but handy approach. For completeness, we will also insert WARN_ON() into __flush_workqueue() after all in-tree users stopped calling flush_scheduled_work(). Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1] Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2] Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-06-07 07:07:14 -10:00
Zqiang	10a5a651e3	workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs When a CPU is going offline, all workers on the CPU's pool will have their cpus_allowed cleared to cpu_possible_mask and can run on any CPUs including the isolated ones. Instead, set cpus_allowed to wq_unbound_cpumask so that the can avoid isolated CPUs. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-21 12:31:04 -10:00
Linus Torvalds	7838316260	Merge branch 'for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Nothing major. Just follow-up cleanups from Lai after the earlier synchronization simplification" * 'for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Convert the type of pool->nr_running to int workqueue: Use wake_up_worker() in wq_worker_sleeping() instead of open code workqueue: Change the comments of the synchronization about the idle_list workqueue: Remove the mb() pair between wq_worker_sleeping() and insert_work()	2022-03-23 12:40:51 -07:00
Frederic Weisbecker	04d4e665a6	sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org	2022-02-16 15:57:55 +01:00
Frederic Weisbecker	7b45b51e77	workqueue: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch To prepare for supporting each feature of the housekeeping cpumask toward cpuset, prepare each of the HK_FLAG_* entries to move to their own cpumask with enforcing to fetch them individually. The new constraint is that multiple HK_FLAG_* entries can't be mixed together anymore in a single call to housekeeping cpumask(). This will later allow, for example, to runtime modify the cpulist passed through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot parameters. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220207155910.527133-3-frederic@kernel.org	2022-02-16 15:57:54 +01:00
Lai Jiangshan	bc35f7ef96	workqueue: Convert the type of pool->nr_running to int It is only modified in associated CPU, so it doesn't need to be atomic. tj: Comment updated. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-01-12 07:46:36 -10:00
Lai Jiangshan	cc5bff3846	workqueue: Use wake_up_worker() in wq_worker_sleeping() instead of open code The wakeup code in wq_worker_sleeping() is the same as wake_up_worker(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-01-12 07:40:32 -10:00
Lai Jiangshan	2c1f1a9180	workqueue: Change the comments of the synchronization about the idle_list The access to idle_list in wq_worker_sleeping() is changed to be protected by pool->lock, so the comments above idle_list can be changed to "L:" which is the meaning of "access with pool->lock held". And the outdated comments in wq_worker_sleeping() is removed since the function is not called with rq lock held any more, idle_list is dereferenced with pool lock now. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-01-12 07:39:15 -10:00
Lai Jiangshan	21b195c05c	workqueue: Remove the mb() pair between wq_worker_sleeping() and insert_work() In wq_worker_sleeping(), the access to worklist is protected by the pool->lock, so the memory barrier is unneeded. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-01-12 07:37:28 -10:00
Tejun Heo	2a8ab0fbd1	Merge branch 'workqueue/for-5.16-fixes' into workqueue/for-5.17 for-5.16-fixes contains two subtle race conditions which were introduced by scheduler side code cleanups. The branch didn't get pushed out, so merge into for-5.17.	2022-01-10 07:54:04 -10:00
Lai Jiangshan	84f91c62d6	workqueue: Remove the cacheline_aligned for nr_running nr_running is never modified remotely after the schedule callback in wakeup path is removed. Rather nr_running is often accessed with other fields in the pool together, so the cacheline_aligned for nr_running isn't needed. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:26:54 -10:00
Lai Jiangshan	989442d737	workqueue: Move the code of waking a worker up in unbind_workers() In unbind_workers(), there are two pool->lock held sections separated by the code of zapping nr_running. wake_up_worker() needs to be in pool->lock held section and after zapping nr_running. And zapping nr_running had to be after schedule() when the local wake up functionality was in use. Now, the call to schedule() has been removed along with the local wake up functionality, so the code can be merged into the same pool->lock held section. The diffstat shows that it is other code moved down because the diff tools can not know the meaning of merging lock sections by swapping two code blocks. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:23:15 -10:00
Lai Jiangshan	b4ac9384ac	workqueue: Remove schedule() in unbind_workers() The commit `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") changed the schedule callbacks for workqueue and moved the schedule callback from the wakeup code to at end of schedule() in the worker's process context. It means that the callback wq_worker_running() is guaranteed that it sees the %WORKER_UNBOUND flag after scheduled since unbind_workers() is running on the same CPU that all the pool's workers bound to. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:20:24 -10:00
Lai Jiangshan	11b45b0bf4	workqueue: Remove outdated comment about exceptional workers in unbind_workers() Long time before, workers are not ALL bound after CPU_ONLINE, they can still be running in other CPUs before self rebinding. But the commit `a9ab775bca` ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") makes rebind_workers() bind them all. So all workers are on the CPU before the CPU is down. And the comment in unbind_workers() refers to the workers "which are still executing works from before the last CPU down" is outdated. Just removed it. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:16:08 -10:00
Lai Jiangshan	3e5f39ea33	workqueue: Remove the advanced kicking of the idle workers in rebind_workers() The commit `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") changed the schedule callbacks for workqueue and removed the local-wake-up functionality. Now the wakingup of workers is done by normal fashion and workers not yet migrated to the specific CPU in concurrency managed pool can also be woken up by workers that already bound to the specific cpu now. So this advanced kicking of the idle workers to migrate them to the associated CPU is unneeded now. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:15:41 -10:00
Lai Jiangshan	ccf45156fd	workqueue: Remove the outdated comment before wq_worker_sleeping() It isn't called with preempt disabled now. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-09 12:15:15 -10:00
Frederic Weisbecker	45c753f5f2	workqueue: Fix unbind_workers() VS wq_worker_sleeping() race At CPU-hotplug time, unbind_workers() may preempt a worker while it is going to sleep. In that case the following scenario can happen: unbind_workers() wq_worker_sleeping() -------------- ------------------- if (worker->flags & WORKER_NOT_RUNNING) return; //PREEMPTED by unbind_workers worker->flags \|= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_dec_and_test(&pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of -1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == -1", all sorts of hazards can happen, starting with queued works being ignored because no workers are awaken at insert_work() time. Fix this with checking again the worker flags while pool->lock is locked. Fixes: `b945efcdd0` ("sched: Remove pointless preemption disable in sched_submit_work()") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-02 13:00:59 -10:00
Frederic Weisbecker	07edfece8b	workqueue: Fix unbind_workers() VS wq_worker_running() race At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags \|= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-02 12:59:58 -10:00
Paul E. McKenney	443378f066	workqueue: Upgrade queue_work_on() comment The current queue_work_on() docbook comment says that the caller must ensure that the specified CPU can't go away, but does not spell out the consequences, which turn out to be quite mild. Therefore expand this comment to explicitly say that the penalty for failing to nail down the specified CPU is that the workqueue handler might find itself executing on some other CPU. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-12-01 06:47:45 -10:00
Linus Torvalds	512b7931ad	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...	2021-11-06 14:08:17 -07:00
Marco Elver	f70da745be	workqueue, kasan: avoid alloc_pages() when recording stack Shuah Khan reported: \| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled, \| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when \| it tries to allocate memory attempting to acquire spinlock in page \| allocation code while holding workqueue pool raw_spinlock. \| \| There are several instances of this problem when block layer tries \| to __queue_work(). Call trace from one of these instances is below: \| \| kblockd_mod_delayed_work_on() \| mod_delayed_work_on() \| __queue_delayed_work() \| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held) \| insert_work() \| kasan_record_aux_stack() \| kasan_save_stack() \| stack_depot_save() \| alloc_pages() \| __alloc_pages() \| get_page_from_freelist() \| rm_queue() \| rm_queue_pcplist() \| local_lock_irqsave(&pagesets.lock, flags); \| [ BUG: Invalid wait context triggered ] The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however, it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain non-preemptive contexts, including raw_spin_locks (see gfp.h and commmit `ab00db216c`). Fix it by instructing stackdepot to not expand stack storage via alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc(). While there is an increased risk of failing to insert the stack trace, this is typically unlikely, especially if the same insertion had already succeeded previously (stack depot hit). For frequent calls from the same location, it therefore becomes extremely unlikely that kasan_record_aux_stack_noalloc() fails. Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reported-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Taras Madan <tarasmadan@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-06 13:30:33 -07:00
Imran Khan	55df0933be	workqueue: Introduce show_one_worker_pool and show_one_workqueue. Currently show_workqueue_state shows the state of all workqueues and of all worker pools. In certain cases we may need to dump state of only a specific workqueue or worker pool. For example in destroy_workqueue we only need to show state of the workqueue which is getting destroyed. So rename show_workqueue_state to show_all_workqueues(to signify it dumps state of all busy workqueues) and divide it into more granular functions (show_one_workqueue and show_one_worker_pool), that would show states of individual workqueues and worker pools and can be used in cases such as the one mentioned above. Also, as mentioned earlier, make destroy_workqueue dump data pertaining to only the workqueue that is being destroyed and make user(s) of earlier interface(show_workqueue_state), use new interface (show_all_workqueues). Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-10-20 06:19:03 -10:00
Menglong Dong	d25302e465	workqueue: make sysfs of unbound kworker cpumask more clever Some unfriendly component, such as dpdk, write the same mask to unbound kworker cpumask again and again. Every time it write to this interface some work is queue to cpu, even though the mask is same with the original mask. So, fix it by return success and do nothing if the cpumask is equal with the old one. Signed-off-by: Mengen Sun <mengensun@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-10-19 08:38:31 -10:00
Johan Hovold	57116ce17b	workqueue: fix state-dump console deadlock Console drivers often queue work while holding locks also taken in their console write paths, something which can lead to deadlocks on SMP when dumping workqueue state (e.g. sysrq-t or on suspend failures). For serial console drivers this could look like: CPU0 CPU1 ---- ---- show_workqueue_state(); lock(&pool->lock); <IRQ> lock(&port->lock); schedule_work(); lock(&pool->lock); printk(); lock(console_owner); lock(&port->lock); where workqueues are, for example, used to push data to the line discipline, process break signals and handle modem-status changes. Line disciplines and serdev drivers can also queue work on write-wakeup notifications, etc. Reworking every console driver to avoid queuing work while holding locks also taken in their write paths would complicate drivers and is neither desirable or feasible. Instead use the deferred-printk mechanism to avoid printing while holding pool locks when dumping workqueue state. Note that there are a few WARN_ON() assertions in the workqueue code which could potentially also trigger a deadlock. Hopefully the ongoing printk rework will provide a general solution for this eventually. This was originally reported after a lockdep splat when executing sysrq-t with the imx serial driver. Fixes: `3494fc3084` ("workqueue: dump workqueues on sysrq-t") Cc: stable@vger.kernel.org # 4.0 Reported-by: Fabio Estevam <festevam@denx.de> Tested-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-10-11 06:50:28 -10:00
Lai Jiangshan	d812796eb3	workqueue: Assign a color to barrier work items There was no strong reason to or not to flush barrier work items in flush_workqueue(). And we have to make barrier work items not participate in nr_active so we had been using WORK_NO_COLOR for them which also makes them can't be flushed by flush_workqueue(). And the users of flush_workqueue() often do not intend to wait barrier work items issued by flush_work(). That made the choice sound perfect. But barrier work items have reference to internal structure (pool_workqueue) and the worker thread[s] is/are still busy for the workqueue user when the barrrier work items are not done. So it is reasonable to make flush_workqueue() also watch for flush_work() to make it more robust. And a problem[1] reported by Li Zhe shows that we need such robustness. The warning logs are listed below: WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0 *** destroy_workqueue: test_workqueue9 has the following busy pwq pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2 in-flight: 5658:wq_barrier_func Showing busy workqueues and worker pools: *** It shows that even after drain_workqueue() returns, the barrier work item is still in flight and the pwq (and a worker) is still busy on it. The problem is caused by flush_workqueue() not watching flush_work(): Thread A Worker /* normal work item with linked / process_scheduled_works() destroy_workqueue() process_one_work() drain_workqueue() / run normal work item / /-- pwq_dec_nr_in_flight() flush_workqueue() <---/ / the last normal work item is done / sanity_check process_one_work() /-- raw_spin_unlock_irq(&pool->lock) raw_spin_lock_irq(&pool->lock) <-/ / maybe preempt / WARNING* wq_barrier_func() /* maybe preempt by cond_resched() / Thread A can get the pool lock after the Worker unlocks the pool lock before running wq_barrier_func(). And if there is any preemption happen around wq_barrier_func(), destroy_workqueue()'s sanity check is more likely to get the lock and catch it. (Note: preemption is not necessary to cause the bug, the unlocking is enough to possibly trigger the WARNING.) A simple solution might be just executing all linked barrier work items once without releasing pool lock after the head work item's pwq_dec_nr_in_flight(). But this solution has two problems: 1) the head work item might also be barrier work item when the user-queued work item is cancelled. For example: thread 1: thread 2: queue_work(wq, &my_work) flush_work(&my_work) cancel_work_sync(&my_work); / Neiter my_work nor the barrier work is scheduled. / destroy_workqueue(wq); / This is an easier way to catch the WARNING. */ 2) there might be too much linked barrier work items and running them all once without releasing pool lock just causes trouble. The only solution is to make flush_workqueue() aslo watch barrier work items. So we have to assign a color to these barrier work items which is the color of the head (user-queued) work item. Assigning a color doesn't cause any problem in ative management, because the prvious patch made barrier work items not participate in nr_active via WORK_STRUCT_INACTIVE rather than reliance on the (old) WORK_NO_COLOR. [1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/ Reported-by: Li Zhe <lizhe.67@bytedance.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-17 07:49:10 -10:00
Lai Jiangshan	018f3a13dd	workqueue: Mark barrier work with WORK_STRUCT_INACTIVE Currently, WORK_NO_COLOR has two meanings: Not participate in flushing Not participate in nr_active And only non-barrier work items are marked with WORK_STRUCT_INACTIVE when they are in inactive_works list. The barrier work items are not marked INACTIVE even linked in inactive_works list since these tail items are always moved together with the head work item. These definitions are simple, clean and practical. (Except a small blemish that only the first meaning of WORK_NO_COLOR is documented in include/linux/workqueue.h while both meanings are in workqueue.c) But dual-purpose WORK_NO_COLOR used for barrier work items has proven to be problematical[1]. Only the second purpose is obligatory. So we plan to make barrier work items participate in flushing but keep them still not participating in nr_active. So the plan is to mark barrier work items inactive without using WORK_NO_COLOR in this patch so that we can assign a flushing color to them in next patch. The reasonable way is to add or reuse a bit in work data of the work item. But adding a bit will double the size of pool_workqueue. Currently, WORK_STRUCT_INACTIVE is only used in try_to_grab_pending() for user-queued work items and try_to_grab_pending() can't work for barrier work items. So we extend WORK_STRUCT_INACTIVE to also mark barrier work items no matter which list they are in because we don't need to determind which list a barrier work item is in. So the meaning of WORK_STRUCT_INACTIVE becomes just "the work items don't participate in nr_active" (no matter whether it is a barrier work item or a user-queued work item). And WORK_STRUCT_INACTIVE for user-queued work items means they are in inactive_works list. This patch does it by setting WORK_STRUCT_INACTIVE for barrier work items in insert_wq_barrier() and checking WORK_STRUCT_INACTIVE first in pwq_dec_nr_in_flight(). And the meaning of WORK_NO_COLOR is reduced to only "not participating in flushing". There is no functionality change intended in this patch. Because WORK_NO_COLOR+WORK_STRUCT_INACTIVE represents the previous WORK_NO_COLOR in meaning and try_to_grab_pending() doesn't use for barrier work items and avoids being confused by this extended WORK_STRUCT_INACTIVE. A bunch of comment for nr_active & WORK_STRUCT_INACTIVE is also added for documenting how WORK_STRUCT_INACTIVE works in nr_active management. [1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/ Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-17 07:49:10 -10:00
Lai Jiangshan	d21cece0db	workqueue: Change the code of calculating work_flags in insert_wq_barrier() Add a local var @work_flags to calculate work_flags step by step, so that we don't need to squeeze several flags in only the last line of code. Parepare for next patch to add a bit to barrier work item's flag. Not squshing this to next patch makes it clear that what it will have changed. No functional change intended. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-17 07:49:10 -10:00
Lai Jiangshan	c4560c2c88	workqueue: Change arguement of pwq_dec_nr_in_flight() Make pwq_dec_nr_in_flight() use work_data rather just work_color. Prepare for later patch to get WORK_STRUCT_INACTIVE bit from work_data in pwq_dec_nr_in_flight(). No functional change intended. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-17 07:49:09 -10:00
Lai Jiangshan	f97a4a1a3f	workqueue: Rename "delayed" (delayed by active management) to "inactive" There are two kinds of "delayed" work items in workqueue subsystem. One is for timer-delayed work items which are visible to workqueue users. The other kind is for work items delayed by active management which can not be directly visible to workqueue users. We mixed the word "delayed" for both kinds and caused somewhat ambiguity. This patch renames the later one (delayed by active management) to "inactive", because it is used for workqueue active management and most of its related symbols are named with "active" or "activate". All "delayed" and "DELAYED" are carefully checked and renamed one by one to avoid accidentally changing the name of the other kind for timer-delayed. No functional change intended. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-17 07:49:09 -10:00
Sebastian Andrzej Siewior	ffd8bea81f	workqueue: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-09 12:33:30 -10:00
Zhen Lei	e441b56fe4	workqueue: Replace deprecated ida_simple_*() with ida_alloc()/ida_free() Replace ida_simple_get() with ida_alloc() and ida_simple_remove() with ida_free(), the latter is more concise and intuitive. In addition, if ida_alloc() fails, NULL is returned directly. This eliminates unnecessary initialization of two local variables and an 'if' judgment. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-09 12:32:38 -10:00
Cai Huoqing	67dc832537	workqueue: Fix typo in comments Fix typo: assing ==> assign alloced ==> allocated Retun ==> Return excute ==> execute v1->v2: reverse 'iff' update changelog Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-09 12:31:03 -10:00
Zhen Lei	f728c4a9e8	workqueue: Fix possible memory leaks in wq_numa_init() In error handling branch "if (WARN_ON(node == NUMA_NO_NODE))", the previously allocated memories are not released. Doing this before allocating memory eliminates memory leaks. tj: Note that the condition only occurs when the arch code is pretty broken and the WARN_ON might as well be BUG_ON(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-07-29 07:16:00 -10:00
Yang Yingliang	b42b0bddcb	workqueue: fix UAF in pwq_unbound_release_workfn() I got a UAF report when doing fuzz test: [ 152.880091][ T8030] ================================================================== [ 152.881240][ T8030] BUG: KASAN: use-after-free in pwq_unbound_release_workfn+0x50/0x190 [ 152.882442][ T8030] Read of size 4 at addr ffff88810d31bd00 by task kworker/3:2/8030 [ 152.883578][ T8030] [ 152.883932][ T8030] CPU: 3 PID: 8030 Comm: kworker/3:2 Not tainted 5.13.0+ #249 [ 152.885014][ T8030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 152.886442][ T8030] Workqueue: events pwq_unbound_release_workfn [ 152.887358][ T8030] Call Trace: [ 152.887837][ T8030] dump_stack_lvl+0x75/0x9b [ 152.888525][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.889371][ T8030] print_address_description.constprop.10+0x48/0x70 [ 152.890326][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.891163][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.891999][ T8030] kasan_report.cold.15+0x82/0xdb [ 152.892740][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.893594][ T8030] __asan_load4+0x69/0x90 [ 152.894243][ T8030] pwq_unbound_release_workfn+0x50/0x190 [ 152.895057][ T8030] process_one_work+0x47b/0x890 [ 152.895778][ T8030] worker_thread+0x5c/0x790 [ 152.896439][ T8030] ? process_one_work+0x890/0x890 [ 152.897163][ T8030] kthread+0x223/0x250 [ 152.897747][ T8030] ? set_kthread_struct+0xb0/0xb0 [ 152.898471][ T8030] ret_from_fork+0x1f/0x30 [ 152.899114][ T8030] [ 152.899446][ T8030] Allocated by task 8884: [ 152.900084][ T8030] kasan_save_stack+0x21/0x50 [ 152.900769][ T8030] __kasan_kmalloc+0x88/0xb0 [ 152.901416][ T8030] __kmalloc+0x29c/0x460 [ 152.902014][ T8030] alloc_workqueue+0x111/0x8e0 [ 152.902690][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0 [ 152.903459][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0 [ 152.904198][ T8030] scrub_workers_get+0x1e8/0x490 [ 152.904929][ T8030] btrfs_scrub_dev+0x1b9/0x9c0 [ 152.905599][ T8030] btrfs_ioctl+0x122c/0x4e50 [ 152.906247][ T8030] __x64_sys_ioctl+0x137/0x190 [ 152.906916][ T8030] do_syscall_64+0x34/0xb0 [ 152.907535][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 152.908365][ T8030] [ 152.908688][ T8030] Freed by task 8884: [ 152.909243][ T8030] kasan_save_stack+0x21/0x50 [ 152.909893][ T8030] kasan_set_track+0x20/0x30 [ 152.910541][ T8030] kasan_set_free_info+0x24/0x40 [ 152.911265][ T8030] __kasan_slab_free+0xf7/0x140 [ 152.911964][ T8030] kfree+0x9e/0x3d0 [ 152.912501][ T8030] alloc_workqueue+0x7d7/0x8e0 [ 152.913182][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0 [ 152.913949][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0 [ 152.914703][ T8030] scrub_workers_get+0x1e8/0x490 [ 152.915402][ T8030] btrfs_scrub_dev+0x1b9/0x9c0 [ 152.916077][ T8030] btrfs_ioctl+0x122c/0x4e50 [ 152.916729][ T8030] __x64_sys_ioctl+0x137/0x190 [ 152.917414][ T8030] do_syscall_64+0x34/0xb0 [ 152.918034][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 152.918872][ T8030] [ 152.919203][ T8030] The buggy address belongs to the object at ffff88810d31bc00 [ 152.919203][ T8030] which belongs to the cache kmalloc-512 of size 512 [ 152.921155][ T8030] The buggy address is located 256 bytes inside of [ 152.921155][ T8030] 512-byte region [ffff88810d31bc00, ffff88810d31be00) [ 152.922993][ T8030] The buggy address belongs to the page: [ 152.923800][ T8030] page:ffffea000434c600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d318 [ 152.925249][ T8030] head:ffffea000434c600 order:2 compound_mapcount:0 compound_pincount:0 [ 152.926399][ T8030] flags: 0x57ff00000010200(slab\|head\|node=1\|zone=2\|lastcpupid=0x7ff) [ 152.927515][ T8030] raw: 057ff00000010200 dead000000000100 dead000000000122 ffff888009c42c80 [ 152.928716][ T8030] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 152.929890][ T8030] page dumped because: kasan: bad access detected [ 152.930759][ T8030] [ 152.931076][ T8030] Memory state around the buggy address: [ 152.931851][ T8030] ffff88810d31bc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.932967][ T8030] ffff88810d31bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.934068][ T8030] >ffff88810d31bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.935189][ T8030] ^ [ 152.935763][ T8030] ffff88810d31bd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.936847][ T8030] ffff88810d31be00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 152.937940][ T8030] ================================================================== If apply_wqattrs_prepare() fails in alloc_workqueue(), it will call put_pwq() which invoke a work queue to call pwq_unbound_release_workfn() and use the 'wq'. The 'wq' allocated in alloc_workqueue() will be freed in error path when apply_wqattrs_prepare() fails. So it will lead a UAF. CPU0 CPU1 alloc_workqueue() alloc_and_link_pwqs() apply_wqattrs_prepare() fails apply_wqattrs_cleanup() schedule_work(&pwq->unbound_release_work) kfree(wq) worker_thread() pwq_unbound_release_workfn() <- trigger uaf here If apply_wqattrs_prepare() fails, the new pwq are not linked, it doesn't hold any reference to the 'wq', 'wq' is invalid to access in the worker, so add check pwq if linked to fix this. Fixes: `2d5f0764b5` ("workqueue: split apply_workqueue_attrs() into 3 stages") Cc: stable@vger.kernel.org # v4.2+ Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-07-21 06:42:31 -10:00
Sergey Senozhatsky	940d71c646	wq: handle VM suspension in stall detection If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-05-20 12:58:30 -04:00
Linus Torvalds	57fa2369ab	CFI on arm64 series for v5.13-rc1 - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCHCR8ACgkQiXL039xt wCZyFQ//fnUZaXR2K354zDyW6CJljMf+d94RF6rH+J6eMTH2/HXa5v0iJokwABLf ussP6qF4k5wtmI22Gm9A5Zc3e4iiry5pC0jOdk0mk4gzWwFN9MdgNxJZIGA3xqhS bsBK4AGrVKjtZl48G1/ZxJuNDeJhVp6GNK2n6/Gl4rZF6R7D/Upz0XelyJRdDpcM HIGma7jZl6xfGU0mdWCzpOGK1zdMca1WVs7A4YuurSbLn5PZJrcNVWLouDqt/Si2 AduSri1gyPClicgvqWjMOzhUpuw/nJtBLRl1x1EsWk/KSZ1/uNVjlewfzdN4fZrr zbtFr2gLubYLK6JOX7/LqoHlOTgE3tYLL+WIVN75DsOGZBKgHhmebTmWLyqzV0SL oqcyM5d3ucC6msdtAK5Fv4MSp8rpjqlK1Ha4SGRT6kC2wut7AhZ3KD7eyRIz8mV9 Sa9mhignGFJnTEUp+LSbYdrAudgSKxB40WyXPmswAXX4VJFRD4ONrrcAON/SzkUT Hw/JdFRCKkJjgwNQjIQoZcUNMTbFz2PlNIEnjJWm38YImQKQlCb2mXaZKCwBkf45 aheCZk17eKoxTCXFMd+KxlyNEtS2yBfq/PpZgvw7GW/pfFbWUg1+2O41LnihIe5v zu0hN1wNCQqgfxiMZqX1OTb9C/2vybzGsXILt+9nppjZ8EBU7iU= =wU6U -----END PGP SIGNATURE----- Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI	2021-04-27 10:16:46 -07:00
Sami Tolvanen	981731129e	workqueue: use WARN_ON_FUNCTION_MISMATCH With CONFIG_CFI_CLANG, a callback function passed to __queue_delayed_work from a module points to a jump table entry defined in the module instead of the one used in the core kernel, which breaks function address equality in this check: WARN_ON_ONCE(timer->function != delayed_work_timer_fn); Use WARN_ON_FUNCTION_MISMATCH() instead to disable the warning when CFI and modules are both enabled. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-6-samitolvanen@google.com	2021-04-08 16:04:21 -07:00
Wang Qing	89e28ce60c	workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() 84;0;0c84;0;0c There are two workqueue-specific watchdog timestamps: + @wq_watchdog_touched_cpu (per-CPU) updated by touch_softlockup_watchdog() + @wq_watchdog_touched (global) updated by touch_all_softlockup_watchdogs() watchdog_timer_fn() checks only the global @wq_watchdog_touched for unbound workqueues. As a result, unbound workqueues are not aware of touch_softlockup_watchdog(). The watchdog might report a stall even when the unbound workqueues are blocked by a known slow code. Solution: touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched timestamp. The global timestamp can no longer be used for bound workqueues because it is now updated from all CPUs. Instead, bound workqueues have to check only @wq_watchdog_touched_cpu and these timestamps have to be updated for all CPUs in touch_all_softlockup_watchdogs(). Beware: The change might cause the opposite problem. An unbound workqueue might get blocked on CPU A because of a real softlockup. The workqueue watchdog would miss it when the timestamp got touched on CPU B. It is acceptable because softlockups are detected by softlockup watchdog. The workqueue watchdog is there to detect stalls where a work never finishes, for example, because of dependencies of works queued into the same workqueue. V3: - Modify the commit message clearly according to Petr's suggestion. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-04-04 13:26:49 -04:00
Zqiang	0687c66b5f	workqueue: Move the position of debug_work_activate() in __queue_work() The debug_work_activate() is called on the premise that the work can be inserted, because if wq be in WQ_DRAINING status, insert work may be failed. Fixes: `e41e704bc4` ("workqueue: improve destroy_workqueue() debuggability") Signed-off-by: Zqiang <qiang.zhang@windriver.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-04-04 13:26:46 -04:00
Linus Torvalds	ac9e806c9c	Merge branch 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull qorkqueue updates from Tejun Heo: "Tracepoint and comment updates only" * 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use %s instead of function name workqueue: tracing the name of the workqueue instead of it's address workqueue: fix annotation for WQ_SYSFS	2021-02-22 17:06:54 -08:00
Stephen Zhang	e9ad2eb3d9	workqueue: Use %s instead of function name It is better to replace the function name with %s, in case the function name changes. Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-01-27 09:42:48 -05:00
Peter Zijlstra	640f17c824	workqueue: Restrict affinity change to rescuer create_worker() will already set the right affinity using kthread_bind_mask(), this means only the rescuer will need to change it's affinity. Howveer, while in cpu-hot-unplug a regular task is not allowed to run on online&&!active as it would be pushed away quite agressively. We need KTHREAD_IS_PER_CPU to survive in that environment. Therefore set the affinity after getting that magic flag. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org	2021-01-22 15:09:43 +01:00
Peter Zijlstra	5c25b5ff89	workqueue: Tag bound workers with KTHREAD_IS_PER_CPU Mark the per-cpu workqueue workers as KTHREAD_IS_PER_CPU. Workqueues have unfortunate semantics in that per-cpu workers are not default flushed and parked during hotplug, however a subset does manual flush on hotplug and hard relies on them for correctness. Therefore play silly games.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.693465814@infradead.org	2021-01-22 15:09:42 +01:00
Lai Jiangshan	547a77d02f	workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity The scheduler won't break affinity for us any more, and we should "emulate" the same behavior when the scheduler breaks affinity for us. The behavior is "changing the cpumask to cpu_possible_mask". And there might be some other CPUs online later while the worker is still running with the pending work items. The worker should be allowed to use the later online CPUs as before and process the work items ASAP. If we use cpu_active_mask here, we can't achieve this goal but using cpu_possible_mask can. Fixes: `06249738a4` ("workqueue: Manually break affinity on hotplug") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Tejun Heo <tj@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com	2021-01-22 15:09:41 +01:00
Linus Torvalds	c76e02c59e	Merge branch 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue update from Tejun Heo: "The same as the cgroup tree - one commit which was scheduled for the 5.11 merge window. All the commit does is avoding spurious worker wakeups from workqueue allocation / config change path to help cpuisol use cases" * 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Kick a worker based on the actual activation of delayed works	2020-12-28 11:23:02 -08:00
Linus Torvalds	ac73e3dc8a	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...	2020-12-15 12:53:37 -08:00
Walter Wu	e89a85d63f	workqueue: kasan: record workqueue stack Patch series "kasan: add workqueue stack for generic KASAN", v5. Syzbot reports many UAF issues for workqueue, see [1]. In some of these access/allocation happened in process_one_work(), we see the free stack is useless in KASAN report, it doesn't help programmers to solve UAF for workqueue issue. This patchset improves KASAN reports by making them to have workqueue queueing stack. It is useful for programmers to solve use-after-free or double-free memory issue. Generic KASAN also records the last two workqueue stacks and prints them in KASAN report. It is only suitable for generic KASAN. [1] https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work [2] https://bugzilla.kernel.org/show_bug.cgi?id=198437 This patch (of 4): When analyzing use-after-free or double-free issue, recording the enqueuing work stacks is helpful to preserve usage history which potentially gives a hint about the affected code. For workqueue it has turned out to be useful to record the enqueuing work call stacks. Because user can see KASAN report to determine whether it is root cause. They don't need to enable debugobjects, but they have a chance to find out the root cause. Link: https://lkml.kernel.org/r/20201203022148.29754-1-walter-zh.wu@mediatek.com Link: https://lkml.kernel.org/r/20201203022442.30006-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Marco Elver <elver@google.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-15 12:13:42 -08:00
Yunfeng Ye	01341fbd0d	workqueue: Kick a worker based on the actual activation of delayed works In realtime scenario, We do not want to have interference on the isolated cpu cores. but when invoking alloc_workqueue() for percpu wq on the housekeeping cpu, it kick a kworker on the isolated cpu. alloc_workqueue pwq_adjust_max_active wake_up_worker The comment in pwq_adjust_max_active() said: "Need to kick a worker after thawed or an unbound wq's max_active is bumped" So it is unnecessary to kick a kworker for percpu's wq when invoking alloc_workqueue(). this patch only kick a worker based on the actual activation of delayed works. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-11-25 17:10:28 -05:00
Peter Zijlstra	06249738a4	workqueue: Manually break affinity on hotplug Don't rely on the scheduler to force break affinity for us -- it will stop doing that for per-cpu-kthreads. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.464718669@infradead.org	2020-11-10 18:38:58 +01:00
Mauro Carvalho Chehab	3eb6b31bfb	workqueue: fix a kernel-doc warning As warned by Sphinx: ./Documentation/core-api/workqueue:400: ./kernel/workqueue.c:1218: WARNING: Unexpected indentation. the return code table is currently not recognized, as it lacks markups. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2020-10-16 07:28:20 +02:00
Stephen Boyd	f9e62f318f	treewide: Make all debug_obj_descriptors const This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by moving the structure to read-only memory. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org	2020-09-24 21:56:25 +02:00
Christoph Hellwig	fe557319aa	maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-17 10:57:41 -07:00
Lai Jiangshan	10cdb15759	workqueue: use BUILD_BUG_ON() for compile time test instead of WARN_ON() Any runtime WARN_ON() has to be fixed, and BUILD_BUG_ON() can help you nitice it earlier. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-06-01 11:02:42 -04:00
Lai Jiangshan	b8f06b0444	workqueue: remove useless unlock() and lock() in series This is no point to unlock() and then lock() the same mutex back to back. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:25:23 -04:00
Lai Jiangshan	4f3f4cf388	workqueue: void unneeded requeuing the pwq in rescuer thread `008847f66c` ("workqueue: allow rescuer thread to do more work.") made the rescuer worker requeue the pwq immediately if there may be more work items which need rescuing instead of waiting for the next mayday timer expiration. Unfortunately, it checks only whether the pool needs help from rescuers, but it doesn't check whether the pwq has work items in the pool (the real reason that this rescuer can help for the pool). The patch adds the check and void unneeded requeuing. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:22:10 -04:00
Sebastian Andrzej Siewior	a9b8a98529	workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t The workqueue code has it's internal spinlocks (pool::lock), which are acquired on most workqueue operations. These spinlocks are converted to 'sleeping' spinlocks on a RT-kernel. Workqueue functions can be invoked from contexts which are truly atomic even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such contexts is forbidden. The pool::lock hold times are bound and the code sections are relatively short, which allows to convert pool::lock and as a consequence wq_mayday_lock to raw spinlocks which are truly spinning locks even on a PREEMPT_RT kernel. With the previous conversion of the manager waitqueue to a simple waitqueue workqueues are now fully RT compliant. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:03:47 -04:00
Sebastian Andrzej Siewior	d8bb65ab70	workqueue: Use rcuwait for wq_manager_wait The workqueue code has it's internal spinlock (pool::lock) and also implicit spinlock usage in the wq_manager waitqueue. These spinlocks are converted to 'sleeping' spinlocks on a RT-kernel. Workqueue functions can be invoked from contexts which are truly atomic even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such contexts is forbidden. pool::lock can be converted to a raw spinlock as the lock held times are short. But the workqueue manager waitqueue is handled inside of pool::lock held regions which again violates the lock nesting rules of raw and regular spinlocks. The manager waitqueue has no special requirements like custom wakeup callbacks or mass wakeups. While it does not use exclusive wait mode explicitly there is no strict requirement to queue the waiters in a particular order as there is only one waiter at a time. This allows to replace the waitqueue with rcuwait which solves the locking problem because rcuwait relies on existing locking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-29 10:00:35 -04:00
Zhang Qiang	342ed2400b	workqueue: Remove unnecessary kfree() call in rcu_free_wq() The data structure member "wq->rescuer" was reset to a null pointer in one if branch. It was passed to a call of the function "kfree" in the callback function "rcu_free_wq" (which was eventually executed). The function "kfree" does not perform more meaningful data processing for a passed null pointer (besides immediately returning from such a call). Thus delete this function call which became unnecessary with the referenced software update. Fixes: `def98c84b6` ("workqueue: Fix spurious sanity check failures in destroy_workqueue()") Suggested-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Zhang Qiang <qiang.zhang@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-27 09:52:41 -04:00
Dan Carpenter	b92b36eadf	workqueue: Fix an use after free in init_rescuer() We need to preserve error code before freeing "rescuer". Fixes: `f187b6974f` ("workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-11 10:25:42 -04:00
Sean Fu	f187b6974f	workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO. Replace inline function PTR_ERR_OR_ZERO with IS_ERR and PTR_ERR to remove redundant parameter definitions and checks. Reduce code size. Before: text data bss dec hex filename 47510 5979 840 54329 d439 kernel/workqueue.o After: text data bss dec hex filename 47474 5979 840 54293 d415 kernel/workqueue.o Signed-off-by: Sean Fu <fxinrong@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-05-05 11:56:07 -04:00
Sebastian Andrzej Siewior	62849a9612	workqueue: Remove the warning in wq_worker_sleeping() The kernel test robot triggered a warning with the following race: task-ctx A interrupt-ctx B worker -> process_one_work() -> work_item() -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> ->sleeping = 1 atomic_dec_and_test(nr_running) __schedule(); interrupt async_page_fault() -> local_irq_enable(); -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> if (WARN_ON(->sleeping)) return -> __schedule() -> sched_update_worker() -> wq_worker_running() -> atomic_inc(nr_running); -> ->sleeping = 0; -> sched_update_worker() -> wq_worker_running() if (!->sleeping) return In this context the warning is pointless everything is fine. An interrupt before wq_worker_sleeping() will perform the ->sleeping assignment (0 -> 1 > 0) twice. An interrupt after wq_worker_sleeping() will trigger the warning and nr_running will be decremented (by A) and incremented once (only by B, A will skip it). This is the case until the ->sleeping is zeroed again in wq_worker_running(). Remove the WARN statement because this condition may happen. Document that preemption around wq_worker_sleeping() needs to be disabled to protect ->sleeping and not just as an optimisation. Fixes: `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian	2020-04-08 11:35:20 +02:00
Linus Torvalds	0adb8bc039	Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Nothing too interesting. Just two trivial patches" * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Mark up unlocked access to wq->first_flusher workqueue: Make workqueue_init*() return void	2020-04-03 12:27:36 -07:00
Chris Wilson	00d5d15b06	workqueue: Mark up unlocked access to wq->first_flusher [ 7329.671518] BUG: KCSAN: data-race in flush_workqueue / flush_workqueue [ 7329.671549] [ 7329.671572] write to 0xffff8881f65fb250 of 8 bytes by task 37173 on cpu 2: [ 7329.671607] flush_workqueue+0x3bc/0x9b0 (kernel/workqueue.c:2844) [ 7329.672527] [ 7329.672540] read to 0xffff8881f65fb250 of 8 bytes by task 37175 on cpu 0: [ 7329.672571] flush_workqueue+0x28d/0x9b0 (kernel/workqueue.c:2835) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-12 14:26:50 -04:00
Hillf Danton	aa202f1f56	workqueue: don't use wq_select_unbound_cpu() for bound works wq_select_unbound_cpu() is designed for unbound workqueues only, but it's wrongly called when using a bound workqueue too. Fixing this ensures work queued to a bound workqueue with cpu=WORK_CPU_UNBOUND always runs on the local CPU. Before, that would happen only if wq_unbound_cpumask happened to include it (likely almost always the case), or was empty, or we got lucky with forced round-robin placement. So restricting /sys/devices/virtual/workqueue/cpumask to a small subset of a machine's CPUs would cause some bound work items to run unexpectedly there. Fixes: `ef55718044` ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Hillf Danton <hdanton@sina.com> [dj: massage changelog] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-10 10:30:51 -04:00
Yu Chen	2333e82995	workqueue: Make workqueue_init*() return void The return values of workqueue_init() and workqueue_early_int() are always 0, and there is no usage of their return value. So just make them return void. Signed-off-by: Yu Chen <chen.yu@easystack.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-04 11:21:49 -05:00
Linus Torvalds	c677124e63	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io\|memory\|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...	2020-01-28 10:07:09 -08:00
Daniel Jordan	1c5da0ec7f	workqueue: add worker function to workqueue_execute_end tracepoint It's surprising that workqueue_execute_end includes only the work when its counterpart workqueue_execute_start has both the work and the worker function. You can't set a tracing filter or trigger based on the function, and postprocessing scripts interested in specific functions are harder to write since they have to remember the work from _start and match it up with the same field in _end. Add the function name, taking care to use the copy stashed in the worker since the work is no longer safe to touch. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2020-01-15 08:02:47 -08:00
Ingo Molnar	1e5f8a3085	Linux 5.5-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf 33WGSyI= =iEwx -----END PGP SIGNATURE----- Merge tag 'v5.5-rc3' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-12-25 10:41:37 +01:00
Sebastian Andrzej Siewior	025f50f386	sched/rt, workqueue: Use PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Update the comment to use PREEMPTION because it is true for both preemption models. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20191015191821.11479-35-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-12-08 14:37:37 +01:00
Kefeng Wang	1d9a6159bd	workqueue: Use pr_warn instead of pr_warning Use pr_warn() instead of the remaining pr_warning() calls. Link: http://lkml.kernel.org/r/20191128004752.35268-2-wangkefeng.wang@huawei.com To: joe@perches.com To: linux-kernel@vger.kernel.org Cc: gregkh@linuxfoundation.org Cc: tj@kernel.org Cc: arnd@arndb.de Cc: sergey.senozhatsky@gmail.com Cc: rostedt@goodmis.org Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-12-06 09:59:30 +01:00
Linus Torvalds	1ae78780ed	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...	2019-11-26 15:42:43 -08:00
Sebastian Andrzej Siewior	49e9d1a9fa	workqueue: Add RCU annotation for pwq list walk An additional check has been recently added to ensure that a RCU related lock is held while the RCU list is iterated. The `pwqs' are sometimes iterated without a RCU lock but with the &wq->mutex acquired leading to a warning. Teach list_for_each_entry_rcu() that the RCU usage is okay if &wq->mutex is acquired during the list traversal. Fixes: `28875945ba` ("rcu: Add support for consolidated-RCU reader checking") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-11-15 11:53:35 -08:00
Joel Fernandes (Google)	5a6446626d	workqueue: Convert for_each_wq to use built-in list check Because list_for_each_entry_rcu() can now check for holding a lock as well as for being in an RCU read-side critical section, this commit replaces the workqueue_sysfs_unregister() function's use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with list_for_each_entry_rcu() augmented with a lockdep_is_held() optional argument. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-30 08:34:10 -07:00
Tejun Heo	e66b39af00	workqueue: Fix pwq ref leak in rescuer_thread() `008847f66c` ("workqueue: allow rescuer thread to do more work.") made the rescuer worker requeue the pwq immediately if there may be more work items which need rescuing instead of waiting for the next mayday timer expiration. Unfortunately, it doesn't check whether the pwq is already on the mayday list and unconditionally gets the ref and moves it onto the list. This doesn't corrupt the list but creates an additional reference to the pwq. It got queued twice but will only be removed once. This leak later can trigger pwq refcnt warning on workqueue destruction and prevent freeing of the workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # v3.19+	2019-10-04 10:23:11 -07:00
Tejun Heo	c29eb85386	workqueue: more destroy_workqueue() fixes destroy_workqueue() warnings still, at a lower frequency, trigger spuriously. The problem seems to be in-flight operations which haven't reached put_pwq() yet. * Make sanity check grab all the related locks so that it's synchronized against operations which puts pwq at the end. * Always print out the offending pwq. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>	2019-10-04 10:23:01 -07:00
Tejun Heo	30ae2fc0a7	workqueue: Minor follow-ups to the rescuer destruction change * Now that wq->rescuer may be cleared while rescuer is still there, switch show_pwq() debug printout to test worker->rescue_wq to identify rescuers intead of testing wq->rescuer. * Update comment on ->rescuer locking. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>	2019-09-20 14:09:14 -07:00
Tejun Heo	8efe1223d7	workqueue: Fix missing kfree(rescuer) in destroy_workqueue() Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Qian Cai <cai@lca.pw> Fixes: `def98c84b6` ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")	2019-09-20 13:39:57 -07:00
Tejun Heo	def98c84b6	workqueue: Fix spurious sanity check failures in destroy_workqueue() Before actually destrying a workqueue, destroy_workqueue() checks whether it's actually idle. If it isn't, it prints out a bunch of warning messages and leaves the workqueue dangling. It unfortunately has a couple issues. * Mayday list queueing increments pwq's refcnts which gets detected as busy and fails the sanity checks. However, because mayday list queueing is asynchronous, this condition can happen without any actual work items left in the workqueue. * Sanity check failure leaves the sysfs interface behind too which can lead to init failure of newer instances of the workqueue. This patch fixes the above two by * If a workqueue has a rescuer, disable and kill the rescuer before sanity checks. Disabling and killing is guaranteed to flush the existing mayday list. * Remove sysfs interface before sanity checks. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Marcin Pawlowski <mpawlowski@fb.com> Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com> Cc: stable@vger.kernel.org	2019-09-18 18:45:23 -07:00
Daniel Jordan	509b320489	workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs Change the calling convention for apply_workqueue_attrs to require CPU hotplug read exclusion. Avoids lockdep complaints about nested calls to get_online_cpus in a future patch where padata calls apply_workqueue_attrs when changing other CPU-hotplug-sensitive data structures with the CPU read lock already held. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2019-09-13 21:15:40 +10:00
Daniel Jordan	513c98d086	workqueue: unconfine alloc/apply/free_workqueue_attrs() padata will use these these interfaces in a later patch, so unconfine them. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2019-09-13 21:15:39 +10:00
Thomas Gleixner	be69d00d97	workqueue: Remove GPF argument from alloc_workqueue_attrs() All callers use GFP_KERNEL. No point in having that argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-06-27 14:12:19 -07:00
Thomas Gleixner	2c9858ecbe	workqueue: Make alloc/apply/free_workqueue_attrs() static None of those functions have any users outside of workqueue.c. Confine them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-06-27 14:12:15 -07:00
Thomas Gleixner	457c899653	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:45 +02:00
Linus Torvalds	23c970608a	Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Only three commits, of which two are trivial. The non-trivial chagne is Thomas's patch to switch workqueue from sched RCU to regular one. The use of sched RCU is mostly historic and doesn't really buy us anything noticeable" * 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use normal rcu kernel/workqueue: Document wq_worker_last_func() argument kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue'	2019-05-09 13:48:52 -07:00
Linus Torvalds	0968621917	Printk changes for 5.2 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY= =WZfC -----END PGP SIGNATURE----- Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()	2019-05-07 09:18:12 -07:00
Thomas Gleixner	6d25be5782	sched/core, workqueues: Distangle worker accounting from rq lock The worker accounting for CPU bound workers is plugged into the core scheduler code and the wakeup code. This is not a hard requirement and can be avoided by keeping track of the state in the workqueue code itself. Keep track of the sleeping state in the worker itself and call the notifier before entering the core scheduler. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from scheduling and being woken immediately after switching away. When nr_running is updated when the task is retunrning from schedule() then it is later compared when it is done from ttwu(). [ bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de Oliveira ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-16 16:55:15 +02:00
Sakari Ailus	d75f773c86	treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' \| grep -v '^$tools\\|Documentation$/' \| \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-04-09 14:19:06 +02:00
Thomas Gleixner	24acfb7182	workqueue: Use normal rcu There is no need for sched_rcu. The undocumented reason why sched_rcu is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by the fact that sched_rcu reader side critical sections are also protected by preempt or irq disabled regions. Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock where it is not yet explicit acquired. Replace local_irq_disable() with rcu_read_lock(). Update asserts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: mangle changelog a little] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-04-08 12:37:43 -07:00
Bart Van Assche	82efcab3b9	workqueue: Only unregister a registered lockdep key The recent change to prevent use after free and a memory leak introduced an unconditional call to wq_unregister_lockdep() in the error handling path. If the lockdep key had not been registered yet, then the lockdep core emits a warning. Only call wq_unregister_lockdep() if wq_register_lockdep() has been called first. Fixes: `009bb421b6` ("workqueue, lockdep: Fix an alloc_workqueue() error path") Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Qian Cai <cai@lca.pw> Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org	2019-03-21 12:00:18 +01:00
Bart Van Assche	8194fe94ab	kernel/workqueue: Document wq_worker_last_func() argument This patch avoids that the following warning is reported when building with W=1: kernel/workqueue.c:938: warning: Function parameter or member 'task' not described in 'wq_worker_last_func' Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-03-19 10:48:20 -07:00
Mathieu Malaterre	a2775bbc1d	kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue' Silence warnings (triggered at W=1) by adding relevant __printf attributes. kernel/workqueue.c:4249:2: warning: function 'alloc_workqueue' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-03-15 08:47:22 -07:00
Linus Torvalds	9e55f87c0e	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A few fixes for lockdep: - initialize lockdep internal RCU head after initializing RCU - prevent use after free in a alloc_workqueue() error handling path - plug a memory leak in the workqueue core which fails to free a dynamically allocated lock name. - make Clang happy" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: workqueue, lockdep: Fix a memory leak in wq->lock_name workqueue, lockdep: Fix an alloc_workqueue() error path locking/lockdep: Only call init_rcu_head() after RCU has been initialized locking/lockdep: Avoid a Clang warning	2019-03-10 13:48:14 -07:00
Qian Cai	69a106c00e	workqueue, lockdep: Fix a memory leak in wq->lock_name The following commit: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name), but wq_init_lockdep() does not point wq->lock_name to the newly allocated slab object. This can be reproduced by running LTP fallocate04 followed by oom01 tests: unreferenced object 0xc0000005876384d8 (size 64): comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s) hex dump (first 32 bytes): 28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e 78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi backtrace: [<00000000cb452883>] kvasprintf+0x6c/0xe0 [<000000004654ddac>] kasprintf+0x34/0x60 [<000000001c68f311>] alloc_workqueue+0x1f8/0x6ac [<0000000003c2ad83>] ext4_fill_super+0x23d4/0x3c80 [ext4] [<0000000006610538>] mount_bdev+0x25c/0x290 [<00000000bcf955ec>] ext4_mount+0x28/0x50 [ext4] [<0000000016e08fd3>] legacy_get_tree+0x4c/0xb0 [<0000000042b6a5fc>] vfs_get_tree+0x6c/0x190 [<00000000268ab022>] do_mount+0xb9c/0x1100 [<00000000698e6898>] ksys_mount+0x158/0x180 [<0000000064e391fd>] sys_mount+0x20/0x30 [<00000000ba378f12>] system_call+0x5c/0x70 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: catalin.marinas@arm.com Cc: jiangshanlai@gmail.com Cc: tj@kernel.org Fixes: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-03-09 14:15:52 +01:00
Bart Van Assche	009bb421b6	workqueue, lockdep: Fix an alloc_workqueue() error path This patch fixes a use-after-free and a memory leak in an alloc_workqueue() error path. Repoted by syzkaller and KASAN: BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline] BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858 CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:197 [inline] lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 wq_init_lockdep kernel/workqueue.c:3444 [inline] alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511 __do_kmalloc mm/slab.c:3726 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3735 kmalloc include/linux/slab.h:553 [inline] kzalloc include/linux/slab.h:743 [inline] alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3821 alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888090fc2580 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 280 bytes inside of 512-byte region [ffff888090fc2580, ffff888090fc2780) Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: `669de8bda8` ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-03-09 14:15:52 +01:00
Linus Torvalds	b5dd0c658c	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - some of the rest of MM - various misc things - dynamic-debug updates - checkpatch - some epoll speedups - autofs - rapidio - lib/, lib/lzo/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) samples/mic/mpssd/mpssd.h: remove duplicate header kernel/fork.c: remove duplicated include include/linux/relay.h: fix percpu annotation in struct rchan arch/nios2/mm/fault.c: remove duplicate include unicore32: stop printing the virtual memory layout MAINTAINERS: fix GTA02 entry and mark as orphan mm: create the new vm_fault_t type arm, s390, unicore32: remove oneliner wrappers for memblock_alloc() arch: simplify several early memory allocations openrisc: simplify pte_alloc_one_kernel() sh: prefer memblock APIs returning virtual address microblaze: prefer memblock API returning virtual address powerpc: prefer memblock APIs returning virtual address lib/lzo: separate lzo-rle from lzo lib/lzo: implement run-length encoding lib/lzo: fast 8-byte copy on arm64 lib/lzo: 64-bit CTZ on arm64 lib/lzo: tidy-up ifdefs ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size ipc: annotate implicit fall through ...	2019-03-07 19:25:37 -08:00
Johannes Weiner	4b04700275	kernel: workqueue: clarify wq_worker_last_func() caller requirements This function can only be called safely from very specific scheduler contexts. Document those. Link: http://lkml.kernel.org/r/20190206150528.31198-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-07 18:32:01 -08:00
Linus Torvalds	abf7c3d8dd	Merge branch 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "All trivial. Two comment updates and one more initialization sanity check in flush_work()" * 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix spelling in source code comments workqueue: fix typo in comment workqueue: Try to catch flush_work() without INIT_WORK().	2019-03-07 10:09:52 -08:00
Linus Torvalds	e431f2d74e	Driver core patches for 5.1-rc1 Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+euQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynyTgCfbV8CLums843sBnT8NnWrTMTdTCcAn1K4re0m ep8g+6oRLxJy414hogxQ =bLs2 -----END PGP SIGNATURE----- Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...	2019-03-06 14:52:48 -08:00

1 2 3 4 5 ...

890 Commits