Commit Graph

774 Commits

Author SHA1 Message Date
Linus Torvalds 7ab044a4f4 workqueue: Changes for v6.5
* Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and excluded
   from concurrency management. Reporting on such work items can also be
   enabled through a config option.
 
 * Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.
 
 * Includes Arnd's minimal fix for gcc-13 enum warning on 32bit compiles.
   This conflicts with afa4bb778e ("workqueue: clean up WORK_* constant
   types, clarify masking") in master. Can be resolved by picking the master
   version.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoGvw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGZu0AP9IGK2opAzO9i3i1/Ys81b3sHi9PwrYWH3g252T
 Oe3O6QD/Wh0wYBVl0o7IdW6BGdd5iNwIEs420G53UmmPrATqsgQ=
 =TffY
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:

 - Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and
   excluded from concurrency management. Reporting on such work items
   can also be enabled through a config option.

 - Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.

 - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
   superseded by commit afa4bb778e in mainline.

* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
  workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  workqueue: fix enum type for gcc-13
  workqueue: Track and monitor per-workqueue CPU time usage
  workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
  workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
  workqueue: Improve locking rule description for worker fields
  workqueue: Move worker_set/clr_flags() upwards
  workqueue: Re-order struct worker fields
  workqueue: Add pwq->stats[] and a monitoring script
  Further upgrade queue_work_on() comment
2023-06-27 16:32:52 -07:00
Linus Torvalds afa4bb778e workqueue: clean up WORK_* constant types, clarify masking
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:

  kernel/workqueue.c: In function ‘get_work_pwq’:
  kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    713 |                 return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
        |                        ^
  [ ... a couple of other cases ... ]

and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.

Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.

The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused.  The mask needs to be 'unsigned long', not some unspecified
enum type.

To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.

That's now how we roll in the kernel.

So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.

Incidentally, this magically makes clang generate better code.  That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.

Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-23 12:08:14 -07:00
Zqiang 18c8ae8131 workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
If workqueue.cpu_intensive_thresh_us is set to 0, the detection mechanism
for CPU-hogging per-cpu work item will keep triggering spuriously:

  workqueue: process_srcu hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: gc_worker hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: gc_worker hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND
  workqueue: wait_rcu_exp_gp hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: kfree_rcu_monitor hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: kfree_rcu_monitor hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND
  workqueue: reg_todo hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND

This commit therefore disables the CPU-hog detection mechanism when
workqueue.cpu_intensive_thresh_us is set to 0.

tj: Patch description updated and the condition check on
    cpu_intensive_thresh_us separated into a separate if statement for
    readability.

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-25 10:46:53 -10:00
Zqiang c8f6219be2 workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
Currently, pool->nr_running can be modified from timer tick, that means the
timer tick can run nested inside a not-irq-protected section that's in the
process of modifying nr_running. Consider the following scenario:

CPU0
kworker/0:2 (events)
   worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
   ->pool->nr_running++;  (1)

   process_one_work()
   ->worker->current_func(work);
     ->schedule()
       ->wq_worker_sleeping()
         ->worker->sleeping = 1;
         ->pool->nr_running--;  (0)
           ....
       ->wq_worker_running()
               ....
               CPU0 by interrupt:
               wq_worker_tick()
               ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
                 ->pool->nr_running--;  (-1)
	         ->worker->flags |= WORKER_CPU_INTENSIVE;
               ....
         ->if (!(worker->flags & WORKER_NOT_RUNNING))
           ->pool->nr_running++;    (will not execute)
         ->worker->sleeping = 0;
         ....
    ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
      ->pool->nr_running++;  (0)
    ....
    worker_set_flags(worker, WORKER_PREP);
    ->pool->nr_running--;   (-1)
    ....
    worker_enter_idle()
    ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);

if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().

[    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[    2.462163] Modules linked in:
[    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[    2.465127] Workqueue:  0x0 (events)
[    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[    2.472614] Call Trace:
[    2.473152]  <TASK>
[    2.474182]  worker_thread+0x71/0x430
[    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[    2.475263]  kthread+0x103/0x120
[    2.475493]  ? __pfx_worker_thread+0x10/0x10
[    2.476355]  ? __pfx_kthread+0x10/0x10
[    2.476635]  ret_from_fork+0x2c/0x50
[    2.477051]  </TASK>

This commit therefore add the check of worker->sleeping in wq_worker_tick(),
if the worker->sleeping is not zero, directly return.

tj: Updated comment and description.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-24 11:57:26 -10:00
Tejun Heo 8a1dd1e547 workqueue: Track and monitor per-workqueue CPU time usage
Now that wq_worker_tick() is there, we can easily track the rough CPU time
consumption of each workqueue by charging the whole tick whenever a tick
hits an active workqueue. While not super accurate, it provides reasonable
visibility into the workqueues that consume a lot of CPU cycles.
wq_monitor.py is updated to report the per-workqueue CPU times.

v2: wq_monitor.py was using "cputime" as the key when outputting in json
    format. Use "cpu_time" instead for consistency with other fields.

Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-17 17:02:09 -10:00
Tejun Heo 6363845005 workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.

This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.

v3: Documentation update.

v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
    introducing circular locking dependency through printk but not effective
    as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
    just print directly using printk_deferred().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-17 17:02:08 -10:00
Tejun Heo 616db8779b workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
If a per-cpu work item hogs the CPU, it can prevent other work items from
starting through concurrency management. A per-cpu workqueue which intends
to host such CPU-hogging work items can choose to not participate in
concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be
error-prone and difficult to debug when missed.

This patch adds an automatic CPU usage based detection. If a
concurrency-managed work item consumes more CPU time than the threshold
(10ms by default) continuously without intervening sleeps, wq_worker_tick()
which is called from scheduler_tick() will detect the condition and
automatically mark it CPU_INTENSIVE.

The mechanism isn't foolproof:

* Detection depends on tick hitting the work item. Getting preempted at the
  right timings may allow a violating work item to evade detection at least
  temporarily.

* nohz_full CPUs may not be running ticks and thus can fail detection.

* Even when detection is working, the 10ms detection delays can add up if
  many CPU-hogging work items are queued at the same time.

However, in vast majority of cases, this should be able to detect violations
reliably and provide reasonable protection with a small increase in code
complexity.

If some work items trigger this condition repeatedly, the bigger problem
likely is the CPU being saturated with such per-cpu work items and the
solution would be making them UNBOUND. The next patch will add a debug
mechanism to help spot such cases.

v4: Documentation for workqueue.cpu_intensive_thresh_us added to
    kernel-parameters.txt.

v3: Switch to use wq_worker_tick() instead of hooking into preemptions as
    suggested by Peter.

v2: Lai pointed out that wq_worker_stopping() also needs to be called from
    preemption and rtlock paths and an earlier patch was updated
    accordingly. This patch adds a comment describing the risk of infinte
    recursions and how they're avoided.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Tejun Heo bdf8b9bfc1 workqueue: Improve locking rule description for worker fields
* Some worker fields are modified only by the worker itself while holding
  pool->lock thus making them safe to read from self, IRQ context if the CPU
  is running the worker or while holding pool->lock. Add 'K' locking rule
  for them.

* worker->sleeping is currently marked "None" which isn't very descriptive.
  It's used only by the worker itself. Add 'S' locking rule for it.

A future patch will depend on the 'K' rule to access worker->current_* from
the scheduler ticks.

Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-17 17:02:08 -10:00
Tejun Heo c54d5046a0 workqueue: Move worker_set/clr_flags() upwards
They are going to be used in wq_worker_stopping(). Move them upwards.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Tejun Heo 725e8ec59c workqueue: Add pwq->stats[] and a monitoring script
Currently, the only way to peer into workqueue operations is through
tracing. While possible, it isn't easy or convenient to monitor
per-workqueue behaviors over time this way. Let's add pwq->stats[] that
track relevant events and a drgn monitoring script -
tools/workqueue/wq_monitor.py.

It's arguable whether this needs to be configurable. However, it currently
only has several counters and the runtime overhead shouldn't be noticeable
given that they're on pwq's which are per-cpu on per-cpu workqueues and
per-numa-node on unbound ones. Let's keep it simple for the time being.

v2: Patch reordered to earlier with fewer fields. Field will be added back
    gradually. Help message improved.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Paul E. McKenney 854f5cc5b7 Further upgrade queue_work_on() comment
The current queue_work_on() docbook comment says that the caller must
ensure that the specified CPU can't go away, and further says that the
penalty for failing to nail down the specified CPU is that the workqueue
handler might find itself executing on some other CPU.  This is true
as far as it goes, but fails to note what happens if the specified CPU
never was online.  Therefore, further expand this comment to say that
specifying a CPU that was never online will result in a splat.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-09 05:37:42 -10:00
Linus Torvalds cd546fa325 workqueue changes for v6.4-rc1
Mostly changes from Petr to improve warning and error reporting. Workqueue
 now reports more of the relevant failures with better context which should
 help debugging.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErbOg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfipAP9BXrUBI99uZL7XjSEe6rLWa0STODyWh67ikR+0
 nUO22QEA+ttt5ecLZyopYB18Le06VBBW7J5cs0Cg83bCBTGb4wE=
 =gHbc
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "Mostly changes from Petr to improve warning and error reporting.

  Workqueue now reports more of the relevant failures with better
  context which should help debugging"

* tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Introduce show_freezable_workqueues
  workqueue: Print backtraces from CPUs with hung CPU bound workqueues
  workqueue: Warn when a rescuer could not be created
  workqueue: Interrupted create_worker() is not a repeated event
  workqueue: Warn when a new worker could not be created
  workqueue: Fix hung time report of worker pools
  workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
  MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry
2023-04-29 09:48:52 -07:00
Jungseung Lee 704bc669e1 workqueue: Introduce show_freezable_workqueues
Currently show_all_workqueue is called if freeze fails at the time of
freeze the workqueues, which shows the status of all workqueues and of
all worker pools. In this cases we may only need to dump state of only
workqueues that are freezable and busy.

This patch defines show_freezable_workqueues, which uses
show_one_workqueue, a granular function that shows the state of individual
workqueues, so that dump only the state of freezable workqueues
at that time.

tj: Minor message adjustment.

Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-23 15:55:38 -10:00
Petr Mladek cd2440d66f workqueue: Print backtraces from CPUs with hung CPU bound workqueues
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

Worker pools for unbound workqueues always wake up an idle worker and
try to process the work immediately. The last idle worker has to create
new worker first. The stall might happen only when a new worker could
not be created in which case an error should get printed. Another problem
might be too high load. In this case, workers are victims of a global
system problem.

Worker pools for CPU bound workqueues are designed for lightweight
work items that do not need much CPU time. They are proceed one by
one on a single worker. New worker is used only when a work is sleeping.
It creates one additional scenario. The stall might happen when
the CPU-bound workqueue is used for CPU-intensive work.

More precisely, the stall is detected when a CPU-bound worker is in
the TASK_RUNNING state for too long. In this case, it might be useful
to see the backtrace from the problematic worker.

The information how long a worker is in the running state is not available.
But the CPU-bound worker pools do not have many workers in the running
state by definition. And only few pools are typically blocked.

It should be acceptable to print backtraces from all workers in
TASK_RUNNING state in the stalled worker pools. The number of false
positives should be very low.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:47 -10:00
Petr Mladek 4c0736a76a workqueue: Warn when a rescuer could not be created
Rescuers are created when a workqueue with WQ_MEM_RECLAIM is allocated.
It typically happens during the system boot.

systemd switches the root filesystem from initrd to the booted system
during boot. It kills processes that block the switch for too long.
One of the process might be modprobe that tries to create a workqueue.

These problems are hard to reproduce. Also alloc_workqueue() does not
pass the error code. Make the debugging easier by printing an error,
similar to create_worker().

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek 60f540389a workqueue: Interrupted create_worker() is not a repeated event
kthread_create_on_node() might get interrupted(). It is rare but realistic.
For example, when an unbound workqueue is allocated in module_init()
callback. It is done in the context of the "modprobe" process. And,
for example, systemd might kill pending processes when switching root
from initrd to the booted system.

The interrupt is a one-off event and the race might be hard to reproduce.
It is always worth printing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek 3f0ea0b864 workqueue: Warn when a new worker could not be created
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

The progress is guaranteed by using idle workers or creating new workers
for pending work items.

There are several reasons why a new worker could not be created:

   + there is not enough memory

   + there is no free pool ID (IDR API)

   + the system reached PID limit

   + the process creating the new worker was interrupted

   + the last idle worker (manager) has not been scheduled for a long
     time. It was not able to even start creating the kthread.

None of these failures is reported at the moment. The only clue is that
show_one_worker_pool() prints that there is a manager. It is the last
idle worker that is responsible for creating a new one. But it is not
clear if create_worker() is failing and why.

Make the debugging easier by printing errors in create_worker().

The error code is important, especially from kthread_create_on_node().
It helps to distinguish the various reasons. For example, reaching
memory limit (-ENOMEM), other system limits (-EAGAIN), or process
interrupted (-EINTR).

Use pr_once() to avoid repeating the same error every CREATE_COOLDOWN
for each stuck worker pool.

Ratelimited printk() might be better. It would help to know if the problem
remains. It would be more clear if the create_worker() errors and workqueue
stalls are related. Also old messages might get lost when the internal log
buffer is full. The problem is that printk() might touch the watchdog.
For example, see touch_nmi_watchdog() in serial8250_console_write().
It would require synchronization of the begin and length of the ratelimit
interval with the workqueue watchdog. Otherwise, the error messages
might break the watchdog. This does not look worth the complexity.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek 335a42ebb0 workqueue: Fix hung time report of worker pools
The workqueue watchdog prints a warning when there is no progress in
a worker pool. Where the progress means that the pool started processing
a pending work item.

Note that it is perfectly fine to process work items much longer.
The progress should be guaranteed by waking up or creating idle
workers.

show_one_worker_pool() prints state of non-idle worker pool. It shows
a delay since the last pool->watchdog_ts.

The timestamp is updated when a first pending work is queued in
__queue_work(). Also it is updated when a work is dequeued for
processing in worker_thread() and rescuer_thread().

The delay is misleading when there is no pending work item. In this
case it shows how long the last work item is being proceed. Show
zero instead. There is no stall if there is no pending work.

Fixes: 82607adcf9 ("workqueue: implement lockup detector")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Ammar Faizi a8ec5880bd workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
Use pr_warn_once() to achieve the same thing. It's simpler.

Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 11:49:03 -10:00
Greg Kroah-Hartman 686f669780 workqueue: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230313182918.1312597-8-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:23 +01:00
Valentin Schneider c63a2e52d5 workqueue: Fold rebind_worker() within rebind_workers()
!CONFIG_SMP builds complain about rebind_worker() being unused. Its only
user, rebind_workers() is indeed only defined for CONFIG_SMP, so just fold
the two lines back up there.

Link: http://lore.kernel.org/r/20230113143102.2e94d74f@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-13 07:50:40 -10:00
Valentin Schneider e02b931248 workqueue: Unbind kworkers before sending them to exit()
It has been reported that isolated CPUs can suffer from interference due to
per-CPU kworkers waking up just to die.

A surge of workqueue activity during initial setup of a latency-sensitive
application (refresh_vm_stats() being one of the culprits) can cause extra
per-CPU kworkers to be spawned. Then, said latency-sensitive task can be
running merrily on an isolated CPU only to be interrupted sometime later by
a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last
kworker activity).

Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't
contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking
them with WORKER_DIE.

Changing the affinity does require a sleepable context, leverage the newly
introduced pool->idle_cull_work to get that.

Remove dying workers from pool->workers and keep track of them in a
separate list. This intentionally prevents for_each_loop_worker() from
iterating over workers that are marked for death.

Rename destroy_worker() to set_working_dying() to better reflect its
effects and relationship with wake_dying_workers().

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider 9ab03be42b workqueue: Don't hold any lock while rcuwait'ing for !POOL_MANAGER_ACTIVE
put_unbound_pool() currently passes wq_manager_inactive() as exit condition
to rcuwait_wait_event(), which grabs pool->lock to check for

  pool->flags & POOL_MANAGER_ACTIVE

A later patch will require destroy_worker() to be invoked with
wq_pool_attach_mutex held, which needs to be acquired before
pool->lock. A mutex cannot be acquired within rcuwait_wait_event(), as
it could clobber the task state set by rcuwait_wait_event()

Instead, restructure the waiting logic to acquire any necessary lock
outside of rcuwait_wait_event().

Since further work cannot be inserted into unbound pwqs that have reached
->refcnt==0, this is bound to make forward progress as eventually the
worklist will be drained and need_more_worker(pool) will remain false,
preventing any worker from stealing the manager position from us.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider 3f959aa3b3 workqueue: Convert the idle_timer to a timer + work_struct
A later patch will require a sleepable context in the idle worker timeout
function. Converting worker_pool.idle_timer to a delayed_work gives us just
that, however this would imply turning all idle_timer expiries into
scheduler events (waking up a worker to handle the dwork).

Instead, implement a "custom dwork" where the timer callback does some
extra checks before queuing the associated work.

No change in functionality intended.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider 793777bc19 workqueue: Factorize unbind/rebind_workers() logic
Later patches will reuse this code, move it into reusable functions.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Lai Jiangshan 99c621ef24 workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
When unbind_workers() reads wq_unbound_cpumask to set the affinity of
freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.

Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also
remove the need of temporary saved_cpumask.

Fixes: 10a5a651e3 ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Reported-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:48 -10:00
Paul E. McKenney c76feb0d5d workqueue: Make show_pwq() use run-length encoding
The show_pwq() function dumps out a pool_workqueue structure's activity,
including the pending work-queue handlers:

 Showing busy workqueues and worker pools:
 workqueue events: flags=0x0
   pwq 0: cpus=0 node=0 flags=0x1 nice=0 active=10/256 refcnt=11
     in-flight: 7:test_work_func, 64:test_work_func, 249:test_work_func
     pending: test_work_func, test_work_func, test_work_func1, test_work_func1, test_work_func1, test_work_func1, test_work_func1

When large systems are facing certain types of hang conditions, it is not
unusual for this "pending" list to contain runs of hundreds of identical
function names.  This "wall of text" is difficult to read, and worse yet,
it can be interleaved with other output such as stack traces.

Therefore, make show_pwq() use run-length encoding so that the above
printout instead looks like this:

 Showing busy workqueues and worker pools:
 workqueue events: flags=0x0
   pwq 0: cpus=0 node=0 flags=0x1 nice=0 active=10/256 refcnt=11
     in-flight: 7:test_work_func, 64:test_work_func, 249:test_work_func
     pending: 2*test_work_func, 5*test_work_func1

When no comma would be printed, including the WORK_STRUCT_LINKED case,
a new run is started unconditionally.

This output is more readable, places less stress on the hardware,
firmware, and software on the console-log path, and reduces interference
with other output.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-06 14:23:24 -10:00
Richard Clark 33e3f0a335 workqueue: Add a new flag to spot the potential UAF error
Currently if the user queues a new work item unintentionally
into a wq after the destroy_workqueue(wq), the work still can
be queued and scheduled without any noticeable kernel message
before the end of a RCU grace period.

As a debug-aid facility, this commit adds a new flag
__WQ_DESTROYING to spot that issue by triggering a kernel WARN
message.

Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-04 12:25:29 -10:00
Uladzislau Rezki a7e30c0e9a workqueue: Make queue_rcu_work() use call_rcu_hurry()
Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one
in queue_rcu_work(), given that callers to queue_rcu_work() are
not necessarily OK with long delays.

Therefore, make queue_rcu_work() use call_rcu_hurry() in order to revert
to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-30 13:17:05 -08:00
Linus Torvalds 865dad2022 kcfi updates for v6.1-rc1
This replaces the prior support for Clang's standard Control Flow
 Integrity (CFI) instrumentation, which has required a lot of special
 conditions (e.g. LTO) and work-arounds. The current implementation
 ("Kernel CFI") is specific to C, directly designed for the Linux kernel,
 and takes advantage of architectural features like x86's IBT. This
 series retains arm64 support and adds x86 support. Additional "generic"
 architectural support is expected soon:
 https://github.com/samitolvanen/llvm-project/commits/kcfi_generic
 
 - treewide: Remove old CFI support details
 
 - arm64: Replace Clang CFI support with Clang KCFI support
 
 - x86: Introduce Clang KCFI support
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4aAUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkgWD/4mUgb7xewNIG/+fuipGd620Iao
 K0T8q4BNxLNRltOxNc3Q0WMDCggX0qJGCeds7EdFQJQOGxWcbifM8MAS4idAGM0G
 fc3Gxl1imC/oF6goCAbQgndA6jYFIWXGsv8LsRjAXRidWLFr3GFAqVqYJyokSySr
 8zMQsEDuF4I1gQnOhEWdtPZbV3MQ4ZjfFzpv+33agbq6Gb72vKvDh3G6g2VXlxjt
 1qnMtS+eEpbBU65cJkOi4MSLgymWbnIAeTMb0dbsV4kJ08YoTl8uz1B+weeH6GgT
 WP73ZJ4nqh1kkkT9EqS9oKozNB9fObhvCokEuAjuQ7i1eCEZsbShvRc0iL7OKTGG
 UfuTJa5qQ4h7Z0JS35FCSJETa+fcG0lTyEd133nLXLMZP9K2antf+A6O//fd0J1V
 Jg4VN7DQmZ+UNGOzRkL6dTtQUy4PkxhniIloaClfSYXxhNirA+v//sHTnTK3z2Bl
 6qceYqmFmns2Laual7+lvnZgt6egMBcmAL/MOdbU74+KIR9Xw76wxQjifktHX+WF
 FEUQkUJDB5XcUyKlbvHoqobRMxvEZ8RIlC5DIkgFiPRE3TI0MqfzNSFnQ/6+lFNg
 Y0AS9HYJmcj8sVzAJ7ji24WPFCXzsbFn6baJa9usDNbWyQZokYeiv7ZPNPHPDVrv
 YEBP6aYko0lVSUS9qw==
 =Li4D
 -----END PGP SIGNATURE-----

Merge tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kcfi updates from Kees Cook:
 "This replaces the prior support for Clang's standard Control Flow
  Integrity (CFI) instrumentation, which has required a lot of special
  conditions (e.g. LTO) and work-arounds.

  The new implementation ("Kernel CFI") is specific to C, directly
  designed for the Linux kernel, and takes advantage of architectural
  features like x86's IBT. This series retains arm64 support and adds
  x86 support.

  GCC support is expected in the future[1], and additional "generic"
  architectural support is expected soon[2].

  Summary:

   - treewide: Remove old CFI support details

   - arm64: Replace Clang CFI support with Clang KCFI support

   - x86: Introduce Clang KCFI support"

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048 [1]
Link: https://github.com/samitolvanen/llvm-project/commits/kcfi_generic [2]

* tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  x86: Add support for CONFIG_CFI_CLANG
  x86/purgatory: Disable CFI
  x86: Add types to indirectly called assembly functions
  x86/tools/relocs: Ignore __kcfi_typeid_ relocations
  kallsyms: Drop CONFIG_CFI_CLANG workarounds
  objtool: Disable CFI warnings
  objtool: Preserve special st_shndx indexes in elf_update_symbol
  treewide: Drop __cficanonical
  treewide: Drop WARN_ON_FUNCTION_MISMATCH
  treewide: Drop function_nocfi
  init: Drop __nocfi from __init
  arm64: Drop unneeded __nocfi attributes
  arm64: Add CFI error handling
  arm64: Add types to indirect called assembly functions
  psci: Fix the function type for psci_initcall_t
  lkdtm: Emit an indirect call for CFI tests
  cfi: Add type helper macros
  cfi: Switch to -fsanitize=kcfi
  cfi: Drop __CFI_ADDRESSABLE
  cfi: Remove CONFIG_CFI_CLANG_SHADOW
  ...
2022-10-03 17:11:07 -07:00
Sami Tolvanen 4b24356312 treewide: Drop WARN_ON_FUNCTION_MISMATCH
CONFIG_CFI_CLANG no longer breaks cross-module function address
equality, which makes WARN_ON_FUNCTION_MISMATCH unnecessary. Remove
the definition and switch back to WARN_ON_ONCE.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-15-samitolvanen@google.com
2022-09-26 10:13:14 -07:00
Tetsuo Handa c0feea594e workqueue: don't skip lockdep work dependency in cancel_work_sync()
Like Hillf Danton mentioned

  syzbot should have been able to catch cancel_work_sync() in work context
  by checking lockdep_map in __flush_work() for both flush and cancel.

in [1], being unable to report an obvious deadlock scenario shown below is
broken. From locking dependency perspective, sync version of cancel request
should behave as if flush request, for it waits for completion of work if
that work has already started execution.

  ----------
  #include <linux/module.h>
  #include <linux/sched.h>
  static DEFINE_MUTEX(mutex);
  static void work_fn(struct work_struct *work)
  {
    schedule_timeout_uninterruptible(HZ / 5);
    mutex_lock(&mutex);
    mutex_unlock(&mutex);
  }
  static DECLARE_WORK(work, work_fn);
  static int __init test_init(void)
  {
    schedule_work(&work);
    schedule_timeout_uninterruptible(HZ / 10);
    mutex_lock(&mutex);
    cancel_work_sync(&work);
    mutex_unlock(&mutex);
    return -EINVAL;
  }
  module_init(test_init);
  MODULE_LICENSE("GPL");
  ----------

The check this patch restores was added by commit 0976dfc1d0
("workqueue: Catch more locking problems with flush_work()").

Then, lockdep's crossrelease feature was added by commit b09be676e0
("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
this check was once removed by commit fd1a5b04df ("workqueue: Remove
now redundant lock acquisitions wrt. workqueue flushes").

But lockdep's crossrelease feature was removed by commit e966eaeeb6
("locking/lockdep: Remove the cross-release locking checks"). At this
point, this check should have been restored.

Then, commit d6e89786be ("workqueue: skip lockdep wq dependency in
cancel_work_sync()") introduced a boolean flag in order to distinguish
flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
dependency when called from cancel_work_sync() was causing false positives.

Then, commit 87915adc3f ("workqueue: re-add lockdep dependencies for
flushing") tried to restore "struct work_struct" dependency check, but by
error checked this boolean flag. Like an example shown above indicates,
"struct work_struct" dependency needs to be checked for both flush_work()
and cancel_work_sync().

Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
Reported-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 87915adc3f ("workqueue: re-add lockdep dependencies for flushing")
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-16 06:27:35 -10:00
Linus Torvalds b44f2fd879 drm for 5.20/6.0
New driver:
 - logicvc
 
 vfio:
 - use aperture API
 
 core:
 - of: Add data-lane helpers and convert drivers
 - connector: Remove deprecated ida_simple_get()
 
 media:
 - Add various RGB666 and RGB888 format constants
 
 panel:
 - Add HannStar HSD101PWW
 - Add ETML0700Y5DHA
 
 dma-buf:
 - add sync-file API
 - set dma mask for udmabuf devices
 
 fbcon:
 - Improve scrolling performance
 - Sanitize input
 
 fbdev:
 - device unregistering fixes
 - vesa: Support COMPILE_TEST
 - Disable firmware-device registration when first native driver loads
 
 aperture:
 - fix segfault during hot-unplug
 - export for use with other subsystems
 
 client:
 - use driver validated modes
 
 dp:
 - aux: make probing more reliable
 - mst: Read extended DPCD capabilities during system resume
 - Support waiting for HDP signal
 - Port-validation fixes
 
 edid:
 - CEA data-block iterators
 - struct drm_edid introduction
 - implement HF-EEODB extension
 
 gem:
 - don't use fb format non-existing planes
 
 probe-helper:
 - use 640x480 as displayport fallback
 
 scheduler:
 - don't kill jobs in interrupt context
 
 bridge:
 - Add support for i.MX8qxp and i.MX8qm
 - lots of fixes/cleanups
 - Add TI-DLPC3433
 - fy07024di26a30d: Optional GPIO reset
 - ldb: Add reg and reg-name properties to bindings, Kconfig fixes
 - lt9611: Fix display sensing;
 - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
 - tc358775: Fix clock settings
 - ti-sn65dsi83: Allow GPIO to sleep
 - adv7511: I2C fixes
 - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
 - fsl-ldb: Drop DE flip
 - ti-sn65dsi86: Convert to atomic modesetting
 
 amdgpu:
 - use atomic fence helpers in DM
 - fix VRAM address calculations
 - export CRTC bpc via debugfs
 - Initial devcoredump support
 - Enable high priority gfx queue on asics which support it
 - Adjust GART size on newer APUs for S/G display
 - Soft reset for GFX 11 / SDMA 6
 - Add gfxoff status query for vangogh
 - Fix timestamps for cursor only commits
 - Adjust GART size on newer APUs for S/G display
 - fix buddy memory corruption
 
 amdkfd:
 - MMU notifier fixes
 - P2P DMA support using dma-buf
 - Add available memory IOCTL
 - HMM profiler support
 - Simplify GPUVM validation
 - Unified memory for CWSR save/restore area
 
 i915:
 - General driver clean-up
 - DG2 enabling (still under force probe)
   - DG2 small BAR memory support
   - HuC loading support
   - DG2 workarounds
   - DG2/ATS-M device IDs added
 - Ponte Vecchio prep work and new blitter engines
 - add Meteorlake support
 - Fix sparse warnings
 - DMC MMIO range checks
 - Audio related fixes
 - Runtime PM fixes
 - PSR fixes
 - Media freq factor and per-gt enhancements
 - DSI fixes for ICL+
 - Disable DMC flip queue handlers
 - ADL_P voltage swing updates
 - Use more the VBT for panel information
 - Fix on Type-C ports with TBT mode
 - Improve fastset and allow seamless M/N changes
 - Accept more fixed modes with VRR/DMRRS panels
 - Disable connector polling for a headless SKU
 - ADL-S display PLL w/a
 - Enable THP on Icelake and beyond
 - Fix i915_gem_object_ggtt_pin_ww regression on old platforms
 - Expose per tile media freq factor in sysfs
 - Fix dma_resv fence handling in multi-batch execbuf
 - Improve on suspend / resume time with VT-d enabled
 - export CRTC bpc settings via debugfs
 
 msm:
 - gpu: a619 support
 - gpu: Fix for unclocked GMU register access
 - gpu: Devcore dump enhancements
 - client utilization via fdinfo support
 - fix fence rollover issue
 - gem: Lockdep false-positive warning fix
 - gem: Switch to pfn mappings
 - WB support on sc7180
 - dp: dropped custom bulk clock implementation
 - fix link retraining on resolution change
 - hdmi: dropped obsolete GPIO support
 
 tegra:
 - context isolation for host1x engines
 - tegra234 soc support
 
 mediatek:
 - add vdosys0/1 for mt8195
 - add MT8195 dp_intf driver
 
 exynos:
 - Fix resume function issue of exynos decon driver by calling
   clk_disable_unprepare() properly if clk_prepare_enable() failed.
 
 nouveau:
 - set of misc fixes/cleanups
 - display cleanups
 
 gma500:
 - Cleanup connector I2C handling
 
 hyperv:
 - Unify VRAM allocation of Gen1 and Gen2
 
 meson:
 - Support YUV422 output; Refcount fixes
 
 mgag200:
 - Support damage clipping
 - Support gamma handling
 - Protect concurrent HW access
 - Fixes to connector
 - Store model-specific limits in device-info structure
 - fix PCI register init
 
 panfrost:
 - Valhall support
 
 r128:
 - Fix bit-shift overflow
 
 rockchip:
 - Locking fixes in error path
 
 ssd130x:
 - Fix built-in linkage
 
 udl:
 - Always advertize VGA connector
 
 ast:
 - Support multiple outputs
 - fix black screen on resume
 
 sun4i:
 - HDMI PHY cleanups
 
 vc4:
 - Add support for BCM2711
 
 vkms:
 - Allocate output buffer with vmalloc()
 
 mcde:
 - Fix ref-count leak
 
 mxsfb/lcdif:
 - Support i.MX8MP LCD controller
 
 stm/ltdc:
 - Support dynamic Z order
 - Support mirroring
 
 ingenic:
 - Fix display at maximum resolution
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmLp/7YACgkQDHTzWXnE
 hr7NjhAAnefa+72EG42OAqajbwTQMENOtFfqyL3k6ueK2ciYbsj/wklw/xc4Ok3o
 DM5kG54t+nA9L1M7UyE7eaO36/XcuvS8Ea0uKKkamWt+3Ux4g1Vo1J37nP5sK5jI
 GT/wceKA5sk3nuYly2lBby6mVTGuhAX+3edTAFeOwmd0WvQzzpy4vV+nCAgfshUs
 ql4gfQPdQdP+wiovUzCIEu6exCSCAI/Oc944fd3AJi5bZbOPFXRS4rMMOLSrdoXV
 9P44EZExPbYrDuVUCx/UaZtN8D9myyyBfZe62CtdgNyTYUHXnHCBYue+7D/s5O+y
 GaLWcP128MsqZNmJNhmcWFIlgqowO24YkKUH68JH0UtBLSWich8rfdEsrxIidYED
 0ma1jodRapjyZOjrHEJ3N5deKpoflMmqvCMpvIk1Ev6pT8KX9a6u34kLgsOVCV41
 2bDEYD+DbRW2FexGR79yB2huXHGSnguco6069ca1oy9RF4q8cX6Pb1w2u42oS7zX
 lIgLIashilVR2AYg/qi6IPHavmOQ9ItSXPC+4YasYiMGp/mwePqpmL63b/wkhg0D
 nXn6/F8Bm6wle2FFbkLGwo1fF1Hn7RzTHSlqRWDKSEaMLhCus6M09VsobFCB19i0
 lO4FNVTL8ZtryR94bgVmgi616w9hOhDhM9A+C0kJ9KBkDnDYUJU=
 =HQ9U
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights:

   - New driver for logicvc - which is a display IP core.

   - EDID parser rework to add new extensions

   - fbcon scrolling improvements

   - i915 has some more DG2 work but not enabled by default, but should
     have enough features for userspace to work now.

  Otherwise it's lots of work all over the place. Detailed summary:

  New driver:
   - logicvc

  vfio:
   - use aperture API

  core:
   - of: Add data-lane helpers and convert drivers
   - connector: Remove deprecated ida_simple_get()

  media:
   - Add various RGB666 and RGB888 format constants

  panel:
   - Add HannStar HSD101PWW
   - Add ETML0700Y5DHA

  dma-buf:
   - add sync-file API
   - set dma mask for udmabuf devices

  fbcon:
   - Improve scrolling performance
   - Sanitize input

  fbdev:
   - device unregistering fixes
   - vesa: Support COMPILE_TEST
   - Disable firmware-device registration when first native driver loads

  aperture:
   - fix segfault during hot-unplug
   - export for use with other subsystems

  client:
   - use driver validated modes

  dp:
   - aux: make probing more reliable
   - mst: Read extended DPCD capabilities during system resume
   - Support waiting for HDP signal
   - Port-validation fixes

  edid:
   - CEA data-block iterators
   - struct drm_edid introduction
   - implement HF-EEODB extension

  gem:
   - don't use fb format non-existing planes

  probe-helper:
   - use 640x480 as displayport fallback

  scheduler:
   - don't kill jobs in interrupt context

  bridge:
   - Add support for i.MX8qxp and i.MX8qm
   - lots of fixes/cleanups
   - Add TI-DLPC3433
   - fy07024di26a30d: Optional GPIO reset
   - ldb: Add reg and reg-name properties to bindings, Kconfig fixes
   - lt9611: Fix display sensing;
   - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
   - tc358775: Fix clock settings
   - ti-sn65dsi83: Allow GPIO to sleep
   - adv7511: I2C fixes
   - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
   - fsl-ldb: Drop DE flip
   - ti-sn65dsi86: Convert to atomic modesetting

  amdgpu:
   - use atomic fence helpers in DM
   - fix VRAM address calculations
   - export CRTC bpc via debugfs
   - Initial devcoredump support
   - Enable high priority gfx queue on asics which support it
   - Adjust GART size on newer APUs for S/G display
   - Soft reset for GFX 11 / SDMA 6
   - Add gfxoff status query for vangogh
   - Fix timestamps for cursor only commits
   - Adjust GART size on newer APUs for S/G display
   - fix buddy memory corruption

  amdkfd:
   - MMU notifier fixes
   - P2P DMA support using dma-buf
   - Add available memory IOCTL
   - HMM profiler support
   - Simplify GPUVM validation
   - Unified memory for CWSR save/restore area

  i915:
   - General driver clean-up
   - DG2 enabling (still under force probe)
       - DG2 small BAR memory support
       - HuC loading support
       - DG2 workarounds
       - DG2/ATS-M device IDs added
   - Ponte Vecchio prep work and new blitter engines
   - add Meteorlake support
   - Fix sparse warnings
   - DMC MMIO range checks
   - Audio related fixes
   - Runtime PM fixes
   - PSR fixes
   - Media freq factor and per-gt enhancements
   - DSI fixes for ICL+
   - Disable DMC flip queue handlers
   - ADL_P voltage swing updates
   - Use more the VBT for panel information
   - Fix on Type-C ports with TBT mode
   - Improve fastset and allow seamless M/N changes
   - Accept more fixed modes with VRR/DMRRS panels
   - Disable connector polling for a headless SKU
   - ADL-S display PLL w/a
   - Enable THP on Icelake and beyond
   - Fix i915_gem_object_ggtt_pin_ww regression on old platforms
   - Expose per tile media freq factor in sysfs
   - Fix dma_resv fence handling in multi-batch execbuf
   - Improve on suspend / resume time with VT-d enabled
   - export CRTC bpc settings via debugfs

  msm:
   - gpu: a619 support
   - gpu: Fix for unclocked GMU register access
   - gpu: Devcore dump enhancements
   - client utilization via fdinfo support
   - fix fence rollover issue
   - gem: Lockdep false-positive warning fix
   - gem: Switch to pfn mappings
   - WB support on sc7180
   - dp: dropped custom bulk clock implementation
   - fix link retraining on resolution change
   - hdmi: dropped obsolete GPIO support

  tegra:
   - context isolation for host1x engines
   - tegra234 soc support

  mediatek:
   - add vdosys0/1 for mt8195
   - add MT8195 dp_intf driver

  exynos:
   - Fix resume function issue of exynos decon driver by calling
     clk_disable_unprepare() properly if clk_prepare_enable() failed.

  nouveau:
   - set of misc fixes/cleanups
   - display cleanups

  gma500:
   - Cleanup connector I2C handling

  hyperv:
   - Unify VRAM allocation of Gen1 and Gen2

  meson:
   - Support YUV422 output; Refcount fixes

  mgag200:
   - Support damage clipping
   - Support gamma handling
   - Protect concurrent HW access
   - Fixes to connector
   - Store model-specific limits in device-info structure
   - fix PCI register init

  panfrost:
   - Valhall support

  r128:
   - Fix bit-shift overflow

  rockchip:
   - Locking fixes in error path

  ssd130x:
   - Fix built-in linkage

  udl:
   - Always advertize VGA connector

  ast:
   - Support multiple outputs
   - fix black screen on resume

  sun4i:
   - HDMI PHY cleanups

  vc4:
   - Add support for BCM2711

  vkms:
   - Allocate output buffer with vmalloc()

  mcde:
   - Fix ref-count leak

  mxsfb/lcdif:
   - Support i.MX8MP LCD controller

  stm/ltdc:
   - Support dynamic Z order
   - Support mirroring

  ingenic:
   - Fix display at maximum resolution"

* tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits)
  drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code
  drm/amdgpu: enable support for psp 13.0.4 block
  drm/amdgpu: add files for PSP 13.0.4
  drm/amdgpu: add header files for MP 13.0.4
  drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index
  drm/amdgpu: send msg to IMU for the front-door loading
  drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"
  drm/amdgpu: fix hive reference leak when reflecting psp topology info
  drm/amd/pm: enable GFX ULV feature support for SMU13.0.0
  drm/amd/pm: update driver if header for SMU 13.0.0
  drm/amdgpu: move mes self test after drm sched re-started
  drm/amdgpu: drop non-necessary call trace dump
  drm/amdgpu: enable VCN cg and JPEG cg/pg
  drm/amdgpu: vcn_4_0_2 video codec query
  drm/amdgpu: add VCN_4_0_2 firmware support
  drm/amdgpu: add VCN function in NBIO v7.7
  drm/amdgpu: fix a vcn4 boot poll bug in emulation mode
  drm/amd/amdgpu: add memory training support for PSP_V13
  drm/amdkfd: remove an unnecessary amdgpu_bo_ref
  drm/amd/pm: Add get_gfx_off_status interface for yellow carp
  ...
2022-08-03 19:52:08 -07:00
Lai Jiangshan 46a4d679ef workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.

Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.

It is very easy to trigger the warning:
  Set wq_unbound_cpumask to a small set of CPUs.
  Offline all the CPUs of wq_unbound_cpumask.
  Offline an extra CPU and trigger the warning.

Fixes: 10a5a651e3 ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-07-29 07:49:02 -10:00
Dave Airlie 344feb7ccf Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.20-2022-07-05:

amdgpu:
- Various spelling and grammer fixes
- Various eDP fixes
- Various DMCUB fixes
- VCN fixes
- GMC 11 fixes
- RAS fixes
- TMZ support for GC 10.3.7
- GPUVM TLB flush fixes
- SMU 13.0.x updates
- DCN 3.2 Support
- DCN 3.2.1 Support
- MES updates
- GFX11 modifiers support
- USB-C fixes
- MMHUB 3.0.1 support
- SDMA 6.0 doorbell fixes
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Enable GPU reset for SMU 13.0.4
- OLED display fixes
- MPO fixes
- DC frame size fixes
- ASPM support for PCIE 7.4/7.6
- GPU reset support for SMU 13.0.0
- GFX11 updates
- VCN JPEG fix
- BACO support for SMU 13.0.7
- VCN instance handling fix
- GFX8 GPUVM TLB flush fix
- GPU reset rework
- VCN 4.0.2 support
- GTT size fixes
- DP link training fixes
- LSDMA 6.0.1 support
- Various backlight fixes
- Color encoding fixes
- Backlight config cleanup
- VCN 4.x unified queue cleanup

amdkfd:
- MMU notifier fixes
- Updates for GC 10.3.6 and 10.3.7
- P2P DMA support using dma-buf
- Add available memory IOCTL
- SDMA 6.0.1 fix
- MES fixes
- HMM profiler support

radeon:
- License fix
- Backlight config cleanup

UAPI:
- Add available memory IOCTL to amdkfd
  Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html
- HMM profiler support for amdkfd
  Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com
2022-07-12 11:07:32 +10:00
Andrey Grodzovsky 73b4b53276 Revert "workqueue: remove unused cancel_work()"
This reverts commit 6417250d3f.

amdpgu need this function in order to prematurly stop pending
reset works when another reset work already in progress.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Lai Jiangshan<jiangshanlai@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10 15:24:38 -04:00
Tetsuo Handa c4f135d643 workqueue: Wrap flush_workqueue() using a macro
Since flush operation synchronously waits for completion, flushing
system-wide WQs (e.g. system_wq) might introduce possibility of deadlock
due to unexpected locking dependency. Tejun Heo commented at [1] that it
makes no sense at all to call flush_workqueue() on the shared WQs as the
caller has no idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, syzbot found a
circular locking dependency caused by flushing system_wq WQ [2].

Therefore, let's change the direction to that developers had better use
their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is
inevitable.

Steps for converting system-wide WQs into local WQs are explained at [3],
and a conversion to stop flushing system-wide WQs is in progress. Now we
want some mechanism for preventing developers who are not aware of this
conversion from again start flushing system-wide WQs.

Since I found that WARN_ON() is complete but awkward approach for teaching
developers about this problem, let's use __compiletime_warning() for
incomplete but handy approach. For completeness, we will also insert
WARN_ON() into __flush_workqueue() after all in-tree users stopped calling
flush_scheduled_work().

Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1]
Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2]
Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-06-07 07:07:14 -10:00
Zqiang 10a5a651e3 workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs
When a CPU is going offline, all workers on the CPU's pool will have their
cpus_allowed cleared to cpu_possible_mask and can run on any CPUs including
the isolated ones. Instead, set cpus_allowed to wq_unbound_cpumask so that
the can avoid isolated CPUs.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-04-21 12:31:04 -10:00
Linus Torvalds 7838316260 Merge branch 'for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing major. Just follow-up cleanups from Lai after the earlier
  synchronization simplification"

* 'for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Convert the type of pool->nr_running to int
  workqueue: Use wake_up_worker() in wq_worker_sleeping() instead of open code
  workqueue: Change the comments of the synchronization about the idle_list
  workqueue: Remove the mb() pair between wq_worker_sleeping() and insert_work()
2022-03-23 12:40:51 -07:00
Frederic Weisbecker 04d4e665a6 sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
2022-02-16 15:57:55 +01:00
Frederic Weisbecker 7b45b51e77 workqueue: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
To prepare for supporting each feature of the housekeeping cpumask
toward cpuset, prepare each of the HK_FLAG_* entries to move to their
own cpumask with enforcing to fetch them individually. The new
constraint is that multiple HK_FLAG_* entries can't be mixed together
anymore in a single call to housekeeping cpumask().

This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220207155910.527133-3-frederic@kernel.org
2022-02-16 15:57:54 +01:00
Lai Jiangshan bc35f7ef96 workqueue: Convert the type of pool->nr_running to int
It is only modified in associated CPU, so it doesn't need to be atomic.

tj: Comment updated.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-12 07:46:36 -10:00
Lai Jiangshan cc5bff3846 workqueue: Use wake_up_worker() in wq_worker_sleeping() instead of open code
The wakeup code in wq_worker_sleeping() is the same as wake_up_worker().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-12 07:40:32 -10:00
Lai Jiangshan 2c1f1a9180 workqueue: Change the comments of the synchronization about the idle_list
The access to idle_list in wq_worker_sleeping() is changed to be
protected by pool->lock, so the comments above idle_list can be changed
to "L:" which is the meaning of "access with pool->lock held".

And the outdated comments in wq_worker_sleeping() is removed since
the function is not called with rq lock held any more, idle_list is
dereferenced with pool lock now.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-12 07:39:15 -10:00
Lai Jiangshan 21b195c05c workqueue: Remove the mb() pair between wq_worker_sleeping() and insert_work()
In wq_worker_sleeping(), the access to worklist is protected by the
pool->lock, so the memory barrier is unneeded.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-12 07:37:28 -10:00
Tejun Heo 2a8ab0fbd1 Merge branch 'workqueue/for-5.16-fixes' into workqueue/for-5.17
for-5.16-fixes contains two subtle race conditions which were introduced by
scheduler side code cleanups. The branch didn't get pushed out, so merge
into for-5.17.
2022-01-10 07:54:04 -10:00
Lai Jiangshan 84f91c62d6 workqueue: Remove the cacheline_aligned for nr_running
nr_running is never modified remotely after the schedule callback in
wakeup path is removed.

Rather nr_running is often accessed with other fields in the pool
together, so the cacheline_aligned for nr_running isn't needed.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-09 12:26:54 -10:00
Lai Jiangshan 989442d737 workqueue: Move the code of waking a worker up in unbind_workers()
In unbind_workers(), there are two pool->lock held sections separated
by the code of zapping nr_running.  wake_up_worker() needs to be in
pool->lock held section and after zapping nr_running.  And zapping
nr_running had to be after schedule() when the local wake up
functionality was in use.  Now, the call to schedule() has been removed
along with the local wake up functionality, so the code can be merged
into the same pool->lock held section.

The diffstat shows that it is other code moved down because the diff
tools can not know the meaning of merging lock sections by swapping
two code blocks.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-09 12:23:15 -10:00
Lai Jiangshan b4ac9384ac workqueue: Remove schedule() in unbind_workers()
The commit 6d25be5782 ("sched/core, workqueues: Distangle worker
accounting from rq lock") changed the schedule callbacks for workqueue
and moved the schedule callback from the wakeup code to at end of
schedule() in the worker's process context.

It means that the callback wq_worker_running() is guaranteed that
it sees the %WORKER_UNBOUND flag after scheduled since unbind_workers()
is running on the same CPU that all the pool's workers bound to.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-09 12:20:24 -10:00
Lai Jiangshan 11b45b0bf4 workqueue: Remove outdated comment about exceptional workers in unbind_workers()
Long time before, workers are not ALL bound after CPU_ONLINE, they can
still be running in other CPUs before self rebinding.

But the commit a9ab775bca ("workqueue: directly restore CPU affinity
of workers from CPU_ONLINE") makes rebind_workers() bind them all.

So all workers are on the CPU before the CPU is down.

And the comment in unbind_workers() refers to the workers "which are
still executing works from before the last CPU down" is outdated.
Just removed it.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-09 12:16:08 -10:00