linux-stable/kernel/locking
Waiman Long 810507fe6f locking/lockdep: Reuse freed chain_hlocks entries
Once a lock class is zapped, all the lock chains that include the zapped
class are essentially useless. The lock_chain structure itself can be
reused, but not the corresponding chain_hlocks[] entries. Over time,
we will run out of chain_hlocks entries while there are still plenty
of other lockdep array entries available.

To fix this imbalance, we have to make chain_hlocks entries reusable
just like the others. As the freed chain_hlocks entries are in blocks of
various lengths. A simple bitmap like the one used in the other reusable
lockdep arrays isn't applicable. Instead the chain_hlocks entries are
put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks.  Bucket 0
is the variable size bucket which houses chain blocks of size larger than
MAX_CHAIN_BUCKETS sorted in decreasing size order.  Initially, the whole
array is in one chain block (the primordial chain block) in bucket 0.

The minimum size of a chain block is 2 chain_hlocks entries. That will
be the minimum allocation size. In other word, allocation requests
for one chain_hlocks entry will cause 2-entry block to be returned and
hence 1 entry will be wasted.

Allocation requests for the chain_hlocks are fulfilled first by looking
for chain block of matching size. If not found, the first chain block
from bucket[0] (the largest one) is split. That can cause hlock entries
fragmentation and reduce allocation efficiency if a chain block of size >
MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial
chain block. So the MAX_CHAIN_BUCKETS must be large enough that this
should seldom happen.

By reusing the chain_hlocks entries, we are able to handle workloads
that add and zap a lot of lock classes without the risk of running out
of chain_hlocks entries as long as the total number of outstanding lock
classes at any time remain within a reasonable limit.

Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks,
are added to track the total number of chain_hlocks entries in the
free bucketed lists and the number of large chain blocks in buckets[0]
respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks.

The nr_large_chain_blocks counter enables to see if we should increase
the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to
avoid the fragmentation problem in bucket[0].

An internal nfsd test that ran for more than an hour and kept on
loading and unloading kernel modules could cause the following message
to be displayed.

  [ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!

The patched kernel was able to complete the test with a lot of free
chain_hlocks entries to spare:

  # cat /proc/lockdep_stats
     :
   dependency chains:                   18867 [max: 65536]
   dependency chain hlocks:             74926 [max: 327680]
   dependency chain hlocks lost:            0
     :
   zapped classes:                       1541
   zapped lock chains:                  56765
   large chain blocks:                      1

By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the
largest chain block. The system still worked and We got the following
lockdep_stats data:

   dependency chains:                   18601 [max: 65536]
   dependency chain hlocks used:        73133 [max: 327680]
   dependency chain hlocks lost:            0
     :
   zapped classes:                       1541
   zapped lock chains:                  56702
   large chain blocks:                  45165
   large chain block size:              20165

By running the test again, I was indeed able to cause chain_hlocks
entries to get lost:

   dependency chain hlocks used:        74806 [max: 327680]
   dependency chain hlocks lost:          575
     :
   large chain blocks:                  48737
   large chain block size:                  7

Due to the fragmentation, it is possible that the
"MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of
of chain_hlocks entries appear to be free.

Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that
few variable sized chain blocks, other than the initial one, should
ever be present in bucket 0.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com
2020-02-11 13:10:52 +01:00
..
lock_events.c locking/lock_events: Don't show pvqspinlock events on bare metal 2019-04-10 10:56:05 +02:00
lock_events.h locking/lock_events: Use raw_cpu_{add,inc}() for stats 2019-06-03 12:32:56 +02:00
lock_events_list.h locking/rwsem: Adaptive disabling of reader optimistic spinning 2019-06-17 12:28:09 +02:00
lockdep.c locking/lockdep: Reuse freed chain_hlocks entries 2020-02-11 13:10:52 +01:00
lockdep_internals.h locking/lockdep: Reuse freed chain_hlocks entries 2020-02-11 13:10:52 +01:00
lockdep_proc.c locking/lockdep: Reuse freed chain_hlocks entries 2020-02-11 13:10:52 +01:00
lockdep_states.h
locktorture.c locking: locktorture: Do not include rwlock.h directly 2019-10-05 11:50:24 -07:00
Makefile locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c 2019-06-17 12:27:57 +02:00
mcs_spinlock.h locking/mcs: Use smp_cond_load_acquire() in MCS spin loop 2018-04-27 09:48:49 +02:00
mutex-debug.c locking/mutex: Replace spin_is_locked() with lockdep 2018-11-12 09:06:22 -08:00
mutex-debug.h
mutex.c Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts" 2019-12-11 00:27:43 +01:00
mutex.h mutex: Fix up mutex_waiter usage 2019-08-08 09:09:25 +02:00
osq_lock.c locking/osq: Use optimized spinning loop for arm64 2020-01-17 10:19:30 +01:00
percpu-rwsem.c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2019-06-28 19:46:47 +02:00
qrwlock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
qspinlock.c locking/qspinlock: Fix inaccessible URL of MCS lock paper 2020-01-17 10:19:30 +01:00
qspinlock_paravirt.h Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-09-25 10:22:37 +02:00
qspinlock_stat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex.c locking/lockdep: Remove unused @nested argument from lock_release() 2019-10-09 12:46:10 +02:00
rtmutex.h
rtmutex_common.h locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter() 2018-03-28 23:01:30 +02:00
rwsem.c locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN 2020-01-17 10:19:27 +01:00
rwsem.h locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c 2019-06-17 12:27:57 +02:00
semaphore.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 2019-06-05 17:37:17 +02:00
spinlock.c asm-generic/mmiowb: Add generic implementation of mmiowb() tracking 2019-04-08 11:59:39 +01:00
spinlock_debug.c locking/spinlock/debug: Fix various data races 2019-11-29 08:03:27 +01:00
test-ww_mutex.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 2019-05-21 11:28:40 +02:00