linux-stable/arch/s390
Alexander Gordeev 9d28671017 s390/mm: fix 2KB pgtable release race
commit c2c224932f upstream.

There is a race on concurrent 2KB-pgtables release paths when
both upper and lower halves of the containing parent page are
freed, one via page_table_free_rcu() + __tlb_remove_table(),
and the other via page_table_free(). The race might lead to a
corruption as result of remove of list item in page_table_free()
concurrently with __free_page() in __tlb_remove_table().

Let's assume first the lower and next the upper 2KB-pgtables are
freed from a page. Since both halves of the page are allocated
the tracking byte (bits 24-31 of the page _refcount) has value
of 0x03 initially:

CPU0				CPU1
----				----

page_table_free_rcu() // lower half
{
	// _refcount[31..24] == 0x03
	...
	atomic_xor_bits(&page->_refcount,
			0x11U << (0 + 24));
	// _refcount[31..24] <= 0x12
	...
	table = table | (1U << 0);
	tlb_remove_table(tlb, table);
}
...
__tlb_remove_table()
{
	// _refcount[31..24] == 0x12
	mask = _table & 3;
	// mask <= 0x01
	...

				page_table_free() // upper half
				{
					// _refcount[31..24] == 0x12
					...
					atomic_xor_bits(
						&page->_refcount,
						1U << (1 + 24));
					// _refcount[31..24] <= 0x10
					// mask <= 0x10
					...
	atomic_xor_bits(&page->_refcount,
			mask << (4 + 24));
	// _refcount[31..24] <= 0x00
	// mask <= 0x00
	...
	if (mask != 0) // == false
		break;
	fallthrough;
	...
					if (mask & 3) // == false
						...
					else
	__free_page(page);			list_del(&page->lru);
	^^^^^^^^^^^^^^^^^^	RACE!		^^^^^^^^^^^^^^^^^^^^^
}					...
				}

The problem is page_table_free() releases the page as result of
lower nibble unset and __tlb_remove_table() observing zero too
early. With this update page_table_free() will use the similar
logic as page_table_free_rcu() + __tlb_remove_table(), and mark
the fragment as pending for removal in the upper nibble until
after the list_del().

In other words, the parent page is considered as unreferenced and
safe to release only when the lower nibble is cleared already and
unsetting a bit in upper nibble results in that nibble turned zero.

Cc: stable@vger.kernel.org
Suggested-by: Vlastimil Babka <vbabka@suse.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 11:05:10 +01:00
..
appldata
boot s390/boot: simplify and fix kernel memory layout setup 2021-11-25 09:48:45 +01:00
configs s390: update defconfigs 2021-09-15 14:29:21 +02:00
crypto s390/archrandom: add parameter check for s390_arch_random_generate 2021-04-21 12:32:12 +02:00
hypfs s390: rename dma section to amode31 2021-08-05 14:10:53 +02:00
include s390/pci: move pseudo-MMIO to prevent MIO overlap 2021-12-08 09:04:42 +01:00
kernel s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() 2022-01-16 09:12:41 +01:00
kvm KVM: s390: Clarify SIGP orders versus STOP/RESTART 2022-01-20 09:13:14 +01:00
lib s390/test_unwind: use raw opcode instead of invalid instruction 2021-12-17 10:30:14 +01:00
mm s390/mm: fix 2KB pgtable release race 2022-01-27 11:05:10 +01:00
net bpf, s390: Fix potential memory leak about jit_data 2021-10-04 09:49:10 +02:00
pci s390/pci: fix zpci_zdev_put() on reserve 2021-10-04 09:49:10 +02:00
purgatory s390: enable KCSAN 2021-07-30 17:09:23 +02:00
tools s390/disassembler: add instructions 2021-07-27 09:39:19 +02:00
Kbuild s390/numa: move code to arch/s390/kernel 2020-08-11 18:16:55 +02:00
Kconfig s390/boot: simplify and fix kernel memory layout setup 2021-11-25 09:48:45 +01:00
Kconfig.debug tracing: Refactor TRACE_IRQFLAGS_SUPPORT in Kconfig 2021-08-16 11:37:21 -04:00
Makefile s390/vdso: filter out -mstack-guard and -mstack-size 2021-11-25 09:48:45 +01:00