From bd20e20c4d64e131498ec91f1b066cd4708088b3 Mon Sep 17 00:00:00 2001 From: Yin Fengwei Date: Tue, 8 Aug 2023 10:09:17 +0800 Subject: [PATCH] madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check commit 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 upstream. Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. It's not correct for large folios. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here. User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then. NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready. Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") Signed-off-by: Yin Fengwei Reviewed-by: Yu Zhao Reviewed-by: Ryan Roberts Cc: David Hildenbrand Cc: Kefeng Wang Cc: Matthew Wilcox Cc: Minchan Kim Cc: Vishal Moola (Oracle) Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 19 +++++++++++++++++++ mm/madvise.c | 4 ++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f83a1c9ec8e4..104ec00823da 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1727,6 +1727,25 @@ static inline size_t folio_size(struct folio *folio) return PAGE_SIZE << folio_order(folio); } +/** + * folio_estimated_sharers - Estimate the number of sharers of a folio. + * @folio: The folio. + * + * folio_estimated_sharers() aims to serve as a function to efficiently + * estimate the number of processes sharing a folio. This is done by + * looking at the precise mapcount of the first subpage in the folio, and + * assuming the other subpages are the same. This may not be true for large + * folios. If you want exact mapcounts for exact calculations, look at + * page_mapcount() or folio_total_mapcount(). + * + * Return: The estimated number of processes sharing a folio. + */ +static inline int folio_estimated_sharers(struct folio *folio) +{ + return page_mapcount(folio_page(folio, 0)); +} + + #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE static inline int arch_make_page_accessible(struct page *page) { diff --git a/mm/madvise.c b/mm/madvise.c index d03e149ffe6e..5973399b2f9b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -654,8 +654,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, * deactivate all pages. */ if (folio_test_large(folio)) { - if (folio_mapcount(folio) != 1) - goto out; + if (folio_estimated_sharers(folio) != 1) + break; folio_get(folio); if (!folio_trylock(folio)) { folio_put(folio);