linux-stable/include/linux/migrate.h
Alistair Popple 16ce101db8 mm/memory.c: fix race when faulting a device private page
Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace. 
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting. 
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code. 
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.


This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00

215 lines
7 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_MIGRATE_H
#define _LINUX_MIGRATE_H
#include <linux/mm.h>
#include <linux/mempolicy.h>
#include <linux/migrate_mode.h>
#include <linux/hugetlb.h>
typedef struct page *new_page_t(struct page *page, unsigned long private);
typedef void free_page_t(struct page *page, unsigned long private);
struct migration_target_control;
/*
* Return values from addresss_space_operations.migratepage():
* - negative errno on page migration failure;
* - zero on page migration success;
*/
#define MIGRATEPAGE_SUCCESS 0
/**
* struct movable_operations - Driver page migration
* @isolate_page:
* The VM calls this function to prepare the page to be moved. The page
* is locked and the driver should not unlock it. The driver should
* return ``true`` if the page is movable and ``false`` if it is not
* currently movable. After this function returns, the VM uses the
* page->lru field, so the driver must preserve any information which
* is usually stored here.
*
* @migrate_page:
* After isolation, the VM calls this function with the isolated
* @src page. The driver should copy the contents of the
* @src page to the @dst page and set up the fields of @dst page.
* Both pages are locked.
* If page migration is successful, the driver should call
* __ClearPageMovable(@src) and return MIGRATEPAGE_SUCCESS.
* If the driver cannot migrate the page at the moment, it can return
* -EAGAIN. The VM interprets this as a temporary migration failure and
* will retry it later. Any other error value is a permanent migration
* failure and migration will not be retried.
* The driver shouldn't touch the @src->lru field while in the
* migrate_page() function. It may write to @dst->lru.
*
* @putback_page:
* If migration fails on the isolated page, the VM informs the driver
* that the page is no longer a candidate for migration by calling
* this function. The driver should put the isolated page back into
* its own data structure.
*/
struct movable_operations {
bool (*isolate_page)(struct page *, isolate_mode_t);
int (*migrate_page)(struct page *dst, struct page *src,
enum migrate_mode);
void (*putback_page)(struct page *);
};
/* Defined in mm/debug.c: */
extern const char *migrate_reason_names[MR_TYPES];
#ifdef CONFIG_MIGRATION
extern void putback_movable_pages(struct list_head *l);
int migrate_folio_extra(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode, int extra_count);
int migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode);
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason,
unsigned int *ret_succeeded);
extern struct page *alloc_migration_target(struct page *page, unsigned long private);
extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
int migrate_huge_page_move_mapping(struct address_space *mapping,
struct folio *dst, struct folio *src);
void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
spinlock_t *ptl);
void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
int folio_migrate_mapping(struct address_space *mapping,
struct folio *newfolio, struct folio *folio, int extra_count);
#else
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason, unsigned int *ret_succeeded)
{ return -ENOSYS; }
static inline struct page *alloc_migration_target(struct page *page,
unsigned long private)
{ return NULL; }
static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
struct folio *dst, struct folio *src)
{
return -ENOSYS;
}
#endif /* CONFIG_MIGRATION */
#ifdef CONFIG_COMPACTION
bool PageMovable(struct page *page);
void __SetPageMovable(struct page *page, const struct movable_operations *ops);
void __ClearPageMovable(struct page *page);
#else
static inline bool PageMovable(struct page *page) { return false; }
static inline void __SetPageMovable(struct page *page,
const struct movable_operations *ops)
{
}
static inline void __ClearPageMovable(struct page *page)
{
}
#endif
static inline bool folio_test_movable(struct folio *folio)
{
return PageMovable(&folio->page);
}
static inline
const struct movable_operations *page_movable_ops(struct page *page)
{
VM_BUG_ON(!__PageMovable(page));
return (const struct movable_operations *)
((unsigned long)page->mapping - PAGE_MAPPING_MOVABLE);
}
#ifdef CONFIG_NUMA_BALANCING
extern int migrate_misplaced_page(struct page *page,
struct vm_area_struct *vma, int node);
#else
static inline int migrate_misplaced_page(struct page *page,
struct vm_area_struct *vma, int node)
{
return -EAGAIN; /* can't migrate now */
}
#endif /* CONFIG_NUMA_BALANCING */
#ifdef CONFIG_MIGRATION
/*
* Watch out for PAE architecture, which has an unsigned long, and might not
* have enough bits to store all physical address and flags. So far we have
* enough room for all our flags.
*/
#define MIGRATE_PFN_VALID (1UL << 0)
#define MIGRATE_PFN_MIGRATE (1UL << 1)
#define MIGRATE_PFN_WRITE (1UL << 3)
#define MIGRATE_PFN_SHIFT 6
static inline struct page *migrate_pfn_to_page(unsigned long mpfn)
{
if (!(mpfn & MIGRATE_PFN_VALID))
return NULL;
return pfn_to_page(mpfn >> MIGRATE_PFN_SHIFT);
}
static inline unsigned long migrate_pfn(unsigned long pfn)
{
return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
}
enum migrate_vma_direction {
MIGRATE_VMA_SELECT_SYSTEM = 1 << 0,
MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1,
MIGRATE_VMA_SELECT_DEVICE_COHERENT = 1 << 2,
};
struct migrate_vma {
struct vm_area_struct *vma;
/*
* Both src and dst array must be big enough for
* (end - start) >> PAGE_SHIFT entries.
*
* The src array must not be modified by the caller after
* migrate_vma_setup(), and must not change the dst array after
* migrate_vma_pages() returns.
*/
unsigned long *dst;
unsigned long *src;
unsigned long cpages;
unsigned long npages;
unsigned long start;
unsigned long end;
/*
* Set to the owner value also stored in page->pgmap->owner for
* migrating out of device private memory. The flags also need to
* be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE.
* The caller should always set this field when using mmu notifier
* callbacks to avoid device MMU invalidations for device private
* pages that are not being migrated.
*/
void *pgmap_owner;
unsigned long flags;
/*
* Set to vmf->page if this is being called to migrate a page as part of
* a migrate_to_ram() callback.
*/
struct page *fault_page;
};
int migrate_vma_setup(struct migrate_vma *args);
void migrate_vma_pages(struct migrate_vma *migrate);
void migrate_vma_finalize(struct migrate_vma *migrate);
#endif /* CONFIG_MIGRATION */
#endif /* _LINUX_MIGRATE_H */