From 51b8c1fe250d1bd70c1722dc3c414f5cff2d7cca Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 8 Nov 2021 18:31:24 -0800 Subject: [PATCH 01/87] vfs: keep inodes with page cache off the inode shrinker LRU Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit a76cf1a474d7 ("mm: don't reclaim inodes with many attached pages")) and failed (commit 69056ee6a8a3 ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Cc: Roman Gushchin Cc: Tejun Heo Cc: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/inode.c | 46 +++++++++++++++++++++---------------- fs/internal.h | 1 - include/linux/fs.h | 1 + include/linux/pagemap.h | 50 +++++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 8 +++++++ mm/truncate.c | 19 ++++++++++++++-- mm/vmscan.c | 7 ++++++ mm/workingset.c | 10 +++++++++ 8 files changed, 120 insertions(+), 22 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index ed0cab8a32db..a49695f57e1e 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -428,11 +428,20 @@ void ihold(struct inode *inode) } EXPORT_SYMBOL(ihold); -static void inode_lru_list_add(struct inode *inode) +static void __inode_add_lru(struct inode *inode, bool rotate) { + if (inode->i_state & (I_DIRTY_ALL | I_SYNC | I_FREEING | I_WILL_FREE)) + return; + if (atomic_read(&inode->i_count)) + return; + if (!(inode->i_sb->s_flags & SB_ACTIVE)) + return; + if (!mapping_shrinkable(&inode->i_data)) + return; + if (list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru)) this_cpu_inc(nr_unused); - else + else if (rotate) inode->i_state |= I_REFERENCED; } @@ -443,16 +452,11 @@ static void inode_lru_list_add(struct inode *inode) */ void inode_add_lru(struct inode *inode) { - if (!(inode->i_state & (I_DIRTY_ALL | I_SYNC | - I_FREEING | I_WILL_FREE)) && - !atomic_read(&inode->i_count) && inode->i_sb->s_flags & SB_ACTIVE) - inode_lru_list_add(inode); + __inode_add_lru(inode, false); } - static void inode_lru_list_del(struct inode *inode) { - if (list_lru_del(&inode->i_sb->s_inode_lru, &inode->i_lru)) this_cpu_dec(nr_unused); } @@ -728,10 +732,6 @@ again: /* * Isolate the inode from the LRU in preparation for freeing it. * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. If the inode has metadata buffers attached to - * mapping->private_list then try to remove them. - * * If the inode has the I_REFERENCED flag set, then it means that it has been * used recently - the flag is set in iput_final(). When we encounter such an * inode, clear the flag and move it to the back of the LRU so it gets another @@ -747,31 +747,39 @@ static enum lru_status inode_lru_isolate(struct list_head *item, struct inode *inode = container_of(item, struct inode, i_lru); /* - * we are inverting the lru lock/inode->i_lock here, so use a trylock. - * If we fail to get the lock, just skip it. + * We are inverting the lru lock/inode->i_lock here, so use a + * trylock. If we fail to get the lock, just skip it. */ if (!spin_trylock(&inode->i_lock)) return LRU_SKIP; /* - * Referenced or dirty inodes are still in use. Give them another pass - * through the LRU as we canot reclaim them now. + * Inodes can get referenced, redirtied, or repopulated while + * they're already on the LRU, and this can make them + * unreclaimable for a while. Remove them lazily here; iput, + * sync, or the last page cache deletion will requeue them. */ if (atomic_read(&inode->i_count) || - (inode->i_state & ~I_REFERENCED)) { + (inode->i_state & ~I_REFERENCED) || + !mapping_shrinkable(&inode->i_data)) { list_lru_isolate(lru, &inode->i_lru); spin_unlock(&inode->i_lock); this_cpu_dec(nr_unused); return LRU_REMOVED; } - /* recently referenced inodes get one more pass */ + /* Recently referenced inodes get one more pass */ if (inode->i_state & I_REFERENCED) { inode->i_state &= ~I_REFERENCED; spin_unlock(&inode->i_lock); return LRU_ROTATE; } + /* + * On highmem systems, mapping_shrinkable() permits dropping + * page cache in order to free up struct inodes: lowmem might + * be under pressure before the cache inside the highmem zone. + */ if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) { __iget(inode); spin_unlock(&inode->i_lock); @@ -1638,7 +1646,7 @@ static void iput_final(struct inode *inode) if (!drop && !(inode->i_state & I_DONTCACHE) && (sb->s_flags & SB_ACTIVE)) { - inode_add_lru(inode); + __inode_add_lru(inode, true); spin_unlock(&inode->i_lock); return; } diff --git a/fs/internal.h b/fs/internal.h index 3cd065c8a66b..2854ff29f116 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -149,7 +149,6 @@ extern int vfs_open(const struct path *, struct file *); * inode.c */ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc); -extern void inode_add_lru(struct inode *inode); extern int dentry_needs_remove_privs(struct dentry *dentry); /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 226de651f52e..de35a2640e70 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3193,6 +3193,7 @@ static inline void remove_inode_hash(struct inode *inode) } extern void inode_sb_list_add(struct inode *inode); +extern void inode_add_lru(struct inode *inode); extern int sb_set_blocksize(struct super_block *, int); extern int sb_min_blocksize(struct super_block *, int); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 62db6b0176b9..5c74a45ff97a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -23,6 +23,56 @@ static inline bool mapping_empty(struct address_space *mapping) return xa_empty(&mapping->i_pages); } +/* + * mapping_shrinkable - test if page cache state allows inode reclaim + * @mapping: the page cache mapping + * + * This checks the mapping's cache state for the pupose of inode + * reclaim and LRU management. + * + * The caller is expected to hold the i_lock, but is not required to + * hold the i_pages lock, which usually protects cache state. That's + * because the i_lock and the list_lru lock that protect the inode and + * its LRU state don't nest inside the irq-safe i_pages lock. + * + * Cache deletions are performed under the i_lock, which ensures that + * when an inode goes empty, it will reliably get queued on the LRU. + * + * Cache additions do not acquire the i_lock and may race with this + * check, in which case we'll report the inode as shrinkable when it + * has cache pages. This is okay: the shrinker also checks the + * refcount and the referenced bit, which will be elevated or set in + * the process of adding new cache pages to an inode. + */ +static inline bool mapping_shrinkable(struct address_space *mapping) +{ + void *head; + + /* + * On highmem systems, there could be lowmem pressure from the + * inodes before there is highmem pressure from the page + * cache. Make inodes shrinkable regardless of cache state. + */ + if (IS_ENABLED(CONFIG_HIGHMEM)) + return true; + + /* Cache completely empty? Shrink away. */ + head = rcu_access_pointer(mapping->i_pages.xa_head); + if (!head) + return true; + + /* + * The xarray stores single offset-0 entries directly in the + * head pointer, which allows non-resident page cache entries + * to escape the shadow shrinker's list of xarray nodes. The + * inode shrinker needs to pick them up under memory pressure. + */ + if (!xa_is_node(head) && xa_is_value(head)) + return true; + + return false; +} + /* * Bits in mapping->flags. */ diff --git a/mm/filemap.c b/mm/filemap.c index b6140debc2da..06de9a17499f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -262,9 +262,13 @@ void delete_from_page_cache(struct page *page) struct address_space *mapping = page_mapping(page); BUG_ON(!PageLocked(page)); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); __delete_from_page_cache(page, NULL); xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); page_cache_free_page(mapping, page); } @@ -340,6 +344,7 @@ void delete_from_page_cache_batch(struct address_space *mapping, if (!pagevec_count(pvec)) return; + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); for (i = 0; i < pagevec_count(pvec); i++) { trace_mm_filemap_delete_from_page_cache(pvec->pages[i]); @@ -348,6 +353,9 @@ void delete_from_page_cache_batch(struct address_space *mapping, } page_cache_delete_batch(mapping, pvec); xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); for (i = 0; i < pagevec_count(pvec); i++) page_cache_free_page(mapping, pvec->pages[i]); diff --git a/mm/truncate.c b/mm/truncate.c index 714eaf19821d..cc83a3f7c1ad 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -45,9 +45,13 @@ static inline void __clear_shadow_entry(struct address_space *mapping, static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, void *entry) { + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); __clear_shadow_entry(mapping, index, entry); xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); } /* @@ -73,8 +77,10 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping, return; dax = dax_mapping(mapping); - if (!dax) + if (!dax) { + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); + } for (i = j; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; @@ -93,8 +99,12 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping, __clear_shadow_entry(mapping, index, page); } - if (!dax) + if (!dax) { xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); + } pvec->nr = j; } @@ -567,6 +577,7 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page) if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) return 0; + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); if (PageDirty(page)) goto failed; @@ -574,6 +585,9 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page) BUG_ON(page_has_private(page)); __delete_from_page_cache(page, NULL); xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); if (mapping->a_ops->freepage) mapping->a_ops->freepage(page); @@ -582,6 +596,7 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page) return 1; failed: xa_unlock_irq(&mapping->i_pages); + spin_unlock(&mapping->host->i_lock); return 0; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 41f5f6007c30..26f07518d7a4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1190,6 +1190,8 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, BUG_ON(!PageLocked(page)); BUG_ON(mapping != page_mapping(page)); + if (!PageSwapCache(page)) + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); /* * The non racy check for a busy page. @@ -1258,6 +1260,9 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, shadow = workingset_eviction(page, target_memcg); __delete_from_page_cache(page, shadow); xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); if (freepage != NULL) freepage(page); @@ -1267,6 +1272,8 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, cannot_free: xa_unlock_irq(&mapping->i_pages); + if (!PageSwapCache(page)) + spin_unlock(&mapping->host->i_lock); return 0; } diff --git a/mm/workingset.c b/mm/workingset.c index d5b81e4f4cbe..23df60ce2e10 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -543,6 +543,13 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, goto out; } + if (!spin_trylock(&mapping->host->i_lock)) { + xa_unlock(&mapping->i_pages); + spin_unlock_irq(lru_lock); + ret = LRU_RETRY; + goto out; + } + list_lru_isolate(lru, item); __dec_lruvec_kmem_state(node, WORKINGSET_NODES); @@ -562,6 +569,9 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, out_invalid: xa_unlock_irq(&mapping->i_pages); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); ret = LRU_REMOVED_RETRY; out: cond_resched(); From 83c1fd763b3220458652f7d656d5047292ebcde4 Mon Sep 17 00:00:00 2001 From: zhangyiru Date: Mon, 8 Nov 2021 18:31:27 -0800 Subject: [PATCH 02/87] mm,hugetlb: remove mlock ulimit for SHM_HUGETLB Commit 21a3c273f88c ("mm, hugetlb: add thread name and pid to SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012, but it is not deleted yet. Mike says he still sees that message in log files on occasion, so maybe we should preserve this warning. Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the user_shm_unlock after out. Link: https://lkml.kernel.org/r/20211103105857.25041-1-zhangyiru3@huawei.com Signed-off-by: zhangyiru Reviewed-by: Mike Kravetz Cc: Hugh Dickins Cc: Liu Zixian Cc: Michal Hocko Cc: wuxu.wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hugetlbfs/inode.c | 23 ++++++++--------------- include/linux/hugetlb.h | 6 ++---- ipc/shm.c | 8 +------- mm/memfd.c | 4 +--- mm/mmap.c | 3 +-- 5 files changed, 13 insertions(+), 31 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cdfb1ae78a3f..49d2e686be74 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1446,8 +1446,8 @@ static int get_hstate_idx(int page_size_log) * otherwise hugetlb_reserve_pages reserves one less hugepages than intended. */ struct file *hugetlb_file_setup(const char *name, size_t size, - vm_flags_t acctflag, struct ucounts **ucounts, - int creat_flags, int page_size_log) + vm_flags_t acctflag, int creat_flags, + int page_size_log) { struct inode *inode; struct vfsmount *mnt; @@ -1458,22 +1458,19 @@ struct file *hugetlb_file_setup(const char *name, size_t size, if (hstate_idx < 0) return ERR_PTR(-ENODEV); - *ucounts = NULL; mnt = hugetlbfs_vfsmount[hstate_idx]; if (!mnt) return ERR_PTR(-ENOENT); if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { - *ucounts = current_ucounts(); - if (user_shm_lock(size, *ucounts)) { - task_lock(current); - pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n", + struct ucounts *ucounts = current_ucounts(); + + if (user_shm_lock(size, ucounts)) { + pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is obsolete\n", current->comm, current->pid); - task_unlock(current); - } else { - *ucounts = NULL; - return ERR_PTR(-EPERM); + user_shm_unlock(size, ucounts); } + return ERR_PTR(-EPERM); } file = ERR_PTR(-ENOSPC); @@ -1498,10 +1495,6 @@ struct file *hugetlb_file_setup(const char *name, size_t size, iput(inode); out: - if (*ucounts) { - user_shm_unlock(size, *ucounts); - *ucounts = NULL; - } return file; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 44c2ab0dfa59..00351ccb49a3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -477,8 +477,7 @@ static inline struct hugetlbfs_inode_info *HUGETLBFS_I(struct inode *inode) extern const struct file_operations hugetlbfs_file_operations; extern const struct vm_operations_struct hugetlb_vm_ops; struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct, - struct ucounts **ucounts, int creat_flags, - int page_size_log); + int creat_flags, int page_size_log); static inline bool is_file_hugepages(struct file *file) { @@ -497,8 +496,7 @@ static inline struct hstate *hstate_inode(struct inode *i) #define is_file_hugepages(file) false static inline struct file * hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag, - struct ucounts **ucounts, int creat_flags, - int page_size_log) + int creat_flags, int page_size_log) { return ERR_PTR(-ENOSYS); } diff --git a/ipc/shm.c b/ipc/shm.c index ab749be6d8b7..4942bdd65748 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -287,9 +287,6 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp) shm_unlock(shp); if (!is_file_hugepages(shm_file)) shmem_lock(shm_file, 0, shp->mlock_ucounts); - else if (shp->mlock_ucounts) - user_shm_unlock(i_size_read(file_inode(shm_file)), - shp->mlock_ucounts); fput(shm_file); ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); @@ -650,8 +647,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) if (shmflg & SHM_NORESERVE) acctflag = VM_NORESERVE; file = hugetlb_file_setup(name, hugesize, acctflag, - &shp->mlock_ucounts, HUGETLB_SHMFS_INODE, - (shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK); + HUGETLB_SHMFS_INODE, (shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK); } else { /* * Do not allow no accounting for OVERCOMMIT_NEVER, even @@ -698,8 +694,6 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) no_id: ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); - if (is_file_hugepages(file) && shp->mlock_ucounts) - user_shm_unlock(size, shp->mlock_ucounts); fput(file); ipc_rcu_putref(&shp->shm_perm, shm_rcu_free); return error; diff --git a/mm/memfd.c b/mm/memfd.c index 081dd33e6a61..9f80f162791a 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -297,9 +297,7 @@ SYSCALL_DEFINE2(memfd_create, } if (flags & MFD_HUGETLB) { - struct ucounts *ucounts = NULL; - - file = hugetlb_file_setup(name, 0, VM_NORESERVE, &ucounts, + file = hugetlb_file_setup(name, 0, VM_NORESERVE, HUGETLB_ANONHUGE_INODE, (flags >> MFD_HUGE_SHIFT) & MFD_HUGE_MASK); diff --git a/mm/mmap.c b/mm/mmap.c index b22a07f5e761..bfb0ea164a90 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1599,7 +1599,6 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, goto out_fput; } } else if (flags & MAP_HUGETLB) { - struct ucounts *ucounts = NULL; struct hstate *hs; hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); @@ -1615,7 +1614,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, */ file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE, - &ucounts, HUGETLB_ANONHUGE_INODE, + HUGETLB_ANONHUGE_INODE, (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); if (IS_ERR(file)) return PTR_ERR(file); From 0658a0961b0ace06b4cf0e1b73a4f20e349f4346 Mon Sep 17 00:00:00 2001 From: Florian Weimer Date: Mon, 8 Nov 2021 18:31:30 -0800 Subject: [PATCH 03/87] procfs: do not list TID 0 in /proc//task If a task exits concurrently, task_pid_nr_ns may return 0. [akpm@linux-foundation.org: coding style tweaks] [adobriyan@gmail.com: test that /proc/*/task doesn't contain "0"] Link: https://lkml.kernel.org/r/YV88AnVzHxPafQ9o@localhost.localdomain Link: https://lkml.kernel.org/r/8735pn5dx7.fsf@oldenburg.str.redhat.com Signed-off-by: Florian Weimer Signed-off-by: Alexey Dobriyan Acked-by: Christian Brauner Reviewed-by: Alexey Dobriyan Cc: Kees Cook Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 3 + tools/testing/selftests/proc/.gitignore | 1 + tools/testing/selftests/proc/Makefile | 2 + tools/testing/selftests/proc/proc-tid0.c | 81 ++++++++++++++++++++++++ 4 files changed, 87 insertions(+) create mode 100644 tools/testing/selftests/proc/proc-tid0.c diff --git a/fs/proc/base.c b/fs/proc/base.c index 533d5836eb9a..5541de99809c 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3799,7 +3799,10 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx) task = next_tid(task), ctx->pos++) { char name[10 + 1]; unsigned int len; + tid = task_pid_nr_ns(task, ns); + if (!tid) + continue; /* The task has just exited. */ len = snprintf(name, sizeof(name), "%u", tid); if (!proc_fill_cache(file, ctx, name, len, proc_task_instantiate, task, NULL)) { diff --git a/tools/testing/selftests/proc/.gitignore b/tools/testing/selftests/proc/.gitignore index 8f3e72e626fa..c4e6a34f9657 100644 --- a/tools/testing/selftests/proc/.gitignore +++ b/tools/testing/selftests/proc/.gitignore @@ -11,6 +11,7 @@ /proc-self-syscall /proc-self-wchan /proc-subset-pid +/proc-tid0 /proc-uptime-001 /proc-uptime-002 /read diff --git a/tools/testing/selftests/proc/Makefile b/tools/testing/selftests/proc/Makefile index 1054e40a499a..219fc6113847 100644 --- a/tools/testing/selftests/proc/Makefile +++ b/tools/testing/selftests/proc/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only CFLAGS += -Wall -O2 -Wno-unused-function CFLAGS += -D_GNU_SOURCE +LDFLAGS += -pthread TEST_GEN_PROGS := TEST_GEN_PROGS += fd-001-lookup @@ -13,6 +14,7 @@ TEST_GEN_PROGS += proc-self-map-files-002 TEST_GEN_PROGS += proc-self-syscall TEST_GEN_PROGS += proc-self-wchan TEST_GEN_PROGS += proc-subset-pid +TEST_GEN_PROGS += proc-tid0 TEST_GEN_PROGS += proc-uptime-001 TEST_GEN_PROGS += proc-uptime-002 TEST_GEN_PROGS += read diff --git a/tools/testing/selftests/proc/proc-tid0.c b/tools/testing/selftests/proc/proc-tid0.c new file mode 100644 index 000000000000..58c1d7c90a8e --- /dev/null +++ b/tools/testing/selftests/proc/proc-tid0.c @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2021 Alexey Dobriyan + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ +// Test that /proc/*/task never contains "0". +#include +#include +#include +#include +#include +#include +#include +#include + +static pid_t pid = -1; + +static void atexit_hook(void) +{ + if (pid > 0) { + kill(pid, SIGKILL); + } +} + +static void *f(void *_) +{ + return NULL; +} + +static void sigalrm(int _) +{ + exit(0); +} + +int main(void) +{ + pid = fork(); + if (pid == 0) { + /* child */ + while (1) { + pthread_t pth; + pthread_create(&pth, NULL, f, NULL); + pthread_join(pth, NULL); + } + } else if (pid > 0) { + /* parent */ + atexit(atexit_hook); + + char buf[64]; + snprintf(buf, sizeof(buf), "/proc/%u/task", pid); + + signal(SIGALRM, sigalrm); + alarm(1); + + while (1) { + DIR *d = opendir(buf); + struct dirent *de; + while ((de = readdir(d))) { + if (strcmp(de->d_name, "0") == 0) { + exit(1); + } + } + closedir(d); + } + + return 0; + } else { + perror("fork"); + return 1; + } +} From 434b90f39e662f0cae658f7701f77d9b09457796 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:33 -0800 Subject: [PATCH 04/87] x86/xen: update xen_oldmem_pfn_is_ram() documentation After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: "Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut" This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [2] https://github.com/dracutdevs/dracut/pull/1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/20211005121430.30136-1-david@redhat.com Link: https://lkml.kernel.org/r/20211005121430.30136-2-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Boris Ostrovsky Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Juergen Gross Cc: Stefano Stabellini Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Dave Young Cc: Baoquan He Cc: Vivek Goyal Cc: Michal Hocko Cc: Oscar Salvador Cc: Mike Rapoport Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/xen/mmu_hvm.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c index 57409373750f..b242d1f4b426 100644 --- a/arch/x86/xen/mmu_hvm.c +++ b/arch/x86/xen/mmu_hvm.c @@ -9,12 +9,9 @@ #ifdef CONFIG_PROC_VMCORE /* - * This function is used in two contexts: - * - the kdump kernel has to check whether a pfn of the crashed kernel - * was a ballooned page. vmcore is using this function to decide - * whether to access a pfn of the crashed kernel. - * - the kexec kernel has to check whether a pfn was ballooned by the - * previous kernel. If the pfn is ballooned, handle it properly. + * The kdump kernel has to check whether a pfn of the crashed kernel + * was a ballooned page. vmcore is using this function to decide + * whether to access a pfn of the crashed kernel. * Returns 0 if the pfn is not backed by a RAM page, the caller may * handle the pfn special in this case. */ From d452a4894983a94c57fa15eb34dbd88228280936 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:37 -0800 Subject: [PATCH 05/87] x86/xen: simplify xen_oldmem_pfn_is_ram() Let's simplify return handling. Link: https://lkml.kernel.org/r/20211005121430.30136-3-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/xen/mmu_hvm.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c index b242d1f4b426..d1b38c77352b 100644 --- a/arch/x86/xen/mmu_hvm.c +++ b/arch/x86/xen/mmu_hvm.c @@ -21,23 +21,10 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn) .domid = DOMID_SELF, .pfn = pfn, }; - int ram; if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a)) return -ENXIO; - - switch (a.mem_type) { - case HVMMEM_mmio_dm: - ram = 0; - break; - case HVMMEM_ram_rw: - case HVMMEM_ram_ro: - default: - ram = 1; - break; - } - - return ram; + return a.mem_type != HVMMEM_mmio_dm; } #endif From 934fadf438b33f3fe952924702eb3254e1884d2f Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:41 -0800 Subject: [PATCH 06/87] x86/xen: print a warning when HVMOP_get_mem_type fails HVMOP_get_mem_type is not expected to fail, "This call failing is indication of something going quite wrong and it would be good to know about this." [1] Let's add a pr_warn_once(). Link: https://lkml.kernel.org/r/3b935aa0-6d85-0bcd-100e-15098add3c4c@oracle.com [1] Link: https://lkml.kernel.org/r/20211005121430.30136-4-david@redhat.com Signed-off-by: David Hildenbrand Suggested-by: Boris Ostrovsky Cc: Baoquan He Cc: Borislav Petkov Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/xen/mmu_hvm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c index d1b38c77352b..6ba8826dcdcc 100644 --- a/arch/x86/xen/mmu_hvm.c +++ b/arch/x86/xen/mmu_hvm.c @@ -22,8 +22,10 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn) .pfn = pfn, }; - if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a)) + if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a)) { + pr_warn_once("Unexpected HVMOP_get_mem_type failure\n"); return -ENXIO; + } return a.mem_type != HVMMEM_mmio_dm; } #endif From 2c9feeaedfe1f34c4e327cb6f9b8b90052edced1 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:44 -0800 Subject: [PATCH 07/87] proc/vmcore: let pfn_is_ram() return a bool The callback should deal with errors internally, it doesn't make sense to expose these via pfn_is_ram(). We'll rework the callbacks next. Right now we consider errors as if "it's RAM"; no functional change. Link: https://lkml.kernel.org/r/20211005121430.30136-5-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/vmcore.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 9a15334da208..a9bd80ab670e 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -84,11 +84,11 @@ void unregister_oldmem_pfn_is_ram(void) } EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram); -static int pfn_is_ram(unsigned long pfn) +static bool pfn_is_ram(unsigned long pfn) { int (*fn)(unsigned long pfn); /* pfn is ram unless fn() checks pagetype */ - int ret = 1; + bool ret = true; /* * Ask hypervisor if the pfn is really ram. @@ -97,7 +97,7 @@ static int pfn_is_ram(unsigned long pfn) */ fn = oldmem_pfn_is_ram; if (fn) - ret = fn(pfn); + ret = !!fn(pfn); return ret; } @@ -124,7 +124,7 @@ ssize_t read_from_oldmem(char *buf, size_t count, nr_bytes = count; /* If pfn is not ram, return zeros for sparse dump files */ - if (pfn_is_ram(pfn) == 0) + if (!pfn_is_ram(pfn)) memset(buf, 0, nr_bytes); else { if (encrypted) From cc5f2704c93456e6f1c671c5cf7b582a55df5484 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:48 -0800 Subject: [PATCH 08/87] proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks Let's support multiple registered callbacks, making sure that registering vmcore callbacks cannot fail. Make the callback return a bool instead of an int, handling how to deal with errors internally. Drop unused HAVE_OLDMEM_PFN_IS_RAM. We soon want to make use of this infrastructure from other drivers: virtio-mem, registering one callback for each virtio-mem device, to prevent reading unplugged virtio-mem memory. Handle it via a generic vmcore_cb structure, prepared for future extensions: for example, once we support virtio-mem on s390x where the vmcore is completely constructed in the second kernel, we want to detect and add plugged virtio-mem memory ranges to the vmcore in order for them to get dumped properly. Handle corner cases that are unexpected and shouldn't happen in sane setups: registering a callback after the vmcore has already been opened (warn only) and unregistering a callback after the vmcore has already been opened (warn and essentially read only zeroes from that point on). Link: https://lkml.kernel.org/r/20211005121430.30136-6-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/kernel/aperture_64.c | 13 ++++- arch/x86/xen/mmu_hvm.c | 11 ++-- fs/proc/vmcore.c | 97 ++++++++++++++++++++++++----------- include/linux/crash_dump.h | 26 ++++++++-- 4 files changed, 110 insertions(+), 37 deletions(-) diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c index 10562885f5fc..af3ba08b684b 100644 --- a/arch/x86/kernel/aperture_64.c +++ b/arch/x86/kernel/aperture_64.c @@ -73,12 +73,23 @@ static int gart_mem_pfn_is_ram(unsigned long pfn) (pfn >= aperture_pfn_start + aperture_page_count)); } +#ifdef CONFIG_PROC_VMCORE +static bool gart_oldmem_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn) +{ + return !!gart_mem_pfn_is_ram(pfn); +} + +static struct vmcore_cb gart_vmcore_cb = { + .pfn_is_ram = gart_oldmem_pfn_is_ram, +}; +#endif + static void __init exclude_from_core(u64 aper_base, u32 aper_order) { aperture_pfn_start = aper_base >> PAGE_SHIFT; aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT; #ifdef CONFIG_PROC_VMCORE - WARN_ON(register_oldmem_pfn_is_ram(&gart_mem_pfn_is_ram)); + register_vmcore_cb(&gart_vmcore_cb); #endif #ifdef CONFIG_PROC_KCORE WARN_ON(register_mem_pfn_is_ram(&gart_mem_pfn_is_ram)); diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c index 6ba8826dcdcc..509bdee3ab90 100644 --- a/arch/x86/xen/mmu_hvm.c +++ b/arch/x86/xen/mmu_hvm.c @@ -12,10 +12,10 @@ * The kdump kernel has to check whether a pfn of the crashed kernel * was a ballooned page. vmcore is using this function to decide * whether to access a pfn of the crashed kernel. - * Returns 0 if the pfn is not backed by a RAM page, the caller may + * Returns "false" if the pfn is not backed by a RAM page, the caller may * handle the pfn special in this case. */ -static int xen_oldmem_pfn_is_ram(unsigned long pfn) +static bool xen_vmcore_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn) { struct xen_hvm_get_mem_type a = { .domid = DOMID_SELF, @@ -24,10 +24,13 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn) if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a)) { pr_warn_once("Unexpected HVMOP_get_mem_type failure\n"); - return -ENXIO; + return true; } return a.mem_type != HVMMEM_mmio_dm; } +static struct vmcore_cb xen_vmcore_cb = { + .pfn_is_ram = xen_vmcore_pfn_is_ram, +}; #endif static void xen_hvm_exit_mmap(struct mm_struct *mm) @@ -61,6 +64,6 @@ void __init xen_hvm_init_mmu_ops(void) if (is_pagetable_dying_supported()) pv_ops.mmu.exit_mmap = xen_hvm_exit_mmap; #ifdef CONFIG_PROC_VMCORE - WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram)); + register_vmcore_cb(&xen_vmcore_cb); #endif } diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index a9bd80ab670e..7a04b2eca287 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -62,46 +62,75 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0); /* Device Dump Size */ static size_t vmcoredd_orig_sz; -/* - * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error - * The called function has to take care of module refcounting. - */ -static int (*oldmem_pfn_is_ram)(unsigned long pfn); +static DECLARE_RWSEM(vmcore_cb_rwsem); +/* List of registered vmcore callbacks. */ +static LIST_HEAD(vmcore_cb_list); +/* Whether we had a surprise unregistration of a callback. */ +static bool vmcore_cb_unstable; +/* Whether the vmcore has been opened once. */ +static bool vmcore_opened; -int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn)) +void register_vmcore_cb(struct vmcore_cb *cb) { - if (oldmem_pfn_is_ram) - return -EBUSY; - oldmem_pfn_is_ram = fn; - return 0; + down_write(&vmcore_cb_rwsem); + INIT_LIST_HEAD(&cb->next); + list_add_tail(&cb->next, &vmcore_cb_list); + /* + * Registering a vmcore callback after the vmcore was opened is + * very unusual (e.g., manual driver loading). + */ + if (vmcore_opened) + pr_warn_once("Unexpected vmcore callback registration\n"); + up_write(&vmcore_cb_rwsem); } -EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram); +EXPORT_SYMBOL_GPL(register_vmcore_cb); -void unregister_oldmem_pfn_is_ram(void) +void unregister_vmcore_cb(struct vmcore_cb *cb) { - oldmem_pfn_is_ram = NULL; - wmb(); + down_write(&vmcore_cb_rwsem); + list_del(&cb->next); + /* + * Unregistering a vmcore callback after the vmcore was opened is + * very unusual (e.g., forced driver removal), but we cannot stop + * unregistering. + */ + if (vmcore_opened) { + pr_warn_once("Unexpected vmcore callback unregistration\n"); + vmcore_cb_unstable = true; + } + up_write(&vmcore_cb_rwsem); } -EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram); +EXPORT_SYMBOL_GPL(unregister_vmcore_cb); static bool pfn_is_ram(unsigned long pfn) { - int (*fn)(unsigned long pfn); - /* pfn is ram unless fn() checks pagetype */ + struct vmcore_cb *cb; bool ret = true; - /* - * Ask hypervisor if the pfn is really ram. - * A ballooned page contains no data and reading from such a page - * will cause high load in the hypervisor. - */ - fn = oldmem_pfn_is_ram; - if (fn) - ret = !!fn(pfn); + lockdep_assert_held_read(&vmcore_cb_rwsem); + if (unlikely(vmcore_cb_unstable)) + return false; + + list_for_each_entry(cb, &vmcore_cb_list, next) { + if (unlikely(!cb->pfn_is_ram)) + continue; + ret = cb->pfn_is_ram(cb, pfn); + if (!ret) + break; + } return ret; } +static int open_vmcore(struct inode *inode, struct file *file) +{ + down_read(&vmcore_cb_rwsem); + vmcore_opened = true; + up_read(&vmcore_cb_rwsem); + + return 0; +} + /* Reads a page from the oldmem device from given offset. */ ssize_t read_from_oldmem(char *buf, size_t count, u64 *ppos, int userbuf, @@ -117,6 +146,7 @@ ssize_t read_from_oldmem(char *buf, size_t count, offset = (unsigned long)(*ppos % PAGE_SIZE); pfn = (unsigned long)(*ppos / PAGE_SIZE); + down_read(&vmcore_cb_rwsem); do { if (count > (PAGE_SIZE - offset)) nr_bytes = PAGE_SIZE - offset; @@ -136,8 +166,10 @@ ssize_t read_from_oldmem(char *buf, size_t count, tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf); - if (tmp < 0) + if (tmp < 0) { + up_read(&vmcore_cb_rwsem); return tmp; + } } *ppos += nr_bytes; count -= nr_bytes; @@ -147,6 +179,7 @@ ssize_t read_from_oldmem(char *buf, size_t count, offset = 0; } while (count); + up_read(&vmcore_cb_rwsem); return read; } @@ -537,14 +570,19 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot) { + int ret; + /* * Check if oldmem_pfn_is_ram was registered to avoid * looping over all pages without a reason. */ - if (oldmem_pfn_is_ram) - return remap_oldmem_pfn_checked(vma, from, pfn, size, prot); + down_read(&vmcore_cb_rwsem); + if (!list_empty(&vmcore_cb_list) || vmcore_cb_unstable) + ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot); else - return remap_oldmem_pfn_range(vma, from, pfn, size, prot); + ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot); + up_read(&vmcore_cb_rwsem); + return ret; } static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) @@ -668,6 +706,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) #endif static const struct proc_ops vmcore_proc_ops = { + .proc_open = open_vmcore, .proc_read = read_vmcore, .proc_lseek = default_llseek, .proc_mmap = mmap_vmcore, diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 2618577a4d6d..0c547d866f1e 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -91,9 +91,29 @@ static inline void vmcore_unusable(void) elfcorehdr_addr = ELFCORE_ADDR_ERR; } -#define HAVE_OLDMEM_PFN_IS_RAM 1 -extern int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn)); -extern void unregister_oldmem_pfn_is_ram(void); +/** + * struct vmcore_cb - driver callbacks for /proc/vmcore handling + * @pfn_is_ram: check whether a PFN really is RAM and should be accessed when + * reading the vmcore. Will return "true" if it is RAM or if the + * callback cannot tell. If any callback returns "false", it's not + * RAM and the page must not be accessed; zeroes should be + * indicated in the vmcore instead. For example, a ballooned page + * contains no data and reading from such a page will cause high + * load in the hypervisor. + * @next: List head to manage registered callbacks internally; initialized by + * register_vmcore_cb(). + * + * vmcore callbacks allow drivers managing physical memory ranges to + * coordinate with vmcore handling code, for example, to prevent accessing + * physical memory ranges that should not be accessed when reading the vmcore, + * although included in the vmcore header as memory ranges to dump. + */ +struct vmcore_cb { + bool (*pfn_is_ram)(struct vmcore_cb *cb, unsigned long pfn); + struct list_head next; +}; +extern void register_vmcore_cb(struct vmcore_cb *cb); +extern void unregister_vmcore_cb(struct vmcore_cb *cb); #else /* !CONFIG_CRASH_DUMP */ static inline bool is_kdump_kernel(void) { return 0; } From 94300fcf4cef8e56f9e25fc75692e7db6d00e232 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:51 -0800 Subject: [PATCH 09/87] virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virtio/virtio_mem.c | 81 ++++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 37 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index bef8ad6bf466..2ba7e8d6ba8d 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2392,41 +2392,10 @@ static int virtio_mem_init_vq(struct virtio_mem *vm) return 0; } -static int virtio_mem_init(struct virtio_mem *vm) +static int virtio_mem_init_hotplug(struct virtio_mem *vm) { const struct range pluggable_range = mhp_get_pluggable_range(true); uint64_t sb_size, addr; - uint16_t node_id; - - if (!vm->vdev->config->get) { - dev_err(&vm->vdev->dev, "config access disabled\n"); - return -EINVAL; - } - - /* - * We don't want to (un)plug or reuse any memory when in kdump. The - * memory is still accessible (but not mapped). - */ - if (is_kdump_kernel()) { - dev_warn(&vm->vdev->dev, "disabled in kdump kernel\n"); - return -EBUSY; - } - - /* Fetch all properties that can't change. */ - virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size, - &vm->plugged_size); - virtio_cread_le(vm->vdev, struct virtio_mem_config, block_size, - &vm->device_block_size); - virtio_cread_le(vm->vdev, struct virtio_mem_config, node_id, - &node_id); - vm->nid = virtio_mem_translate_node_id(vm, node_id); - virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, &vm->addr); - virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size, - &vm->region_size); - - /* Determine the nid for the device based on the lowest address. */ - if (vm->nid == NUMA_NO_NODE) - vm->nid = memory_add_physaddr_to_nid(vm->addr); /* bad device setup - warn only */ if (!IS_ALIGNED(vm->addr, memory_block_size_bytes())) @@ -2496,10 +2465,6 @@ static int virtio_mem_init(struct virtio_mem *vm) vm->offline_threshold); } - dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr); - dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size); - dev_info(&vm->vdev->dev, "device block size: 0x%llx", - (unsigned long long)vm->device_block_size); dev_info(&vm->vdev->dev, "memory block size: 0x%lx", memory_block_size_bytes()); if (vm->in_sbm) @@ -2508,10 +2473,52 @@ static int virtio_mem_init(struct virtio_mem *vm) else dev_info(&vm->vdev->dev, "big block size: 0x%llx", (unsigned long long)vm->bbm.bb_size); + + return 0; +} + +static int virtio_mem_init(struct virtio_mem *vm) +{ + uint16_t node_id; + + if (!vm->vdev->config->get) { + dev_err(&vm->vdev->dev, "config access disabled\n"); + return -EINVAL; + } + + /* + * We don't want to (un)plug or reuse any memory when in kdump. The + * memory is still accessible (but not mapped). + */ + if (is_kdump_kernel()) { + dev_warn(&vm->vdev->dev, "disabled in kdump kernel\n"); + return -EBUSY; + } + + /* Fetch all properties that can't change. */ + virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size, + &vm->plugged_size); + virtio_cread_le(vm->vdev, struct virtio_mem_config, block_size, + &vm->device_block_size); + virtio_cread_le(vm->vdev, struct virtio_mem_config, node_id, + &node_id); + vm->nid = virtio_mem_translate_node_id(vm, node_id); + virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, &vm->addr); + virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size, + &vm->region_size); + + /* Determine the nid for the device based on the lowest address. */ + if (vm->nid == NUMA_NO_NODE) + vm->nid = memory_add_physaddr_to_nid(vm->addr); + + dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr); + dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size); + dev_info(&vm->vdev->dev, "device block size: 0x%llx", + (unsigned long long)vm->device_block_size); if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA)) dev_info(&vm->vdev->dev, "nid: %d", vm->nid); - return 0; + return virtio_mem_init_hotplug(vm); } static int virtio_mem_create_resource(struct virtio_mem *vm) From 84e17e684eefa9d84d9d46e5436d46de129cd6da Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:55 -0800 Subject: [PATCH 10/87] virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virtio/virtio_mem.c | 87 +++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 42 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 2ba7e8d6ba8d..1be3ee7f684d 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -260,6 +260,8 @@ static void virtio_mem_fake_offline_going_offline(unsigned long pfn, static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn, unsigned long nr_pages); static void virtio_mem_retry(struct virtio_mem *vm); +static int virtio_mem_create_resource(struct virtio_mem *vm); +static void virtio_mem_delete_resource(struct virtio_mem *vm); /* * Register a virtio-mem device so it will be considered for the online_page @@ -2395,7 +2397,8 @@ static int virtio_mem_init_vq(struct virtio_mem *vm) static int virtio_mem_init_hotplug(struct virtio_mem *vm) { const struct range pluggable_range = mhp_get_pluggable_range(true); - uint64_t sb_size, addr; + uint64_t unit_pages, sb_size, addr; + int rc; /* bad device setup - warn only */ if (!IS_ALIGNED(vm->addr, memory_block_size_bytes())) @@ -2474,7 +2477,48 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm) dev_info(&vm->vdev->dev, "big block size: 0x%llx", (unsigned long long)vm->bbm.bb_size); + /* create the parent resource for all memory */ + rc = virtio_mem_create_resource(vm); + if (rc) + return rc; + + /* use a single dynamic memory group to cover the whole memory device */ + if (vm->in_sbm) + unit_pages = PHYS_PFN(memory_block_size_bytes()); + else + unit_pages = PHYS_PFN(vm->bbm.bb_size); + rc = memory_group_register_dynamic(vm->nid, unit_pages); + if (rc < 0) + goto out_del_resource; + vm->mgid = rc; + + /* + * If we still have memory plugged, we have to unplug all memory first. + * Registering our parent resource makes sure that this memory isn't + * actually in use (e.g., trying to reload the driver). + */ + if (vm->plugged_size) { + vm->unplug_all_required = true; + dev_info(&vm->vdev->dev, "unplugging all memory is required\n"); + } + + /* register callbacks */ + vm->memory_notifier.notifier_call = virtio_mem_memory_notifier_cb; + rc = register_memory_notifier(&vm->memory_notifier); + if (rc) + goto out_unreg_group; + rc = register_virtio_mem_device(vm); + if (rc) + goto out_unreg_mem; + return 0; +out_unreg_mem: + unregister_memory_notifier(&vm->memory_notifier); +out_unreg_group: + memory_group_unregister(vm->mgid); +out_del_resource: + virtio_mem_delete_resource(vm); + return rc; } static int virtio_mem_init(struct virtio_mem *vm) @@ -2578,7 +2622,6 @@ static bool virtio_mem_has_memory_added(struct virtio_mem *vm) static int virtio_mem_probe(struct virtio_device *vdev) { struct virtio_mem *vm; - uint64_t unit_pages; int rc; BUILD_BUG_ON(sizeof(struct virtio_mem_req) != 24); @@ -2608,40 +2651,6 @@ static int virtio_mem_probe(struct virtio_device *vdev) if (rc) goto out_del_vq; - /* create the parent resource for all memory */ - rc = virtio_mem_create_resource(vm); - if (rc) - goto out_del_vq; - - /* use a single dynamic memory group to cover the whole memory device */ - if (vm->in_sbm) - unit_pages = PHYS_PFN(memory_block_size_bytes()); - else - unit_pages = PHYS_PFN(vm->bbm.bb_size); - rc = memory_group_register_dynamic(vm->nid, unit_pages); - if (rc < 0) - goto out_del_resource; - vm->mgid = rc; - - /* - * If we still have memory plugged, we have to unplug all memory first. - * Registering our parent resource makes sure that this memory isn't - * actually in use (e.g., trying to reload the driver). - */ - if (vm->plugged_size) { - vm->unplug_all_required = true; - dev_info(&vm->vdev->dev, "unplugging all memory is required\n"); - } - - /* register callbacks */ - vm->memory_notifier.notifier_call = virtio_mem_memory_notifier_cb; - rc = register_memory_notifier(&vm->memory_notifier); - if (rc) - goto out_unreg_group; - rc = register_virtio_mem_device(vm); - if (rc) - goto out_unreg_mem; - virtio_device_ready(vdev); /* trigger a config update to start processing the requested_size */ @@ -2649,12 +2658,6 @@ static int virtio_mem_probe(struct virtio_device *vdev) queue_work(system_freezable_wq, &vm->wq); return 0; -out_unreg_mem: - unregister_memory_notifier(&vm->memory_notifier); -out_unreg_group: - memory_group_unregister(vm->mgid); -out_del_resource: - virtio_mem_delete_resource(vm); out_del_vq: vdev->config->del_vqs(vdev); out_free_vm: From ffc763d0c334c397ef04c02f6bd512ae23bfa1aa Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:31:58 -0800 Subject: [PATCH 11/87] virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virtio/virtio_mem.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 1be3ee7f684d..76d8aef3cfd2 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2667,9 +2667,8 @@ out_free_vm: return rc; } -static void virtio_mem_remove(struct virtio_device *vdev) +static void virtio_mem_deinit_hotplug(struct virtio_mem *vm) { - struct virtio_mem *vm = vdev->priv; unsigned long mb_id; int rc; @@ -2716,7 +2715,8 @@ static void virtio_mem_remove(struct virtio_device *vdev) * away. Warn at least. */ if (virtio_mem_has_memory_added(vm)) { - dev_warn(&vdev->dev, "device still has system memory added\n"); + dev_warn(&vm->vdev->dev, + "device still has system memory added\n"); } else { virtio_mem_delete_resource(vm); kfree_const(vm->resource_name); @@ -2730,6 +2730,13 @@ static void virtio_mem_remove(struct virtio_device *vdev) } else { vfree(vm->bbm.bb_states); } +} + +static void virtio_mem_remove(struct virtio_device *vdev) +{ + struct virtio_mem *vm = vdev->priv; + + virtio_mem_deinit_hotplug(vm); /* reset the device and cleanup the queues */ vdev->config->reset(vdev); From ce2814622e844fd8c71d38c458e04d765dadfd04 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:32:02 -0800 Subject: [PATCH 12/87] virtio-mem: kdump mode to sanitize /proc/vmcore access Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory via PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) and a recent dracut version (including the virtio_mem module in the kdump initrd). Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1] Link: https://github.com/dracutdevs/dracut/pull/1157 [2] Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com Signed-off-by: David Hildenbrand Cc: Baoquan He Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Young Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jason Wang Cc: Juergen Gross Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Rapoport Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virtio/virtio_mem.c | 136 ++++++++++++++++++++++++++++++++---- 1 file changed, 124 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 76d8aef3cfd2..3573b78f6b05 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -223,6 +223,9 @@ struct virtio_mem { * When this lock is held the pointers can't change, ONLINE and * OFFLINE blocks can't change the state and no subblocks will get * plugged/unplugged. + * + * In kdump mode, used to serialize requests, last_block_addr and + * last_block_plugged. */ struct mutex hotplug_mutex; bool hotplug_active; @@ -230,6 +233,9 @@ struct virtio_mem { /* An error occurred we cannot handle - stop processing requests. */ bool broken; + /* Cached valued of is_kdump_kernel() when the device was probed. */ + bool in_kdump; + /* The driver is being removed. */ spinlock_t removal_lock; bool removing; @@ -243,6 +249,13 @@ struct virtio_mem { /* Memory notifier (online/offline events). */ struct notifier_block memory_notifier; +#ifdef CONFIG_PROC_VMCORE + /* vmcore callback for /proc/vmcore handling in kdump mode */ + struct vmcore_cb vmcore_cb; + uint64_t last_block_addr; + bool last_block_plugged; +#endif /* CONFIG_PROC_VMCORE */ + /* Next device in the list of virtio-mem devices. */ struct list_head next; }; @@ -2293,6 +2306,12 @@ static void virtio_mem_run_wq(struct work_struct *work) uint64_t diff; int rc; + if (unlikely(vm->in_kdump)) { + dev_warn_once(&vm->vdev->dev, + "unexpected workqueue run in kdump kernel\n"); + return; + } + hrtimer_cancel(&vm->retry_timer); if (vm->broken) @@ -2521,6 +2540,86 @@ out_del_resource: return rc; } +#ifdef CONFIG_PROC_VMCORE +static int virtio_mem_send_state_request(struct virtio_mem *vm, uint64_t addr, + uint64_t size) +{ + const uint64_t nb_vm_blocks = size / vm->device_block_size; + const struct virtio_mem_req req = { + .type = cpu_to_virtio16(vm->vdev, VIRTIO_MEM_REQ_STATE), + .u.state.addr = cpu_to_virtio64(vm->vdev, addr), + .u.state.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks), + }; + int rc = -ENOMEM; + + dev_dbg(&vm->vdev->dev, "requesting state: 0x%llx - 0x%llx\n", addr, + addr + size - 1); + + switch (virtio_mem_send_request(vm, &req)) { + case VIRTIO_MEM_RESP_ACK: + return virtio16_to_cpu(vm->vdev, vm->resp.u.state.state); + case VIRTIO_MEM_RESP_ERROR: + rc = -EINVAL; + break; + default: + break; + } + + dev_dbg(&vm->vdev->dev, "requesting state failed: %d\n", rc); + return rc; +} + +static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb, + unsigned long pfn) +{ + struct virtio_mem *vm = container_of(cb, struct virtio_mem, + vmcore_cb); + uint64_t addr = PFN_PHYS(pfn); + bool is_ram; + int rc; + + if (!virtio_mem_contains_range(vm, addr, PAGE_SIZE)) + return true; + if (!vm->plugged_size) + return false; + + /* + * We have to serialize device requests and access to the information + * about the block queried last. + */ + mutex_lock(&vm->hotplug_mutex); + + addr = ALIGN_DOWN(addr, vm->device_block_size); + if (addr != vm->last_block_addr) { + rc = virtio_mem_send_state_request(vm, addr, + vm->device_block_size); + /* On any kind of error, we're going to signal !ram. */ + if (rc == VIRTIO_MEM_STATE_PLUGGED) + vm->last_block_plugged = true; + else + vm->last_block_plugged = false; + vm->last_block_addr = addr; + } + + is_ram = vm->last_block_plugged; + mutex_unlock(&vm->hotplug_mutex); + return is_ram; +} +#endif /* CONFIG_PROC_VMCORE */ + +static int virtio_mem_init_kdump(struct virtio_mem *vm) +{ +#ifdef CONFIG_PROC_VMCORE + dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n"); + vm->vmcore_cb.pfn_is_ram = virtio_mem_vmcore_pfn_is_ram; + register_vmcore_cb(&vm->vmcore_cb); + return 0; +#else /* CONFIG_PROC_VMCORE */ + dev_warn(&vm->vdev->dev, "disabled in kdump kernel without vmcore\n"); + return -EBUSY; +#endif /* CONFIG_PROC_VMCORE */ +} + static int virtio_mem_init(struct virtio_mem *vm) { uint16_t node_id; @@ -2530,15 +2629,6 @@ static int virtio_mem_init(struct virtio_mem *vm) return -EINVAL; } - /* - * We don't want to (un)plug or reuse any memory when in kdump. The - * memory is still accessible (but not mapped). - */ - if (is_kdump_kernel()) { - dev_warn(&vm->vdev->dev, "disabled in kdump kernel\n"); - return -EBUSY; - } - /* Fetch all properties that can't change. */ virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size, &vm->plugged_size); @@ -2562,6 +2652,12 @@ static int virtio_mem_init(struct virtio_mem *vm) if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA)) dev_info(&vm->vdev->dev, "nid: %d", vm->nid); + /* + * We don't want to (un)plug or reuse any memory when in kdump. The + * memory is still accessible (but not exposed to Linux). + */ + if (vm->in_kdump) + return virtio_mem_init_kdump(vm); return virtio_mem_init_hotplug(vm); } @@ -2640,6 +2736,7 @@ static int virtio_mem_probe(struct virtio_device *vdev) hrtimer_init(&vm->retry_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); vm->retry_timer.function = virtio_mem_timer_expired; vm->retry_timer_ms = VIRTIO_MEM_RETRY_TIMER_MIN_MS; + vm->in_kdump = is_kdump_kernel(); /* register the virtqueue */ rc = virtio_mem_init_vq(vm); @@ -2654,8 +2751,10 @@ static int virtio_mem_probe(struct virtio_device *vdev) virtio_device_ready(vdev); /* trigger a config update to start processing the requested_size */ - atomic_set(&vm->config_changed, 1); - queue_work(system_freezable_wq, &vm->wq); + if (!vm->in_kdump) { + atomic_set(&vm->config_changed, 1); + queue_work(system_freezable_wq, &vm->wq); + } return 0; out_del_vq: @@ -2732,11 +2831,21 @@ static void virtio_mem_deinit_hotplug(struct virtio_mem *vm) } } +static void virtio_mem_deinit_kdump(struct virtio_mem *vm) +{ +#ifdef CONFIG_PROC_VMCORE + unregister_vmcore_cb(&vm->vmcore_cb); +#endif /* CONFIG_PROC_VMCORE */ +} + static void virtio_mem_remove(struct virtio_device *vdev) { struct virtio_mem *vm = vdev->priv; - virtio_mem_deinit_hotplug(vm); + if (vm->in_kdump) + virtio_mem_deinit_kdump(vm); + else + virtio_mem_deinit_hotplug(vm); /* reset the device and cleanup the queues */ vdev->config->reset(vdev); @@ -2750,6 +2859,9 @@ static void virtio_mem_config_changed(struct virtio_device *vdev) { struct virtio_mem *vm = vdev->priv; + if (unlikely(vm->in_kdump)) + return; + atomic_set(&vm->config_changed, 1); virtio_mem_retry(vm); } From da4d6b9cf80ae5b0083f640133b85b68b53b6497 Mon Sep 17 00:00:00 2001 From: Stephen Brennan Date: Mon, 8 Nov 2021 18:32:05 -0800 Subject: [PATCH 13/87] proc: allow pid_revalidate() during LOOKUP_RCU Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.com Signed-off-by: Stephen Brennan Cc: Konrad Wilk Cc: Alexander Viro Cc: Matthew Wilcox Cc: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 5541de99809c..264509e584e3 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1979,19 +1979,21 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int ret = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); + rcu_read_lock(); + inode = d_inode_rcu(dentry); + if (!inode) + goto out; + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { pid_update_inode(task, inode); - put_task_struct(task); - return 1; + ret = 1; } - return 0; +out: + rcu_read_unlock(); + return ret; } static inline bool proc_inode_is_dead(struct inode *inode) From f5d80614844a725cbf66fd6524e01c218ede2141 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:08 -0800 Subject: [PATCH 14/87] kernel.h: drop unneeded inclusion from other headers Patch series "kernel.h further split", v5. kernel.h is a set of something which is not related to each other and often used in non-crossed compilation units, especially when drivers need only one or two macro definitions from it. This patch (of 7): There is no evidence we need kernel.h inclusion in certain headers. Drop unneeded inclusion from other headers. [sfr@canb.auug.org.au: bottom_half.h needs kernel] Link: https://lkml.kernel.org/r/20211015202908.1c417ae2@canb.auug.org.au Link: https://lkml.kernel.org/r/20211013170417.87909-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20211013170417.87909-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Stephen Rothwell Cc: Thomas Gleixner Cc: Brendan Higgins Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: Sakari Ailus Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Jonathan Cameron Cc: Rasmus Villemoes Cc: Thorsten Leemhuis Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bottom_half.h | 1 + include/linux/rwsem.h | 1 - include/linux/smp.h | 1 - include/linux/spinlock.h | 1 - 4 files changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h index eed86eb0a1de..11d107d88d03 100644 --- a/include/linux/bottom_half.h +++ b/include/linux/bottom_half.h @@ -2,6 +2,7 @@ #ifndef _LINUX_BH_H #define _LINUX_BH_H +#include #include #if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_TRACE_IRQFLAGS) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 352c6127cb90..f9348769e558 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -11,7 +11,6 @@ #include #include -#include #include #include #include diff --git a/include/linux/smp.h b/include/linux/smp.h index 510519e8a1eb..a80ab58ae3f1 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -108,7 +108,6 @@ static inline void on_each_cpu_cond(smp_cond_func_t cond_func, #ifdef CONFIG_SMP #include -#include #include #include #include diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 45310ea1b1d7..22d672f81705 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -57,7 +57,6 @@ #include #include #include -#include #include #include #include From d2a8ebbf8192b84b11f1b204c4f7c602df32aeac Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:12 -0800 Subject: [PATCH 15/87] kernel.h: split out container_of() and typeof_member() macros kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt cleaning it up by splitting out container_of() and typeof_member() macros. For time being include new header back to kernel.h to avoid twisted indirected includes for existing users. Note, there are _a lot_ of headers and modules that include kernel.h solely for one of these macros and this allows to unburden compiler for the twisted inclusion paths and to make new code cleaner in the future. Link: https://lkml.kernel.org/r/20211013170417.87909-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sakari Ailus Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/container_of.h | 40 ++++++++++++++++++++++++++++++++++++ include/linux/kernel.h | 33 +---------------------------- 2 files changed, 41 insertions(+), 32 deletions(-) create mode 100644 include/linux/container_of.h diff --git a/include/linux/container_of.h b/include/linux/container_of.h new file mode 100644 index 000000000000..dd56019838c6 --- /dev/null +++ b/include/linux/container_of.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CONTAINER_OF_H +#define _LINUX_CONTAINER_OF_H + +#include +#include + +#define typeof_member(T, m) typeof(((T*)0)->m) + +/** + * container_of - cast a member of a structure out to the containing structure + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member: the name of the member within the struct. + * + */ +#define container_of(ptr, type, member) ({ \ + void *__mptr = (void *)(ptr); \ + BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ + !__same_type(*(ptr), void), \ + "pointer type mismatch in container_of()"); \ + ((type *)(__mptr - offsetof(type, member))); }) + +/** + * container_of_safe - cast a member of a structure out to the containing structure + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member: the name of the member within the struct. + * + * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged. + */ +#define container_of_safe(ptr, type, member) ({ \ + void *__mptr = (void *)(ptr); \ + BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ + !__same_type(*(ptr), void), \ + "pointer type mismatch in container_of()"); \ + IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) : \ + ((type *)(__mptr - offsetof(type, member))); }) + +#endif /* _LINUX_CONTAINER_OF_H */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 471bc0593679..ed4465757cec 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -52,8 +53,6 @@ } \ ) -#define typeof_member(T, m) typeof(((T*)0)->m) - #define _RET_IP_ (unsigned long)__builtin_return_address(0) #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) @@ -484,36 +483,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { } #define __CONCAT(a, b) a ## b #define CONCATENATE(a, b) __CONCAT(a, b) -/** - * container_of - cast a member of a structure out to the containing structure - * @ptr: the pointer to the member. - * @type: the type of the container struct this is embedded in. - * @member: the name of the member within the struct. - * - */ -#define container_of(ptr, type, member) ({ \ - void *__mptr = (void *)(ptr); \ - BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ - !__same_type(*(ptr), void), \ - "pointer type mismatch in container_of()"); \ - ((type *)(__mptr - offsetof(type, member))); }) - -/** - * container_of_safe - cast a member of a structure out to the containing structure - * @ptr: the pointer to the member. - * @type: the type of the container struct this is embedded in. - * @member: the name of the member within the struct. - * - * If IS_ERR_OR_NULL(ptr), ptr is returned unchanged. - */ -#define container_of_safe(ptr, type, member) ({ \ - void *__mptr = (void *)(ptr); \ - BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ - !__same_type(*(ptr), void), \ - "pointer type mismatch in container_of()"); \ - IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) : \ - ((type *)(__mptr - offsetof(type, member))); }) - /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */ #ifdef CONFIG_FTRACE_MCOUNT_RECORD # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD From ec54c2892064c9975d8be8ef8ee5957cb762f9f6 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:15 -0800 Subject: [PATCH 16/87] include/kunit/test.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-4-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sakari Ailus Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/kunit/test.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/include/kunit/test.h b/include/kunit/test.h index 018e776a34b9..b26400731c02 100644 --- a/include/kunit/test.h +++ b/include/kunit/test.h @@ -11,11 +11,20 @@ #include #include -#include + +#include +#include +#include +#include +#include +#include #include #include +#include +#include #include -#include + +#include struct kunit_resource; From cd7187e112c91186185c0553b1f9d033ed399d3a Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:19 -0800 Subject: [PATCH 17/87] include/linux/list.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-5-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sakari Ailus Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/list.h b/include/linux/list.h index f2af4b4aa4e9..6636fc07f918 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -2,11 +2,13 @@ #ifndef _LINUX_LIST_H #define _LINUX_LIST_H +#include #include #include #include #include -#include + +#include /* * Circular doubly linked list implementation. From 50b09d6145da04e19fa448670e4918ce496f1c77 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:22 -0800 Subject: [PATCH 18/87] include/linux/llist.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-6-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sakari Ailus Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/llist.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/llist.h b/include/linux/llist.h index 24f207b0190b..85bda2d02d65 100644 --- a/include/linux/llist.h +++ b/include/linux/llist.h @@ -49,7 +49,9 @@ */ #include -#include +#include +#include +#include struct llist_head { struct llist_node *first; From c540f9595956ff3c4b2cca1145c893941f921bcd Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:25 -0800 Subject: [PATCH 19/87] include/linux/plist.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-7-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sakari Ailus Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/plist.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/plist.h b/include/linux/plist.h index 66bab1bca35c..0f352c1d3c80 100644 --- a/include/linux/plist.h +++ b/include/linux/plist.h @@ -73,8 +73,11 @@ #ifndef _LINUX_PLIST_H_ #define _LINUX_PLIST_H_ -#include +#include #include +#include + +#include struct plist_head { struct list_head node_list; From 28b2e8f32023142522a0fe53bd64ba6cfa3c6ad7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:29 -0800 Subject: [PATCH 20/87] include/media/media-entity.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-8-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Acked-by: Sakari Ailus Cc: Boqun Feng Cc: Brendan Higgins Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Miguel Ojeda Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/media/media-entity.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/media/media-entity.h b/include/media/media-entity.h index 09737b47881f..fea489f03d57 100644 --- a/include/media/media-entity.h +++ b/include/media/media-entity.h @@ -13,10 +13,11 @@ #include #include +#include #include -#include #include #include +#include /* Enums used internally at the media controller to represent graphs */ From 5f6286a608107a68646241b57d2bd2a4fc139ef9 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:32 -0800 Subject: [PATCH 21/87] include/linux/delay.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. [akpm@linux-foundation.org: cxd2880_common.h needs bits.h for GENMASK()] [andriy.shevchenko@linux.intel.com: delay.h: fix for removed kernel.h] Link: https://lkml.kernel.org/r/20211028170143.56523-1-andriy.shevchenko@linux.intel.com [akpm@linux-foundation.org: include/linux/fwnode.h needs bits.h for BIT()] Link: https://lkml.kernel.org/r/20211027150324.79827-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/riscv/lib/delay.c | 4 ++++ arch/s390/include/asm/facility.h | 4 ++++ drivers/media/dvb-frontends/cxd2880/cxd2880_common.h | 1 + include/linux/delay.h | 2 +- include/linux/fwnode.h | 1 + 5 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/riscv/lib/delay.c b/arch/riscv/lib/delay.c index f51c9a03bca1..49d510ba75fd 100644 --- a/arch/riscv/lib/delay.c +++ b/arch/riscv/lib/delay.c @@ -4,10 +4,14 @@ */ #include +#include #include #include +#include #include +#include + /* * This is copies from arch/arm/include/asm/delay.h * diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h index e3aa354ab9f4..94b6919026df 100644 --- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -9,8 +9,12 @@ #define __ASM_FACILITY_H #include + +#include #include +#include #include + #include #define MAX_FACILITY_BIT (sizeof(stfle_fac_list) * 8) diff --git a/drivers/media/dvb-frontends/cxd2880/cxd2880_common.h b/drivers/media/dvb-frontends/cxd2880/cxd2880_common.h index b05bce71ab35..9dc15a5a9683 100644 --- a/drivers/media/dvb-frontends/cxd2880/cxd2880_common.h +++ b/drivers/media/dvb-frontends/cxd2880/cxd2880_common.h @@ -12,6 +12,7 @@ #include #include #include +#include #include int cxd2880_convert2s_complement(u32 value, u32 bitlen); diff --git a/include/linux/delay.h b/include/linux/delay.h index 1d0e2ce6b6d9..8eacf67eb212 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -19,7 +19,7 @@ * https://lists.openwall.net/linux-kernel/2011/01/09/56 */ -#include +#include extern unsigned long loops_per_jiffy; diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 9f4ad719bfe3..3a532ba66f6c 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -11,6 +11,7 @@ #include #include +#include #include struct fwnode_operations; From 1fcbd5deac51f30a7ed3f10bea422d08a2b3aba6 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:35 -0800 Subject: [PATCH 22/87] include/linux/sbitmap.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211027150437.79921-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sbitmap.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 2713e689ad66..8121034d1f13 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -9,8 +9,17 @@ #ifndef __LINUX_SCALE_BITMAP_H #define __LINUX_SCALE_BITMAP_H -#include +#include +#include +#include +#include +#include +#include +#include #include +#include +#include +#include struct seq_file; From 98e1385ef24bd607586fbf44e73decd5112d3d4a Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:38 -0800 Subject: [PATCH 23/87] include/linux/radix-tree.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211027150528.80003-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/radix-tree.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 64ad900ac742..f7c1d21c2f39 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -9,8 +9,10 @@ #define _LINUX_RADIX_TREE_H #include -#include +#include #include +#include +#include #include #include #include From b4b87651104dd32ab258d097d47b97e243a95222 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:32:41 -0800 Subject: [PATCH 24/87] include/linux/generic-radix-tree.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. [akpm@linux-foundation.org: include math.h for round_up()] Link: https://lkml.kernel.org/r/20211027150548.80042-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/generic-radix-tree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/generic-radix-tree.h b/include/linux/generic-radix-tree.h index bfd00320c7f3..107613f7d792 100644 --- a/include/linux/generic-radix-tree.h +++ b/include/linux/generic-radix-tree.h @@ -38,8 +38,9 @@ #include #include -#include #include +#include +#include struct genradix_root; From e52340de11d8bca3ba35e5a72fea4f2a5b6abbbb Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Mon, 8 Nov 2021 18:32:43 -0800 Subject: [PATCH 25/87] kernel.h: split out instruction pointer accessors bottom_half.h needs _THIS_IP_ to be standalone, so split that and _RET_IP_ out from kernel.h into the new instruction_pointer.h. kernel.h directly needs them, so include it there and replace the include of kernel.h with this new file in bottom_half.h. Link: https://lkml.kernel.org/r/20211028161248.45232-1-andriy.shevchenko@linux.intel.com Signed-off-by: Stephen Rothwell Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bottom_half.h | 2 +- include/linux/instruction_pointer.h | 8 ++++++++ include/linux/kernel.h | 4 +--- 3 files changed, 10 insertions(+), 4 deletions(-) create mode 100644 include/linux/instruction_pointer.h diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h index 11d107d88d03..fc53e0ad56d9 100644 --- a/include/linux/bottom_half.h +++ b/include/linux/bottom_half.h @@ -2,7 +2,7 @@ #ifndef _LINUX_BH_H #define _LINUX_BH_H -#include +#include #include #if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_TRACE_IRQFLAGS) diff --git a/include/linux/instruction_pointer.h b/include/linux/instruction_pointer.h new file mode 100644 index 000000000000..cda1f706eaeb --- /dev/null +++ b/include/linux/instruction_pointer.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_INSTRUCTION_POINTER_H +#define _LINUX_INSTRUCTION_POINTER_H + +#define _RET_IP_ (unsigned long)__builtin_return_address(0) +#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) + +#endif /* _LINUX_INSTRUCTION_POINTER_H */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index ed4465757cec..46ca4404fb93 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -53,9 +54,6 @@ } \ ) -#define _RET_IP_ (unsigned long)__builtin_return_address(0) -#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) - /** * upper_32_bits - return bits 32-63 of a number * @n: the number we're accessing From e1edc277e6f6dfb372216522dfc57f9381c39e35 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Mon, 8 Nov 2021 18:32:46 -0800 Subject: [PATCH 26/87] linux/container_of.h: switch to static_assert _Static_assert() is evaluated already in the compiler's frontend, and gives a somehat more to-the-point error, compared to the BUILD_BUG_ON macro, which only fires after the optimizer has had a chance to eliminate calls to functions marked with __attribute__((error)). In theory, this might make builds a tiny bit faster. There's also a little less gunk in the error message emitted: lib/sort.c: In function `foo': include/linux/build_bug.h:78:41: error: static assertion failed: "pointer type mismatch in container_of()" 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) compared to lib/sort.c: In function `foo': include/linux/compiler_types.h:322:38: error: call to `__compiletime_assert_2' declared with attribute error: pointer type mismatch in container_of() 322 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) While at it, fix the copy-pasto in container_of_safe(). Link: https://lkml.kernel.org/r/20211015090530.2774079-1-linux@rasmusvillemoes.dk Link: https://lore.kernel.org/lkml/20211014132331.GA4811@kernel.org/T/ Signed-off-by: Rasmus Villemoes Reviewed-by: Miguel Ojeda Acked-by: Andy Shevchenko Reviewed-by: Nick Desaulniers Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/container_of.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/container_of.h b/include/linux/container_of.h index dd56019838c6..2f4944b791b8 100644 --- a/include/linux/container_of.h +++ b/include/linux/container_of.h @@ -16,9 +16,9 @@ */ #define container_of(ptr, type, member) ({ \ void *__mptr = (void *)(ptr); \ - BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ - !__same_type(*(ptr), void), \ - "pointer type mismatch in container_of()"); \ + static_assert(__same_type(*(ptr), ((type *)0)->member) || \ + __same_type(*(ptr), void), \ + "pointer type mismatch in container_of()"); \ ((type *)(__mptr - offsetof(type, member))); }) /** @@ -31,9 +31,9 @@ */ #define container_of_safe(ptr, type, member) ({ \ void *__mptr = (void *)(ptr); \ - BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ - !__same_type(*(ptr), void), \ - "pointer type mismatch in container_of()"); \ + static_assert(__same_type(*(ptr), ((type *)0)->member) || \ + __same_type(*(ptr), void), \ + "pointer type mismatch in container_of_safe()"); \ IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) : \ ((type *)(__mptr - offsetof(type, member))); }) From 7d60ac00979297d2d303a590891b6d282667c389 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Mon, 8 Nov 2021 18:32:49 -0800 Subject: [PATCH 27/87] mailmap: update email address for Colin King Colin King has moved to Intel to update gmail and Canonical email addresses. Link: https://lkml.kernel.org/r/20211102231617.78569-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index 90e614d2bf7e..bccab013ee1c 100644 --- a/.mailmap +++ b/.mailmap @@ -73,6 +73,8 @@ Chris Chiu Chris Chiu Christophe Ricard Christoph Hellwig +Colin Ian King +Colin Ian King Corey Minyard Damian Hobson-Garcia Daniel Borkmann From b15be237a95f18609c6149b0c6a11d6ce4715228 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 Nov 2021 18:32:52 -0800 Subject: [PATCH 28/87] MAINTAINERS: add "exec & binfmt" section with myself and Eric I'd like more continuity of review for the exec and binfmt (and ELF) stuff. Eric and I have been the most active lately, so list us as reviewers. Link: https://lkml.kernel.org/r/20211006180200.1178142-1-keescook@chromium.org Signed-off-by: Kees Cook Cc: Eric Biederman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3cf33e113b9b..44949a650c1b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7037,6 +7037,20 @@ F: include/trace/events/mdio.h F: include/uapi/linux/mdio.h F: include/uapi/linux/mii.h +EXEC & BINFMT API +R: Eric Biederman +R: Kees Cook +F: arch/alpha/kernel/binfmt_loader.c +F: arch/x86/ia32/ia32_aout.c +F: fs/*binfmt_*.c +F: fs/exec.c +F: include/linux/binfmts.h +F: include/linux/elf.h +F: include/uapi/linux/binfmts.h +F: tools/testing/selftests/exec/ +N: asm/elf.h +N: binfmt + EXFAT FILE SYSTEM M: Namjae Jeon M: Sungjong Seo From 46bfa85fc888b466860b02b29dd2456598af8474 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 8 Nov 2021 18:32:55 -0800 Subject: [PATCH 29/87] MAINTAINERS: rectify entry for ARM/TOSHIBA VISCONTI ARCHITECTURE Patch series "Rectify file references for dt-bindings in MAINTAINERS", v5. A patch series that cleans up some file references for dt-bindings in MAINTAINERS. This patch (of 4): Commit 836863a08c99 ("MAINTAINERS: Add information for Toshiba Visconti ARM SoCs") refers to the non-existing file toshiba,tmpv7700-pinctrl.yaml in ./Documentation/devicetree/bindings/pinctrl/. Commit 1825c1fe0057 ("pinctrl: Add DT bindings for Toshiba Visconti TMPV7700 SoC") originating from the same patch series however adds the file toshiba,visconti-pinctrl.yaml in that directory instead. So, refer to toshiba,visconti-pinctrl.yaml in the ARM/TOSHIBA VISCONTI ARCHITECTURE section instead. Link: https://lkml.kernel.org/r/20211026141902.4865-1-lukas.bulwahn@gmail.com Link: https://lkml.kernel.org/r/20211026141902.4865-2-lukas.bulwahn@gmail.com Fixes: 836863a08c99 ("MAINTAINERS: Add information for Toshiba Visconti ARM SoCs") Signed-off-by: Lukas Bulwahn Acked-by: Nobuhiro Iwamatsu Reviewed-by: Nobuhiro Iwamatsu Cc: Rob Herring Cc: Punit Agrawal Cc: Anitha Chrisanthus Cc: Wilken Gottwalt Cc: Greg Kroah-Hartman Cc: John Stultz Cc: Mauro Carvalho Chehab Cc: Yu Chen Cc: Sam Ravnborg Cc: Edmund Dea Cc: Joe Perches Cc: Ralf Ramsauer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 44949a650c1b..2aadfec11190 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2745,7 +2745,7 @@ F: Documentation/devicetree/bindings/arm/toshiba.yaml F: Documentation/devicetree/bindings/net/toshiba,visconti-dwmac.yaml F: Documentation/devicetree/bindings/gpio/toshiba,gpio-visconti.yaml F: Documentation/devicetree/bindings/pci/toshiba,visconti-pcie.yaml -F: Documentation/devicetree/bindings/pinctrl/toshiba,tmpv7700-pinctrl.yaml +F: Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml F: Documentation/devicetree/bindings/watchdog/toshiba,visconti-wdt.yaml F: arch/arm64/boot/dts/toshiba/ F: drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c From b39c920665c070bec5a0266af1a5c021d07ec183 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 8 Nov 2021 18:32:59 -0800 Subject: [PATCH 30/87] MAINTAINERS: rectify entry for HIKEY960 ONBOARD USB GPIO HUB DRIVER Commit 7a6ff4c4cbc3 ("misc: hisi_hikey_usb: Driver to support onboard USB gpio hub on Hikey960") refers to the non-existing file Documentation/devicetree/bindings/misc/hisilicon-hikey-usb.yaml, but this commit's patch series does not add any related devicetree binding in misc. So, just drop this file reference in HIKEY960 ONBOARD USB GPIO HUB DRIVER. Link: https://lkml.kernel.org/r/20211026141902.4865-3-lukas.bulwahn@gmail.com Fixes: 7a6ff4c4cbc3 ("misc: hisi_hikey_usb: Driver to support onboard USB gpio hub on Hikey960") Signed-off-by: Lukas Bulwahn Cc: Anitha Chrisanthus Cc: Edmund Dea Cc: Greg Kroah-Hartman Cc: Joe Perches Cc: John Stultz Cc: Mauro Carvalho Chehab Cc: Nobuhiro Iwamatsu Cc: Punit Agrawal Cc: Ralf Ramsauer Cc: Rob Herring Cc: Sam Ravnborg Cc: Wilken Gottwalt Cc: Yu Chen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2aadfec11190..e78cc3b239d5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8483,7 +8483,6 @@ M: John Stultz L: linux-kernel@vger.kernel.org S: Maintained F: drivers/misc/hisi_hikey_usb.c -F: Documentation/devicetree/bindings/misc/hisilicon-hikey-usb.yaml HISILICON PMU DRIVER M: Shaokun Zhang From 65e5acbb135ecd351ab55573056661eb010111e2 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 8 Nov 2021 18:33:02 -0800 Subject: [PATCH 31/87] MAINTAINERS: rectify entry for INTEL KEEM BAY DRM DRIVER Commit ed794057b052 ("drm/kmb: Build files for KeemBay Display driver") refers to the non-existing file intel,kmb_display.yaml in Documentation/devicetree/bindings/display/. Commit 5a76b1ed73b9 ("dt-bindings: display: Add support for Intel KeemBay Display") originating from the same patch series however adds the file intel,keembay-display.yaml in that directory instead. So, refer to intel,keembay-display.yaml in the INTEL KEEM BAY DRM DRIVER section instead. Link: https://lkml.kernel.org/r/20211026141902.4865-4-lukas.bulwahn@gmail.com Fixes: ed794057b052 ("drm/kmb: Build files for KeemBay Display driver") Signed-off-by: Lukas Bulwahn Cc: Anitha Chrisanthus Cc: Edmund Dea Cc: Greg Kroah-Hartman Cc: Joe Perches Cc: John Stultz Cc: Mauro Carvalho Chehab Cc: Nobuhiro Iwamatsu Cc: Punit Agrawal Cc: Ralf Ramsauer Cc: Rob Herring Cc: Sam Ravnborg Cc: Wilken Gottwalt Cc: Yu Chen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e78cc3b239d5..35c4a0ba7eb1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9530,7 +9530,7 @@ INTEL KEEM BAY DRM DRIVER M: Anitha Chrisanthus M: Edmund Dea S: Maintained -F: Documentation/devicetree/bindings/display/intel,kmb_display.yaml +F: Documentation/devicetree/bindings/display/intel,keembay-display.yaml F: drivers/gpu/drm/kmb/ INTEL KEEM BAY OCS AES/SM4 CRYPTO DRIVER From 57235b6e783c74a411b2bf1face4a4df4e8a8f68 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 8 Nov 2021 18:33:06 -0800 Subject: [PATCH 32/87] MAINTAINERS: rectify entry for ALLWINNER HARDWARE SPINLOCK SUPPORT Commit f9e784dcb63f ("dt-bindings: hwlock: add sun6i_hwspinlock") adds Documentation/devicetree/bindings/hwlock/allwinner,sun6i-a31-hwspinlock.yaml, but the related commit 3c881e05c814 ("hwspinlock: add sun6i hardware spinlock support") adds a file reference to allwinner,sun6i-hwspinlock.yaml instead. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains: warning: no file matches F: Documentation/devicetree/bindings/hwlock/allwinner,sun6i-hwspinlock.yaml Rectify this file reference in ALLWINNER HARDWARE SPINLOCK SUPPORT. Link: https://lkml.kernel.org/r/20211026141902.4865-5-lukas.bulwahn@gmail.com Reviewed-by: Wilken Gottwalt Signed-off-by: Lukas Bulwahn Cc: Anitha Chrisanthus Cc: Edmund Dea Cc: Greg Kroah-Hartman Cc: Joe Perches Cc: John Stultz Cc: Mauro Carvalho Chehab Cc: Nobuhiro Iwamatsu Cc: Punit Agrawal Cc: Ralf Ramsauer Cc: Rob Herring Cc: Sam Ravnborg Cc: Yu Chen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 35c4a0ba7eb1..1cfe2b4e094e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -761,7 +761,7 @@ F: drivers/crypto/allwinner/ ALLWINNER HARDWARE SPINLOCK SUPPORT M: Wilken Gottwalt S: Maintained -F: Documentation/devicetree/bindings/hwlock/allwinner,sun6i-hwspinlock.yaml +F: Documentation/devicetree/bindings/hwlock/allwinner,sun6i-a31-hwspinlock.yaml F: drivers/hwspinlock/sun6i_hwspinlock.c ALLWINNER THERMAL DRIVER From 4d4712c1a4ac331574fb4542e4adcfd24fdb07d0 Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Mon, 8 Nov 2021 18:33:09 -0800 Subject: [PATCH 33/87] lib, stackdepot: check stackdepot handle before accessing slabs Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2. PATCH-1: Checks validity of a stackdepot handle before proceeding to access stackdepot slab/objects. PATCH-2: Adds a helper in stackdepot, to allow users to print stack entries just by specifying the stackdepot handle. It also changes such users to use this new interface. PATCH-3: Adds a helper in stackdepot, to allow users to print stack entries into buffers just by specifying the stackdepot handle and destination buffer. It also changes such users to use this new interface. This patch (of 3): stack_depot_save allocates slabs that will be used for storing objects in future.If this slab allocation fails we may get to a situation where space allocation for a new stack_record fails, causing stack_depot_save to return 0 as handle. If user of this handle ends up invoking stack_depot_fetch with this handle value, current implementation of stack_depot_fetch will end up using slab from wrong index. To avoid this check handle value at the beginning. Link: https://lkml.kernel.org/r/20210915175321.3472770-1-imran.f.khan@oracle.com Link: https://lkml.kernel.org/r/20210915014806.3206938-1-imran.f.khan@oracle.com Link: https://lkml.kernel.org/r/20210915014806.3206938-2-imran.f.khan@oracle.com Signed-off-by: Imran Khan Suggested-by: Vlastimil Babka Acked-by: Vlastimil Babka Cc: Geert Uytterhoeven Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Dmitry Vyukov Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/stackdepot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 09485dc5bd12..f034f095c627 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -231,6 +231,9 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle, struct stack_record *stack; *entries = NULL; + if (!handle) + return 0; + if (parts.slabindex > depot_index) { WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n", parts.slabindex, depot_index, handle); From 505be48165fa2f2cb795659b4fc61f35563d105e Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Mon, 8 Nov 2021 18:33:12 -0800 Subject: [PATCH 34/87] lib, stackdepot: add helper to print stack entries To print a stack entries, users of stackdepot, first use stack_depot_fetch to get a list of stack entries and then use stack_trace_print to print this list. Provide a helper in stackdepot to print stack entries based on stackdepot handle. Also change above mentioned users to use this helper. Link: https://lkml.kernel.org/r/20210915014806.3206938-3-imran.f.khan@oracle.com Signed-off-by: Imran Khan Suggested-by: Vlastimil Babka Acked-by: Vlastimil Babka Reviewed-by: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Daniel Vetter Cc: David Airlie Cc: Dmitry Vyukov Cc: Geert Uytterhoeven Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/stackdepot.h | 2 ++ lib/stackdepot.c | 18 ++++++++++++++++++ mm/kasan/report.c | 15 +++------------ mm/page_owner.c | 13 ++++--------- 4 files changed, 27 insertions(+), 21 deletions(-) diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index d29860966bc9..8ab37500fcba 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -25,6 +25,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries); +void stack_depot_print(depot_stack_handle_t stack); + #ifdef CONFIG_STACKDEPOT int stack_depot_init(void); #else diff --git a/lib/stackdepot.c b/lib/stackdepot.c index f034f095c627..4e1f2982d0fa 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -213,6 +213,24 @@ static inline struct stack_record *find_stack(struct stack_record *bucket, return NULL; } +/** + * stack_depot_print - print stack entries from a depot + * + * @stack: Stack depot handle which was returned from + * stack_depot_save(). + * + */ +void stack_depot_print(depot_stack_handle_t stack) +{ + unsigned long *entries; + unsigned int nr_entries; + + nr_entries = stack_depot_fetch(stack, &entries); + if (nr_entries > 0) + stack_trace_print(entries, nr_entries, 0); +} +EXPORT_SYMBOL_GPL(stack_depot_print); + /** * stack_depot_fetch - Fetch stack entries from a depot * diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 884a950c7026..3239fd8f8747 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -132,20 +132,11 @@ static void end_report(unsigned long *flags, unsigned long addr) kasan_enable_current(); } -static void print_stack(depot_stack_handle_t stack) -{ - unsigned long *entries; - unsigned int nr_entries; - - nr_entries = stack_depot_fetch(stack, &entries); - stack_trace_print(entries, nr_entries, 0); -} - static void print_track(struct kasan_track *track, const char *prefix) { pr_err("%s by task %u:\n", prefix, track->pid); if (track->stack) { - print_stack(track->stack); + stack_depot_print(track->stack); } else { pr_err("(stack is not available)\n"); } @@ -214,12 +205,12 @@ static void describe_object_stacks(struct kmem_cache *cache, void *object, return; if (alloc_meta->aux_stack[0]) { pr_err("Last potentially related work creation:\n"); - print_stack(alloc_meta->aux_stack[0]); + stack_depot_print(alloc_meta->aux_stack[0]); pr_err("\n"); } if (alloc_meta->aux_stack[1]) { pr_err("Second to last potentially related work creation:\n"); - print_stack(alloc_meta->aux_stack[1]); + stack_depot_print(alloc_meta->aux_stack[1]); pr_err("\n"); } #endif diff --git a/mm/page_owner.c b/mm/page_owner.c index 62402d22539b..eff29be1218b 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -394,8 +394,6 @@ void __dump_page_owner(const struct page *page) struct page_ext *page_ext = lookup_page_ext(page); struct page_owner *page_owner; depot_stack_handle_t handle; - unsigned long *entries; - unsigned int nr_entries; gfp_t gfp_mask; int mt; @@ -423,20 +421,17 @@ void __dump_page_owner(const struct page *page) page_owner->pid, page_owner->ts_nsec, page_owner->free_ts_nsec); handle = READ_ONCE(page_owner->handle); - if (!handle) { + if (!handle) pr_alert("page_owner allocation stack trace missing\n"); - } else { - nr_entries = stack_depot_fetch(handle, &entries); - stack_trace_print(entries, nr_entries, 0); - } + else + stack_depot_print(handle); handle = READ_ONCE(page_owner->free_handle); if (!handle) { pr_alert("page_owner free stack trace missing\n"); } else { - nr_entries = stack_depot_fetch(handle, &entries); pr_alert("page last free stack trace:\n"); - stack_trace_print(entries, nr_entries, 0); + stack_depot_print(handle); } if (page_owner->last_migrate_reason != -1) From 0f68d45ef41abb618a9ca33996348ae73800a106 Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Mon, 8 Nov 2021 18:33:16 -0800 Subject: [PATCH 35/87] lib, stackdepot: add helper to print stack entries into buffer To print stack entries into a buffer, users of stackdepot, first get a list of stack entries using stack_depot_fetch and then print this list into a buffer using stack_trace_snprint. Provide a helper in stackdepot for this purpose. Also change above mentioned users to use this helper. [imran.f.khan@oracle.com: fix build error] Link: https://lkml.kernel.org/r/20210915175321.3472770-4-imran.f.khan@oracle.com [imran.f.khan@oracle.com: export stack_depot_snprint() to modules] Link: https://lkml.kernel.org/r/20210916133535.3592491-4-imran.f.khan@oracle.com Link: https://lkml.kernel.org/r/20210915014806.3206938-4-imran.f.khan@oracle.com Signed-off-by: Imran Khan Suggested-by: Vlastimil Babka Acked-by: Vlastimil Babka Acked-by: Jani Nikula [i915] Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Daniel Vetter Cc: David Airlie Cc: Dmitry Vyukov Cc: Geert Uytterhoeven Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpu/drm/drm_dp_mst_topology.c | 5 +---- drivers/gpu/drm/drm_mm.c | 5 +---- drivers/gpu/drm/i915/i915_vma.c | 5 +---- drivers/gpu/drm/i915/intel_runtime_pm.c | 20 +++++--------------- include/linux/stackdepot.h | 3 +++ lib/stackdepot.c | 25 +++++++++++++++++++++++++ mm/page_owner.c | 5 +---- 7 files changed, 37 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 86d13d6bc463..2d1adab9e360 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1668,13 +1668,10 @@ __dump_topology_ref_history(struct drm_dp_mst_topology_ref_history *history, for (i = 0; i < history->len; i++) { const struct drm_dp_mst_topology_ref_entry *entry = &history->entries[i]; - ulong *entries; - uint nr_entries; u64 ts_nsec = entry->ts_nsec; u32 rem_nsec = do_div(ts_nsec, 1000000000); - nr_entries = stack_depot_fetch(entry->backtrace, &entries); - stack_trace_snprint(buf, PAGE_SIZE, entries, nr_entries, 4); + stack_depot_snprint(entry->backtrace, buf, PAGE_SIZE, 4); drm_printf(&p, " %d %ss (last at %5llu.%06u):\n%s", entry->count, diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 93d48a6f04ab..7d1c578388d3 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -118,8 +118,6 @@ static noinline void save_stack(struct drm_mm_node *node) static void show_leaks(struct drm_mm *mm) { struct drm_mm_node *node; - unsigned long *entries; - unsigned int nr_entries; char *buf; buf = kmalloc(BUFSZ, GFP_KERNEL); @@ -133,8 +131,7 @@ static void show_leaks(struct drm_mm *mm) continue; } - nr_entries = stack_depot_fetch(node->stack, &entries); - stack_trace_snprint(buf, BUFSZ, entries, nr_entries, 0); + stack_depot_snprint(node->stack, buf, BUFSZ, 0); DRM_ERROR("node [%08llx + %08llx]: inserted at\n%s", node->start, node->size, buf); } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 4b7fc4647e46..f2d9ed375109 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -56,8 +56,6 @@ void i915_vma_free(struct i915_vma *vma) static void vma_print_allocator(struct i915_vma *vma, const char *reason) { - unsigned long *entries; - unsigned int nr_entries; char buf[512]; if (!vma->node.stack) { @@ -66,8 +64,7 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) return; } - nr_entries = stack_depot_fetch(vma->node.stack, &entries); - stack_trace_snprint(buf, sizeof(buf), entries, nr_entries, 0); + stack_depot_snprint(vma->node.stack, buf, sizeof(buf), 0); DRM_DEBUG_DRIVER("vma.node [%08llx + %08llx] %s: inserted at %s\n", vma->node.start, vma->node.size, reason, buf); } diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index eaf7688f517d..0d85f3c5c526 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -65,16 +65,6 @@ static noinline depot_stack_handle_t __save_depot_stack(void) return stack_depot_save(entries, n, GFP_NOWAIT | __GFP_NOWARN); } -static void __print_depot_stack(depot_stack_handle_t stack, - char *buf, int sz, int indent) -{ - unsigned long *entries; - unsigned int nr_entries; - - nr_entries = stack_depot_fetch(stack, &entries); - stack_trace_snprint(buf, sz, entries, nr_entries, indent); -} - static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); @@ -146,12 +136,12 @@ static void untrack_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm, if (!buf) return; - __print_depot_stack(stack, buf, PAGE_SIZE, 2); + stack_depot_snprint(stack, buf, PAGE_SIZE, 2); DRM_DEBUG_DRIVER("wakeref %x from\n%s", stack, buf); stack = READ_ONCE(rpm->debug.last_release); if (stack) { - __print_depot_stack(stack, buf, PAGE_SIZE, 2); + stack_depot_snprint(stack, buf, PAGE_SIZE, 2); DRM_DEBUG_DRIVER("wakeref last released at\n%s", buf); } @@ -183,12 +173,12 @@ __print_intel_runtime_pm_wakeref(struct drm_printer *p, return; if (dbg->last_acquire) { - __print_depot_stack(dbg->last_acquire, buf, PAGE_SIZE, 2); + stack_depot_snprint(dbg->last_acquire, buf, PAGE_SIZE, 2); drm_printf(p, "Wakeref last acquired:\n%s", buf); } if (dbg->last_release) { - __print_depot_stack(dbg->last_release, buf, PAGE_SIZE, 2); + stack_depot_snprint(dbg->last_release, buf, PAGE_SIZE, 2); drm_printf(p, "Wakeref last released:\n%s", buf); } @@ -203,7 +193,7 @@ __print_intel_runtime_pm_wakeref(struct drm_printer *p, rep = 1; while (i + 1 < dbg->count && dbg->owners[i + 1] == stack) rep++, i++; - __print_depot_stack(stack, buf, PAGE_SIZE, 2); + stack_depot_snprint(stack, buf, PAGE_SIZE, 2); drm_printf(p, "Wakeref x%lu taken at:\n%s", rep, buf); } diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 8ab37500fcba..c34b55a6e554 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -25,6 +25,9 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries); +int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size, + int spaces); + void stack_depot_print(depot_stack_handle_t stack); #ifdef CONFIG_STACKDEPOT diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 4e1f2982d0fa..b437ae79aca1 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -213,6 +213,31 @@ static inline struct stack_record *find_stack(struct stack_record *bucket, return NULL; } +/** + * stack_depot_snprint - print stack entries from a depot into a buffer + * + * @handle: Stack depot handle which was returned from + * stack_depot_save(). + * @buf: Pointer to the print buffer + * + * @size: Size of the print buffer + * + * @spaces: Number of leading spaces to print + * + * Return: Number of bytes printed. + */ +int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size, + int spaces) +{ + unsigned long *entries; + unsigned int nr_entries; + + nr_entries = stack_depot_fetch(handle, &entries); + return nr_entries ? stack_trace_snprint(buf, size, entries, nr_entries, + spaces) : 0; +} +EXPORT_SYMBOL_GPL(stack_depot_snprint); + /** * stack_depot_print - print stack entries from a depot * diff --git a/mm/page_owner.c b/mm/page_owner.c index eff29be1218b..1653040d1133 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -329,8 +329,6 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, depot_stack_handle_t handle) { int ret, pageblock_mt, page_mt; - unsigned long *entries; - unsigned int nr_entries; char *kbuf; count = min_t(size_t, count, PAGE_SIZE); @@ -361,8 +359,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, if (ret >= count) goto err; - nr_entries = stack_depot_fetch(handle, &entries); - ret += stack_trace_snprint(kbuf + ret, count - ret, entries, nr_entries, 0); + ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0); if (ret >= count) goto err; From bfb3ba32061da1a9217ef6d02fbcffb528e4c8df Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Mon, 8 Nov 2021 18:33:19 -0800 Subject: [PATCH 36/87] include/linux/string_helpers.h: add linux/string.h for strlen() linux/string_helpers.h uses strlen(), so include the correponding header. Otherwise we get a compilation error if it's not also included by whoever included the helper. Link: https://lkml.kernel.org/r/20211005212634.3223113-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/string_helpers.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 68189c4a2eb1..4ba39e1403b2 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -4,6 +4,7 @@ #include #include +#include #include struct file; From 839b395eb9c13ae56ea5fc3ca9802734a72293f0 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Mon, 8 Nov 2021 18:33:22 -0800 Subject: [PATCH 37/87] lib: uninline simple_strntoull() as well Codegen become bloated again after simple_strntoull() introduction add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-224 (-224) Function old new delta simple_strtoul 5 2 -3 simple_strtol 23 20 -3 simple_strtoull 119 15 -104 simple_strtoll 155 41 -114 Link: https://lkml.kernel.org/r/YVmlB9yY4lvbNKYt@localhost.localdomain Signed-off-by: Alexey Dobriyan Cc: Richard Fitzgerald Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/vsprintf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index d7ad44f2c8f5..25d79ed53226 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -53,8 +53,7 @@ #include #include "kstrtox.h" -static unsigned long long simple_strntoull(const char *startp, size_t max_chars, - char **endp, unsigned int base) +static noinline unsigned long long simple_strntoull(const char *startp, size_t max_chars, char **endp, unsigned int base) { const char *cp; unsigned long long result = 0ULL; From 723aca2085166bb7213bf6af1729ddfd94c25a3e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 8 Nov 2021 18:33:25 -0800 Subject: [PATCH 38/87] mm/scatterlist: replace the !preemptible warning in sg_miter_stop() sg_miter_stop() checks for disabled preemption before unmapping a page via kunmap_atomic(). The kernel doc mentions under context that preemption must be disabled if SG_MITER_ATOMIC is set. There is no active requirement for the caller to have preemption disabled before invoking sg_mitter_stop(). The sg_mitter_*() implementation itself has no such requirement. In fact, preemption is disabled by kmap_atomic() as part of sg_miter_next() and remains disabled as long as there is an active SG_MITER_ATOMIC mapping. This is a consequence of kmap_atomic() and not a requirement for sg_mitter_*() itself. The user chooses SG_MITER_ATOMIC because it uses the API in a context where blocking is not possible or blocking is possible but he chooses a lower weight mapping which is not available on all CPUs and so it might need less overhead to setup at a price that now preemption will be disabled. The kmap_atomic() implementation on PREEMPT_RT does not disable preemption. It simply disables CPU migration to ensure that the task remains on the same CPU while the caller remains preemptible. This in turn triggers the warning in sg_miter_stop() because preemption is allowed. The PREEMPT_RT and !PREEMPT_RT implementation of kmap_atomic() disable pagefaults as a requirement. It is sufficient to check for this instead of disabled preemption. Check for disabled pagefault handler in the SG_MITER_ATOMIC case. Remove the "preemption disabled" part from the kernel doc as the sg_milter*() implementation does not care. [bigeasy@linutronix.de: commit description] Link: https://lkml.kernel.org/r/20211015211409.cqopacv3pxdwn2ty@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/scatterlist.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/lib/scatterlist.c b/lib/scatterlist.c index abb3432ed744..d5e82e4a57ad 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -828,8 +828,7 @@ static bool sg_miter_get_next_page(struct sg_mapping_iter *miter) * stops @miter. * * Context: - * Don't care if @miter is stopped, or not proceeded yet. - * Otherwise, preemption disabled if the SG_MITER_ATOMIC is set. + * Don't care. * * Returns: * true if @miter contains the valid mapping. false if end of sg @@ -865,8 +864,7 @@ EXPORT_SYMBOL(sg_miter_skip); * @miter->addr and @miter->length point to the current mapping. * * Context: - * Preemption disabled if SG_MITER_ATOMIC. Preemption must stay disabled - * till @miter is stopped. May sleep if !SG_MITER_ATOMIC. + * May sleep if !SG_MITER_ATOMIC. * * Returns: * true if @miter contains the next mapping. false if end of sg @@ -906,8 +904,7 @@ EXPORT_SYMBOL(sg_miter_next); * need to be released during iteration. * * Context: - * Preemption disabled if the SG_MITER_ATOMIC is set. Don't care - * otherwise. + * Don't care otherwise. */ void sg_miter_stop(struct sg_mapping_iter *miter) { @@ -922,7 +919,7 @@ void sg_miter_stop(struct sg_mapping_iter *miter) flush_dcache_page(miter->page); if (miter->__flags & SG_MITER_ATOMIC) { - WARN_ON_ONCE(preemptible()); + WARN_ON_ONCE(!pagefault_disabled()); kunmap_atomic(miter->addr); } else kunmap(miter->page); From 3e421469dd77ebbe7d5403827ecd5a74bcbcbb0a Mon Sep 17 00:00:00 2001 From: Rikard Falkeborn Date: Mon, 8 Nov 2021 18:33:28 -0800 Subject: [PATCH 39/87] const_structs.checkpatch: add a few sound ops structs Add a couple of commonly used (>50 instances) sound ops structs that are typically const. Link: https://lkml.kernel.org/r/20210922211042.38017-1-rikard.falkeborn@gmail.com Signed-off-by: Rikard Falkeborn Cc: Joe Perches Cc: Liam Girdwood Cc: Mark Brown Cc: Jaroslav Kysela Cc: Takashi Iwai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/const_structs.checkpatch | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/scripts/const_structs.checkpatch b/scripts/const_structs.checkpatch index 1aae4f4fdacc..3980985205a0 100644 --- a/scripts/const_structs.checkpatch +++ b/scripts/const_structs.checkpatch @@ -54,7 +54,11 @@ sd_desc seq_operations sirfsoc_padmux snd_ac97_build_ops +snd_pcm_ops +snd_rawmidi_ops snd_soc_component_driver +snd_soc_dai_ops +snd_soc_ops soc_pcmcia_socket_ops stacktrace_ops sysfs_ops From 70a11659f590b6ac32e6260d5c64a3e83b1272d4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 Nov 2021 18:33:31 -0800 Subject: [PATCH 40/87] checkpatch: improve EXPORT_SYMBOL test for EXPORT_SYMBOL_NS uses The EXPORT_SYMBOL test expects a single argument but definitions of EXPORT_SYMBOL_NS have multiple arguments. Update the test to extract only the first argument from any EXPORT_SYMBOL related definition. Link: https://lkml.kernel.org/r/9e8f251b42e405f460f26a23ba9b33ef45a94adc.camel@perches.com Signed-off-by: Joe Perches Reported-by: Ian Pilcher Cc: Dwaipayan Ray Cc: Lukas Bulwahn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 88cb294dc447..91798b07c6cb 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -4449,6 +4449,7 @@ sub process { # XXX(foo); # EXPORT_SYMBOL(something_foo); my $name = $1; + $name =~ s/^\s*($Ident).*/$1/; if ($stat =~ /^(?:.\s*}\s*\n)?.([A-Z_]+)\s*\(\s*($Ident)/ && $name =~ /^${Ident}_$2/) { #print "FOO C name<$name>\n"; From 0ee3e7b8893eca232cb118cb4874324ebfe9f767 Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Mon, 8 Nov 2021 18:33:34 -0800 Subject: [PATCH 41/87] checkpatch: get default codespell dictionary path from package location The standard location of dictionary.txt is under codespell's package, on my machine atm (codespell 2.1, Artix Linux): /usr/lib/python3.9/site-packages/codespell_lib/data/dictionary.txt Since we enable the codespell by default for SOF I have constant: No codespell typos will be found - file '/usr/share/codespell/dictionary.txt': No such file or directory The patch proposes to try to fix up the path following the recommendation found here: https://github.com/codespell-project/codespell/issues/1540 Link: https://lkml.kernel.org/r/29e25d1364c8ad7f7657cc0660f60c568074d438.camel@perches.com Signed-off-by: Peter Ujfalusi Acked-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 91798b07c6cb..1784921c645d 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -63,6 +63,7 @@ my $min_conf_desc_length = 4; my $spelling_file = "$D/spelling.txt"; my $codespell = 0; my $codespellfile = "/usr/share/codespell/dictionary.txt"; +my $user_codespellfile = ""; my $conststructsfile = "$D/const_structs.checkpatch"; my $docsfile = "$D/../Documentation/dev-tools/checkpatch.rst"; my $typedefsfile; @@ -130,7 +131,7 @@ Options: --ignore-perl-version override checking of perl version. expect runtime errors. --codespell Use the codespell dictionary for spelling/typos - (default:/usr/share/codespell/dictionary.txt) + (default:$codespellfile) --codespellfile Use this codespell dictionary --typedefsfile Read additional types from this file --color[=WHEN] Use colors 'always', 'never', or only when output @@ -317,7 +318,7 @@ GetOptions( 'debug=s' => \%debug, 'test-only=s' => \$tst_only, 'codespell!' => \$codespell, - 'codespellfile=s' => \$codespellfile, + 'codespellfile=s' => \$user_codespellfile, 'typedefsfile=s' => \$typedefsfile, 'color=s' => \$color, 'no-color' => \$color, #keep old behaviors of -nocolor @@ -325,9 +326,32 @@ GetOptions( 'kconfig-prefix=s' => \${CONFIG_}, 'h|help' => \$help, 'version' => \$help -) or help(1); +) or $help = 2; -help(0) if ($help); +if ($user_codespellfile) { + # Use the user provided codespell file unconditionally + $codespellfile = $user_codespellfile; +} elsif (!(-f $codespellfile)) { + # If /usr/share/codespell/dictionary.txt is not present, try to find it + # under codespell's install directory: /data/dictionary.txt + if (($codespell || $help) && which("codespell") ne "" && which("python") ne "") { + my $python_codespell_dict = << "EOF"; + +import os.path as op +import codespell_lib +codespell_dir = op.dirname(codespell_lib.__file__) +codespell_file = op.join(codespell_dir, 'data', 'dictionary.txt') +print(codespell_file, end='') +EOF + + my $codespell_dict = `python -c "$python_codespell_dict" 2> /dev/null`; + $codespellfile = $codespell_dict if (-f $codespell_dict); + } +} + +# $help is 1 if either -h, --help or --version is passed as option - exitcode: 0 +# $help is 2 if invalid option is passed - exitcode: 1 +help($help - 1) if ($help); die "$P: --git cannot be used with --file or --fix\n" if ($git && ($file || $fix)); die "$P: --verbose cannot be used with --terse\n" if ($verbose && $terse); From 5f501d555653f8968011a1e65ebb121c8b43c144 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 Nov 2021 18:33:37 -0800 Subject: [PATCH 42/87] binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE Commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") reverted back to using MAP_FIXED to map ELF LOAD segments because it was found that the segments in some binaries overlap and can cause MAP_FIXED_NOREPLACE to fail. The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to prevent the silent clobbering of an existing mapping (e.g. stack) by the ELF image, which could lead to exploitable conditions. Quoting commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map"), which originally introduced the use of MAP_FIXED_NOREPLACE in the loader: Both load_elf_interp and load_elf_binary rely on elf_map to map segments [to a specific] address and they use MAP_FIXED to enforce that. This is however [a] dangerous thing prone to silent data corruption which can be even exploitable. ... Let's take CVE-2017-1000253 as an example ... we could end up mapping [the executable] over the existing stack ... The [stack layout] issue has been fixed since then ... So we should be safe and any [similar] attack should be impractical. On the other hand this is just too subtle [an] assumption ... it can break quite easily and [be] hard to spot. ... Address this [weakness] by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one [instead of silently] clobbering it. Then processing ET_DYN binaries the loader already calculates a total size for the image when the first segment is mapped, maps the entire image, and then unmaps the remainder before the remaining segments are then individually mapped. To avoid the earlier problems (legitimate overlapping LOAD segments specified in the ELF), apply the same logic to ET_EXEC binaries as well. For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the initial total size mapping and then use MAP_FIXED to build the final (possibly legitimately overlapping) mappings. For ET_DYN w/out INTERP, continue to map at a system-selected address in the mmap region. Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.com Co-developed-by: Anthony Yznaga Signed-off-by: Anthony Yznaga Signed-off-by: Kees Cook Cc: Russell King Cc: Michal Hocko Cc: Eric Biederman Cc: Chen Jingwen Cc: Alexander Viro Cc: Andrei Vagin Cc: Khalid Aziz Cc: Michael Ellerman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/binfmt_elf.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a813b70f594e..7537a19f0484 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1074,20 +1074,26 @@ out_free_interp: vaddr = elf_ppnt->p_vaddr; /* - * If we are loading ET_EXEC or we have already performed - * the ET_DYN load_addr calculations, proceed normally. + * The first time through the loop, load_addr_set is false: + * layout will be calculated. Once set, use MAP_FIXED since + * we know we've already safely mapped the entire region with + * MAP_FIXED_NOREPLACE in the once-per-binary logic following. */ - if (elf_ex->e_type == ET_EXEC || load_addr_set) { + if (load_addr_set) { elf_flags |= MAP_FIXED; + } else if (elf_ex->e_type == ET_EXEC) { + /* + * This logic is run once for the first LOAD Program + * Header for ET_EXEC binaries. No special handling + * is needed. + */ + elf_flags |= MAP_FIXED_NOREPLACE; } else if (elf_ex->e_type == ET_DYN) { /* * This logic is run once for the first LOAD Program * Header for ET_DYN binaries to calculate the * randomization (load_bias) for all the LOAD - * Program Headers, and to calculate the entire - * size of the ELF mapping (total_size). (Note that - * load_addr_set is set to true later once the - * initial mapping is performed.) + * Program Headers. * * There are effectively two types of ET_DYN * binaries: programs (i.e. PIE: ET_DYN with INTERP) @@ -1108,7 +1114,7 @@ out_free_interp: * Therefore, programs are loaded offset from * ELF_ET_DYN_BASE and loaders are loaded into the * independently randomized mmap region (0 load_bias - * without MAP_FIXED). + * without MAP_FIXED nor MAP_FIXED_NOREPLACE). */ if (interpreter) { load_bias = ELF_ET_DYN_BASE; @@ -1117,7 +1123,7 @@ out_free_interp: alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum); if (alignment) load_bias &= ~(alignment - 1); - elf_flags |= MAP_FIXED; + elf_flags |= MAP_FIXED_NOREPLACE; } else load_bias = 0; @@ -1129,7 +1135,14 @@ out_free_interp: * is then page aligned. */ load_bias = ELF_PAGESTART(load_bias - vaddr); + } + /* + * Calculate the entire size of the ELF mapping (total_size). + * (Note that load_addr_set is set to true later once the + * initial mapping is performed.) + */ + if (!load_addr_set) { total_size = total_mapping_size(elf_phdata, elf_ex->e_phnum); if (!total_size) { From a43e5e3a0227435e580a9e58f57c5face82368ee Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Mon, 8 Nov 2021 18:33:40 -0800 Subject: [PATCH 43/87] ELF: simplify STACK_ALLOC macro "A -= B; A" is equivalent to "A -= B". Link: https://lkml.kernel.org/r/YVmcP256fRMqCwgK@localhost.localdomain Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/binfmt_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 7537a19f0484..824e26a35b77 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -156,7 +156,7 @@ static int padzero(unsigned long elf_bss) #define STACK_ADD(sp, items) ((elf_addr_t __user *)(sp) - (items)) #define STACK_ROUND(sp, items) \ (((unsigned long) (sp - items)) &~ 15UL) -#define STACK_ALLOC(sp, len) ({ sp -= len ; sp; }) +#define STACK_ALLOC(sp, len) (sp -= len) #endif #ifndef ELF_BASE_PLATFORM From 1b1ad288b8f1b11f83396e537003722897ecc12b Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:33:43 -0800 Subject: [PATCH 44/87] kallsyms: remove arch specific text and data check Patch series "sections: Unify kernel sections range check and use", v4. There are three head files(kallsyms.h, kernel.h and sections.h) which include the kernel sections range check, let's make some cleanup and unify them. 1. cleanup arch specific text/data check and fix address boundary check in kallsyms.h 2. make all the basic/core kernel range check function into sections.h 3. update all the callers, and use the helper in sections.h to simplify the code After this series, we have 5 APIs about kernel sections range check in sections.h * is_kernel_rodata() --- already in sections.h * is_kernel_core_data() --- come from core_kernel_data() in kernel.h * is_kernel_inittext() --- come from kernel.h and kallsyms.h * __is_kernel_text() --- add new internal helper * __is_kernel() --- add new internal helper Note: For the last two helpers, people should not use directly, consider to use corresponding function in kallsyms.h. This patch (of 11): Remove arch specific text and data check after commit 4ba66a976072 ("arch: remove blackfin port"), no need arch-specific text/data check. Link: https://lkml.kernel.org/r/20210930071143.63410-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20210930071143.63410-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Arnd Bergmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: David S. Miller Cc: Alexei Starovoitov Cc: Andrey Ryabinin Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christophe Leroy Cc: Kefeng Wang Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michal Simek Cc: Petr Mladek Cc: Richard Henderson Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-generic/sections.h | 16 ---------------- include/linux/kallsyms.h | 3 +-- kernel/locking/lockdep.c | 3 --- 3 files changed, 1 insertion(+), 21 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 596ab2092289..614fc809de34 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end; #define dereference_kernel_function_descriptor(p) ((void *)(p)) #endif -/* random extra sections (if any). Override - * in asm/sections.h */ -#ifndef arch_is_kernel_text -static inline int arch_is_kernel_text(unsigned long addr) -{ - return 0; -} -#endif - -#ifndef arch_is_kernel_data -static inline int arch_is_kernel_data(unsigned long addr) -{ - return 0; -} -#endif - /** * memory_contains - checks if an object is contained within a memory region * @begin: virtual address of the beginning of the memory region diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index a1d6fc82d7f0..bac340972670 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr) static inline int is_kernel_text(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) || - arch_is_kernel_text(addr)) + if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext)) return 1; return in_gate_area_no_mm(addr); } diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 8e118caf835e..f2d3bf4960f4 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -818,9 +818,6 @@ static int static_obj(const void *obj) if ((addr >= start) && (addr < end)) return 1; - if (arch_is_kernel_data(addr)) - return 1; - /* * in-kernel percpu var? */ From e7d5c4b0eb9be6ea5c2cfb510a9d6f56925118eb Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:33:47 -0800 Subject: [PATCH 45/87] kallsyms: fix address-checks for kernel related range The is_kernel_inittext/is_kernel_text/is_kernel function should not include the end address(the labels _einittext, _etext and _end) when check the address range, the issue exists since Linux v2.6.12. Link: https://lkml.kernel.org/r/20210930071143.63410-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Petr Mladek Reviewed-by: Steven Rostedt (VMware) Acked-by: Sergey Senozhatsky Cc: Arnd Bergmann Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Richard Henderson Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kallsyms.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index bac340972670..62ce22b27ea8 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -27,21 +27,21 @@ struct module; static inline int is_kernel_inittext(unsigned long addr) { if (addr >= (unsigned long)_sinittext - && addr <= (unsigned long)_einittext) + && addr < (unsigned long)_einittext) return 1; return 0; } static inline int is_kernel_text(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext)) + if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) return 1; return in_gate_area_no_mm(addr); } static inline int is_kernel(unsigned long addr) { - if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end) + if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) return 1; return in_gate_area_no_mm(addr); } From a20deb3a348719adaf8c12e1bf4b599bfc51836e Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:33:51 -0800 Subject: [PATCH 46/87] sections: move and rename core_kernel_data() to is_kernel_core_data() Move core_kernel_data() into sections.h and rename it to is_kernel_core_data(), also make it return bool value, then update all the callers. Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Arnd Bergmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: "David S. Miller" Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: Dmitry Vyukov Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-generic/sections.h | 16 ++++++++++++++++ include/linux/kernel.h | 1 - kernel/extable.c | 18 ------------------ kernel/trace/ftrace.c | 2 +- net/sysctl_net.c | 2 +- 5 files changed, 18 insertions(+), 21 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 614fc809de34..7cba8ff10d3a 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -128,6 +128,22 @@ static inline bool init_section_intersects(void *virt, size_t size) return memory_intersects(__init_begin, __init_end, virt, size); } +/** + * is_kernel_core_data - checks if the pointer address is located in the + * .data section + * + * @addr: address to check + * + * Returns: true if the address is located in .data, false otherwise. + * Note: On some archs it may return true for core RODATA, and false + * for others. But will always be true for core RW data. + */ +static inline bool is_kernel_core_data(unsigned long addr) +{ + return addr >= (unsigned long)_sdata && + addr < (unsigned long)_edata; +} + /** * is_kernel_rodata - checks if the pointer address is located in the * .rodata section diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 46ca4404fb93..23f57a2d5a13 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -227,7 +227,6 @@ extern char *next_arg(char *args, char **param, char **val); extern int core_kernel_text(unsigned long addr); extern int init_kernel_text(unsigned long addr); -extern int core_kernel_data(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); extern int func_ptr_is_kernel_text(void *ptr); diff --git a/kernel/extable.c b/kernel/extable.c index 290661f68e6b..0e3412b48bba 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr) return 0; } -/** - * core_kernel_data - tell if addr points to kernel data - * @addr: address to test - * - * Returns true if @addr passed in is from the core kernel data - * section. - * - * Note: On some archs it may return true for core RODATA, and false - * for others. But will always be true for core RW data. - */ -int core_kernel_data(unsigned long addr) -{ - if (addr >= (unsigned long)_sdata && - addr < (unsigned long)_edata) - return 1; - return 0; -} - int __kernel_text_address(unsigned long addr) { if (kernel_text_address(addr)) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index feebf57c6458..7aaef2b78760 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops) if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT)) return -EBUSY; - if (!core_kernel_data((unsigned long)ops)) + if (!is_kernel_core_data((unsigned long)ops)) ops->flags |= FTRACE_OPS_FL_DYNAMIC; add_ftrace_ops(&ftrace_ops_list, ops); diff --git a/net/sysctl_net.c b/net/sysctl_net.c index f6cb0d4d114c..4b45ed631eb8 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path, addr = (unsigned long)ent->data; if (is_module_address(addr)) where = "module"; - else if (core_kernel_data(addr)) + else if (is_kernel_core_data(addr)) where = "kernel"; else continue; From b9ad8fe7b8cab3814d1de41e8ddca602b2646f4b Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:33:54 -0800 Subject: [PATCH 47/87] sections: move is_kernel_inittext() into sections.h The is_kernel_inittext() and init_kernel_text() are with same functionality, let's just keep is_kernel_inittext() and move it into sections.h, then update all the callers. Link: https://lkml.kernel.org/r/20210930071143.63410-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Arnd Bergmann Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/kernel/unwind_orc.c | 2 +- include/asm-generic/sections.h | 14 ++++++++++++++ include/linux/kallsyms.h | 8 -------- include/linux/kernel.h | 1 - kernel/extable.c | 12 ++---------- 5 files changed, 17 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index a1202536fc57..d92ec2ced059 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip) } /* vmlinux .init slow lookup: */ - if (init_kernel_text(ip)) + if (is_kernel_inittext(ip)) return __orc_find(__start_orc_unwind_ip, __start_orc_unwind, __stop_orc_unwind_ip - __start_orc_unwind_ip, ip); diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 7cba8ff10d3a..f82806ca8756 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -158,4 +158,18 @@ static inline bool is_kernel_rodata(unsigned long addr) addr < (unsigned long)__end_rodata; } +/** + * is_kernel_inittext - checks if the pointer address is located in the + * .init.text section + * + * @addr: address to check + * + * Returns: true if the address is located in .init.text, false otherwise. + */ +static inline bool is_kernel_inittext(unsigned long addr) +{ + return addr >= (unsigned long)_sinittext && + addr < (unsigned long)_einittext; +} + #endif /* _ASM_GENERIC_SECTIONS_H_ */ diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 62ce22b27ea8..bb5d9ec78e13 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -24,14 +24,6 @@ struct cred; struct module; -static inline int is_kernel_inittext(unsigned long addr) -{ - if (addr >= (unsigned long)_sinittext - && addr < (unsigned long)_einittext) - return 1; - return 0; -} - static inline int is_kernel_text(unsigned long addr) { if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 23f57a2d5a13..be84ab369650 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -226,7 +226,6 @@ extern bool parse_option_str(const char *str, const char *option); extern char *next_arg(char *args, char **param, char **val); extern int core_kernel_text(unsigned long addr); -extern int init_kernel_text(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); extern int func_ptr_is_kernel_text(void *ptr); diff --git a/kernel/extable.c b/kernel/extable.c index 0e3412b48bba..6505207aa7e6 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -62,14 +62,6 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr) return e; } -int init_kernel_text(unsigned long addr) -{ - if (addr >= (unsigned long)_sinittext && - addr < (unsigned long)_einittext) - return 1; - return 0; -} - int notrace core_kernel_text(unsigned long addr) { if (addr >= (unsigned long)_stext && @@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr) return 1; if (system_state < SYSTEM_FREEING_INITMEM && - init_kernel_text(addr)) + is_kernel_inittext(addr)) return 1; return 0; } @@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr) * Since we are after the module-symbols check, there's * no danger of address overlap: */ - if (init_kernel_text(addr)) + if (is_kernel_inittext(addr)) return 1; return 0; } From 0a96c902d46cf7abcf00c96df7259ec92b5d9e29 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:33:58 -0800 Subject: [PATCH 48/87] x86: mm: rename __is_kernel_text() to is_x86_32_kernel_text() Commit b56cd05c55a1 ("x86/mm: Rename is_kernel_text to __is_kernel_text"), add '__' prefix not to get in conflict with existing is_kernel_text() in . We will add __is_kernel_text() for the basic kernel text range check in the next patch, so use private is_x86_32_kernel_text() naming for x86 special check. Link: https://lkml.kernel.org/r/20210930071143.63410-6-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Ingo Molnar Cc: Borislav Petkov Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/mm/init_32.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 5cd7ea6d645c..d4e2648a1dfb 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -238,11 +238,7 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base) } } -/* - * The already defines is_kernel_text, - * using '__' prefix not to get in conflict. - */ -static inline int __is_kernel_text(unsigned long addr) +static inline int is_x86_32_kernel_text(unsigned long addr) { if (addr >= (unsigned long)_text && addr <= (unsigned long)__init_end) return 1; @@ -333,8 +329,8 @@ repeat: addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE + PAGE_OFFSET + PAGE_SIZE-1; - if (__is_kernel_text(addr) || - __is_kernel_text(addr2)) + if (is_x86_32_kernel_text(addr) || + is_x86_32_kernel_text(addr2)) prot = PAGE_KERNEL_LARGE_EXEC; pages_2m++; @@ -359,7 +355,7 @@ repeat: */ pgprot_t init_prot = __pgprot(PTE_IDENT_ATTR); - if (__is_kernel_text(addr)) + if (is_x86_32_kernel_text(addr)) prot = PAGE_KERNEL_EXEC; pages_4k++; @@ -789,7 +785,7 @@ static void mark_nxdata_nx(void) */ unsigned long start = PFN_ALIGN(_etext); /* - * This comes from __is_kernel_text upper limit. Also HPAGE where used: + * This comes from is_x86_32_kernel_text upper limit. Also HPAGE where used: */ unsigned long size = (((unsigned long)__init_end + HPAGE_SIZE) & HPAGE_MASK) - start; From 8f6e42e83362650007ed298070a69d084f2c8baf Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:02 -0800 Subject: [PATCH 49/87] sections: provide internal __is_kernel() and __is_kernel_text() helper An internal __is_kernel() helper which only check the kernel address ranges, and an internal __is_kernel_text() helper which only check text section ranges. Link: https://lkml.kernel.org/r/20210930071143.63410-7-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-generic/sections.h | 29 +++++++++++++++++++++++++++++ include/linux/kallsyms.h | 4 ++-- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index f82806ca8756..1dfadb2e878d 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -172,4 +172,33 @@ static inline bool is_kernel_inittext(unsigned long addr) addr < (unsigned long)_einittext; } +/** + * __is_kernel_text - checks if the pointer address is located in the + * .text section + * + * @addr: address to check + * + * Returns: true if the address is located in .text, false otherwise. + * Note: an internal helper, only check the range of _stext to _etext. + */ +static inline bool __is_kernel_text(unsigned long addr) +{ + return addr >= (unsigned long)_stext && + addr < (unsigned long)_etext; +} + +/** + * __is_kernel - checks if the pointer address is located in the kernel range + * + * @addr: address to check + * + * Returns: true if the address is located in the kernel range, false otherwise. + * Note: an internal helper, only check the range of _stext to _end. + */ +static inline bool __is_kernel(unsigned long addr) +{ + return addr >= (unsigned long)_stext && + addr < (unsigned long)_end; +} + #endif /* _ASM_GENERIC_SECTIONS_H_ */ diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index bb5d9ec78e13..4176c7eca7b5 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -26,14 +26,14 @@ struct module; static inline int is_kernel_text(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) + if (__is_kernel_text(addr)) return 1; return in_gate_area_no_mm(addr); } static inline int is_kernel(unsigned long addr) { - if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) + if (__is_kernel(addr)) return 1; return in_gate_area_no_mm(addr); } From 3298cbe8046a964d62bb920af3b64121fba50b55 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:05 -0800 Subject: [PATCH 50/87] mm: kasan: use is_kernel() helper Directly use is_kernel() helper in kernel_or_module_addr(). Link: https://lkml.kernel.org/r/20210930071143.63410-8-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Alexander Potapenko Reviewed-by: Andrey Konovalov Reviewed-by: Sergey Senozhatsky Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Alexei Starovoitov Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/report.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 3239fd8f8747..1c955e1c98d5 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -226,7 +226,7 @@ static void describe_object(struct kmem_cache *cache, void *object, static inline bool kernel_or_module_addr(const void *addr) { - if (addr >= (void *)_stext && addr < (void *)_end) + if (is_kernel((unsigned long)addr)) return true; if (is_module_address((unsigned long)addr)) return true; From 808b64565b028bd072c985624d5cbe4b6f5a5a35 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:09 -0800 Subject: [PATCH 51/87] extable: use is_kernel_text() helper The core_kernel_text() should check the gate area, as it is part of kernel text range, use is_kernel_text() in core_kernel_text(). Link: https://lkml.kernel.org/r/20210930071143.63410-9-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Steven Rostedt Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/extable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/extable.c b/kernel/extable.c index 6505207aa7e6..b6f330f0fe74 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -64,8 +64,7 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr) int notrace core_kernel_text(unsigned long addr) { - if (addr >= (unsigned long)_stext && - addr < (unsigned long)_etext) + if (is_kernel_text(addr)) return 1; if (system_state < SYSTEM_FREEING_INITMEM && From 843a1ffaf6f2eafd333e03af16eb9c4ec68febc4 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:13 -0800 Subject: [PATCH 52/87] powerpc/mm: use core_kernel_text() helper Use core_kernel_text() helper to simplify code, also drop etext, _stext, _sinittext, _einittext declaration which already declared in section.h. Link: https://lkml.kernel.org/r/20210930071143.63410-10-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christophe Leroy Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Borislav Petkov Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michal Simek Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/powerpc/mm/pgtable_32.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index dcf5ecca19d9..079abbf45a33 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -33,8 +33,6 @@ #include -extern char etext[], _stext[], _sinittext[], _einittext[]; - static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data; notrace void __init early_ioremap_init(void) @@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) { unsigned long v, s; phys_addr_t p; - int ktext; + bool ktext; s = offset; v = PAGE_OFFSET + s; p = memstart_addr + s; for (; s < top; s += PAGE_SIZE) { - ktext = ((char *)v >= _stext && (char *)v < etext) || - ((char *)v >= _sinittext && (char *)v < _einittext); + ktext = core_kernel_text(v); map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL); v += PAGE_SIZE; p += PAGE_SIZE; From 4b5ef1e11421af5ddd43e7989b70bbfcdd3e8d5b Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:17 -0800 Subject: [PATCH 53/87] microblaze: use is_kernel_text() helper Use is_kernel_text() helper to simplify code. Link: https://lkml.kernel.org/r/20210930071143.63410-11-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Michal Simek Reviewed-by: Sergey Senozhatsky Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Michael Ellerman Cc: Paul Mackerras Cc: Petr Mladek Cc: Richard Henderson Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/microblaze/mm/pgtable.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c index c1833b159d3b..9f73265aad4e 100644 --- a/arch/microblaze/mm/pgtable.c +++ b/arch/microblaze/mm/pgtable.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -171,7 +172,7 @@ void __init mapin_ram(void) for (s = 0; s < lowmem_size; s += PAGE_SIZE) { f = _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_SHARED | _PAGE_HWEXEC; - if ((char *) v < _stext || (char *) v >= _etext) + if (!is_kernel_text(v)) f |= _PAGE_WRENABLE; else /* On the MicroBlaze, no user access From 2d93a5835a37684bb6171af71a2657514539df3f Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 8 Nov 2021 18:34:20 -0800 Subject: [PATCH 54/87] alpha: use is_kernel_text() helper Use is_kernel_text() helper to simplify code. Link: https://lkml.kernel.org/r/20210930071143.63410-12-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Sergey Senozhatsky Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Alexander Potapenko Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christophe Leroy Cc: "David S. Miller" Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Paul Mackerras Cc: Petr Mladek Cc: Steven Rostedt Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/alpha/kernel/traps.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c index e805106409f7..2ae34702456c 100644 --- a/arch/alpha/kernel/traps.c +++ b/arch/alpha/kernel/traps.c @@ -129,9 +129,7 @@ dik_show_trace(unsigned long *sp, const char *loglvl) extern char _stext[], _etext[]; unsigned long tmp = *sp; sp++; - if (tmp < (unsigned long) &_stext) - continue; - if (tmp >= (unsigned long) &_etext) + if (!is_kernel_text(tmp)) continue; printk("%s[<%lx>] %pSR\n", loglvl, tmp, (void *)tmp); if (i > 40) { From 0858d7da8a09e440fb192a0239d20249a2d16af8 Mon Sep 17 00:00:00 2001 From: yangerkun Date: Mon, 8 Nov 2021 18:34:24 -0800 Subject: [PATCH 55/87] ramfs: fix mount source show for ramfs ramfs_parse_param does not parse key "source", and will convert -ENOPARAM to 0. This will skip vfs_parse_fs_param_source in vfs_parse_fs_param, which lead always "none" mount source for ramfs. Fix it by parsing "source" in ramfs_parse_param like cgroup1_parse_param does. Link: https://lkml.kernel.org/r/20210924091756.1906118-1-yangerkun@huawei.com Signed-off-by: yangerkun Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ramfs/inode.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index 65e7e56005b8..df4515d39cd4 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -203,17 +203,20 @@ static int ramfs_parse_param(struct fs_context *fc, struct fs_parameter *param) int opt; opt = fs_parse(fc, ramfs_fs_parameters, param, &result); - if (opt < 0) { + if (opt == -ENOPARAM) { + opt = vfs_parse_fs_param_source(fc, param); + if (opt != -ENOPARAM) + return opt; /* * We might like to report bad mount options here; * but traditionally ramfs has ignored all mount options, * and as it is used as a !CONFIG_SHMEM simple substitute * for tmpfs, better continue to ignore other mount options. */ - if (opt == -ENOPARAM) - opt = 0; - return opt; + return 0; } + if (opt < 0) + return opt; switch (opt) { case Opt_mode: From 8bc2b3dca7292347d8e715fb723c587134abe013 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Mon, 8 Nov 2021 18:34:27 -0800 Subject: [PATCH 56/87] init: make unknown command line param message clearer The prior message is confusing users, which is the exact opposite of the goal. If the message is being seen, one of the following situations is happening: 1. the param is misspelled 2. the param is not valid due to the kernel configuration 3. the param is intended for init but isn't after the '--' delineator on the command line To make that more clear to the user, explicitly mention "kernel command line" and also note that the params are still passed to user space to avoid causing any alarm over params intended for init. Link: https://lkml.kernel.org/r/20211013223502.96756-1-ahalaney@redhat.com Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters") Signed-off-by: Andrew Halaney Suggested-by: Steven Rostedt (VMware) Acked-by: Randy Dunlap Cc: Borislav Petkov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/init/main.c b/init/main.c index f0001af8ebb9..158ba0827106 100644 --- a/init/main.c +++ b/init/main.c @@ -924,7 +924,9 @@ static void __init print_unknown_bootoptions(void) for (p = &envp_init[2]; *p; p++) end += sprintf(end, " %s", *p); - pr_notice("Unknown command line parameters:%s\n", unknown_options); + /* Start at unknown_options[1] to skip the initial space */ + pr_notice("Unknown kernel command line parameters \"%s\", will be passed to user space.\n", + &unknown_options[1]); memblock_free(unknown_options, len); } From 18319cb478de23340fdcb6385b0cc074a5416da7 Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:30 -0800 Subject: [PATCH 57/87] coda: avoid NULL pointer dereference from a bad inode Patch series "Coda updates for -next". The following patch series contains some fixes for the Coda kernel module I've had sitting around and were tested extensively in a development version of the Coda kernel module that lives outside of the main kernel. This patch (of 9): Avoid accessing coda_inode_info from a dentry with a bad inode. Link: https://lkml.kernel.org/r/20210908140308.18491-1-jaharkes@cs.cmu.edu Link: https://lkml.kernel.org/r/20210908140308.18491-2-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/dir.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/coda/dir.c b/fs/coda/dir.c index d69989c1bac3..3fd085009f26 100644 --- a/fs/coda/dir.c +++ b/fs/coda/dir.c @@ -499,15 +499,20 @@ out: */ static int coda_dentry_delete(const struct dentry * dentry) { - int flags; + struct inode *inode; + struct coda_inode_info *cii; if (d_really_is_negative(dentry)) return 0; - flags = (ITOC(d_inode(dentry))->c_flags) & C_PURGE; - if (is_bad_inode(d_inode(dentry)) || flags) { + inode = d_inode(dentry); + if (!inode || is_bad_inode(inode)) return 1; - } + + cii = ITOC(inode); + if (cii->c_flags & C_PURGE) + return 1; + return 0; } From 3d8e72d97411370aab662e85e2c3a7b26555179c Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:33 -0800 Subject: [PATCH 58/87] coda: check for async upcall request using local state Originally flagged by Smatch because the code implicitly assumed outSize is not NULL for non-async upcalls because of a flag that was (not) set in req->uc_flags. However req->uc_flags field is in shared state and although the current code will not allow it to be changed before the async request check the code is more robust when it tests against the local outSize variable. Link: https://lkml.kernel.org/r/20210908140308.18491-3-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/upcall.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index eb3b1898da46..59f6cfd06f96 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -744,7 +744,8 @@ static int coda_upcall(struct venus_comm *vcp, list_add_tail(&req->uc_chain, &vcp->vc_pending); wake_up_interruptible(&vcp->vc_waitq); - if (req->uc_flags & CODA_REQ_ASYNC) { + /* We can return early on asynchronous requests */ + if (outSize == NULL) { mutex_unlock(&vcp->vc_mutex); return 0; } From b1deb685b07945bb374fa75857033bb658a1253b Mon Sep 17 00:00:00 2001 From: Alex Shi Date: Mon, 8 Nov 2021 18:34:36 -0800 Subject: [PATCH 59/87] coda: remove err which no one care No one care 'err' in func coda_release, so better remove it. Link: https://lkml.kernel.org/r/20210908140308.18491-4-jaharkes@cs.cmu.edu Signed-off-by: Alex Shi Signed-off-by: Jan Harkes Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/coda/file.c b/fs/coda/file.c index ef5ca22bfb3e..52deab784667 100644 --- a/fs/coda/file.c +++ b/fs/coda/file.c @@ -238,11 +238,10 @@ int coda_release(struct inode *coda_inode, struct file *coda_file) struct coda_file_info *cfi; struct coda_inode_info *cii; struct inode *host_inode; - int err; cfi = coda_ftoc(coda_file); - err = venus_close(coda_inode->i_sb, coda_i2f(coda_inode), + venus_close(coda_inode->i_sb, coda_i2f(coda_inode), coda_flags, coda_file->f_cred->fsuid); host_inode = file_inode(cfi->cfi_container); From 76097eb7a48a2ddcf4755773bd501c7aa14cbb7d Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:39 -0800 Subject: [PATCH 60/87] coda: avoid flagging NULL inodes Somehow we hit a negative dentry in coda_rename even after checking with d_really_is_positive. Maybe something raced and turned the new_dentry negative while we were fixing up directory link counts. Link: https://lkml.kernel.org/r/20210908140308.18491-5-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/coda_linux.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/coda/coda_linux.h b/fs/coda/coda_linux.h index e7b27754ce78..3c2947bba5e5 100644 --- a/fs/coda/coda_linux.h +++ b/fs/coda/coda_linux.h @@ -83,6 +83,9 @@ static __inline__ void coda_flag_inode(struct inode *inode, int flag) { struct coda_inode_info *cii = ITOC(inode); + if (!inode) + return; + spin_lock(&cii->c_lock); cii->c_flags |= flag; spin_unlock(&cii->c_lock); From b2e36228367a8a097c7d15670fad47d77098f56d Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:42 -0800 Subject: [PATCH 61/87] coda: avoid hidden code duplication in rename We were actually fixing up the directory mtime in both branches after the negative dentry test, it was just that one branch was only flagging the directory inodes to refresh their attributes while the other branch used the optional optimization to set mtime to the current time and not go back to the Coda client. Link: https://lkml.kernel.org/r/20210908140308.18491-6-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/dir.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/coda/dir.c b/fs/coda/dir.c index 3fd085009f26..328d7a684b63 100644 --- a/fs/coda/dir.c +++ b/fs/coda/dir.c @@ -317,13 +317,10 @@ static int coda_rename(struct user_namespace *mnt_userns, struct inode *old_dir, coda_dir_drop_nlink(old_dir); coda_dir_inc_nlink(new_dir); } - coda_dir_update_mtime(old_dir); - coda_dir_update_mtime(new_dir); coda_flag_inode(d_inode(new_dentry), C_VATTR); - } else { - coda_flag_inode(old_dir, C_VATTR); - coda_flag_inode(new_dir, C_VATTR); } + coda_dir_update_mtime(old_dir); + coda_dir_update_mtime(new_dir); } return error; } From 5a646fb3a3e2de8bbab19d41826b3e54b09969bc Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:45 -0800 Subject: [PATCH 62/87] coda: avoid doing bad things on inode type changes during revalidation When Coda discovers an inconsistent object, it turns it into a symlink. However we can't just follow this change in the kernel on an existing file or directory inode that may still have references. This patch removes the inconsistent inode from the inode hash and allocates a new inode for the symlink object. Link: https://lkml.kernel.org/r/20210908140308.18491-7-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/cnode.c | 13 +++++++++---- fs/coda/coda_linux.c | 39 +++++++++++++++++++-------------------- fs/coda/coda_linux.h | 3 ++- 3 files changed, 30 insertions(+), 25 deletions(-) diff --git a/fs/coda/cnode.c b/fs/coda/cnode.c index 06855f6c7902..62a3d2565c26 100644 --- a/fs/coda/cnode.c +++ b/fs/coda/cnode.c @@ -63,9 +63,10 @@ struct inode * coda_iget(struct super_block * sb, struct CodaFid * fid, struct inode *inode; struct coda_inode_info *cii; unsigned long hash = coda_f2i(fid); + umode_t inode_type = coda_inode_type(attr); +retry: inode = iget5_locked(sb, hash, coda_test_inode, coda_set_inode, fid); - if (!inode) return ERR_PTR(-ENOMEM); @@ -75,11 +76,15 @@ struct inode * coda_iget(struct super_block * sb, struct CodaFid * fid, inode->i_ino = hash; /* inode is locked and unique, no need to grab cii->c_lock */ cii->c_mapcount = 0; + coda_fill_inode(inode, attr); unlock_new_inode(inode); + } else if ((inode->i_mode & S_IFMT) != inode_type) { + /* Inode has changed type, mark bad and grab a new one */ + remove_inode_hash(inode); + coda_flag_inode(inode, C_PURGE); + iput(inode); + goto retry; } - - /* always replace the attributes, type might have changed */ - coda_fill_inode(inode, attr); return inode; } diff --git a/fs/coda/coda_linux.c b/fs/coda/coda_linux.c index 2e1a5a192074..903ca8fa4b9b 100644 --- a/fs/coda/coda_linux.c +++ b/fs/coda/coda_linux.c @@ -87,28 +87,27 @@ static struct coda_timespec timespec64_to_coda(struct timespec64 ts64) } /* utility functions below */ +umode_t coda_inode_type(struct coda_vattr *attr) +{ + switch (attr->va_type) { + case C_VREG: + return S_IFREG; + case C_VDIR: + return S_IFDIR; + case C_VLNK: + return S_IFLNK; + case C_VNON: + default: + return 0; + } +} + void coda_vattr_to_iattr(struct inode *inode, struct coda_vattr *attr) { - int inode_type; - /* inode's i_flags, i_ino are set by iget - XXX: is this all we need ?? - */ - switch (attr->va_type) { - case C_VNON: - inode_type = 0; - break; - case C_VREG: - inode_type = S_IFREG; - break; - case C_VDIR: - inode_type = S_IFDIR; - break; - case C_VLNK: - inode_type = S_IFLNK; - break; - default: - inode_type = 0; - } + /* inode's i_flags, i_ino are set by iget + * XXX: is this all we need ?? + */ + umode_t inode_type = coda_inode_type(attr); inode->i_mode |= inode_type; if (attr->va_mode != (u_short) -1) diff --git a/fs/coda/coda_linux.h b/fs/coda/coda_linux.h index 3c2947bba5e5..9be281bbcc06 100644 --- a/fs/coda/coda_linux.h +++ b/fs/coda/coda_linux.h @@ -53,10 +53,11 @@ int coda_getattr(struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int); int coda_setattr(struct user_namespace *, struct dentry *, struct iattr *); -/* this file: heloers */ +/* this file: helpers */ char *coda_f2s(struct CodaFid *f); int coda_iscontrol(const char *name, size_t length); +umode_t coda_inode_type(struct coda_vattr *attr); void coda_vattr_to_iattr(struct inode *, struct coda_vattr *); void coda_iattr_to_vattr(struct iattr *, struct coda_vattr *); unsigned short coda_flags_to_cflags(unsigned short); From 1077c2857791076a7e81b0ba91571f136cee08e4 Mon Sep 17 00:00:00 2001 From: Xiyu Yang Date: Mon, 8 Nov 2021 18:34:48 -0800 Subject: [PATCH 63/87] coda: convert from atomic_t to refcount_t on coda_vm_ops->refcnt refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Link: https://lkml.kernel.org/r/20210908140308.18491-8-jaharkes@cs.cmu.edu Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/file.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/coda/file.c b/fs/coda/file.c index 52deab784667..29dd87be2fb8 100644 --- a/fs/coda/file.c +++ b/fs/coda/file.c @@ -8,6 +8,7 @@ * to the Coda project. Contact Peter Braam . */ +#include #include #include #include @@ -28,7 +29,7 @@ #include "coda_int.h" struct coda_vm_ops { - atomic_t refcnt; + refcount_t refcnt; struct file *coda_file; const struct vm_operations_struct *host_vm_ops; struct vm_operations_struct vm_ops; @@ -98,7 +99,7 @@ coda_vm_open(struct vm_area_struct *vma) struct coda_vm_ops *cvm_ops = container_of(vma->vm_ops, struct coda_vm_ops, vm_ops); - atomic_inc(&cvm_ops->refcnt); + refcount_inc(&cvm_ops->refcnt); if (cvm_ops->host_vm_ops && cvm_ops->host_vm_ops->open) cvm_ops->host_vm_ops->open(vma); @@ -113,7 +114,7 @@ coda_vm_close(struct vm_area_struct *vma) if (cvm_ops->host_vm_ops && cvm_ops->host_vm_ops->close) cvm_ops->host_vm_ops->close(vma); - if (atomic_dec_and_test(&cvm_ops->refcnt)) { + if (refcount_dec_and_test(&cvm_ops->refcnt)) { vma->vm_ops = cvm_ops->host_vm_ops; fput(cvm_ops->coda_file); kfree(cvm_ops); @@ -189,7 +190,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma) cvm_ops->vm_ops.open = coda_vm_open; cvm_ops->vm_ops.close = coda_vm_close; cvm_ops->coda_file = coda_file; - atomic_set(&cvm_ops->refcnt, 1); + refcount_set(&cvm_ops->refcnt, 1); vma->vm_ops = &cvm_ops->vm_ops; } From 118b7ee169d2e469a080bf1d2fa68cd31452bdad Mon Sep 17 00:00:00 2001 From: Jing Yangyang Date: Mon, 8 Nov 2021 18:34:52 -0800 Subject: [PATCH 64/87] coda: use vmemdup_user to replace the open code vmemdup_user is better than duplicating its implementation, So just replace the open code. fs/coda/psdev.c:125:10-18:WARNING:opportunity for vmemdup_user The issue is detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/20210908140308.18491-9-jaharkes@cs.cmu.edu Reported-by: Zeal Robot Signed-off-by: Jing Yangyang Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Xin Tan Cc: Xiyu Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/psdev.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/coda/psdev.c b/fs/coda/psdev.c index 240669f51eac..7e23cb22d394 100644 --- a/fs/coda/psdev.c +++ b/fs/coda/psdev.c @@ -122,14 +122,10 @@ static ssize_t coda_psdev_write(struct file *file, const char __user *buf, hdr.opcode, hdr.unique); nbytes = size; } - dcbuf = kvmalloc(nbytes, GFP_KERNEL); - if (!dcbuf) { - retval = -ENOMEM; - goto out; - } - if (copy_from_user(dcbuf, buf, nbytes)) { - kvfree(dcbuf); - retval = -EFAULT; + + dcbuf = vmemdup_user(buf, nbytes); + if (IS_ERR(dcbuf)) { + retval = PTR_ERR(dcbuf); goto out; } From 98d5b61ef5fae7681df27065ad95ee6e30c42243 Mon Sep 17 00:00:00 2001 From: Jan Harkes Date: Mon, 8 Nov 2021 18:34:55 -0800 Subject: [PATCH 65/87] coda: bump module version to 7.2 Helps with tracking which patches have been propagated upstream and if users are running the latest known version. Link: https://lkml.kernel.org/r/20210908140308.18491-10-jaharkes@cs.cmu.edu Signed-off-by: Jan Harkes Cc: Alex Shi Cc: Jing Yangyang Cc: Xin Tan Cc: Xiyu Yang Cc: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coda/psdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/coda/psdev.c b/fs/coda/psdev.c index 7e23cb22d394..b39580ad4ce5 100644 --- a/fs/coda/psdev.c +++ b/fs/coda/psdev.c @@ -384,7 +384,7 @@ MODULE_AUTHOR("Jan Harkes, Peter J. Braam"); MODULE_DESCRIPTION("Coda Distributed File System VFS interface"); MODULE_ALIAS_CHARDEV_MAJOR(CODA_PSDEV_MAJOR); MODULE_LICENSE("GPL"); -MODULE_VERSION("7.0"); +MODULE_VERSION("7.2"); static int __init init_coda(void) { From 3bcd6c5bd483287f4a09d3d59a012d47677b6edc Mon Sep 17 00:00:00 2001 From: Qing Wang Date: Mon, 8 Nov 2021 18:34:58 -0800 Subject: [PATCH 66/87] nilfs2: replace snprintf in show functions with sysfs_emit Patch series "nilfs2 updates". This patch (of 2): coccicheck complains about the use of snprintf() in sysfs show functions. Fix the coccicheck warning: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Link: https://lkml.kernel.org/r/1635151862-11547-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1634095759-4625-1-git-send-email-wangqing@vivo.com Link: https://lkml.kernel.org/r/1635151862-11547-2-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Qing Wang Signed-off-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/nilfs2/sysfs.c | 76 +++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 62f8a7ac19c8..be407072def7 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -95,7 +95,7 @@ static ssize_t nilfs_snapshot_inodes_count_show(struct nilfs_snapshot_attr *attr, struct nilfs_root *root, char *buf) { - return snprintf(buf, PAGE_SIZE, "%llu\n", + return sysfs_emit(buf, "%llu\n", (unsigned long long)atomic64_read(&root->inodes_count)); } @@ -103,7 +103,7 @@ static ssize_t nilfs_snapshot_blocks_count_show(struct nilfs_snapshot_attr *attr, struct nilfs_root *root, char *buf) { - return snprintf(buf, PAGE_SIZE, "%llu\n", + return sysfs_emit(buf, "%llu\n", (unsigned long long)atomic64_read(&root->blocks_count)); } @@ -116,7 +116,7 @@ static ssize_t nilfs_snapshot_README_show(struct nilfs_snapshot_attr *attr, struct nilfs_root *root, char *buf) { - return snprintf(buf, PAGE_SIZE, snapshot_readme_str); + return sysfs_emit(buf, snapshot_readme_str); } NILFS_SNAPSHOT_RO_ATTR(inodes_count); @@ -217,7 +217,7 @@ static ssize_t nilfs_mounted_snapshots_README_show(struct nilfs_mounted_snapshots_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, mounted_snapshots_readme_str); + return sysfs_emit(buf, mounted_snapshots_readme_str); } NILFS_MOUNTED_SNAPSHOTS_RO_ATTR(README); @@ -255,7 +255,7 @@ nilfs_checkpoints_checkpoints_number_show(struct nilfs_checkpoints_attr *attr, ncheckpoints = cpstat.cs_ncps; - return snprintf(buf, PAGE_SIZE, "%llu\n", ncheckpoints); + return sysfs_emit(buf, "%llu\n", ncheckpoints); } static ssize_t @@ -278,7 +278,7 @@ nilfs_checkpoints_snapshots_number_show(struct nilfs_checkpoints_attr *attr, nsnapshots = cpstat.cs_nsss; - return snprintf(buf, PAGE_SIZE, "%llu\n", nsnapshots); + return sysfs_emit(buf, "%llu\n", nsnapshots); } static ssize_t @@ -292,7 +292,7 @@ nilfs_checkpoints_last_seg_checkpoint_show(struct nilfs_checkpoints_attr *attr, last_cno = nilfs->ns_last_cno; spin_unlock(&nilfs->ns_last_segment_lock); - return snprintf(buf, PAGE_SIZE, "%llu\n", last_cno); + return sysfs_emit(buf, "%llu\n", last_cno); } static ssize_t @@ -306,7 +306,7 @@ nilfs_checkpoints_next_checkpoint_show(struct nilfs_checkpoints_attr *attr, cno = nilfs->ns_cno; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", cno); + return sysfs_emit(buf, "%llu\n", cno); } static const char checkpoints_readme_str[] = @@ -322,7 +322,7 @@ static ssize_t nilfs_checkpoints_README_show(struct nilfs_checkpoints_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, checkpoints_readme_str); + return sysfs_emit(buf, checkpoints_readme_str); } NILFS_CHECKPOINTS_RO_ATTR(checkpoints_number); @@ -353,7 +353,7 @@ nilfs_segments_segments_number_show(struct nilfs_segments_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, "%lu\n", nilfs->ns_nsegments); + return sysfs_emit(buf, "%lu\n", nilfs->ns_nsegments); } static ssize_t @@ -361,7 +361,7 @@ nilfs_segments_blocks_per_segment_show(struct nilfs_segments_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, "%lu\n", nilfs->ns_blocks_per_segment); + return sysfs_emit(buf, "%lu\n", nilfs->ns_blocks_per_segment); } static ssize_t @@ -375,7 +375,7 @@ nilfs_segments_clean_segments_show(struct nilfs_segments_attr *attr, ncleansegs = nilfs_sufile_get_ncleansegs(nilfs->ns_sufile); up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem); - return snprintf(buf, PAGE_SIZE, "%lu\n", ncleansegs); + return sysfs_emit(buf, "%lu\n", ncleansegs); } static ssize_t @@ -395,7 +395,7 @@ nilfs_segments_dirty_segments_show(struct nilfs_segments_attr *attr, return err; } - return snprintf(buf, PAGE_SIZE, "%llu\n", sustat.ss_ndirtysegs); + return sysfs_emit(buf, "%llu\n", sustat.ss_ndirtysegs); } static const char segments_readme_str[] = @@ -411,7 +411,7 @@ nilfs_segments_README_show(struct nilfs_segments_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, segments_readme_str); + return sysfs_emit(buf, segments_readme_str); } NILFS_SEGMENTS_RO_ATTR(segments_number); @@ -448,7 +448,7 @@ nilfs_segctor_last_pseg_block_show(struct nilfs_segctor_attr *attr, last_pseg = nilfs->ns_last_pseg; spin_unlock(&nilfs->ns_last_segment_lock); - return snprintf(buf, PAGE_SIZE, "%llu\n", + return sysfs_emit(buf, "%llu\n", (unsigned long long)last_pseg); } @@ -463,7 +463,7 @@ nilfs_segctor_last_seg_sequence_show(struct nilfs_segctor_attr *attr, last_seq = nilfs->ns_last_seq; spin_unlock(&nilfs->ns_last_segment_lock); - return snprintf(buf, PAGE_SIZE, "%llu\n", last_seq); + return sysfs_emit(buf, "%llu\n", last_seq); } static ssize_t @@ -477,7 +477,7 @@ nilfs_segctor_last_seg_checkpoint_show(struct nilfs_segctor_attr *attr, last_cno = nilfs->ns_last_cno; spin_unlock(&nilfs->ns_last_segment_lock); - return snprintf(buf, PAGE_SIZE, "%llu\n", last_cno); + return sysfs_emit(buf, "%llu\n", last_cno); } static ssize_t @@ -491,7 +491,7 @@ nilfs_segctor_current_seg_sequence_show(struct nilfs_segctor_attr *attr, seg_seq = nilfs->ns_seg_seq; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", seg_seq); + return sysfs_emit(buf, "%llu\n", seg_seq); } static ssize_t @@ -505,7 +505,7 @@ nilfs_segctor_current_last_full_seg_show(struct nilfs_segctor_attr *attr, segnum = nilfs->ns_segnum; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", segnum); + return sysfs_emit(buf, "%llu\n", segnum); } static ssize_t @@ -519,7 +519,7 @@ nilfs_segctor_next_full_seg_show(struct nilfs_segctor_attr *attr, nextnum = nilfs->ns_nextnum; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", nextnum); + return sysfs_emit(buf, "%llu\n", nextnum); } static ssize_t @@ -533,7 +533,7 @@ nilfs_segctor_next_pseg_offset_show(struct nilfs_segctor_attr *attr, pseg_offset = nilfs->ns_pseg_offset; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%lu\n", pseg_offset); + return sysfs_emit(buf, "%lu\n", pseg_offset); } static ssize_t @@ -547,7 +547,7 @@ nilfs_segctor_next_checkpoint_show(struct nilfs_segctor_attr *attr, cno = nilfs->ns_cno; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", cno); + return sysfs_emit(buf, "%llu\n", cno); } static ssize_t @@ -575,7 +575,7 @@ nilfs_segctor_last_seg_write_time_secs_show(struct nilfs_segctor_attr *attr, ctime = nilfs->ns_ctime; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", ctime); + return sysfs_emit(buf, "%llu\n", ctime); } static ssize_t @@ -603,7 +603,7 @@ nilfs_segctor_last_nongc_write_time_secs_show(struct nilfs_segctor_attr *attr, nongc_ctime = nilfs->ns_nongc_ctime; up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", nongc_ctime); + return sysfs_emit(buf, "%llu\n", nongc_ctime); } static ssize_t @@ -617,7 +617,7 @@ nilfs_segctor_dirty_data_blocks_count_show(struct nilfs_segctor_attr *attr, ndirtyblks = atomic_read(&nilfs->ns_ndirtyblks); up_read(&nilfs->ns_segctor_sem); - return snprintf(buf, PAGE_SIZE, "%u\n", ndirtyblks); + return sysfs_emit(buf, "%u\n", ndirtyblks); } static const char segctor_readme_str[] = @@ -654,7 +654,7 @@ static ssize_t nilfs_segctor_README_show(struct nilfs_segctor_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, segctor_readme_str); + return sysfs_emit(buf, segctor_readme_str); } NILFS_SEGCTOR_RO_ATTR(last_pseg_block); @@ -723,7 +723,7 @@ nilfs_superblock_sb_write_time_secs_show(struct nilfs_superblock_attr *attr, sbwtime = nilfs->ns_sbwtime; up_read(&nilfs->ns_sem); - return snprintf(buf, PAGE_SIZE, "%llu\n", sbwtime); + return sysfs_emit(buf, "%llu\n", sbwtime); } static ssize_t @@ -737,7 +737,7 @@ nilfs_superblock_sb_write_count_show(struct nilfs_superblock_attr *attr, sbwcount = nilfs->ns_sbwcount; up_read(&nilfs->ns_sem); - return snprintf(buf, PAGE_SIZE, "%u\n", sbwcount); + return sysfs_emit(buf, "%u\n", sbwcount); } static ssize_t @@ -751,7 +751,7 @@ nilfs_superblock_sb_update_frequency_show(struct nilfs_superblock_attr *attr, sb_update_freq = nilfs->ns_sb_update_freq; up_read(&nilfs->ns_sem); - return snprintf(buf, PAGE_SIZE, "%u\n", sb_update_freq); + return sysfs_emit(buf, "%u\n", sb_update_freq); } static ssize_t @@ -799,7 +799,7 @@ static ssize_t nilfs_superblock_README_show(struct nilfs_superblock_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, sb_readme_str); + return sysfs_emit(buf, sb_readme_str); } NILFS_SUPERBLOCK_RO_ATTR(sb_write_time); @@ -834,7 +834,7 @@ ssize_t nilfs_dev_revision_show(struct nilfs_dev_attr *attr, u32 major = le32_to_cpu(sbp[0]->s_rev_level); u16 minor = le16_to_cpu(sbp[0]->s_minor_rev_level); - return snprintf(buf, PAGE_SIZE, "%d.%d\n", major, minor); + return sysfs_emit(buf, "%d.%d\n", major, minor); } static @@ -842,7 +842,7 @@ ssize_t nilfs_dev_blocksize_show(struct nilfs_dev_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, "%u\n", nilfs->ns_blocksize); + return sysfs_emit(buf, "%u\n", nilfs->ns_blocksize); } static @@ -853,7 +853,7 @@ ssize_t nilfs_dev_device_size_show(struct nilfs_dev_attr *attr, struct nilfs_super_block **sbp = nilfs->ns_sbp; u64 dev_size = le64_to_cpu(sbp[0]->s_dev_size); - return snprintf(buf, PAGE_SIZE, "%llu\n", dev_size); + return sysfs_emit(buf, "%llu\n", dev_size); } static @@ -864,7 +864,7 @@ ssize_t nilfs_dev_free_blocks_show(struct nilfs_dev_attr *attr, sector_t free_blocks = 0; nilfs_count_free_blocks(nilfs, &free_blocks); - return snprintf(buf, PAGE_SIZE, "%llu\n", + return sysfs_emit(buf, "%llu\n", (unsigned long long)free_blocks); } @@ -875,7 +875,7 @@ ssize_t nilfs_dev_uuid_show(struct nilfs_dev_attr *attr, { struct nilfs_super_block **sbp = nilfs->ns_sbp; - return snprintf(buf, PAGE_SIZE, "%pUb\n", sbp[0]->s_uuid); + return sysfs_emit(buf, "%pUb\n", sbp[0]->s_uuid); } static @@ -903,7 +903,7 @@ static ssize_t nilfs_dev_README_show(struct nilfs_dev_attr *attr, struct the_nilfs *nilfs, char *buf) { - return snprintf(buf, PAGE_SIZE, dev_readme_str); + return sysfs_emit(buf, dev_readme_str); } NILFS_DEV_RO_ATTR(revision); @@ -1047,7 +1047,7 @@ void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) static ssize_t nilfs_feature_revision_show(struct kobject *kobj, struct attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%d.%d\n", + return sysfs_emit(buf, "%d.%d\n", NILFS_CURRENT_REV, NILFS_MINOR_REV); } @@ -1060,7 +1060,7 @@ static ssize_t nilfs_feature_README_show(struct kobject *kobj, struct attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, features_readme_str); + return sysfs_emit(buf, features_readme_str); } NILFS_FEATURE_RO_ATTR(revision); From 94ee1d91514a1e02db87fb09b903b51d86903771 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Mon, 8 Nov 2021 18:35:01 -0800 Subject: [PATCH 67/87] nilfs2: remove filenames from file comments Remove filenames that are not particularly useful in file comments, and suppress checkpatch warnings WARNING: It's generally not useful to have the filename in the file Link: https://lkml.kernel.org/r/1635151862-11547-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Cc: Qing Wang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/nilfs2/alloc.c | 2 +- fs/nilfs2/alloc.h | 2 +- fs/nilfs2/bmap.c | 2 +- fs/nilfs2/bmap.h | 2 +- fs/nilfs2/btnode.c | 2 +- fs/nilfs2/btnode.h | 2 +- fs/nilfs2/btree.c | 2 +- fs/nilfs2/btree.h | 2 +- fs/nilfs2/cpfile.c | 2 +- fs/nilfs2/cpfile.h | 2 +- fs/nilfs2/dat.c | 2 +- fs/nilfs2/dat.h | 2 +- fs/nilfs2/dir.c | 2 +- fs/nilfs2/direct.c | 2 +- fs/nilfs2/direct.h | 2 +- fs/nilfs2/file.c | 2 +- fs/nilfs2/gcinode.c | 2 +- fs/nilfs2/ifile.c | 2 +- fs/nilfs2/ifile.h | 2 +- fs/nilfs2/inode.c | 2 +- fs/nilfs2/ioctl.c | 2 +- fs/nilfs2/mdt.c | 2 +- fs/nilfs2/mdt.h | 2 +- fs/nilfs2/namei.c | 2 +- fs/nilfs2/nilfs.h | 2 +- fs/nilfs2/page.c | 2 +- fs/nilfs2/page.h | 2 +- fs/nilfs2/recovery.c | 2 +- fs/nilfs2/segbuf.c | 2 +- fs/nilfs2/segbuf.h | 2 +- fs/nilfs2/segment.c | 2 +- fs/nilfs2/segment.h | 2 +- fs/nilfs2/sufile.c | 2 +- fs/nilfs2/sufile.h | 2 +- fs/nilfs2/super.c | 2 +- fs/nilfs2/sysfs.c | 2 +- fs/nilfs2/sysfs.h | 2 +- fs/nilfs2/the_nilfs.c | 2 +- fs/nilfs2/the_nilfs.h | 2 +- 39 files changed, 39 insertions(+), 39 deletions(-) diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c index adf3bb0a8048..6ce8617b562d 100644 --- a/fs/nilfs2/alloc.c +++ b/fs/nilfs2/alloc.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * alloc.c - NILFS dat/inode allocator + * NILFS dat/inode allocator * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h index 0303c3968cee..b667e869ac07 100644 --- a/fs/nilfs2/alloc.h +++ b/fs/nilfs2/alloc.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * alloc.h - persistent object (dat entry/disk inode) allocator/deallocator + * Persistent object (dat entry/disk inode) allocator/deallocator * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c index 5900879d5693..798a2c1b38c6 100644 --- a/fs/nilfs2/bmap.c +++ b/fs/nilfs2/bmap.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * bmap.c - NILFS block mapping. + * NILFS block mapping. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/bmap.h b/fs/nilfs2/bmap.h index 2c63858e81c9..608168a5cb88 100644 --- a/fs/nilfs2/bmap.h +++ b/fs/nilfs2/bmap.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * bmap.h - NILFS block mapping. + * NILFS block mapping. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c index 4391fd3abd8f..66bdaa2cf496 100644 --- a/fs/nilfs2/btnode.c +++ b/fs/nilfs2/btnode.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * btnode.c - NILFS B-tree node cache + * NILFS B-tree node cache * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/btnode.h b/fs/nilfs2/btnode.h index 0f88dbc9bcb3..11663650add7 100644 --- a/fs/nilfs2/btnode.h +++ b/fs/nilfs2/btnode.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * btnode.h - NILFS B-tree node cache + * NILFS B-tree node cache * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c index ab9ec073330f..3594eabe1419 100644 --- a/fs/nilfs2/btree.c +++ b/fs/nilfs2/btree.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * btree.c - NILFS B-tree. + * NILFS B-tree. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/btree.h b/fs/nilfs2/btree.h index d1421b646ce4..92868e1a48ca 100644 --- a/fs/nilfs2/btree.h +++ b/fs/nilfs2/btree.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * btree.h - NILFS B-tree. + * NILFS B-tree. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c index ce144776b4ef..9ebefb3acb0e 100644 --- a/fs/nilfs2/cpfile.c +++ b/fs/nilfs2/cpfile.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * cpfile.c - NILFS checkpoint file. + * NILFS checkpoint file. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/cpfile.h b/fs/nilfs2/cpfile.h index 6336222df24a..edabb2dc5756 100644 --- a/fs/nilfs2/cpfile.h +++ b/fs/nilfs2/cpfile.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * cpfile.h - NILFS checkpoint file. + * NILFS checkpoint file. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c index 8bccdf1158fc..dc51d3b7a7bf 100644 --- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * dat.c - NILFS disk address translation. + * NILFS disk address translation. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h index b17ee34580ae..468c82d26183 100644 --- a/fs/nilfs2/dat.h +++ b/fs/nilfs2/dat.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * dat.h - NILFS disk address translation. + * NILFS disk address translation. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 81394e22d0a0..f8f4c2ff52f4 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * dir.c - NILFS directory entry operations + * NILFS directory entry operations * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c index f353101955e3..a35f2795b242 100644 --- a/fs/nilfs2/direct.c +++ b/fs/nilfs2/direct.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * direct.c - NILFS direct block pointer. + * NILFS direct block pointer. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/direct.h b/fs/nilfs2/direct.h index ec9a23c77994..b7ca896269af 100644 --- a/fs/nilfs2/direct.h +++ b/fs/nilfs2/direct.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * direct.h - NILFS direct block pointer. + * NILFS direct block pointer. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c index 7cf765258fda..a265d391ffe9 100644 --- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * file.c - NILFS regular file handling primitives including fsync(). + * NILFS regular file handling primitives including fsync(). * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c index 448320496856..a8f5315f01e3 100644 --- a/fs/nilfs2/gcinode.c +++ b/fs/nilfs2/gcinode.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * gcinode.c - dummy inodes to buffer blocks for garbage collection + * Dummy inodes to buffer blocks for garbage collection * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c index 02727ed3a7c6..a8a4bc8490b4 100644 --- a/fs/nilfs2/ifile.c +++ b/fs/nilfs2/ifile.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * ifile.c - NILFS inode file + * NILFS inode file * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/ifile.h b/fs/nilfs2/ifile.h index a1e1e5711a05..35c5273f4821 100644 --- a/fs/nilfs2/ifile.h +++ b/fs/nilfs2/ifile.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * ifile.h - NILFS inode file + * NILFS inode file * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 2e8eb263cf0f..e3d807d5b83a 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * inode.c - NILFS inode operations. + * NILFS inode operations. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 640ac8fe891e..efa76a80ee25 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * ioctl.c - NILFS ioctl operations. + * NILFS ioctl operations. * * Copyright (C) 2007, 2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c index 97769fe4d588..4b3d33cf0041 100644 --- a/fs/nilfs2/mdt.c +++ b/fs/nilfs2/mdt.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * mdt.c - meta data file for NILFS + * Meta data file for NILFS * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/mdt.h b/fs/nilfs2/mdt.h index e77aea4bb921..8f86080a436d 100644 --- a/fs/nilfs2/mdt.h +++ b/fs/nilfs2/mdt.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * mdt.h - NILFS meta data file prototype and definitions + * NILFS meta data file prototype and definitions * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c index 91eebeb0c48b..23899e0ae850 100644 --- a/fs/nilfs2/namei.c +++ b/fs/nilfs2/namei.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * namei.c - NILFS pathname lookup operations. + * NILFS pathname lookup operations. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index 60b21b6eeac0..a7b81755c350 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * nilfs.h - NILFS local header file. + * NILFS local header file. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 171fb5cd427f..bc3e2cd4117f 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * page.c - buffer/page management specific to NILFS + * Buffer/page management specific to NILFS * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h index 62b9bb469e92..569263b23c0c 100644 --- a/fs/nilfs2/page.h +++ b/fs/nilfs2/page.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * page.h - buffer/page management specific to NILFS + * Buffer/page management specific to NILFS * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c index 2217f904a7cf..9e2ed76c0f25 100644 --- a/fs/nilfs2/recovery.c +++ b/fs/nilfs2/recovery.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * recovery.c - NILFS recovery logic + * NILFS recovery logic * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c index 56872e93823d..43287b0d3e9b 100644 --- a/fs/nilfs2/segbuf.c +++ b/fs/nilfs2/segbuf.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * segbuf.c - NILFS segment buffer + * NILFS segment buffer * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/segbuf.h b/fs/nilfs2/segbuf.h index 9bea1bd59041..e20091ededba 100644 --- a/fs/nilfs2/segbuf.h +++ b/fs/nilfs2/segbuf.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * segbuf.h - NILFS Segment buffer prototypes and definitions + * NILFS Segment buffer prototypes and definitions * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 686c8ee7b29c..85a853334771 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * segment.c - NILFS segment constructor. + * NILFS segment constructor. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h index f5cf5308f3fc..1060f72ebf5a 100644 --- a/fs/nilfs2/segment.h +++ b/fs/nilfs2/segment.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * segment.h - NILFS Segment constructor prototypes and definitions + * NILFS Segment constructor prototypes and definitions * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c index 63722475e17e..e385cca2004a 100644 --- a/fs/nilfs2/sufile.c +++ b/fs/nilfs2/sufile.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * sufile.c - NILFS segment usage file. + * NILFS segment usage file. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h index c4e2c7a7add1..8e8a1a5a0402 100644 --- a/fs/nilfs2/sufile.h +++ b/fs/nilfs2/sufile.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * sufile.h - NILFS segment usage file. + * NILFS segment usage file. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index f6b2d280aab5..63cffe010c6e 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * super.c - NILFS module and super block management. + * NILFS module and super block management. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index be407072def7..81f35c5b5a40 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * sysfs.c - sysfs support implementation. + * Sysfs support implementation. * * Copyright (C) 2005-2014 Nippon Telegraph and Telephone Corporation. * Copyright (C) 2014 HGST, Inc., a Western Digital Company. diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index d001eb862dae..78a87a016928 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * sysfs.h - sysfs support declarations. + * Sysfs support declarations. * * Copyright (C) 2005-2014 Nippon Telegraph and Telephone Corporation. * Copyright (C) 2014 HGST, Inc., a Western Digital Company. diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index c8bfc01da5d7..2044a2058be7 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* - * the_nilfs.c - the_nilfs shared structure. + * the_nilfs shared structure. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 987c8ab02aee..47c7dfbb7ea5 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ */ /* - * the_nilfs.h - the_nilfs shared structure. + * the_nilfs shared structure. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * From 55d1cbbbb29e6656c662ee8f73ba1fc4777532eb Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 8 Nov 2021 18:35:04 -0800 Subject: [PATCH 68/87] hfs/hfsplus: use WARN_ON for sanity check gcc warns about a couple of instances in which a sanity check exists but the author wasn't sure how to react to it failing, which makes it look like a possible bug: fs/hfsplus/inode.c: In function 'hfsplus_cat_read_inode': fs/hfsplus/inode.c:503:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 503 | /* panic? */; | ^ fs/hfsplus/inode.c:524:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 524 | /* panic? */; | ^ fs/hfsplus/inode.c: In function 'hfsplus_cat_write_inode': fs/hfsplus/inode.c:582:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 582 | /* panic? */; | ^ fs/hfsplus/inode.c:608:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 608 | /* panic? */; | ^ fs/hfs/inode.c: In function 'hfs_write_inode': fs/hfs/inode.c:464:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 464 | /* panic? */; | ^ fs/hfs/inode.c:485:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 485 | /* panic? */; | ^ panic() is probably not the correct choice here, but a WARN_ON seems appropriate and avoids the compile-time warning. Link: https://lkml.kernel.org/r/20210927102149.1809384-1-arnd@kernel.org Link: https://lore.kernel.org/all/20210322223249.2632268-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann Reviewed-by: Christian Brauner Cc: Alexander Viro Cc: Christian Brauner Cc: Greg Kroah-Hartman Cc: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hfs/inode.c | 6 ++---- fs/hfsplus/inode.c | 12 ++++-------- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c index 4a95a92546a0..2a5143246282 100644 --- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -462,8 +462,7 @@ int hfs_write_inode(struct inode *inode, struct writeback_control *wbc) goto out; if (S_ISDIR(main_inode->i_mode)) { - if (fd.entrylength < sizeof(struct hfs_cat_dir)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfs_cat_dir)); hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_dir)); if (rec.type != HFS_CDR_DIR || @@ -483,8 +482,7 @@ int hfs_write_inode(struct inode *inode, struct writeback_control *wbc) hfs_bnode_write(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); } else { - if (fd.entrylength < sizeof(struct hfs_cat_file)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfs_cat_file)); hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); if (rec.type != HFS_CDR_FIL || diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c index 6fef67c2a9f0..d08a8d1d40a4 100644 --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -509,8 +509,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd) if (type == HFSPLUS_FOLDER) { struct hfsplus_cat_folder *folder = &entry.folder; - if (fd->entrylength < sizeof(struct hfsplus_cat_folder)) - /* panic? */; + WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_folder)); hfs_bnode_read(fd->bnode, &entry, fd->entryoffset, sizeof(struct hfsplus_cat_folder)); hfsplus_get_perms(inode, &folder->permissions, 1); @@ -530,8 +529,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd) } else if (type == HFSPLUS_FILE) { struct hfsplus_cat_file *file = &entry.file; - if (fd->entrylength < sizeof(struct hfsplus_cat_file)) - /* panic? */; + WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_file)); hfs_bnode_read(fd->bnode, &entry, fd->entryoffset, sizeof(struct hfsplus_cat_file)); @@ -588,8 +586,7 @@ int hfsplus_cat_write_inode(struct inode *inode) if (S_ISDIR(main_inode->i_mode)) { struct hfsplus_cat_folder *folder = &entry.folder; - if (fd.entrylength < sizeof(struct hfsplus_cat_folder)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfsplus_cat_folder)); hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, sizeof(struct hfsplus_cat_folder)); /* simple node checks? */ @@ -614,8 +611,7 @@ int hfsplus_cat_write_inode(struct inode *inode) } else { struct hfsplus_cat_file *file = &entry.file; - if (fd.entrylength < sizeof(struct hfsplus_cat_file)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfsplus_cat_file)); hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, sizeof(struct hfsplus_cat_file)); hfsplus_inode_write_fork(inode, &file->data_fork); From 5605f41917c6ad766814a8e417a4481e9157c991 Mon Sep 17 00:00:00 2001 From: Changcheng Deng Date: Mon, 8 Nov 2021 18:35:07 -0800 Subject: [PATCH 69/87] crash_dump: fix boolreturn.cocci warning ./include/linux/crash_dump.h: 119: 50-51: WARNING: return of 0/1 in function 'is_kdump_kernel' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Link: https://lkml.kernel.org/r/20211020083905.1037952-1-deng.changcheng@zte.com.cn Signed-off-by: Changcheng Deng Reported-by: Zeal Robot Reviewed-by: Simon Horman Acked-by: Baoquan He Cc: Dave Young Cc: Mike Rapoport Cc: Vivek Goyal Cc: Ye Guojin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/crash_dump.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 0c547d866f1e..979c26176c1d 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -116,7 +116,7 @@ extern void register_vmcore_cb(struct vmcore_cb *cb); extern void unregister_vmcore_cb(struct vmcore_cb *cb); #else /* !CONFIG_CRASH_DUMP */ -static inline bool is_kdump_kernel(void) { return 0; } +static inline bool is_kdump_kernel(void) { return false; } #endif /* CONFIG_CRASH_DUMP */ /* Device Dump information to be filled by drivers */ From a10677a028b853c25054a4b8be04013ccb55682f Mon Sep 17 00:00:00 2001 From: Ye Guojin Date: Mon, 8 Nov 2021 18:35:10 -0800 Subject: [PATCH 70/87] crash_dump: remove duplicate include in crash_dump.h In crash_dump.h, header file is included twice. This duplication was introduced in commit 65fddcfca8ad("mm: reorder includes after introduction of linux/pgtable.h") where the order of the header files is adjusted, while the old one was not removed. Clean it up here. Link: https://lkml.kernel.org/r/20211020090659.1038877-1-ye.guojin@zte.com.cn Signed-off-by: Ye Guojin Reported-by: Zeal Robot Acked-by: Baoquan He Cc: Dave Young Cc: Mike Rapoport Cc: Vivek Goyal Cc: Changcheng Deng Cc: Simon Horman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/crash_dump.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 979c26176c1d..620821549b23 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -8,8 +8,6 @@ #include #include -#include /* for pgprot_t */ - /* For IS_ENABLED(CONFIG_CRASH_DUMP) */ #define ELFCORE_ADDR_MAX (-1ULL) #define ELFCORE_ADDR_ERR (-2ULL) From f26663684e76773ea86e2df13fb18f9d66c91151 Mon Sep 17 00:00:00 2001 From: Ye Guojin Date: Mon, 8 Nov 2021 18:35:13 -0800 Subject: [PATCH 71/87] signal: remove duplicate include in signal.h 'linux/string.h' included in 'signal.h' is duplicated. it's also included at line 7. Link: https://lkml.kernel.org/r/20211019024934.973008-1-ye.guojin@zte.com.cn Signed-off-by: Ye Guojin Reported-by: Zeal Robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/signal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/signal.h b/include/linux/signal.h index 3f96a6374e4f..37f5080c070a 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -126,7 +126,6 @@ static inline int sigequalsets(const sigset_t *set1, const sigset_t *set2) #define sigmask(sig) (1UL << ((sig) - 1)) #ifndef __HAVE_ARCH_SIG_SETOPS -#include #define _SIG_SET_BINOP(name, op) \ static inline void name(sigset_t *r, const sigset_t *a, const sigset_t *b) \ From 372904c080be44629d84bb15ed5e12eed44b5f9f Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Nov 2021 18:35:16 -0800 Subject: [PATCH 72/87] seq_file: move seq_escape() to a header Move seq_escape() to the header as inliner, for a small kernel text size reduction. Link: https://lkml.kernel.org/r/20211001122917.67228-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/seq_file.c | 16 ---------------- include/linux/seq_file.h | 17 ++++++++++++++++- 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/fs/seq_file.c b/fs/seq_file.c index 4a2cda04d3e2..f8e1f4ee87ff 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -383,22 +383,6 @@ void seq_escape_mem(struct seq_file *m, const char *src, size_t len, } EXPORT_SYMBOL(seq_escape_mem); -/** - * seq_escape - print string into buffer, escaping some characters - * @m: target buffer - * @s: string - * @esc: set of characters that need escaping - * - * Puts string into buffer, replacing each occurrence of character from - * @esc with usual octal escape. - * Use seq_has_overflowed() to check for errors. - */ -void seq_escape(struct seq_file *m, const char *s, const char *esc) -{ - seq_escape_str(m, s, ESCAPE_OCTAL, esc); -} -EXPORT_SYMBOL(seq_escape); - void seq_vprintf(struct seq_file *m, const char *f, va_list args) { int len; diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index dd99569595fd..103776e18555 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -4,6 +4,7 @@ #include #include +#include #include #include #include @@ -135,7 +136,21 @@ static inline void seq_escape_str(struct seq_file *m, const char *src, seq_escape_mem(m, src, strlen(src), flags, esc); } -void seq_escape(struct seq_file *m, const char *s, const char *esc); +/** + * seq_escape - print string into buffer, escaping some characters + * @m: target buffer + * @s: NULL-terminated string + * @esc: set of characters that need escaping + * + * Puts string into buffer, replacing each occurrence of character from + * @esc with usual octal escape. + * + * Use seq_has_overflowed() to check for errors. + */ +static inline void seq_escape(struct seq_file *m, const char *s, const char *esc) +{ + seq_escape_str(m, s, ESCAPE_OCTAL, esc); +} void seq_hex_dump(struct seq_file *m, const char *prefix_str, int prefix_type, int rowsize, int groupsize, const void *buf, size_t len, From 10a6de19cad6efb9b49883513afb810dc265fca2 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Mon, 8 Nov 2021 18:35:19 -0800 Subject: [PATCH 73/87] seq_file: fix passing wrong private data DEFINE_PROC_SHOW_ATTRIBUTE() is supposed to be used to define a series of functions and variables to register proc file easily. And the users can use proc_create_data() to pass their own private data and get it via seq->private in the callback. Unfortunately, the proc file system use PDE_DATA() to get private data instead of inode->i_private. So fix it. Fortunately, there only one user of it which does not pass any private data, so this bug does not break any in-tree codes. Link: https://lkml.kernel.org/r/20211029032638.84884-1-songmuchun@bytedance.com Fixes: 97a32539b956 ("proc: convert everything to "struct proc_ops"") Signed-off-by: Muchun Song Cc: Andy Shevchenko Cc: Stephen Rothwell Cc: Florent Revest Cc: Alexey Dobriyan Cc: Christian Brauner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/seq_file.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 103776e18555..72dbb44a4573 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -209,7 +209,7 @@ static const struct file_operations __name ## _fops = { \ #define DEFINE_PROC_SHOW_ATTRIBUTE(__name) \ static int __name ## _open(struct inode *inode, struct file *file) \ { \ - return single_open(file, __name ## _show, inode->i_private); \ + return single_open(file, __name ## _show, PDE_DATA(inode)); \ } \ \ static const struct proc_ops __name ## _proc_ops = { \ From ba1f70ddd1809c03660ddcfb757e1c1ced8b676d Mon Sep 17 00:00:00 2001 From: Ran Xiaokai Date: Mon, 8 Nov 2021 18:35:22 -0800 Subject: [PATCH 74/87] kernel/fork.c: unshare(): use swap() to make code cleaner Use swap() instead of reimplementing it. Link: https://lkml.kernel.org/r/20210909022046.8151-1-ran.xiaokai@zte.com.cn Signed-off-by: Ran Xiaokai Cc: Gabriel Krisman Bertazi Cc: Peter Zijlstra Cc: Eric W. Biederman Cc: Jens Axboe Cc: Alexey Gladkov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 38681ad44c76..acf94e660ce3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3027,7 +3027,7 @@ int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, int ksys_unshare(unsigned long unshare_flags) { struct fs_struct *fs, *new_fs = NULL; - struct files_struct *fd, *new_fd = NULL; + struct files_struct *new_fd = NULL; struct cred *new_cred = NULL; struct nsproxy *new_nsproxy = NULL; int do_sysvsem = 0; @@ -3114,11 +3114,8 @@ int ksys_unshare(unsigned long unshare_flags) spin_unlock(&fs->lock); } - if (new_fd) { - fd = current->files; - current->files = new_fd; - new_fd = fd; - } + if (new_fd) + swap(current->files, new_fd); task_unlock(current); From 7eb0e28c1d3147e2234512e2beeb99183ac99f58 Mon Sep 17 00:00:00 2001 From: Pavel Skripkin Date: Mon, 8 Nov 2021 18:35:25 -0800 Subject: [PATCH 75/87] sysv: use BUILD_BUG_ON instead of runtime check There were runtime checks about sizes of struct v7_super_block and struct sysv_inode. If one of these checks fail the kernel will panic. Since these values are known at compile time let's use BUILD_BUG_ON(), because it's a standard mechanism for validation checking at build time Link: https://lkml.kernel.org/r/20210813123020.22971-1-paskripkin@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Pavel Skripkin Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/sysv/super.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/sysv/super.c b/fs/sysv/super.c index cc8e2ed155c8..d1def0771a40 100644 --- a/fs/sysv/super.c +++ b/fs/sysv/super.c @@ -474,10 +474,8 @@ static int v7_fill_super(struct super_block *sb, void *data, int silent) struct sysv_sb_info *sbi; struct buffer_head *bh; - if (440 != sizeof (struct v7_super_block)) - panic("V7 FS: bad super-block size"); - if (64 != sizeof (struct sysv_inode)) - panic("sysv fs: bad i-node size"); + BUILD_BUG_ON(sizeof(struct v7_super_block) != 440); + BUILD_BUG_ON(sizeof(struct sysv_inode) != 64); sbi = kzalloc(sizeof(struct sysv_sb_info), GFP_KERNEL); if (!sbi) From d687a9ccf2648c38a6a7097371d94cd1bb1414af Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 8 Nov 2021 18:35:28 -0800 Subject: [PATCH 76/87] Documentation/kcov: include types.h in the example Patch series "kcov: PREEMPT_RT fixup + misc", v2. The last patch in series is follow-up to address the PREEMPT_RT issue within in kcov reported by Clark [1]. Patches 1-3 are smaller things that I noticed while staring at it. Patch 4 is small change which makes replacement in #5 simpler / more obvious. [1] https://lkml.kernel.org/r/20210809155909.333073de@theseus.lan This patch (of 5): The first example code has includes at the top, the following two example share that part. The last example (remote coverage collection) requires the linux/types.h header file due its __aligned_u64 usage. Add the linux/types.h to the top most example and a comment that the header files from above are required as it is done in the second example. Link: https://lkml.kernel.org/r/20210923164741.1859522-1-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-2-bigeasy@linutronix.de Link: https://lkml.kernel.org/r/20210923164741.1859522-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Acked-by: Dmitry Vyukov Acked-by: Marco Elver Tested-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Steven Rostedt Cc: Clark Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kcov.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst index d2c4c27e1702..347f3b6de8d4 100644 --- a/Documentation/dev-tools/kcov.rst +++ b/Documentation/dev-tools/kcov.rst @@ -50,6 +50,7 @@ program using kcov: #include #include #include + #include #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) #define KCOV_ENABLE _IO('c', 100) @@ -251,6 +252,8 @@ selectively from different subsystems. .. code-block:: c + /* Same includes and defines as above. */ + struct kcov_remote_arg { __u32 trace_mode; __u32 area_size; From 6f1d34bd491c1bd0ac23100347e81080a2ec209d Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 8 Nov 2021 18:35:31 -0800 Subject: [PATCH 77/87] Documentation/kcov: define `ip' in the example The example code uses the variable `ip' but never declares it. Declare `ip' as a 64bit variable which is the same type as the array from which it loads its value. Link: https://lkml.kernel.org/r/20210923164741.1859522-3-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Acked-by: Dmitry Vyukov Acked-by: Marco Elver Tested-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Clark Williams Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kcov.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst index 347f3b6de8d4..d83c9ab49427 100644 --- a/Documentation/dev-tools/kcov.rst +++ b/Documentation/dev-tools/kcov.rst @@ -178,6 +178,8 @@ Comparison operands collection is similar to coverage collection: /* Read number of comparisons collected. */ n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED); for (i = 0; i < n; i++) { + uint64_t ip; + type = cover[i * KCOV_WORDS_PER_CMP + 1]; /* arg1 and arg2 - operands of the comparison. */ arg1 = cover[i * KCOV_WORDS_PER_CMP + 2]; From 741ddd4519c4d21eb7313e89a2c4ccc44a3dd6b9 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 8 Nov 2021 18:35:34 -0800 Subject: [PATCH 78/87] kcov: allocate per-CPU memory on the relevant node During boot kcov allocates per-CPU memory which is used later if remote/ softirq processing is enabled. Allocate the per-CPU memory on the CPU local node to avoid cross node memory access. Link: https://lkml.kernel.org/r/20210923164741.1859522-4-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Acked-by: Dmitry Vyukov Acked-by: Marco Elver Tested-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Clark Williams Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kcov.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 80bfe71bbe13..4f910231d99a 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -1034,8 +1034,8 @@ static int __init kcov_init(void) int cpu; for_each_possible_cpu(cpu) { - void *area = vmalloc(CONFIG_KCOV_IRQ_AREA_SIZE * - sizeof(unsigned long)); + void *area = vmalloc_node(CONFIG_KCOV_IRQ_AREA_SIZE * + sizeof(unsigned long), cpu_to_node(cpu)); if (!area) return -ENOMEM; per_cpu_ptr(&kcov_percpu_data, cpu)->irq_area = area; From 22036abe17c9f6e295bd9d767312cfb92fc9cf0a Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 8 Nov 2021 18:35:37 -0800 Subject: [PATCH 79/87] kcov: avoid enable+disable interrupts if !in_task() kcov_remote_start() may need to allocate memory in the in_task() case (otherwise per-CPU memory has been pre-allocated) and therefore requires enabled interrupts. The interrupts are enabled before checking if the allocation is required so if no allocation is required then the interrupts are needlessly enabled and disabled again. Enable interrupts only if memory allocation is performed. Link: https://lkml.kernel.org/r/20210923164741.1859522-5-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Acked-by: Dmitry Vyukov Acked-by: Marco Elver Tested-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Clark Williams Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kcov.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 4f910231d99a..620dc4ffeb68 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -869,19 +869,19 @@ void kcov_remote_start(u64 handle) size = CONFIG_KCOV_IRQ_AREA_SIZE; area = this_cpu_ptr(&kcov_percpu_data)->irq_area; } - spin_unlock_irqrestore(&kcov_remote_lock, flags); + spin_unlock(&kcov_remote_lock); /* Can only happen when in_task(). */ if (!area) { + local_irqrestore(flags); area = vmalloc(size * sizeof(unsigned long)); if (!area) { kcov_put(kcov); return; } + local_irq_save(flags); } - local_irq_save(flags); - /* Reset coverage size. */ *(u64 *)area = 0; From d5d2c51f1e5f56ed01d2c773974630c007e5e5f5 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 8 Nov 2021 18:35:40 -0800 Subject: [PATCH 80/87] kcov: replace local_irq_save() with a local_lock_t The kcov code mixes local_irq_save() and spin_lock() in kcov_remote_{start|end}(). This creates a warning on PREEMPT_RT because local_irq_save() disables interrupts and spin_lock_t is turned into a sleeping lock which can not be acquired in a section with disabled interrupts. The kcov_remote_lock is used to synchronize the access to the hash-list kcov_remote_map. The local_irq_save() block protects access to the per-CPU data kcov_percpu_data. There is no compelling reason to change the lock type to raw_spin_lock_t to make it work with local_irq_save(). Changing it would require to move memory allocation (in kcov_remote_add()) and deallocation outside of the locked section. Adding an unlimited amount of entries to the hashlist will increase the IRQ-off time during lookup. It could be argued that this is debug code and the latency does not matter. There is however no need to do so and it would allow to use this facility in an RT enabled build. Using a local_lock_t instead of local_irq_save() has the befit of adding a protection scope within the source which makes it obvious what is protected. On a !PREEMPT_RT && !LOCKDEP build the local_lock_irqsave() maps directly to local_irq_save() so there is overhead at runtime. Replace the local_irq_save() section with a local_lock_t. Link: https://lkml.kernel.org/r/20210923164741.1859522-6-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-6-bigeasy@linutronix.de Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior Acked-by: Dmitry Vyukov Acked-by: Marco Elver Tested-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kcov.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 620dc4ffeb68..36ca640c4f8e 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -88,6 +88,7 @@ static struct list_head kcov_remote_areas = LIST_HEAD_INIT(kcov_remote_areas); struct kcov_percpu_data { void *irq_area; + local_lock_t lock; unsigned int saved_mode; unsigned int saved_size; @@ -96,7 +97,9 @@ struct kcov_percpu_data { int saved_sequence; }; -static DEFINE_PER_CPU(struct kcov_percpu_data, kcov_percpu_data); +static DEFINE_PER_CPU(struct kcov_percpu_data, kcov_percpu_data) = { + .lock = INIT_LOCAL_LOCK(lock), +}; /* Must be called with kcov_remote_lock locked. */ static struct kcov_remote *kcov_remote_find(u64 handle) @@ -824,7 +827,7 @@ void kcov_remote_start(u64 handle) if (!in_task() && !in_serving_softirq()) return; - local_irq_save(flags); + local_lock_irqsave(&kcov_percpu_data.lock, flags); /* * Check that kcov_remote_start() is not called twice in background @@ -832,7 +835,7 @@ void kcov_remote_start(u64 handle) */ mode = READ_ONCE(t->kcov_mode); if (WARN_ON(in_task() && kcov_mode_enabled(mode))) { - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } /* @@ -841,14 +844,15 @@ void kcov_remote_start(u64 handle) * happened while collecting coverage from a background thread. */ if (WARN_ON(in_serving_softirq() && t->kcov_softirq)) { - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } spin_lock(&kcov_remote_lock); remote = kcov_remote_find(handle); if (!remote) { - spin_unlock_irqrestore(&kcov_remote_lock, flags); + spin_unlock(&kcov_remote_lock); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } kcov_debug("handle = %llx, context: %s\n", handle, @@ -873,13 +877,13 @@ void kcov_remote_start(u64 handle) /* Can only happen when in_task(). */ if (!area) { - local_irqrestore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); area = vmalloc(size * sizeof(unsigned long)); if (!area) { kcov_put(kcov); return; } - local_irq_save(flags); + local_lock_irqsave(&kcov_percpu_data.lock, flags); } /* Reset coverage size. */ @@ -891,7 +895,7 @@ void kcov_remote_start(u64 handle) } kcov_start(t, kcov, size, area, mode, sequence); - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); } EXPORT_SYMBOL(kcov_remote_start); @@ -965,12 +969,12 @@ void kcov_remote_stop(void) if (!in_task() && !in_serving_softirq()) return; - local_irq_save(flags); + local_lock_irqsave(&kcov_percpu_data.lock, flags); mode = READ_ONCE(t->kcov_mode); barrier(); if (!kcov_mode_enabled(mode)) { - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } /* @@ -978,12 +982,12 @@ void kcov_remote_stop(void) * actually found the remote handle and started collecting coverage. */ if (in_serving_softirq() && !t->kcov_softirq) { - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } /* Make sure that kcov_softirq is only set when in softirq. */ if (WARN_ON(!in_serving_softirq() && t->kcov_softirq)) { - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); return; } @@ -1013,7 +1017,7 @@ void kcov_remote_stop(void) spin_unlock(&kcov_remote_lock); } - local_irq_restore(flags); + local_unlock_irqrestore(&kcov_percpu_data.lock, flags); /* Get in kcov_remote_start(). */ kcov_put(kcov); From 3b2941188e010e77552ee3718bf6163f4dbd9c08 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Mon, 8 Nov 2021 18:35:43 -0800 Subject: [PATCH 81/87] scripts/gdb: handle split debug for vmlinux This is related to two previous changes. Commit dfe4529ee4d3 ("scripts/gdb: find vmlinux where it was before") and commit da036ae14762 ("scripts/gdb: handle split debug"). Although Chrome OS has been using the debug suffix for modules for a while, it has just recently started using it for vmlinux as well. That means we've now got to improve the detection of "vmlinux" to also handle that it might end with ".debug". Link: https://lkml.kernel.org/r/20211028151120.v2.1.Ie6bd5a232f770acd8c9ffae487a02170bad3e963@changeid Signed-off-by: Douglas Anderson Reviewed-by: Stephen Boyd Cc: Jan Kiszka Cc: Kieran Bingham Cc: Johannes Berg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/gdb/linux/symbols.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py index 08d264ac328b..46f7542db08c 100644 --- a/scripts/gdb/linux/symbols.py +++ b/scripts/gdb/linux/symbols.py @@ -148,7 +148,8 @@ lx-symbols command.""" # drop all current symbols and reload vmlinux orig_vmlinux = 'vmlinux' for obj in gdb.objfiles(): - if obj.filename.endswith('vmlinux'): + if (obj.filename.endswith('vmlinux') or + obj.filename.endswith('vmlinux.debug')): orig_vmlinux = obj.filename gdb.execute("symbol-file", to_string=True) gdb.execute("symbol-file {0}".format(orig_vmlinux)) From b78dfa059fdd7d0f791132dc0cef39ceafdfd62e Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:35:46 -0800 Subject: [PATCH 82/87] kernel/resource: clean up and optimize iomem_is_exclusive() Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5. Let's add the basic infrastructure to exclude some physical memory regions marked as "IORESOURCE_SYSTEM_RAM" completely from /dev/mem access, even though they are not marked IORESOURCE_BUSY and even though "iomem=relaxed" is set. Resource IORESOURCE_EXCLUSIVE for that purpose instead of adding new flags to express something similar to "soft-busy" or "not busy yet, but already prepared by a driver and not to be mapped by user space". Use it for virtio-mem, to disallow mapping any virtio-mem memory via /dev/mem to user space after the virtio-mem driver was loaded. This patch (of 3): We end up traversing subtrees of ranges we are not interested in; let's optimize this case, skipping such subtrees, cleaning up the function a bit. For example, in the following configuration (/proc/iomem): 00000000-00000fff : Reserved 00001000-00057fff : System RAM 00058000-00058fff : Reserved 00059000-0009cfff : System RAM 0009d000-000fffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c3fff : PCI Bus 0000:00 000c4000-000c7fff : PCI Bus 0000:00 000c8000-000cbfff : PCI Bus 0000:00 000cc000-000cffff : PCI Bus 0000:00 000d0000-000d3fff : PCI Bus 0000:00 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000dffff : PCI Bus 0000:00 000e0000-000e3fff : PCI Bus 0000:00 000e4000-000e7fff : PCI Bus 0000:00 000e8000-000ebfff : PCI Bus 0000:00 000ec000-000effff : PCI Bus 0000:00 000f0000-000fffff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 40000000-403fffff : Reserved 40000000-403fffff : pnp 00:00 40400000-80a79fff : System RAM ... We don't have to look at any children of "0009d000-000fffff : Reserved" if we can just skip these 15 items directly because the parent range is not of interest. Link: https://lkml.kernel.org/r/20210920142856.17758-1-david@redhat.com Link: https://lkml.kernel.org/r/20210920142856.17758-2-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Dan Williams Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: "Rafael J. Wysocki" Cc: Hanjun Guo Cc: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/resource.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index ca9f5198a01f..2999f57da38c 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -73,6 +73,18 @@ static struct resource *next_resource(struct resource *p) return p->sibling; } +static struct resource *next_resource_skip_children(struct resource *p) +{ + while (!p->sibling && p->parent) + p = p->parent; + return p->sibling; +} + +#define for_each_resource(_root, _p, _skip_children) \ + for ((_p) = (_root)->child; (_p); \ + (_p) = (_skip_children) ? next_resource_skip_children(_p) : \ + next_resource(_p)) + static void *r_next(struct seq_file *m, void *v, loff_t *pos) { struct resource *p = v; @@ -1712,10 +1724,9 @@ static int strict_iomem_checks; */ bool iomem_is_exclusive(u64 addr) { - struct resource *p = &iomem_resource; - bool err = false; - loff_t l; + bool skip_children = false, err = false; int size = PAGE_SIZE; + struct resource *p; if (!strict_iomem_checks) return false; @@ -1723,15 +1734,19 @@ bool iomem_is_exclusive(u64 addr) addr = addr & PAGE_MASK; read_lock(&resource_lock); - for (p = p->child; p ; p = r_next(NULL, p, &l)) { + for_each_resource(&iomem_resource, p, skip_children) { /* * We can probably skip the resources without * IORESOURCE_IO attribute? */ if (p->start >= addr + size) break; - if (p->end < addr) + if (p->end < addr) { + skip_children = true; continue; + } + skip_children = false; + /* * A resource is exclusive if IORESOURCE_EXCLUSIVE is set * or CONFIG_IO_STRICT_DEVMEM is enabled and the From a9e7b8d4f663d13975e9bbe849735c8659c2a1f1 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:35:50 -0800 Subject: [PATCH 83/87] kernel/resource: disallow access to exclusive system RAM regions virtio-mem dynamically exposes memory inside a device memory region as system RAM to Linux, coordinating with the hypervisor which parts are actually "plugged" and consequently usable/accessible. On the one hand, the virtio-mem driver adds/removes whole memory blocks, creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs memory inside added memory blocks, dynamically either exposing them to the buddy or hiding them from the buddy and marking them PG_offline. In contrast to physical devices, like a DIMM, the virtio-mem driver is required to actually make use of any of the device-provided memory, because it performs the handshake with the hypervisor. virtio-mem memory cannot simply be access via /dev/mem without a driver. There is no safe way to: a) Access plugged memory blocks via /dev/mem, as they might contain unplugged holes or might get silently unplugged by the virtio-mem driver and consequently turned inaccessible. b) Access unplugged memory blocks via /dev/mem because the virtio-mem driver is required to make them actually accessible first. The virtio-spec states that unplugged memory blocks MUST NOT be written, and only selected unplugged memory blocks MAY be read. We want to make sure, this is the case in sane environments -- where the virtio-mem driver was loaded. We want to make sure that in a sane environment, nobody "accidentially" accesses unplugged memory inside the device managed region. For example, a user might spot a memory region in /proc/iomem and try accessing it via /dev/mem via gdb or dumping it via something else. By the time the mmap() happens, the memory might already have been removed by the virtio-mem driver silently: the mmap() would succeeed and user space might accidentially access unplugged memory. So once the driver was loaded and detected the device along the device-managed region, we just want to disallow any access via /dev/mem to it. In an ideal world, we would mark the whole region as busy ("owned by a driver") and exclude it; however, that would be wrong, as we don't really have actual system RAM at these ranges added to Linux ("busy system RAM"). Instead, we want to mark such ranges as "not actual busy system RAM but still soft-reserved and prepared by a driver for future use." Let's teach iomem_is_exclusive() to reject access to any range with "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it easier for applicable drivers to depend on this setting in their Kconfig. For now, there are no applicable ranges and we'll modify virtio-mem next to properly set IORESOURCE_EXCLUSIVE on the parent resource container it creates to contain all actual busy system RAM added via add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Dan Williams Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: Hanjun Guo Cc: Jason Wang Cc: "Michael S. Tsirkin" Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/resource.c | 29 +++++++++++++++++++---------- mm/Kconfig | 7 +++++++ 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 2999f57da38c..5ad3eba619ba 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1719,26 +1719,23 @@ static int strict_iomem_checks; #endif /* - * check if an address is reserved in the iomem resource tree - * returns true if reserved, false if not reserved. + * Check if an address is exclusive to the kernel and must not be mapped to + * user space, for example, via /dev/mem. + * + * Returns true if exclusive to the kernel, otherwise returns false. */ bool iomem_is_exclusive(u64 addr) { + const unsigned int exclusive_system_ram = IORESOURCE_SYSTEM_RAM | + IORESOURCE_EXCLUSIVE; bool skip_children = false, err = false; int size = PAGE_SIZE; struct resource *p; - if (!strict_iomem_checks) - return false; - addr = addr & PAGE_MASK; read_lock(&resource_lock); for_each_resource(&iomem_resource, p, skip_children) { - /* - * We can probably skip the resources without - * IORESOURCE_IO attribute? - */ if (p->start >= addr + size) break; if (p->end < addr) { @@ -1747,12 +1744,24 @@ bool iomem_is_exclusive(u64 addr) } skip_children = false; + /* + * IORESOURCE_SYSTEM_RAM resources are exclusive if + * IORESOURCE_EXCLUSIVE is set, even if they + * are not busy and even if "iomem=relaxed" is set. The + * responsible driver dynamically adds/removes system RAM within + * such an area and uncontrolled access is dangerous. + */ + if ((p->flags & exclusive_system_ram) == exclusive_system_ram) { + err = true; + break; + } + /* * A resource is exclusive if IORESOURCE_EXCLUSIVE is set * or CONFIG_IO_STRICT_DEVMEM is enabled and the * resource is busy. */ - if ((p->flags & IORESOURCE_BUSY) == 0) + if (!strict_iomem_checks || !(p->flags & IORESOURCE_BUSY)) continue; if (IS_ENABLED(CONFIG_IO_STRICT_DEVMEM) || p->flags & IORESOURCE_EXCLUSIVE) { diff --git a/mm/Kconfig b/mm/Kconfig index ae1f151c2924..068ce591a13a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -109,6 +109,13 @@ config NUMA_KEEP_MEMINFO config MEMORY_ISOLATION bool +# IORESOURCE_SYSTEM_RAM regions in the kernel resource tree that are marked +# IORESOURCE_EXCLUSIVE cannot be mapped to user space, for example, via +# /dev/mem. +config EXCLUSIVE_SYSTEM_RAM + def_bool y + depends on !DEVMEM || STRICT_DEVMEM + # # Only be set on architectures that have completely implemented memory hotplug # feature. If you are not sure, don't touch it. From 2128f4e21aa283945e6f0fb183e70fdfdc0d66f0 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 8 Nov 2021 18:35:53 -0800 Subject: [PATCH 84/87] virtio-mem: disallow mapping virtio-mem memory via /dev/mem We don't want user space to be able to map virtio-mem device memory directly (e.g., via /dev/mem) in order to have guarantees that in a sane setup we'll never accidentially access unplugged memory within the device-managed region of a virtio-mem device, just as required by the virtio-spec. As soon as the virtio-mem driver is loaded, the device region is visible in /proc/iomem via the parent device region. From that point on user space is aware of the device region and we want to disallow mapping anything inside that region (where we will dynamically (un)plug memory) until the driver has been unloaded cleanly and e.g., another driver might take over. By creating our parent IORESOURCE_SYSTEM_RAM resource with IORESOURCE_EXCLUSIVE, we will disallow any /dev/mem access to our device region until the driver was unloaded cleanly and removed the parent region. This will work even though only some memory blocks are actually currently added to Linux and appear as busy in the resource tree. So access to the region from user space is only possible a) if we don't load the virtio-mem driver. b) after unloading the virtio-mem driver cleanly. Don't build virtio-mem if access to /dev/mem cannot be restricticted -- if we have CONFIG_DEVMEM=y but CONFIG_STRICT_DEVMEM is not set. Link: https://lkml.kernel.org/r/20210920142856.17758-4-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Dan Williams Acked-by: Michael S. Tsirkin Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: Hanjun Guo Cc: Jason Wang Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virtio/Kconfig | 1 + drivers/virtio/virtio_mem.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 3654def9915c..44a082639439 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -101,6 +101,7 @@ config VIRTIO_MEM depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on CONTIG_ALLOC + depends on EXCLUSIVE_SYSTEM_RAM help This driver provides access to virtio-mem paravirtualized memory devices, allowing to hotplug and hotunplug memory. diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 3573b78f6b05..0da0af251c73 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2672,8 +2672,10 @@ static int virtio_mem_create_resource(struct virtio_mem *vm) if (!name) return -ENOMEM; + /* Disallow mapping device memory via /dev/mem completely. */ vm->parent_resource = __request_mem_region(vm->addr, vm->region_size, - name, IORESOURCE_SYSTEM_RAM); + name, IORESOURCE_SYSTEM_RAM | + IORESOURCE_EXCLUSIVE); if (!vm->parent_resource) { kfree(name); dev_warn(&vm->vdev->dev, "could not reserve device region\n"); From 303f8e2d02002dbe331cab7813ee091aead3cd39 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 8 Nov 2021 18:35:56 -0800 Subject: [PATCH 85/87] selftests/kselftest/runner/run_one(): allow running non-executable files When running a test program, 'run_one()' checks if the program has the execution permission and fails if it doesn't. However, it's easy to mistakenly lose the permissions, as some common tools like 'diff' don't support the permission change well[1]. Compared to that, making mistakes in the test program's path would only rare, as those are explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense to resolve the situation on our own and run the program. For this reason, this commit makes the test program runner function still print the warning message but to try parsing the interpreter of the program and to explicitly run it with the interpreter, in this case. [1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/ Link: https://lkml.kernel.org/r/20210810164534.25902-1-sj38.park@gmail.com Signed-off-by: SeongJae Park Suggested-by: Greg Kroah-Hartman Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++-------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh index cc9c846585f0..a9ba782d8ca0 100644 --- a/tools/testing/selftests/kselftest/runner.sh +++ b/tools/testing/selftests/kselftest/runner.sh @@ -33,9 +33,9 @@ tap_timeout() { # Make sure tests will time out if utility is available. if [ -x /usr/bin/timeout ] ; then - /usr/bin/timeout --foreground "$kselftest_timeout" "$1" + /usr/bin/timeout --foreground "$kselftest_timeout" $1 else - "$1" + $1 fi } @@ -65,17 +65,25 @@ run_one() TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST" echo "# $TEST_HDR_MSG" - if [ ! -x "$TEST" ]; then - echo -n "# Warning: file $TEST is " - if [ ! -e "$TEST" ]; then - echo "missing!" - else - echo "not executable, correct this." - fi + if [ ! -e "$TEST" ]; then + echo "# Warning: file $TEST is missing!" echo "not ok $test_num $TEST_HDR_MSG" else + cmd="./$BASENAME_TEST" + if [ ! -x "$TEST" ]; then + echo "# Warning: file $TEST is not executable" + + if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ] + then + interpreter=$(head -n 1 "$TEST" | cut -c 3-) + cmd="$interpreter ./$BASENAME_TEST" + else + echo "not ok $test_num $TEST_HDR_MSG" + return + fi + fi cd `dirname $TEST` > /dev/null - ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) | + ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) | tap_prefix >&4) 3>&1) | (read xs; exit $xs)) 4>>"$logfile" && echo "ok $test_num $TEST_HDR_MSG") || From 5563cabdde7ee53c34ec7e5e0283bfcc9a1bc893 Mon Sep 17 00:00:00 2001 From: Michal Clapinski Date: Mon, 8 Nov 2021 18:35:59 -0800 Subject: [PATCH 86/87] ipc: check checkpoint_restore_ns_capable() to modify C/R proc files This commit removes the requirement to be root to modify sem_next_id, msg_next_id and shm_next_id and checks checkpoint_restore_ns_capable instead. Since those files are specific to the IPC namespace, there is no reason they should require root privileges. This is similar to ns_last_pid, which also only checks checkpoint_restore_ns_capable. [akpm@linux-foundation.org: ipc/ipc_sysctl.c needs capability.h for checkpoint_restore_ns_capable()] Link: https://lkml.kernel.org/r/20210916163717.3179496-1-mclapinski@google.com Signed-off-by: Michal Clapinski Reviewed-by: Davidlohr Bueso Reviewed-by: Manfred Spraul Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- ipc/ipc_sysctl.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 3f312bf2b116..345e4d673e61 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include "util.h" @@ -104,6 +105,19 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, return ret; } +#ifdef CONFIG_CHECKPOINT_RESTORE +static int proc_ipc_dointvec_minmax_checkpoint_restore(struct ctl_table *table, + int write, void *buffer, size_t *lenp, loff_t *ppos) +{ + struct user_namespace *user_ns = current->nsproxy->ipc_ns->user_ns; + + if (write && !checkpoint_restore_ns_capable(user_ns)) + return -EPERM; + + return proc_ipc_dointvec_minmax(table, write, buffer, lenp, ppos); +} +#endif + #else #define proc_ipc_doulongvec_minmax NULL #define proc_ipc_dointvec NULL @@ -111,6 +125,9 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, #define proc_ipc_dointvec_minmax_orphans NULL #define proc_ipc_auto_msgmni NULL #define proc_ipc_sem_dointvec NULL +#ifdef CONFIG_CHECKPOINT_RESTORE +#define proc_ipc_dointvec_minmax_checkpoint_restore NULL +#endif /* CONFIG_CHECKPOINT_RESTORE */ #endif int ipc_mni = IPCMNI; @@ -198,8 +215,8 @@ static struct ctl_table ipc_kern_table[] = { .procname = "sem_next_id", .data = &init_ipc_ns.ids[IPC_SEM_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_SEM_IDS].next_id), - .mode = 0644, - .proc_handler = proc_ipc_dointvec_minmax, + .mode = 0666, + .proc_handler = proc_ipc_dointvec_minmax_checkpoint_restore, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, }, @@ -207,8 +224,8 @@ static struct ctl_table ipc_kern_table[] = { .procname = "msg_next_id", .data = &init_ipc_ns.ids[IPC_MSG_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_MSG_IDS].next_id), - .mode = 0644, - .proc_handler = proc_ipc_dointvec_minmax, + .mode = 0666, + .proc_handler = proc_ipc_dointvec_minmax_checkpoint_restore, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, }, @@ -216,8 +233,8 @@ static struct ctl_table ipc_kern_table[] = { .procname = "shm_next_id", .data = &init_ipc_ns.ids[IPC_SHM_IDS].next_id, .maxlen = sizeof(init_ipc_ns.ids[IPC_SHM_IDS].next_id), - .mode = 0644, - .proc_handler = proc_ipc_dointvec_minmax, + .mode = 0666, + .proc_handler = proc_ipc_dointvec_minmax_checkpoint_restore, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, }, From 0e9beb8a96f21a6df1579cb3a679e150e3269d80 Mon Sep 17 00:00:00 2001 From: Manfred Spraul Date: Mon, 8 Nov 2021 18:36:02 -0800 Subject: [PATCH 87/87] ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL Compilation of ipc/ipc_sysctl.c is controlled by obj-$(CONFIG_SYSVIPC_SYSCTL) [see ipc/Makefile] And CONFIG_SYSVIPC_SYSCTL depends on SYSCTL [see init/Kconfig] An SYSCTL is selected by PROC_SYSCTL. [see fs/proc/Kconfig] Thus: #ifndef CONFIG_PROC_SYSCTL in ipc/ipc_sysctl.c is impossible, the fallback can be removed. Link: https://lkml.kernel.org/r/20210918145337.3369-1-manfred@colorfullife.com Signed-off-by: Manfred Spraul Reviewed-by: "Eric W. Biederman" Acked-by: Davidlohr Bueso Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- ipc/ipc_sysctl.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 345e4d673e61..f101c171753f 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -23,7 +23,6 @@ static void *get_ipc(struct ctl_table *table) return which; } -#ifdef CONFIG_PROC_SYSCTL static int proc_ipc_dointvec(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -118,18 +117,6 @@ static int proc_ipc_dointvec_minmax_checkpoint_restore(struct ctl_table *table, } #endif -#else -#define proc_ipc_doulongvec_minmax NULL -#define proc_ipc_dointvec NULL -#define proc_ipc_dointvec_minmax NULL -#define proc_ipc_dointvec_minmax_orphans NULL -#define proc_ipc_auto_msgmni NULL -#define proc_ipc_sem_dointvec NULL -#ifdef CONFIG_CHECKPOINT_RESTORE -#define proc_ipc_dointvec_minmax_checkpoint_restore NULL -#endif /* CONFIG_CHECKPOINT_RESTORE */ -#endif - int ipc_mni = IPCMNI; int ipc_mni_shift = IPCMNI_SHIFT; int ipc_min_cycle = RADIX_TREE_MAP_SIZE;