linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-22 02:20:40 +00:00

Author	SHA1	Message	Date
Veerendranath Jakkam	fcfb8ceafb	wifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_done [ Upstream commit `baeaabf970` ] Individual MLO links connection status is not copied to EVENT_CONNECT_RESULT data while processing the connect response information in cfg80211_connect_done(). Due to this failed links are wrongly indicated with success status in EVENT_CONNECT_RESULT. To fix this, copy the individual MLO links status to the EVENT_CONNECT_RESULT data. Fixes: `53ad07e982` ("wifi: cfg80211: support reporting failed links") Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20240724125327.3495874-1-quic_vjakkam@quicinc.com [commit message editorial changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:19 +02:00
Eric Dumazet	d7cc186d09	sched: act_ct: take care of padding in struct zones_ht_key [ Upstream commit `2191a54f63` ] Blamed commit increased lookup key size from 2 bytes to 16 bytes, because zones_ht_key got a struct net pointer. Make sure rhashtable_lookup() is not using the padding bytes which are not initialized. BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline] BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 rht_ptr_rcu include/linux/rhashtable.h:376 [inline] __rhashtable_lookup include/linux/rhashtable.h:607 [inline] rhashtable_lookup include/linux/rhashtable.h:646 [inline] rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425 tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488 tcf_action_add net/sched/act_api.c:2061 [inline] tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118 rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2597 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651 __sys_sendmsg net/socket.c:2680 [inline] __do_sys_sendmsg net/socket.c:2689 [inline] __se_sys_sendmsg net/socket.c:2687 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687 x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable key created at: tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 Fixes: `88c67aeb14` ("sched: act_ct: add netns into the key of tcf_ct_flow_table") Reported-by: syzbot+1b5e4e187cc586d05ea0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Ian Forbes	c98d6c23fb	drm/vmwgfx: Trigger a modeset when the screen moves [ Upstream commit `75c3e8a26a` ] When multi-monitor is cycled the X,Y position of the Screen Target will likely change but the resolution will not. We need to trigger a modeset when this occurs in order to recreate the Screen Target with the correct X,Y position. Fixes a bug where multiple displays are shown in a single scrollable host window rather than in 2+ windows on separate host displays. Fixes: `4268269331` ("drm/vmwgfx: Filter modes which exceed graphics memory") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240624205951.23343-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Ian Forbes	b67643bffe	drm/vmwgfx: Fix overlay when using Screen Targets [ Upstream commit `cb372a505a` ] This code was never updated to support Screen Targets. Fixes a bug where Xv playback displays a green screen instead of actual video contents when 3D acceleration is disabled in the guest. Fixes: `c8261a961e` ("vmwgfx: Major KMS refactoring / cleanup in preparation of screen targets") Reported-by: Doug Brown <doug@schmorgal.com> Closes: https://lore.kernel.org/all/bd9cb3c7-90e8-435d-bc28-0e38fee58977@schmorgal.com Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Tested-by: Doug Brown <doug@schmorgal.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719163627.20888-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Danilo Krummrich	f23cd66933	drm/nouveau: prime: fix refcount underflow [ Upstream commit `a9bf3efc33` ] Calling nouveau_bo_ref() on a nouveau_bo without initializing it (and hence the backing ttm_bo) leads to a refcount underflow. Instead of calling nouveau_bo_ref() in the unwind path of drm_gem_object_init(), clean things up manually. Fixes: `ab9ccb96a6` ("drm/nouveau: use prime helpers") Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240718165959.3983-2-dakr@kernel.org (cherry picked from commit `1b93f3e89d`) Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Casey Chen	1b46b23561	perf tool: fix dereferencing NULL al->maps [ Upstream commit `4c17736689` ] With `0dd5041c9a` ("perf addr_location: Add init/exit/copy functions"), when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR), thread__find_map() could return with al->maps being NULL. The path below could add a callchain_cursor_node with NULL ms.maps. add_callchain_ip() thread__find_symbol(.., &al) thread__find_map(.., &al) // al->maps becomes NULL ms.maps = maps__get(al.maps) callchain_cursor_append(..., &ms, ...) node->ms.maps = maps__get(ms->maps) Then the path below would dereference NULL maps and get segfault. fill_callchain_info() maps__machine(node->ms.maps); Fix it by checking if maps is NULL in fill_callchain_info(). Fixes: `0dd5041c9a` ("perf addr_location: Add init/exit/copy functions") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: yzhong@purestorage.com Link: https://lore.kernel.org/r/20240722211548.61455-1-cachen@purestorage.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Basavaraj Natikar	0a5ca73bab	HID: amd_sfh: Move sensor discovery before HID device initialization [ Upstream commit `8031b001da` ] Sensors discovery is independent of HID device initialization. If sensor discovery fails after HID initialization, then the HID device needs to be deinitialized. Therefore, sensors discovery should be moved before HID device initialization. Fixes: `7bcfdab3f0` ("HID: amd_sfh: if no sensors are enabled, clean up") Tested-by: Aurinko <petrvelicka@tuta.io> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Link: https://patch.msgid.link/20240718111616.3012155-1-Basavaraj.Natikar@amd.com Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Jinjie Ruan	181f9b5619	ARM: 9406/1: Fix callchain_trace() return value [ Upstream commit `4e7b4ff2dc` ] perf_callchain_store() return 0 on success, -1 otherwise, fix callchain_trace() to return correct bool value. So walk_stackframe() can have a chance to stop walking the stack ahead. Fixes: `70ccc7c066` ("ARM: 9258/1: stacktrace: Make stack walk callback consistent with generic code") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:18 +02:00
Jiaxun Yang	f4675c8ee7	MIPS: dts: loongson: Fix ls2k1000-rtc interrupt [ Upstream commit `f70fd92df7` ] The correct interrupt line for RTC is line 8 on liointc1. Fixes: `e47084e116` ("MIPS: Loongson64: DTS: Add RTC support to Loongson-2K1000") Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Jiaxun Yang	3544efb889	MIPS: dts: loongson: Fix liointc IRQ polarity [ Upstream commit `dbb69b9d62` ] All internal liointc interrupts are high level triggered. Fixes: `b1a792601f` ("MIPS: Loongson64: DeviceTree for Loongson-2K1000") Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Jiaxun Yang	fcf20dc293	MIPS: Loongson64: DTS: Fix PCIe port nodes for ls7a [ Upstream commit `d89a415ff8` ] Add various required properties to silent warnings: arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi:116.16-297.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider arch/mips/boot/dts/loongson/loongson64_2core_2k1000.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider' Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Stable-dep-of: `dbb69b9d62` ("MIPS: dts: loongson: Fix liointc IRQ polarity") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Xu Yang	0bcd599a0f	perf: imx_perf: fix counter start and config sequence [ Upstream commit `ac9aa295f7` ] In current driver, the counter will start firstly and then be configured. This sequence is not correct for AXI filter events since the correct AXI_MASK and AXI_ID are not set yet. Then the results may be inaccurate. Reviewed-by: Frank Li <Frank.Li@nxp.com> Fixes: `55691f99d4` ("drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver") cc: stable@vger.kernel.org Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240529080358.703784-5-xu.yang_2@nxp.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Joy Zou	c91c8d3830	dmaengine: fsl-edma: change the memory access from local into remote mode in i.MX 8QM [ Upstream commit `8ddad55899` ] Fix the issue where MEM_TO_MEM fail on i.MX8QM due to the requirement that both source and destination addresses need pass through the IOMMU. Typically, peripheral FIFO addresses bypass the IOMMU, necessitating only one of the source or destination to go through it. Set "is_remote" to true to ensure both source and destination addresses pass through the IOMMU. iMX8 Spec define "Local" and "Remote" bus as below. Local bus: bypass IOMMU to directly access other peripheral register, such as FIFO. Remote bus: go through IOMMU to access system memory. The test fail log as follow: [ 66.268506] dmatest: dma0chan0-copy0: result #1: 'test timed out' with src_off=0x100 dst_off=0x80 len=0x3ec0 (0) [ 66.278785] dmatest: dma0chan0-copy0: summary 1 tests, 1 failures 0.32 iops 4 KB/s (0) Fixes: `72f5801a4e` ("dmaengine: fsl-edma: integrate v3 support") Signed-off-by: Joy Zou <joy.zou@nxp.com> Cc: stable@vger.kernel.org Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240510030959.703663-1-joy.zou@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Joy Zou	417b64e4c8	dmaengine: fsl-edma: clean up unused "fsl,imx8qm-adma" compatible string [ Upstream commit `77584368a0` ] The eDMA hardware issue only exist imx8QM A0. A0 never mass production. So remove the workaround safely. Signed-off-by: Joy Zou <joy.zou@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240424064508.1886764-2-joy.zou@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: `8ddad55899` ("dmaengine: fsl-edma: change the memory access from local into remote mode in i.MX 8QM") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Joy Zou	ba20b7f28e	dmaengine: fsl-edma: add i.MX8ULP edma support [ Upstream commit `d8d4355861` ] Add support for the i.MX8ULP platform to the eDMA driver. Introduce the use of the correct FSL_EDMA_DRV_HAS_CHCLK flag to handle per-channel clock configurations. Signed-off-by: Joy Zou <joy.zou@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240323-8ulp_edma-v3-5-c0e981027c05@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: `8ddad55899` ("dmaengine: fsl-edma: change the memory access from local into remote mode in i.MX 8QM") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Frank Li	5f8de773d4	dmaengine: fsl-edma: add address for channel mux register in fsl_edma_chan [ Upstream commit `e0a08ed254` ] iMX95 move channel mux register to management page address space. This prepare to support iMX95. Add mux_addr in struct fsl_edma_chan. No function change. Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20231221153528.1588049-4-Frank.Li@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: `8ddad55899` ("dmaengine: fsl-edma: change the memory access from local into remote mode in i.MX 8QM") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:17 +02:00
Jaegeuk Kim	4239571c5d	f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid [ Upstream commit `8cb1f4080d` ] mkdir /mnt/test/comp f2fs_io setflags compression /mnt/test/comp dd if=/dev/zero of=/mnt/test/comp/testfile bs=16k count=1 truncate --size 13 /mnt/test/comp/testfile In the above scenario, we can get a BUG_ON. kernel BUG at fs/f2fs/segment.c:3589! Call Trace: do_write_page+0x78/0x390 [f2fs] f2fs_outplace_write_data+0x62/0xb0 [f2fs] f2fs_do_write_data_page+0x275/0x740 [f2fs] f2fs_write_single_data_page+0x1dc/0x8f0 [f2fs] f2fs_write_multi_pages+0x1e5/0xae0 [f2fs] f2fs_write_cache_pages+0xab1/0xc60 [f2fs] f2fs_write_data_pages+0x2d8/0x330 [f2fs] do_writepages+0xcf/0x270 __writeback_single_inode+0x44/0x350 writeback_sb_inodes+0x242/0x530 __writeback_inodes_wb+0x54/0xf0 wb_writeback+0x192/0x310 wb_workfn+0x30d/0x400 The reason is we gave CURSEG_ALL_DATA_ATGC to COMPR_ADDR where the page was set the gcing flag by set_cluster_dirty(). Cc: stable@vger.kernel.org Fixes: `4961acdd65` ("f2fs: fix to tag gcing flag on page during block migration") Reviewed-by: Chao Yu <chao@kernel.org> Tested-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Zhiguo Niu	f911be1165	f2fs: fix to avoid use SSR allocate when do defragment [ Upstream commit `21327a042d` ] SSR allocate mode will be used when doing file defragment if ATGC is working at the same time, that is because set_page_private_gcing may make CURSEG_ALL_DATA_ATGC segment type got in f2fs_allocate_data_block when defragment page is writeback, which may cause file fragmentation is worse. A file with 2 fragmentations is changed as following after defragment: ----------------file info------------------- sensorsdata : -------------------------------------------- dev [254:48] ino [0x 3029 : 12329] mode [0x 81b0 : 33200] nlink [0x 1 : 1] uid [0x 27e6 : 10214] gid [0x 27e6 : 10214] size [0x 242000 : 2367488] blksize [0x 1000 : 4096] blocks [0x 1210 : 4624] -------------------------------------------- file_pos start_blk end_blk blks 0 11361121 11361207 87 356352 11361215 11361216 2 364544 11361218 11361218 1 368640 11361220 11361221 2 376832 11361224 11361225 2 385024 11361227 11361238 12 434176 11361240 11361252 13 487424 11361254 11361254 1 491520 11361271 11361279 9 528384 3681794 3681795 2 536576 3681797 3681797 1 540672 `3681799` `3681799` 1 544768 3681803 3681803 1 548864 `3681805` `3681805` 1 552960 3681807 3681807 1 557056 3681809 3681809 1 Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: `8cb1f4080d` ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Li Zhijian	00fbc7ba49	mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() [ Upstream commit `66eca1021a` ] It's expected that no page should be left in pcp_list after calling zone_pcp_disable() in offline_pages(). Previously, it's observed that offline_pages() gets stuck [1] due to some pages remaining in pcp_list. Cause: There is a race condition between drain_pages_zone() and __rmqueue_pcplist() involving the pcp->count variable. See below scenario: CPU0 CPU1 ---------------- --------------- spin_lock(&pcp->lock); __rmqueue_pcplist() { zone_pcp_disable() { /* list is empty / if (list_empty(list)) { / add pages to pcp_list / alloced = rmqueue_bulk() mutex_lock(&pcp_batch_high_lock) ... __drain_all_pages() { drain_pages_zone() { / read pcp->count, it's 0 here / count = READ_ONCE(pcp->count) / 0 means nothing to drain / / update pcp->count */ pcp->count += alloced << order; ... ... spin_unlock(&pcp->lock); In this case, after calling zone_pcp_disable() though, there are still some pages in pcp_list. And these pages in pcp_list are neither movable nor isolated, offline_pages() gets stuck as a result. Solution: Expand the scope of the pcp->lock to also protect pcp->count in drain_pages_zone(), to ensure no pages are left in the pcp list after zone_pcp_disable() [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/ Link: https://lkml.kernel.org/r/20240723064428.1179519-1-lizhijian@fujitsu.com Fixes: `4b23a68f95` ("mm/page_alloc: protect PCP lists with a spinlock") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Lucas Stach	4abfa277c2	mm: page_alloc: control latency caused by zone PCP draining [ Upstream commit `55f77df7d7` ] Patch series "mm/treewide: Remove pXd_huge() API", v2. In previous work [1], we removed the pXd_large() API, which is arch specific. This patchset further removes the hugetlb pXd_huge() API. Hugetlb was never special on creating huge mappings when compared with other huge mappings. Having a standalone API just to detect such pgtable entries is more or less redundant, especially after the pXd_leaf() API set is introduced with/without CONFIG_HUGETLB_PAGE. When looking at this problem, a few issues are also exposed that we don't have a clear definition of the _huge() variance API. This patchset started by cleaning these issues first, then replace all _huge() users to use _leaf(), then drop all _huge() code. On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for all the rest archs they're reported "false" instead. This part is done in patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll leave that to hmm experts to decide. Besides, there are three archs (arm, arm64, powerpc) that have slightly different definitions between the _huge() v.s. _leaf() variances. I tackled them separately so that it'll be easier for arch experts to chim in when necessary. This part is done in patch 6-9. The final patches 10-14 do the rest on the final removal, since _leaf() will be the ultimate API in the future, and we seem to have quite some confusions on how _huge() APIs can be defined, provide a rich comment for *_leaf() API set to define them properly to avoid future misuse, and hopefully that'll also help new archs to start support huge mappings and avoid traps (like either swap entries, or PROT_NONE entry checks). [1] https://lore.kernel.org/r/20240305043750.93762-1-peterx@redhat.com This patch (of 14): When the complete PCP is drained a much larger number of pages than the usual batch size might be freed at once, causing large IRQ and preemption latency spikes, as they are all freed while holding the pcp and zone spinlocks. To avoid those latency spikes, limit the number of pages freed in a single bulk operation to common batch limits. Link: https://lkml.kernel.org/r/20240318200404.448346-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20240318200736.2835502-1-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bjorn Andersson <andersson@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fabio Estevam <festevam@denx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Mark Salter <msalter@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: `66eca1021a` ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Huang Ying	dde5e5343d	mm: restrict the pcp batch scale factor to avoid too long latency [ Upstream commit `52166607ec` ] In page allocator, PCP (Per-CPU Pageset) is refilled and drained in batches to increase page allocation throughput, reduce page allocation/freeing latency per page, and reduce zone lock contention. But too large batch size will cause too long maximal allocation/freeing latency, which may punish arbitrary users. So the default batch size is chosen carefully (in zone_batchsize(), the value is 63 for zone > 1GB) to avoid that. In commit `3b12e7e979` ("mm/page_alloc: scale the number of pages that are batch freed"), the batch size will be scaled for large number of page freeing to improve page freeing performance and reduce zone lock contention. Similar optimization can be used for large number of pages allocation too. To find out a suitable max batch scale factor (that is, max effective batch size), some tests and measurement on some machines were done as follows. A set of debug patches are implemented as follows, - Set PCP high to be 2 * batch to reduce the effect of PCP high - Disable free batch size scaling to get the raw performance. - The code with zone lock held is extracted from rmqueue_bulk() and free_pcppages_bulk() to 2 separate functions to make it easy to measure the function run time with ftrace function_graph tracer. - The batch size is hard coded to be 63 (default), 127, 255, 511, 1023, 2047, 4095. Then will-it-scale/page_fault1 is used to generate the page allocation/freeing workload. The page allocation/freeing throughput (page/s) is measured via will-it-scale. The page allocation/freeing average latency (alloc/free latency avg, in us) and allocation/freeing latency at 99 percentile (alloc/free latency 99%, in us) are measured with ftrace function_graph tracer. The test results are as follows, Sapphire Rapids Server ====================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 513633.4 2.33 3.57 2.67 6.83 127 517616.7 4.35 6.65 4.22 13.03 255 520822.8 8.29 13.32 7.52 25.24 511 524122.0 15.79 23.42 14.02 49.35 1023 525980.5 30.25 44.19 25.36 94.88 2047 526793.6 59.39 84.50 45.22 140.81 Ice Lake Server =============== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 620210.3 2.21 3.68 2.02 4.35 127 627003.0 4.09 6.86 3.51 8.28 255 630777.5 7.70 13.50 6.17 15.97 511 633651.5 14.85 22.62 11.66 31.08 1023 637071.1 28.55 42.02 20.81 54.36 2047 638089.7 56.54 84.06 39.28 91.68 Cascade Lake Server =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 404706.7 3.29 5.03 3.53 4.75 127 422475.2 6.12 9.09 6.36 8.76 255 411522.2 11.68 16.97 10.90 16.39 511 428124.1 22.54 31.28 19.86 32.25 1023 414718.4 43.39 62.52 40.00 66.33 2047 429848.7 86.64 120.34 71.14 106.08 Commet Lake Desktop =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 795183.13 2.18 3.55 2.03 3.05 127 803067.85 3.91 6.56 3.85 5.52 255 812771.10 7.35 10.80 7.14 10.20 511 817723.48 14.17 27.54 13.43 30.31 1023 818870.19 27.72 40.10 27.89 46.28 Coffee Lake Desktop =================== Batch throughput free latency free latency alloc latency alloc latency page/s avg / us 99% / us avg / us 99% / us ----- ---------- ------------ ------------ ------------- ------------- 63 510542.8 3.13 4.40 2.48 3.43 127 514288.6 5.97 7.89 4.65 6.04 255 516889.7 11.86 15.58 8.96 12.55 511 519802.4 23.10 28.81 16.95 26.19 1023 520802.7 45.30 52.51 33.19 45.95 2047 519997.1 90.63 104.00 65.26 81.74 From the above data, to restrict the allocation/freeing latency to be less than 100 us in most times, the max batch scale factor needs to be less than or equal to 5. Although it is reasonable to use 5 as max batch scale factor for the systems tested, there are also slower systems. Where smaller value should be used to constrain the page allocation/freeing latency. So, in this patch, a new kconfig option (PCP_BATCH_SCALE_MAX) is added to set the max batch scale factor. Whose default value is 5, and users can reduce it when necessary. Link: https://lkml.kernel.org/r/20231016053002.756205-5-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: `66eca1021a` ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	340bbe90cc	fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes [ Upstream commit `c2bc958b2b` ] Test the vesa_attributes field in struct screen_info for compatibility with VGA hardware. Vesafb currently tests bit 1 in screen_info's capabilities field which indicates a 64-bit lfb address and is unrelated to VGA compatibility. Section 4.4 of the Vesa VBE 2.0 specifications defines that bit 5 in the mode's attributes field signals VGA compatibility. The mode is compatible with VGA hardware if the bit is clear. In that case, the driver can access VGA state of the VBE's underlying hardware. The vesafb driver uses this feature to program the color LUT in palette modes. Without, colors might be incorrect. The problem got introduced in commit `89ec4c238e` ("[PATCH] vesafb: Fix incorrect logo colors in x86_64"). It incorrectly stores the mode attributes in the screen_info's capabilities field and updates vesafb accordingly. Later, commit `5e8ddcbe86` ("Video mode probing support for the new x86 setup code") fixed the screen_info, but did not update vesafb. Color output still tends to work, because bit 1 in capabilities is usually 0. Besides fixing the bug in vesafb, this commit introduces a helper that reads the correct bit from screen_info. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: `5e8ddcbe86` ("Video mode probing support for the new x86 setup code") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Cc: <stable@vger.kernel.org> # v2.6.23+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	a168da3182	firmware/sysfb: Update screen_info for relocated EFI framebuffers [ Upstream commit `78aa89d1df` ] On ARM PCI systems, the PCI hierarchy might be reconfigured during boot and the firmware framebuffer might move as a result of that. The values in screen_info will then be invalid. Work around this problem by tracking the framebuffer's initial location before it get relocated; then fix the screen_info state between reloaction and creating the firmware framebuffer's device. This functionality has been lifted from efifb. See the commit message of commit `55d728a40d` ("efi/fb: Avoid reconfiguration of BAR that covers the framebuffer") for more information. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-8-tzimmermann@suse.de Stable-dep-of: `c2bc958b2b` ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:16 +02:00
Thomas Zimmermann	f5dce77f3f	video: Provide screen_info_get_pci_dev() to find screen_info's PCI device [ Upstream commit `036105e3a7` ] Add screen_info_get_pci_dev() to find the PCI device of an instance of screen_info. Does nothing on systems without PCI bus. v3: * search PCI device with pci_get_base_class() (Sui) v2: * remove ret from screen_info_pci_dev() (Javier) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-3-tzimmermann@suse.de Stable-dep-of: `c2bc958b2b` ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Thomas Zimmermann	5b4d995dfd	video: Add helpers for decoding screen_info [ Upstream commit `75fa9b7e37` ] The plain values as stored in struct screen_info need to be decoded before being used. Add helpers that decode the type of video output and the framebuffer I/O aperture. Old or non-x86 systems may not set the type of video directly, but only indicate the presence by storing 0x01 in orig_video_isVGA. The decoding logic in screen_info_video_type() takes this into account. It then follows similar code in vgacon's vgacon_startup() to detect the video type from the given values. A call to screen_info_resources() returns all known resources of the given screen_info. The resources' values have been taken from existing code in vgacon and vga16fb. These drivers can later be converted to use the new interfaces. v2: * return ssize_t from screen_info_resources() * don't call __screen_info_has_lfb() unnecessarily Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-2-tzimmermann@suse.de Stable-dep-of: `c2bc958b2b` ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Thomas Zimmermann	bab0a82854	fbdev/vesafb: Replace references to global screen_info by local pointer [ Upstream commit `3218286bbb` ] Get the global screen_info's address once and access the data via this pointer. Limits the use of global state. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206135153.2599-4-tzimmermann@suse.de Stable-dep-of: `c2bc958b2b` ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sui Jingfeng	ccab04dc57	PCI: Add pci_get_base_class() helper [ Upstream commit `d427da2323` ] There is no function to get all PCI devices in a system by matching against the base class code only, ignoring the sub-class code and the programming interface. Add pci_get_base_class() to suit the need. For example, if a driver wants to process all PCI display devices in a system, it can do so like this: pdev = NULL; while ((pdev = pci_get_base_class(PCI_BASE_CLASS_DISPLAY, pdev))) { do_something_for_pci_display_device(pdev); } Link: https://lore.kernel.org/r/20230825062714.6325-2-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> [bhelgaas: reword commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: `c2bc958b2b` ("fbdev: vesafb: Detect VGA compatibility from screen info's VESA attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sean Christopherson	43e73206cf	KVM: nVMX: Check for pending posted interrupts when looking for nested events [ Upstream commit `27c4fa42b1` ] Check for pending (and notified!) posted interrupts when checking if L2 has a pending wake event, as fully posted/notified virtual interrupt is a valid wake event for HLT. Note that KVM must check vmx->nested.pi_pending to avoid prematurely waking L2, e.g. even if KVM sees a non-zero PID.PIR and PID.0N=1, the virtual interrupt won't actually be recognized until a notification IRQ is received by the vCPU or the vCPU does (nested) VM-Enter. Fixes: `26844fee6a` ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Jim Mattson <jmattson@google.com> Closes: https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@google.com Link: https://lore.kernel.org/r/20240607172609.3205077-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Sean Christopherson	459403bc66	KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector [ Upstream commit `d83c36d822` ] Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is straightforward, it's surprisingly easy to mess up, e.g. if one tries to reuse lapic.c's find_highest_vector(), which doesn't work with PID.PIR due to the APIC's IRR and ISR component registers being physically discontiguous (they're 4-byte registers aligned at 16-byte intervals). To make PIR handling more consistent with respect to IRR and ISR handling, return -1 to indicate "no interrupt pending". Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:15 +02:00
Jacob Pan	65b2514e03	KVM: VMX: Move posted interrupt descriptor out of VMX code [ Upstream commit `699f67512f` ] To prepare native usage of posted interrupts, move the PID declarations out of VMX code such that they can be shared. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20240423174114.526704-2-jacob.jun.pan@linux.intel.com Stable-dep-of: `d83c36d822` ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Vitaly Kuznetsov	ebfed7bebd	KVM: VMX: Split off vmx_onhyperv.{ch} from hyperv.{ch} [ Upstream commit `50a82b0eb8` ] hyperv.{ch} is currently a mix of stuff which is needed by both Hyper-V on KVM and KVM on Hyper-V. As a preparation to making Hyper-V emulation optional, put KVM-on-Hyper-V specific code into dedicated files. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231205103630.1391318-4-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: `d83c36d822` ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Thomas Weißschuh	93ac74cd6f	leds: triggers: Flush pending brightness before activating trigger [ Upstream commit `ab477b766e` ] The race fixed in timer_trig_activate() between a blocking set_brightness() call and trigger->activate() can affect any trigger. So move the call to flush_work() into led_trigger_set() where it can avoid the race for all triggers. Fixes: `0db37915d9` ("leds: avoid races with workqueue") Fixes: `8c0f693c6e` ("leds: avoid flush_work in atomic context") Cc: stable@vger.kernel.org Tested-by: Dustin L. Howett <dustin@howett.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20240613-led-trigger-flush-v2-1-f4f970799d77@weissschuh.net Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Hans de Goede	9ce3c14f0d	leds: trigger: Call synchronize_rcu() before calling trig->activate() [ Upstream commit `b1bbd20f35` ] Some triggers call led_trigger_event() from their activate() callback to initialize the brightness of the LED for which the trigger is being activated. In order for the LED's initial state to be set correctly this requires that the led_trigger_event() call uses the new version of trigger->led_cdevs, which has the new LED. AFAICT led_trigger_event() will always use the new version when it is running on the same CPU as where the list_add_tail_rcu() call was made, which is why the missing synchronize_rcu() has not lead to bug reports. But if activate() is pre-empted, sleeps or uses a worker then the led_trigger_event() call may run on another CPU which may still use the old trigger->led_cdevs list. Add a synchronize_rcu() call to ensure that any led_trigger_event() calls done from activate() always use the new list. Triggers using led_trigger_event() from their activate() callback are: net/bluetooth/leds.c, net/rfkill/core.c and drivers/tty/vt/keyboard.c. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240531120124.75662-1-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: `ab477b766e` ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Heiner Kallweit	587cf9c0f7	leds: trigger: Store brightness set by led_trigger_event() [ Upstream commit `822c91e72e` ] If a simple trigger is assigned to a LED, then the LED may be off until the next led_trigger_event() call. This may be an issue for simple triggers with rare led_trigger_event() calls, e.g. power supply charging indicators (drivers/power/supply/power_supply_leds.c). Therefore persist the brightness value of the last led_trigger_event() call and use this value if the trigger is assigned to a LED. In addition add a getter for the trigger brightness value. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/b1358b25-3f30-458d-8240-5705ae007a8a@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: `ab477b766e` ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Heiner Kallweit	73a26eada5	leds: trigger: Remove unused function led_trigger_rename_static() [ Upstream commit `c82a1662d4` ] This function was added with `a8df7b1ab7` ("leds: add led_trigger_rename function") 11 yrs ago, but it has no users. So remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/d90f30be-f661-4db7-b0b5-d09d07a78a68@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: `ab477b766e` ("leds: triggers: Flush pending brightness before activating trigger") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Javier Carrasco	e3fd01a810	cpufreq: qcom-nvmem: fix memory leaks in probe error paths [ Upstream commit `d01c84b97f` ] The code refactoring added new error paths between the np device node allocation and the call to of_node_put(), which leads to memory leaks if any of those errors occur. Add the missing of_node_put() in the error paths that require it. Cc: stable@vger.kernel.org Fixes: `57f2f8b4aa` ("cpufreq: qcom: Refactor the driver to make it easier to extend") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:14 +02:00
Stephan Gerhold	51a45209a8	cpufreq: qcom-nvmem: Simplify driver data allocation [ Upstream commit `2a5d46c3ad` ] Simplify the allocation and cleanup of driver data by using devm together with a flexible array. Prepare for adding additional per-CPU data by defining a struct qcom_cpufreq_drv_cpu instead of storing the opp_tokens directly. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Stable-dep-of: `d01c84b97f` ("cpufreq: qcom-nvmem: fix memory leaks in probe error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	df7363307e	ext4: check the extent status again before inserting delalloc block [ Upstream commit `0ea6560abb` ] ext4_da_map_blocks looks up for any extent entry in the extent status tree (w/o i_data_sem) and then the looks up for any ondisk extent mapping (with i_data_sem in read mode). If it finds a hole in the extent status tree or if it couldn't find any entry at all, it then takes the i_data_sem in write mode to add a da entry into the extent status tree. This can actually race with page mkwrite & fallocate path. Note that this is ok between 1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the folio lock 2. ext4 buffered write path v/s ext4 fallocate because of the inode lock. But this can race between ext4_page_mkwrite() & ext4 fallocate path ext4_page_mkwrite() ext4_fallocate() block_page_mkwrite() ext4_da_map_blocks() //find hole in extent status tree ext4_alloc_file_blocks() ext4_map_blocks() //allocate block and unwritten extent ext4_insert_delayed_block() ext4_da_reserve_space() //reserve one more block ext4_es_insert_delayed_block() //drop unwritten extent and add delayed extent by mistake Then, the delalloc extent is wrong until writeback and the extra reserved block can't be released any more and it triggers below warning: EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared! Fix the problem by looking up extent status tree again while the i_data_sem is held in write mode. If it still can't find any entry, then we insert a new da entry into the extent status tree. Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240517124005.347221-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	f12fbb9599	ext4: factor out a common helper to query extent map [ Upstream commit `8e4e5cdf2f` ] Factor out a new common helper ext4_map_query_blocks() from the ext4_da_map_blocks(), it query and return the extent map status on the inode's extent path, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240517124005.347221-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `0ea6560abb` ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	c6cba59072	ext4: convert to exclusive lock while inserting delalloc extents [ Upstream commit `acf795dc16` ] ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem when inserting delalloc extents, it could be raced by another querying path of ext4_map_blocks() without i_rwsem, .e.g buffered read path. Suppose we buffered read a file containing just a hole, and without any cached extents tree, then it is raced by another delayed buffered write to the same area or the near area belongs to the same hole, and the new delalloc extent could be overwritten to a hole extent. pread() pwrite() filemap_read_folio() ext4_mpage_readpages() ext4_map_blocks() down_read(i_data_sem) ext4_ext_determine_hole() //find hole ext4_ext_put_gap_in_cache() ext4_es_find_extent_range() //no delalloc extent ext4_da_map_blocks() down_read(i_data_sem) ext4_insert_delayed_block() //insert delalloc extent ext4_es_insert_extent() //overwrite delalloc extent to hole This race could lead to inconsistent delalloc extents tree and incorrect reserved space counter. Fix this by converting to hold i_data_sem in exclusive mode when adding a new delalloc extent in ext4_da_map_blocks(). Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `0ea6560abb` ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Zhang Yi	7849e9b5ba	ext4: refactor ext4_da_map_blocks() [ Upstream commit `3fcc2b887a` ] Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary parameters and branches, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `0ea6560abb` ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Thomas Weißschuh	ffde3af4b2	sysctl: always initialize i_uid/i_gid [ Upstream commit `98ca62ba9e` ] Always initialize i_uid/i_gid inside the sysfs core so set_ownership() can safely skip setting them. Commit `5ec27ec735` ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") added defaults for i_uid/i_gid when set_ownership() was not implemented. It also missed adjusting net_ctl_set_ownership() to use the same default values in case the computation of a better value failed. Fixes: `5ec27ec735` ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Thomas Weißschuh	96f1d909cd	sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) [ Upstream commit `520713a93d` ] Remove the 'table' argument from set_ownership as it is never used. This change is a step towards putting "struct ctl_table" into .rodata and eventually having sysctl core only use "const struct ctl_table". The patch was created with the following coccinelle script: @@ identifier func, head, table, uid, gid; @@ void func( struct ctl_table_header head, - struct ctl_table table, kuid_t uid, kgid_t gid) { ... } No additional occurrences of 'set_ownership' were found after doing a tree-wide search. Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Stable-dep-of: `98ca62ba9e` ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:13 +02:00
Alexey Gladkov	13886221ad	sysctl: allow to change limits for posix messages queues [ Upstream commit `f9436a5d04` ] All parameters of posix messages queues (queues_max/msg_max/msgsize_max) end up being limited by RLIMIT_MSGQUEUE. The code in mqueue_get_inode is where that limiting happens. The RLIMIT_MSGQUEUE is bound to the user namespace and is counted hierarchically. We can allow root in the user namespace to modify the posix messages queues parameters. Link: https://lkml.kernel.org/r/6ad67f23d1459a4f4339f74aa73bac0ecf3995e1.1705333426.git.legion@kernel.org Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/7eb21211c8622e91d226e63416b1b93c079f60ee.1663756794.git.legion@kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Joel Granados <joel.granados@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: `98ca62ba9e` ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Alexey Gladkov	8d5b1a9ff8	sysctl: allow change system v ipc sysctls inside ipc namespace [ Upstream commit `50ec499b9a` ] Patch series "Allow to change ipc/mq sysctls inside ipc namespace", v3. Right now ipc and mq limits count as per ipc namespace, but only real root can change them. By default, the current values of these limits are such that it can only be reduced. Since only root can change the values, it is impossible to reduce these limits in the rootless container. We can allow limit changes within ipc namespace because mq parameters are limited by RLIMIT_MSGQUEUE and ipc parameters are not limited to anything other than cgroups. This patch (of 3): Rootless containers are not allowed to modify kernel IPC parameters. All default limits are set to such high values that in fact there are no limits at all. All limits are not inherited and are initialized to default values when a new ipc_namespace is created. For new ipc_namespace: size_t ipc_ns.shm_ctlmax = SHMMAX; // (ULONG_MAX - (1UL << 24)) size_t ipc_ns.shm_ctlall = SHMALL; // (ULONG_MAX - (1UL << 24)) int ipc_ns.shm_ctlmni = IPCMNI; // (1 << 15) int ipc_ns.shm_rmid_forced = 0; unsigned int ipc_ns.msg_ctlmax = MSGMAX; // 8192 unsigned int ipc_ns.msg_ctlmni = MSGMNI; // 32000 unsigned int ipc_ns.msg_ctlmnb = MSGMNB; // 16384 The shm_tot (total amount of shared pages) has also ceased to be global, it is located in ipc_namespace and is not inherited from anywhere. In such conditions, it cannot be said that these limits limit anything. The real limiter for them is cgroups. If we allow rootless containers to change these parameters, then it can only be reduced. Link: https://lkml.kernel.org/r/cover.1705333426.git.legion@kernel.org Link: https://lkml.kernel.org/r/d2f4603305cbfed58a24755aa61d027314b73a45.1705333426.git.legion@kernel.org Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/e2d84d3ec0172cfff759e6065da84ce0cc2736f8.1663756794.git.legion@kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: Joel Granados <joel.granados@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: `98ca62ba9e` ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Krzysztof Kozlowski	34e788045d	thermal/drivers/broadcom: Fix race between removal and clock disable [ Upstream commit `e90c369cc2` ] During the probe, driver enables clocks necessary to access registers (in get_temp()) and then registers thermal zone with managed-resources (devm) interface. Removal of device is not done in reversed order, because: 1. Clock will be disabled in driver remove() callback - thermal zone is still registered and accessible to users, 2. devm interface will unregister thermal zone. This leaves short window between (1) and (2) for accessing the get_temp() callback with disabled clock. Fix this by enabling clock also via devm-interface, so entire cleanup path will be in proper, reversed order. Fixes: `8454c8c09c` ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Uwe Kleine-König	103881e636	thermal: bcm2835: Convert to platform remove callback returning void [ Upstream commit `f29ecd3748` ] The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: `e90c369cc2` ("thermal/drivers/broadcom: Fix race between removal and clock disable") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Krishna Kurapati	0b4e4da51e	arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB [ Upstream commit `cf4d6d54ea` ] For Gen-1 targets like SDM845, it is seen that stressing out the controller in host mode results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instance in park mode for SDM845 to mitigate this issue. Cc: stable@vger.kernel.org Fixes: `ca4db2b538` ("arm64: dts: qcom: sdm845: Add USB-related nodes") Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240704152848.3380602-9-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Dmitry Baryshkov	a27753e685	arm64: dts: qcom: sdm845: switch USB QMP PHY to new style of bindings [ Upstream commit `ca5ca568d7` ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230824211952.1397699-12-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: `cf4d6d54ea` ("arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00
Dmitry Baryshkov	affc4de945	arm64: dts: qcom: sdm845: switch USB+DP QMP PHY to new style of bindings [ Upstream commit `a9ecdec45a` ] Change the USB QMP PHY to use newer style of QMP PHY bindings (single resource region, no per-PHY subnodes). Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230711120916.4165894-9-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: `cf4d6d54ea` ("arm64: dts: qcom: sdm845: Disable SS instance in Parkmode for USB") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-11 12:47:12 +02:00

... 2 3 4 5 6 ...

1226736 commits