linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-21 18:11:39 +00:00

Author	SHA1	Message	Date
Michal Schmidt	5379866664	iavf: in iavf_down, disable queues when removing the driver In iavf_down, we're skipping the scheduling of certain operations if the driver is being removed. However, the IAVF_FLAG_AQ_DISABLE_QUEUES request must not be skipped in this case, because iavf_close waits for the transition to the __IAVF_DOWN state, which happens in iavf_virtchnl_completion after the queues are released. Without this fix, "rmmod iavf" takes half a second per interface that's up and prints the "Device resources not yet released" warning. Fixes: `c8de44b577` ("iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is set") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025183213.874283-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 17:48:31 -07:00
Jakub Kicinski	5e5d8b94a4	netfilter pull request 23-10-25 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmU45rAACgkQ1V2XiooU IORtEQ//U91FHPqc1KpJi5lAnXXAXaDji6RjZ080bwk4H3oXc2moc71SiGEgggGs POZEnN0sNJXfUacdG23pQGLnrT1iQpG927mzV01W9HhyZEopO4g+mRt5ymt/qmvO Q9MKWuNOlJCD5blPyKxU7VF3LsQynaPST1IbuPI1NVKiqNUpIpAWC1G+Ofpt67QY Tq7KiJDX0yc+51OFT9Ahs3piSbzC5bl0yC4iynajPxziv+rUiJW5ym2GM24G2rNh /SD4EeJkArdFa3I4Kf15Hnj9809qQP22PDhoQ2Hzzr7XbveArmPjaI0UQ39uV5Jr 1/lFP3iQMBsj04dI/xRLBHJHb2WZvlNa+btV/RHuaw1TEnYevdarMl3Lh0q7p5sT 3M4JBbk0+bq7ZXWmDBT48ZQs4S5UqMscunZXKg2k0fZPn/rSlASAZ3TAXZuF0avp KLQGQsjeBX/zgmQqhq37/oD+YV13LCtEqC0xz4WgX9WpVvgyMR3LFcsHQcZBAVUN PJenvgmpdo8sbhABOXsURJPVDo0JzS4xZhrPyIKaojTo33KfQ/1Z5Ef0EOkbs75+ 6wMoUTdvcZK+Y5f6hvMQ/XOu7XNz0sVZlfBjAhFrVU/TsbprviQCN8QB1IQNHclm 5A93VnID0WPCSAmOmaIdMlcJka4wKv4irI+Iv8vNlQXqV7dXuzQ= =r+0z -----END PGP SIGNATURE----- Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patch contains two late Netfilter's flowtable fixes for net: 1) Flowtable GC pushes back packets to classic path in every GC run, ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset by the time the flow is offloaded (this status bit is only reliable in the sched/act_ct datapath). 2) sched/act_ct logic to push back packets to classic path to reevaluate if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is set on and no hardware offload request is pending to be handled. From Vlad Buslov. These two patches fixes two problems that were introduced in the previous 6.5 development cycle. * tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path ==================== Link: https://lore.kernel.org/r/20231025100819.2664-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 16:02:06 -07:00
Alexandru Matei	53b08c4985	vsock/virtio: initialize the_virtio_vsock before using VQs Once VQs are filled with empty buffers and we kick the host, it can send connection requests. If the_virtio_vsock is not initialized before, replies are silently dropped and do not reach the host. virtio_transport_send_pkt() can queue packets once the_virtio_vsock is set, but they won't be processed until vsock->tx_run is set to true. We queue vsock->send_pkt_work when initialization finishes to send those packets queued earlier. Fixes: `0deab087b1` ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231024191742.14259-1-alexandru.matei@uipath.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 15:49:47 -07:00
Christian Brauner	61d4fb0b34	file, i915: fix file reference for mmap_singleton() Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below. So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics. The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release(): static int singleton_release(struct inode inode, struct file file) { struct drm_i915_private *i915 = file->private_data; cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm); return 0; } The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton: While mmap_singleton() does rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); it allocates a new file via anon_inode_getfile() and does smp_store_mb(i915->gem.mmap_singleton, file); So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact. Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL. Say concurrently another task hits mmap_singleton() and does: rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu(). But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called. So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload. <3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK> Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963 Cc: Jann Horn <jannh@google.com>, Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-25 22:17:04 +02:00
Marc Zyngier	f199bf5bf8	irqchip/gic-v3-its: Don't override quirk settings with default values When splitting the allocation of the ITS node from its configuration, some of the default settings were kept in the latter instead of being moved to the former. This has the side effect of negating some of the quirk detections that have happened in between, amongst which the dreaded Synquacer hack (that also affect Dominic's TI platform). Move the initialisation of these fields early, so that they can again be overriden by the Synquacer quirk. Fixes: `9585a495ac` ("irqchip/gic-v3-its: Split allocation from initialisation of its_node") Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net> Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org	2023-10-25 21:44:49 +02:00
Linus Torvalds	611da07b89	ACPI fix for 6.6-rc8 Unbreak the ACPI NFIT driver after a recent change that inadvertently altered its behavior (Xiang Chen). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmU5VEASHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxLJMQAIS8rYBX13iyD7GcN1f3ct7sgG06qasM 6TAys0OWNIm3tqiMT+EGEpxIlk7Ou0Ztf9kQ6DWAYLY9os7TlnfxseLgy3hYaM/N Svx9EYPNOSYNe/U4HKEsJidvUVDLk5pw6XyNY6eVNAa02/OtM7RiQYGD3QKu6E6z qeLQ+WTk9jRCsA2SHYhhoCBArk6i+uaWuBd1IZ5T2XlBydTzGpQ5I3iYzoBcgFPa KuPhN5ky8+uuP/Bo8Wo2zVTg8RNQRV4P5+kr1HTh7nO4NEiXfCzsbSEKfsyVMWAl OGuO4qry4LqaqzFdDS+Ls2B3z9jPKnoKly87O+0SjHOhF4bMBvpL+zvqYxHb67lQ 4CyCHKl2LZko3trB/aPbQs8kjEtJ+OCXajbJVy32VX0CX6AYaQVBGJEVttOlCA5v Vohnv+Uwcc9daN7MPinhPyfK02A4pC2KlCNyQJ7/cnN/Gk5Er4XyTHd0EIGYzfb9 UY4hXpxOk9YpOn0uSGJQqdzRJU1nGgV29ZUEPTg/Wr94tZLGgPhk5p9azbkUgXlp UHE0iZlTkViSy0TfX0PPXOjXjC1ZEUTT8rO6AlteOeOeuA3X+vPLRgmkCF5IswB8 BJZozhmR/hLwX2UemNGQDfHCpfW7o5xx0QDyy3b57U6W9qZyvMlixs13A+Zp9d2r RGDYWuJeMGA8 =W0DD -----END PGP SIGNATURE----- Merge tag 'acpi-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Unbreak the ACPI NFIT driver after a recent change that inadvertently altered its behavior (Xiang Chen)" * tag 'acpi-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: NFIT: Install Notify() handler before getting NFIT table	2023-10-25 07:51:56 -10:00
Petr Tesarik	d5090484b0	swiotlb: do not try to allocate a TLB bigger than MAX_ORDER pages When allocating a new pool at runtime, reduce the number of slabs so that the allocation order is at most MAX_ORDER. This avoids a kernel warning in __alloc_pages(). The warning is relatively benign, because the pool size is subsequently reduced when allocation fails, but it is silly to start with a request that is known to fail, especially since this is the default behavior if the kernel is built with CONFIG_SWIOTLB_DYNAMIC=y and booted without any swiotlb= parameter. Reported-by: Ben Greear <greearb@candelatech.com> Closes: https://lore.kernel.org/netdev/4f173dd2-324a-0240-ff8d-abf5c191be18@candelatech.com/ Fixes: `1aaa736815` ("swiotlb: allocate a new memory pool when existing pools are full") Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2023-10-25 16:26:20 +02:00
Jens Axboe	838b35bb6a	io_uring/rw: disable IOCB_DIO_CALLER_COMP If an application does O_DIRECT writes with io_uring and the file system supports IOCB_DIO_CALLER_COMP, then completions of the dio write side is done from the task_work that will post the completion event for said write as well. Whenever a dio write is done against a file, the inode i_dio_count is elevated. This enables other callers to use inode_dio_wait() to wait for previous writes to complete. If we defer the full dio completion to task_work, we are dependent on that task_work being run before the inode i_dio_count can be decremented. If the same task that issues io_uring dio writes with IOCB_DIO_CALLER_COMP performs a synchronous system call that calls inode_dio_wait(), then we can deadlock as we're blocked sleeping on the event to become true, but not processing the completions that will result in the inode i_dio_count being decremented. Until we can guarantee that this is the case, then disable the deferred caller completions. Fixes: `099ada2c87` ("io_uring/rw: add write support for IOCB_DIO_CALLER_COMP") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-10-25 08:02:29 -06:00
Mario Limonciello	64ffd2f1d0	drm/amd: Disable ASPM for VI w/ all Intel systems Originally we were quirking ASPM disabled specifically for VI when used with Alder Lake, but it appears to have problems with Rocket Lake as well. Like we've done in the case of dpm for newer platforms, disable ASPM for all Intel systems. Cc: stable@vger.kernel.org # 5.15+ Fixes: `0064b0ce85` ("drm/amd/pm: enable ASPM by default") Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-25 09:53:17 -04:00
Jens Axboe	7644b1a1c9	io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid We could race with SQ thread exit, and if we do, we'll hit a NULL pointer dereference when the thread is cleared. Grab the SQPOLL data lock before attempting to get the task cpu and pid for fdinfo, this ensures we have a stable view of it. Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-10-25 07:44:14 -06:00
Umesh Nerlige Ramappa	4cbed7702e	drm/i915/pmu: Check if pmu is closed before stopping event When the driver unbinds, pmu is unregistered and i915->uabi_engines is set to RB_ROOT. Due to this, when i915 PMU tries to stop the engine events, it issues a warn_on because engine lookup fails. All perf hooks are taking care of this using a pmu->closed flag that is set when PMU unregisters. The stop event seems to have been left out. Check for pmu->closed in pmu_event_stop as well. Based on discussion here - https://patchwork.freedesktop.org/patch/492079/?series=105790&rev=2 v2: s/is/if/ in commit title v3: Add fixes tag and cc stable Cc: <stable@vger.kernel.org> # v5.11+ Fixes: `b00bccb3f0` ("drm/i915/pmu: Handle PCI unbind") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020152441.3764850-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit `31f6a06f0c`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-10-25 08:44:30 -04:00
Matt Roper	78cc55e0b6	drm/i915/mcr: Hold GT forcewake during steering operations The steering control and semaphore registers are inside an "always on" power domain with respect to RC6. However there are some issues if higher-level platform sleep states are entering/exiting at the same time these registers are accessed. Grabbing GT forcewake and holding it over the entire lock/steer/unlock cycle ensures that those sleep states have been fully exited before we access these registers. This is expected to become a formally documented/numbered workaround soon. Note that this patch alone isn't expected to have an immediately noticeable impact on MCR (mis)behavior; an upcoming pcode firmware update will also be necessary to provide the other half of this workaround. v2: - Move the forcewake inside the Xe_LPG-specific IP version check. This should only be necessary on platforms that have a steering semaphore. Fixes: `3100240bf8` ("drm/i915/mtl: Add hardware-level lock for steering") Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231019170241.2102037-2-matthew.d.roper@intel.com (cherry picked from commit `8fa1c7cd1f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-10-25 08:44:26 -04:00
Sui Jingfeng	4e6c38c387	drm/logicvc: Kconfig: select REGMAP and REGMAP_MMIO drm/logicvc driver is depend on REGMAP and REGMAP_MMIO, should select this two kconfig option, otherwise the driver failed to compile on platform without REGMAP_MMIO selected: ERROR: modpost: "__devm_regmap_init_mmio_clk" [drivers/gpu/drm/logicvc/logicvc-drm.ko] undefined! make[1]: * [scripts/Makefile.modpost:136: Module.symvers] Error 1 make: * [Makefile:1978: modpost] Error 2 Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Fixes: `efeeaefe9b` ("drm: Add support for the LogiCVC display controller") Link: https://patchwork.freedesktop.org/patch/msgid/20230608024207.581401-1-suijingfeng@loongson.cn Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2023-10-25 12:05:25 +02:00
Deming Wang	1711435e3e	net: ipv6: fix typo in comments The word "advertize" should be replaced by "advertise". Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-25 10:38:07 +01:00
Deming Wang	197f9fba96	net: ipv4: fix typo in comments The word "advertize" should be replaced by "advertise". Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-25 10:38:07 +01:00
Vlad Buslov	a63b662212	net/sched: act_ct: additional checks for outdated flows Current nf_flow_is_outdated() implementation considers any flow table flow which state diverged from its underlying CT connection status for teardown which can be problematic in the following cases: - Flow has never been offloaded to hardware in the first place either because flow table has hardware offload disabled (flag NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add' workqueue to be offloaded for the first time. The former is incorrect, the later generates excessive deletions and additions of flows. - Flow is already pending to be updated on the workqueue. Tearing down such flows will also generate excessive removals from the flow table, especially on highly loaded system where the latency to re-offload a flow via 'add' workqueue can be quite high. When considering a flow for teardown as outdated verify that it is both offloaded to hardware and doesn't have any pending updates. Fixes: `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-25 11:35:57 +02:00
Pablo Neira Ayuso	735795f68b	netfilter: flowtable: GC pushes back packets to classic path Since `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY back to classic path in every run, ie. every second. This is because of a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct. In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on and IPS_SEEN_REPLY is unreliable since users decide when to offload the flow before, such bit might be set on at a later stage. Fix it by adding a custom .gc handler that sched/act_ct can use to deal with its NF_FLOW_HW_ESTABLISHED bit. Fixes: `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reported-by: Vladimir Smelhaus <vl.sm@email.cz> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-25 11:35:46 +02:00
Aneesh Kumar K.V	47b8def935	powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes With commit `9fee28baa6` ("powerpc: implement the new page table range API") we added set_ptes to powerpc architecture. The implementation included calling arch_enter/leave_lazy_mmu() calls. The patch removes the usage of arch_enter/leave_lazy_mmu() because set_pte is not supposed to be used when updating a pte entry. Powerpc architecture uses this rule to skip the expensive tlb invalidate which is not needed when you are setting up the pte for the first time. See commit `56eecdb912` ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit") for more details The patch also makes sure we are not using the interface to update a valid/present pte entry by adding VM_WARN_ON check all the ptes we are setting up. Furthermore, we add a comment to set_pte_filter to clarify it can only update folio-related flags and cannot filter pfn specific details in pte filtering. Removal of arch_enter/leave_lazy_mmu() also will avoid nesting of these functions that are not supported. For ex: remap_pte_range() -> arch_enter_lazy_mmu() -> set_ptes() -> arch_enter_lazy_mmu() -> arch_leave_lazy_mmu() -> arch_leave_lazy_mmu() Fixes: `9fee28baa6` ("powerpc: implement the new page table range API") Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231024143604.16749-1-aneesh.kumar@linux.ibm.com	2023-10-25 16:08:46 +11:00
Ivan Vecera	77a8c982ff	i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR The I40E_TXR_FLAGS_WB_ON_ITR is i40e_ring flag and not i40e_pf one. Fixes: `8e0764b4d6` ("i40e/i40evf: Add support for writeback on ITR feature for X722") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231023212714.178032-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 17:03:29 -07:00
Jakub Kicinski	00d67093e4	Three more fixes: - don't drop all unprotected public action frames since some don't have a protected dual - fix pointer confusion in scanning code - fix warning in some connections with multiple links -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmU3naIACgkQ10qiO8sP aADUNA//clFAsaH6A94tD6Hgyafi9idBpsHERkYE4RaGmiID34yOYDInvkoDmsy1 WG7wEdNjnsYDBrX0eG1x7WSrQRLhs76U0HnBP9tFYIeygnLuul2/UNRFkK6EwQfn OJbVJdjQdL/c8p129DUr5JKhavbc4ovY2acECLRY54n1fAYlnn6u7SWsOyCu0zl7 wXSQ5pzYHu5lFM5LSFj6mC7U8b/aFQ5r9XsNHwGz4YVvd5cEdLYc/y5bAK6xAIxz jJcJLV088QikAcYmgIS7MNQuKrMudNjCEDWtqM23N9pO//QjsbOag2q02JmfxWyv 4YJy42G/0K0/wjwCpIZig2OOE5iKDKCJ+dvNBUaCnnTn1ARQSXSDgAYkmWReCrZu DUpvn8Be3fgULtaUC0QQ3R1oCTVJyKYTD55Ofcy3Pj1qt+1lmhgLp1qyezemJcfJ p2sv5GLwyPLcOUTjeqTgP57xoJl2JV9vUVey9xvk2dMl0vS5qpfIf3FR7R0+HtlZ UIrveQOLMsKAaamk59RaSpfg4vJoCqaabu97f/lHRc5WdaeURSlUw0rU9xdqc/P+ GTX7ubKoMiMEx11v25JdTE3eniFGxu28cojqScryvFo6WIlkYp4cbNxtRb4i9rOX ZJXQWCE6YJZ90VlR/a8lpJnTXjntQT5vBtH7vhqAneN2TJN74h8= =AvmQ -----END PGP SIGNATURE----- Merge tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Three more fixes: - don't drop all unprotected public action frames since some don't have a protected dual - fix pointer confusion in scanning code - fix warning in some connections with multiple links * tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() ==================== Link: https://lore.kernel.org/r/20231024103540.19198-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:10:53 -07:00
Linus Torvalds	4f82870119	20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues or aren't considered necessary for earlier kernel versions. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZTfz/QAKCRDdBJ7gKXxA joMyAP99hLaLYeJbjlf+4tLJzhlpbVoFra1ieun2D+ZgFE78xQD/T4T3PYrZhYqD WdrxGT9fiKOykXM5pmQRH9Zr4EvJBA0= =Obbk -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues or aren't considered necessary for earlier kernel versions" * tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: add GFP_KERNEL to allocations in mas_expected_entries() selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier mailmap: correct email aliasing for Oleksij Rempel mailmap: map Bartosz's old address to the current one mm/damon/sysfs: check DAMOS regions update progress from before_terminate() MAINTAINERS: Ondrej has moved kasan: disable kasan_non_canonical_hook() for HW tags kasan: print the original fault addr when access invalid shadow hugetlbfs: close race between MADV_DONTNEED and page fault hugetlbfs: extend hugetlb_vma_lock to private VMAs hugetlbfs: clear resv_map pointer if mmap fails mm: zswap: fix pool refcount bug around shrink_worker() mm/migrate: fix do_pages_move for compat pointers riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set riscv: handle VM_FAULT_[HWPOISON\|HWPOISON_LARGE] faults instead of panicking mmap: fix error paths with dup_anon_vma() mmap: fix vma_iterator in error path of vma_merge() mm: fix vm_brk_flags() to not bail out while holding lock mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer mm/page_alloc: correct start page when guard page debug is enabled	2023-10-24 09:52:16 -10:00
Jinjie Ruan	28926daf73	fpga: Fix memory leak for fpga_region_test_class_find() fpga_region_class_find() in fpga_region_test_class_find() will call get_device() if the data is matched, which will increment refcount for dev->kobj, so it should call put_device() to decrement refcount for dev->kobj to free the region, because fpga_region_unregister() will call fpga_region_dev_release() only when the refcount for dev->kobj is zero but fpga_region_test_init() call device_register() in fpga_region_register_full(), which also increment refcount. So call put_device() after calling fpga_region_class_find() in fpga_region_test_class_find(). After applying this patch, the following memory leak is never detected. unreferenced object 0xffff88810c8ef000 (size 1024): comm "kunit_try_catch", pid 1875, jiffies 4294715298 (age 836.836s) hex dump (first 32 bytes): b8 d1 fb 05 81 88 ff ff 08 f0 8e 0c 81 88 ff ff ................ 08 f0 8e 0c 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff817ebad7>] kmalloc_trace+0x27/0xa0 [<ffffffffa02385e1>] fpga_region_register_full+0x51/0x430 [fpga_region] [<ffffffffa0228e47>] 0xffffffffa0228e47 [<ffffffff829c479d>] kunit_try_run_case+0xdd/0x250 [<ffffffff829c9f2a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81238b85>] kthread+0x2b5/0x380 [<ffffffff81097ded>] ret_from_fork+0x2d/0x70 [<ffffffff810034d1>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888105fbd1b8 (size 8): comm "kunit_try_catch", pid 1875, jiffies 4294715298 (age 836.836s) hex dump (first 8 bytes): 72 65 67 69 6f 6e 30 00 region0. backtrace: [<ffffffff817ec023>] __kmalloc_node_track_caller+0x53/0x150 [<ffffffff82995590>] kvasprintf+0xb0/0x130 [<ffffffff83f713b1>] kobject_set_name_vargs+0x41/0x110 [<ffffffff8304ac1b>] dev_set_name+0xab/0xe0 [<ffffffffa02388a2>] fpga_region_register_full+0x312/0x430 [fpga_region] [<ffffffffa0228e47>] 0xffffffffa0228e47 [<ffffffff829c479d>] kunit_try_run_case+0xdd/0x250 [<ffffffff829c9f2a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81238b85>] kthread+0x2b5/0x380 [<ffffffff81097ded>] ret_from_fork+0x2d/0x70 [<ffffffff810034d1>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810b3b8a00 (size 256): comm "kunit_try_catch", pid 1875, jiffies 4294715298 (age 836.836s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 8a 3b 0b 81 88 ff ff ..........;..... 08 8a 3b 0b 81 88 ff ff e0 ac 04 83 ff ff ff ff ..;............. backtrace: [<ffffffff817ebad7>] kmalloc_trace+0x27/0xa0 [<ffffffff83056d7a>] device_add+0xa2a/0x15e0 [<ffffffffa02388b1>] fpga_region_register_full+0x321/0x430 [fpga_region] [<ffffffffa0228e47>] 0xffffffffa0228e47 [<ffffffff829c479d>] kunit_try_run_case+0xdd/0x250 [<ffffffff829c9f2a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81238b85>] kthread+0x2b5/0x380 [<ffffffff81097ded>] ret_from_fork+0x2d/0x70 [<ffffffff810034d1>] ret_from_fork_asm+0x11/0x20 Fixes: `64a5f972c9` ("fpga: add an initial KUnit suite for the FPGA Region") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Marco Pagani <marpagan@redhat.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20231007094321.3447084-1-ruanjinjie@huawei.com [yilun.xu@intel.com: slightly changes the commit message] Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Link: https://lore.kernel.org/r/20231023032857.902699-3-yilun.xu@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-10-24 19:32:39 +02:00
Russ Weight	1e55c5200d	fpga: m10bmc-sec: Change contact for secure update driver Change the maintainer for the Intel MAX10 BMC Secure Update driver from Russ Weight to Peter Colberg. Update the ABI documentation contact information as well. Signed-off-by: Russ Weight <russell.h.weight@intel.com> Acked-by: Peter Colberg <peter.colberg@intel.com> Link: https://lore.kernel.org/r/20230928164753.278684-1-russell.h.weight@intel.com Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Link: https://lore.kernel.org/r/20231023032857.902699-2-yilun.xu@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-10-24 19:32:39 +02:00
Umesh Nerlige Ramappa	cba94bbcff	drm/i915/perf: Determine context valid in OA reports When supporting OA for TGL, it was seen that the context valid bit in the report ID was not defined, however revisiting the spec seems to have this bit defined. The bit is used to determine if a context is valid on a context switch and is essential to determine active and idle periods for a context. Re-enable the context valid bit for gen12 platforms. BSpec: 52196 (description of report_id) v2: Include BSpec reference (Ashutosh) Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230802202854.1224547-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit `7eeaedf799`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-10-24 09:41:56 -04:00
Peter Zijlstra	a71ef31485	perf/core: Fix potential NULL deref Smatch is awesome. Fixes: `32671e3799` ("perf: Disallow mis-matched inherited group reads") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2023-10-24 12:15:12 +02:00
Paolo Abeni	cd8892c078	Merge branch 'gtp-tunnel-driver-fixes' Pablo Neira Ayuso says: ==================== GTP tunnel driver fixes The following patchset contains two fixes for the GTP tunnel driver: 1) Incorrect GTPA_MAX definition in UAPI headers. This is updating an existing UAPI definition but for a good reason, this is certainly broken. Similar fixes for incorrect _MAX definition in netlink headers were applied in the past too. 2) Fix GTP driver PMTU with GRO packets, add missing call to skb_gso_validate_network_len() to handle GRO packets. ==================== Link: https://lore.kernel.org/r/20231022202519.659526-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-24 12:02:04 +02:00
Pablo Neira Ayuso	4530e5b8e2	gtp: fix fragmentation needed check with gso Call skb_gso_validate_network_len() to check if packet is over PMTU. Fixes: `459aa660eb` ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-24 12:02:02 +02:00
Pablo Neira Ayuso	adc8df12d9	gtp: uapi: fix GTPA_MAX Subtract one to __GTPA_MAX, otherwise GTPA_MAX is off by 2. Fixes: `459aa660eb` ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-24 12:02:02 +02:00
Ian Kent	d3c5006176	autofs: fix add autofs_parse_fd() We are seeing systemd hang on its autofs direct mount at /proc/sys/fs/binfmt_misc. Historically this was due to a mismatch in the communication structure size between a 64 bit kernel and a 32 bit user space and was fixed by making the pipe communication record oriented. During autofs v5 development I decided to stay with the existing usage instead of changing to a packed structure for autofs <=> user space communications which turned out to be a mistake on my part. Problems arose and they were fixed by allowing for the 64 bit to 32 bit size difference in the automount(8) code. Along the way systemd started to use autofs and eventually encountered this problem too. systemd refused to compensate for the length difference insisting it be fixed in the kernel. Fortunately Linus implemented the packetized pipe which resolved the problem in a straight forward and simple way. In the autofs mount api conversion series I inadvertatly dropped the packet pipe flag settings when adding the autofs_parse_fd() function. This patch fixes that omission. Fixes: `546694b8f6` ("autofs: add autofs_parse_fd()") Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20231023093359.64265-1-raven@themaw.net Tested-by: Anders Roxell <anders.roxell@linaro.org> Cc: Bill O'Donnell <bodonnel@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 11:04:45 +02:00
Anjali Kulkarni	9644bc4970	Fix NULL pointer dereference in cn_filter() Check that sk_user_data is not NULL, else return from cn_filter(). Could not reproduce this issue, but Oliver Sang verified it has fixed the "Closes" problem below. Fixes: `2aa1f7a1f4` ("connector/cn_proc: Add filtering to fix some bugs") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202309201456.84c19e27-oliver.sang@intel.com/ Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Link: https://lore.kernel.org/r/20231020234058.2232347-1-anjali.k.kulkarni@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-24 10:53:45 +02:00
Bernd Schubert	c04d905f6c	vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups The calling code actually handles -ECHILD, so this BUG_ON can be converted to WARN_ON_ONCE. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Link: https://lore.kernel.org/r/20231023184718.11143-1-bschubert@ddn.com Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 10:51:05 +02:00
Peter Zijlstra	984ffb6a43	sched/fair: Remove SIS_PROP SIS_UTIL seems to work well, lets remove the old thing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20231020134337.GD33965@noisy.programming.kicks-ass.net	2023-10-24 10:38:44 +02:00
Yicong Yang	22165f61d0	sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup Chen Yu reports a hackbench regression of cluster wakeup when hackbench threads equal to the CPU number [1]. Analysis shows it's because we wake up more on the target CPU even if the prev_cpu is a good wakeup candidate and leads to the decrease of the CPU utilization. Generally if the task's prev_cpu is idle we'll wake up the task on it without scanning. On cluster machines we'll try to wake up the task in the same cluster of the target for better cache affinity, so if the prev_cpu is idle but not sharing the same cluster with the target we'll still try to find an idle CPU within the cluster. This will improve the performance at low loads on cluster machines. But in the issue above, if the prev_cpu is idle but not in the cluster with the target CPU, we'll try to scan an idle one in the cluster. But since the system is busy, we're likely to fail the scanning and use target instead, even if the prev_cpu is idle. Then leads to the regression. This patch solves this in 2 steps: o record the prev_cpu/recent_used_cpu if they're good wakeup candidates but not sharing the cluster with the target. o on scanning failure use the prev_cpu/recent_used_cpu if they're recorded as idle [1] https://lore.kernel.org/all/ZGzDLuVaHR1PAYDt@chenyu5-mobl1/ Closes: https://lore.kernel.org/all/ZGsLy83wPIpamy6x@chenyu5-mobl1/ Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Tested-and-reviewed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20231019033323.54147-4-yangyicong@huawei.com	2023-10-24 10:38:43 +02:00
Barry Song	8881e1639f	sched/fair: Scan cluster before scanning LLC in wake-up path For platforms having clusters like Kunpeng920, CPUs within the same cluster have lower latency when synchronizing and accessing shared resources like cache. Thus, this patch tries to find an idle cpu within the cluster of the target CPU before scanning the whole LLC to gain lower latency. This will be implemented in 2 steps in select_idle_sibling(): 1. When the prev_cpu/recent_used_cpu are good wakeup candidates, use them if they're sharing cluster with the target CPU. Otherwise trying to scan for an idle CPU in the target's cluster. 2. Scanning the cluster prior to the LLC of the target CPU for an idle CPU to wakeup. Testing has been done on Kunpeng920 by pinning tasks to one numa and two numa. On Kunpeng920, Each numa has 8 clusters and each cluster has 4 CPUs. With this patch, We noticed enhancement on tbench and netperf within one numa or cross two numa on top of tip-sched-core commit 9b46f1abc6d4 ("sched/debug: Print 'tgid' in sched_show_task()") tbench results (node 0): baseline patched 1: 327.2833 372.4623 ( 13.80%) 4: 1320.5933 1479.8833 ( 12.06%) 8: 2638.4867 2921.5267 ( 10.73%) 16: 5282.7133 5891.5633 ( 11.53%) 32: 9810.6733 9877.3400 ( 0.68%) 64: 7408.9367 7447.9900 ( 0.53%) 128: 6203.2600 6191.6500 ( -0.19%) tbench results (node 0-1): baseline patched 1: 332.0433 372.7223 ( 12.25%) 4: 1325.4667 1477.6733 ( 11.48%) 8: 2622.9433 2897.9967 ( 10.49%) 16: 5218.6100 5878.2967 ( 12.64%) 32: 10211.7000 11494.4000 ( 12.56%) 64: 13313.7333 16740.0333 ( 25.74%) 128: 13959.1000 14533.9000 ( 4.12%) netperf results TCP_RR (node 0): baseline patched 1: 76546.5033 90649.9867 ( 18.42%) 4: 77292.4450 90932.7175 ( 17.65%) 8: 77367.7254 90882.3467 ( 17.47%) 16: 78519.9048 90938.8344 ( 15.82%) 32: 72169.5035 72851.6730 ( 0.95%) 64: 25911.2457 25882.2315 ( -0.11%) 128: 10752.6572 10768.6038 ( 0.15%) netperf results TCP_RR (node 0-1): baseline patched 1: 76857.6667 90892.2767 ( 18.26%) 4: 78236.6475 90767.3017 ( 16.02%) 8: 77929.6096 90684.1633 ( 16.37%) 16: 77438.5873 90502.5787 ( 16.87%) 32: 74205.6635 88301.5612 ( 19.00%) 64: 69827.8535 71787.6706 ( 2.81%) 128: 25281.4366 25771.3023 ( 1.94%) netperf results UDP_RR (node 0): baseline patched 1: 96869.8400 110800.8467 ( 14.38%) 4: 97744.9750 109680.5425 ( 12.21%) 8: 98783.9863 110409.9637 ( 11.77%) 16: 99575.0235 110636.2435 ( 11.11%) 32: 95044.7250 97622.8887 ( 2.71%) 64: 32925.2146 32644.4991 ( -0.85%) 128: 12859.2343 12824.0051 ( -0.27%) netperf results UDP_RR (node 0-1): baseline patched 1: 97202.4733 110190.1200 ( 13.36%) 4: 95954.0558 106245.7258 ( 10.73%) 8: 96277.1958 105206.5304 ( 9.27%) 16: 97692.7810 107927.2125 ( 10.48%) 32: 79999.6702 103550.2999 ( 29.44%) 64: 80592.7413 87284.0856 ( 8.30%) 128: 27701.5770 29914.5820 ( 7.99%) Note neither Kunpeng920 nor x86 Jacobsville supports SMT, so the SMT branch in the code has not been tested but it supposed to work. Chen Yu also noticed this will improve the performance of tbench and netperf on a 24 CPUs Jacobsville machine, there are 4 CPUs in one cluster sharing L2 Cache. [https://lore.kernel.org/lkml/Ytfjs+m1kUs0ScSn@worktop.programming.kicks-ass.net] Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-and-reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lkml.kernel.org/r/20231019033323.54147-3-yangyicong@huawei.com	2023-10-24 10:38:43 +02:00
Barry Song	b95303e0ae	sched: Add cpus_share_resources API Add cpus_share_resources() API. This is the preparation for the optimization of select_idle_cpu() on platforms with cluster scheduler level. On a machine with clusters cpus_share_resources() will test whether two cpus are within the same cluster. On a non-cluster machine it will behaves the same as cpus_share_cache(). So we use "resources" here for cache resources. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-and-reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20231019033323.54147-2-yangyicong@huawei.com	2023-10-24 10:38:42 +02:00
Hao Jia	5ebde09d91	sched/core: Fix RQCF_ACT_SKIP leak Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning. This warning may be triggered in the following situations: CPU0 CPU1 __schedule() rq->clock_update_flags <<= 1; unregister_fair_sched_group() pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth() newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) i=0 rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle() raw_spin_rq_unlock(this_rq) rq_lock(CPU0_rq, &rf) rq_clock_start_loop_update() rq->clock_update_flags & RQCF_ACT_SKIP <-- raw_spin_rq_lock(this_rq) The purpose of RQCF_ACT_SKIP is to skip the update rq clock, but the update is very early in __schedule(), but we clear RQCF__SKIP very late, causing it to span that gap above and triggering this warning. In __schedule() we can clear the RQCF__SKIP flag immediately after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning. And set rq->clock_update_flags to RQCF_UPDATED to avoid rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later. Fixes: `ebb83d84e4` ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com Reported-by: Igor Raits <igor.raits@gmail.com> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com	2023-10-24 10:38:42 +02:00
Linus Torvalds	d88520ad73	fix for lock_rename() misuse in nfsd -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZS4N8wAKCRBZ7Krx/gZQ 65q9AQDhucfo26czFALs6aOceZ1K+FUu3OzgU0gbQaCCLhuubwD/Uu3GXL2KrVaj uMk7Wv6a68/j1VXwtNMpSb0MV09j/wM= =xKoB -----END PGP SIGNATURE----- Merge tag 'pull-nfsd-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nfsd fix from Al Viro: "Catch from lock_rename() audit; nfsd_rename() checked that both directories belonged to the same filesystem, but only after having done lock_rename(). Trivial fix, tested and acked by nfs folks" * tag 'pull-nfsd-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: lock_rename() needs both directories to live on the same fs	2023-10-23 20:40:04 -10:00
Linus Torvalds	84186fcb83	Urgent pull request for nolibc into v6.6 This pull request contains the following fixes: o tools/nolibc: i386: Fix a stack misalign bug on _start o MAINTAINERS: nolibc: update tree location o tools/nolibc: mark start_c as weak to avoid linker errors -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmUtvC8THHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jMyED/9nsHjSYKUvzdn8kb8Xjr+OkUlx6DCl ITRqAScxl/Q+TTAKTSL508b/fVB56+h0ZmqOHeV0+askVI9c3G2wmLYCJ06P1bpI siy6pqBtcaDVvU38ielbAVYtAeSahj9Jro44gCwBD9OE2TPi4ehl7PMIsX1vG39a hmlbOSw3GG6jFHc5HkTlrOiOy1UB7oIPFI7qfH0XsKJ35vvmDSWPpiHIGwZyx3iv hInVPV4kEBREAXONjru7Ginn9dnxZXFqOwQeqW3ZfYudDUHeKzLMrtYsE6pqZxRP UEyFiI6bhr2fDPoXHGYSm63OrYAm3uZ0jsgC72ZjrLH2ISY0oCuGGf6HP5SjkRfP jPcqMD5h3K+aEHLN3XZV0v3FwelE3qjHVcDQhpu+nJCNDlWK3DNMOjwadynqDOHJ 5FIbusDk5h4rCgOh607zuRPBv0EmtLw3oXGzBzLl8bqRaj58iZP1Te+tnGZEYVMj YydEqXPlKWNaSeBL82gyCpWneYT+vdMjJdbl9b/EKXQxogYLFx4yC4z3h+7K7G56 InDXbxBu0BIMYMgJTWn7nsBgdjenho4PUrper3v6VMr6TKuXuEzmhpqgovaHLM1g ITmdl/+ExPBUBI8u2s1qqLIdEFe5SXkOhFpYnC1E7RsS+GRCho9XCmtfcaPWfiU5 KcFVXKBlIJ4cNQ== =PDcP -----END PGP SIGNATURE----- Merge tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc fixes from Paul McKenney: - tools/nolibc: i386: Fix a stack misalign bug on _start - MAINTAINERS: nolibc: update tree location - tools/nolibc: mark start_c as weak to avoid linker errors * tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/nolibc: mark start_c as weak MAINTAINERS: nolibc: update tree location tools/nolibc: i386: Fix a stack misalign bug on _start	2023-10-23 14:19:11 -10:00
Pieter Jansen van Vuuren	d788c93383	sfc: cleanup and reduce netlink error messages Reduce the length of netlink error messages as they are likely to be truncated anyway. Additionally, reword netlink error messages so they are more consistent with previous messages. Fixes: `9dbc8d2b9a` ("sfc: add decrement ipv6 hop limit by offloading set hop limit actions") Fixes: `3c9561c0a5` ("sfc: support TC decap rules matching on enc_ip_tos") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310202136.4u7bv0hp-lkp@intel.com/ Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20231020140149.30490-1-pieter.jansen-van-vuuren@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-23 15:47:47 -07:00
Arnd Bergmann	291c0d3a98	mvebu fixes for 6.6 (part 1) Update MAINTAINERS for eDPU board -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQQYqXDMF3cvSLY+g9cLBhiOFHI71QUCZTFFCwAKCRALBhiOFHI7 1RJkAKCMO2Kg8LWDDOnwcfdCkc20dXiw6gCgjCtxy0ZVkJ9Lh+OKE42KcfTUpsY= =pipD -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmU2wycACgkQYKtH/8kJ UidjHg/9HmfBoFlOALtHBhJlgMN21t5ANFsmP7V2KXd+20QYX1L6S5PVCS1Fv1Vm NCZPY4xrLfbX4VRKUfJkFO716ReT2I59vdXeLhbtIT2c7WgY3Ne5MaD4jQWb2e50 sX/Rgwj7pt7gdzcQVq10yvqRD6zLfsbDQqKwbcmfQ5YSRIQgTeuZt27u+nza7dDQ xMiMgKmjlycoGM3cwkK1Zx1orIH7Gf5OOK6+XljaAseOyx0WD/3A44IkmZxq40SA AVIPRyP3tx3ozuaYAAS0KzkdvSMWpYxU4gg++VWCE94tz6UqRHEPKFi0h0ZK+5wk jufjDgav6ngR84N2DZiXnUnZQ/qvZDNs8N+O1nMq3Mf6YZNKo4B+M5lFRaFEh5SZ ywBKxumrzC2FecImWPfp6581Fh15VlmCOeFTYuVi1s/n0I1sPNbzwUXzrCVTVKd+ gQsnpwBh2exswHcOruAd74I+HeDCKBC3jv6vVE9bRshSV2ODeFNxdJR5YUl7reQM Y7NaWibXxg95UgTNn0FtVcpqI1smvXsZ6bpaLkZu/wA6sxtScSPfog4wjvu/Hm0s L9drJ7sW0JjGDyb2Sc9q+bogvBSAZDtYp/AgJyaIaINj8ECrXy7MY0QoambW2o5X 7Ta4RPHPPIJcnzke6CBjAyXsDCfTBGQBbKJSJfPQip7l6QvYS2c= =+uXX -----END PGP SIGNATURE----- Merge tag 'mvebu-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/fixes mvebu fixes for 6.6 (part 1) Update MAINTAINERS for eDPU board * tag 'mvebu-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: MAINTAINERS: uDPU: add remaining Methode boards MAINTAINERS: uDPU: make myself maintainer of it Link: https://lore.kernel.org/r/875y32abqe.fsf@BL-laptop Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-10-23 21:01:59 +02:00
Linus Torvalds	e017769f4c	for-6.6-rc7-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmU2lLEACgkQxWXV+ddt WDvCThAApe+zMNdEhQ/cgrvfzP/X91Q53PXQsdVsrujPyUV8eEV4oJzEwVbJhRdw 3ukIQtvyAMNiWhEBhOQRwxjuUoTCApGAeEEEl1cWWEqQ7G2/2LS4+bcWzgQ3Vu32 dzYL37ddsfe4n7OgfnymtMrnv7kge0XbAlY3GbavaDccZDQDqcD5wSAOyOhfIsH7 kcu4sA5Fi44wVSfAJX1Dms+wXfsmQu/sd3c9Gcyce9Hpy1cEW3vWbApLBE4K0aKX /JHTdmkAJ20a4APQsfGH+UymyuZgr8d2eGmL9rVYKhT/c+Dow0lNAWYkvGf/MawM CX3GdP6f6ZOR/anCPZ8nqZCE5AoFykGazvpCCSrvCOpU7o7GqxbAQkWWFcMp1FHW 9TFrj81WK18DeCfCNw7lR3sdMy/2o2nnSUAw3DFY4n/3Lek7FUmrBTHvXlWDot7T TM9CzYGF840QhL5s5SMYS09YmeI0I34L7HJAi/+qli48SooGuL9RZ29TmzHIX69Y 2bgpS64j06p/AGEnfHAcT1LbpiFCPmO5cpXKv/t40GL5QO5d4WV698ysDGoPYUPO 8CPL85Y8cao56KGJLyOroGz0P1bo+RdNe5bN6xJJoTRn1Y9oUA+bQSnN8x9iuunF 9QZrAIHzNyDcRGzoqgDW+3bivOvIus/Dto/u1P3ap68kP2HTVsY= =gOyi -----END PGP SIGNATURE----- Merge tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix for a problem with snapshot of a newly created subvolume that can lead to inconsistent data under some circumstances. Kernel 6.5 added a performance optimization to skip transaction commit for subvolume creation but this could end up with newer data on disk but not linked to other structures. The fix itself is an added condition, the rest of the patch is a parameter added to several functions" * tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix unwritten extent buffer after snapshotting a new subvolume	2023-10-23 07:59:13 -10:00
Linus Torvalds	7c14564010	virtio: last minute fixes a collection of small fixes that look like worth having in this release. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUv/JMPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpO/kH/j/uunE6oOE/BhtfO1USciebjRhLJ7lvoAvS OD4/bcA45GRGLGIZaJtkcCIOOb9djUWLsS3QqA2UUFX+NN2/teEX6lsnv1tJTjdC a2DkDS6AVYwp+rpzxSE5PUn/ImpiDt0/+R0ZbN56R3rHTOl7nFeXvutMbzxNXZvL eWLcSDmRg7nmAdF+YbZ5omdgSL11Wi+dBFEJ0unEsecyu8pO7WcAGYvU6x/x04XJ uLrjsaAGKr3rtoLLZ1DtnSmoED/b/lwDwzVR5REsg4kf2aiHxj1+kKGNXfrtqMl5 2OVxZEorcLufHM212LW4KT3Ncw4KE4xJzjt2mzEwO/ztgtomnBM= =Rhxy -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "A collection of small fixes that look like worth having in this release" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: fix the common cfg map size virtio-crypto: handle config changed by work queue vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE vdpa/mlx5: Fix firmware error on creation of 1k VQs virtio_balloon: Fix endless deflation and inflation on arm64 vdpa/mlx5: Fix double release of debugfs entry virtio-mmio: fix memory leak of vm_dev vdpa_sim_blk: Fix the potential leak of mgmt_dev tools/virtio: Add dma sync api for virtio test	2023-10-23 07:42:48 -10:00
Shubhrajyoti Datta	6f15b178cd	EDAC/versal: Add a Xilinx Versal memory controller driver Add a EDAC driver for the RAS capabilities on the Xilinx integrated DDR Memory Controllers (DDRMCs) which support both DDR4 and LPDDR4/4X memory interfaces. It has four programmable Network-on-Chip (NoC) interface ports and is designed to handle multiple streams of traffic. The driver reports correctable and uncorrectable errors, and also creates debugfs entries for testing through error injection. [ bp: - Add a pointer to the documentation about the register unlock code. - Squash in a fix for a Smatch static checker issue as reported by Dan Carpenter: https://lore.kernel.org/r/a4db6f93-8e5f-4d55-a7b8-b5a987d48a58@moroto.mountain ] Co-developed-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231005101242.14621-3-shubhrajyoti.datta@amd.com	2023-10-23 19:41:27 +02:00
Moritz Wanzenböck	7798b59409	net/handshake: fix file ref count in handshake_nl_accept_doit() If req->hr_proto->hp_accept() fail, we call fput() twice: Once in the error path, but also a second time because sock->file is at that point already associated with the file descriptor. Once the task exits, as it would probably do after receiving an error reading from netlink, the fd is closed, calling fput() a second time. To fix, we move installing the file after the error path for the hp_accept() call. In the case of errors we simply put the unused fd. In case of success we can use fd_install() to link the sock->file to the reserved fd. Fixes: `7ea9c1ec66` ("net/handshake: Fix handshake_dup() ref counting") Signed-off-by: Moritz Wanzenböck <moritz.wanzenboeck@linbit.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20231019125847.276443-1-moritz.wanzenboeck@linbit.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-23 10:19:33 -07:00
Will Deacon	60fd39af33	scripts/faddr2line: Skip over mapping symbols in output from readelf Mapping symbols emitted in the readelf output can confuse the 'faddr2line' symbol size calculation, resulting in the erroneous rejection of valid offsets. This is especially prevalent when building an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are prefixed with a 32-bit data value in a '$d.n' section. For example: 447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall 104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73 106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75 111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78 112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79 36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process Adding a warning to do_one_initcall() results in: \| WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260 Which 'faddr2line' refuses to accept: $ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260 skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224) no match for do_one_initcall+0xf4/0x260 Filter out these entries from readelf using a shell reimplementation of is_mapping_symbol(), so that the size of a symbol is calculated as a delta to the next symbol present in ksymtab. Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20231002165750.1661-4-will@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>	2023-10-23 08:36:46 -07:00
Will Deacon	86bf86e19d	scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1 GNU utilities cannot necessarily parse objects built by LLVM, which can result in confusing errors when using 'faddr2line': $ CROSS_COMPILE=aarch64-linux-gnu- ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260 aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn' aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25 do_one_initcall+0xf4/0x260: aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn' aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25 $x.73 at main.c:? Although this can be worked around by setting CROSS_COMPILE to "llvm=-", it's cleaner to follow the same syntax as the top-level Makefile and accept LLVM= as an indication to use the llvm- tools, optionally specifying their location or specific version number. Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20231002165750.1661-3-will@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>	2023-10-23 08:36:33 -07:00
Will Deacon	180af1a5bd	scripts/faddr2line: Don't filter out non-function symbols from readelf As Josh points out in 20230724234734.zy67gm674vl3p3wv@treble: > Problem is, I think the kernel's symbol printing code prints the > nearest kallsyms symbol, and there are some valid non-FUNC code > symbols. For example, syscall_return_via_sysret. so we shouldn't be considering only 'FUNC'-type symbols in the output from readelf. Drop the function symbol type filtering from the faddr2line outer loop. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230724234734.zy67gm674vl3p3wv@treble Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231002165750.1661-2-will@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>	2023-10-23 08:35:01 -07:00
Filipe Manana	eb96e22193	btrfs: fix unwritten extent buffer after snapshotting a new subvolume When creating a snapshot of a subvolume that was created in the current transaction, we can end up not persisting a dirty extent buffer that is referenced by the snapshot, resulting in IO errors due to checksum failures when trying to read the extent buffer later from disk. A sequence of steps that leads to this is the following: 1) At ioctl.c:create_subvol() we allocate an extent buffer, with logical address 36007936, for the leaf/root of a new subvolume that has an ID of 291. We mark the extent buffer as dirty, and at this point the subvolume tree has a single node/leaf which is also its root (level 0); 2) We no longer commit the transaction used to create the subvolume at create_subvol(). We used to, but that was recently removed in commit `1b53e51a4a` ("btrfs: don't commit transaction for every subvol create"); 3) The transaction used to create the subvolume has an ID of 33, so the extent buffer 36007936 has a generation of 33; 4) Several updates happen to subvolume 291 during transaction 33, several files created and its tree height changes from 0 to 1, so we end up with a new root at level 1 and the extent buffer 36007936 is now a leaf of that new root node, which is extent buffer 36048896. The commit root remains as 36007936, since we are still at transaction 33; 5) Creation of a snapshot of subvolume 291, with an ID of 292, starts at ioctl.c:create_snapshot(). This triggers a commit of transaction 33 and we end up at transaction.c:create_pending_snapshot(), in the critical section of a transaction commit. There we COW the root of subvolume 291, which is extent buffer 36048896. The COW operation returns extent buffer 36048896, since there's no need to COW because the extent buffer was created in this transaction and it was not written yet. The we call btrfs_copy_root() against the root node 36048896. During this operation we allocate a new extent buffer to turn into the root node of the snapshot, copy the contents of the root node 36048896 into this snapshot root extent buffer, set the owner to 292 (the ID of the snapshot), etc, and then we call btrfs_inc_ref(). This will create a delayed reference for each leaf pointed by the root node with a reference root of 292 - this includes a reference for the leaf 36007936. After that we set the bit BTRFS_ROOT_FORCE_COW in the root's state. Then we call btrfs_insert_dir_item(), to create the directory entry in in the tree of subvolume 291 that points to the snapshot. This ends up needing to modify leaf 36007936 to insert the respective directory items. Because the bit BTRFS_ROOT_FORCE_COW is set for the root's state, we need to COW the leaf. We end up at btrfs_force_cow_block() and then at update_ref_for_cow(). At update_ref_for_cow() we call btrfs_block_can_be_shared() which returns false, despite the fact the leaf 36007936 is shared - the subvolume's root and the snapshot's root point to that leaf. The reason that it incorrectly returns false is because the commit root of the subvolume is extent buffer 36007936 - it was the initial root of the subvolume when we created it. So btrfs_block_can_be_shared() which has the following logic: int btrfs_block_can_be_shared(struct btrfs_root root, struct extent_buffer buf) { if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && buf != root->node && buf != root->commit_root && (btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) \|\| btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC))) return 1; return 0; } Returns false (0) since 'buf' (extent buffer 36007936) matches the root's commit root. As a result, at update_ref_for_cow(), we don't check for the number of references for extent buffer 36007936, we just assume it's not shared and therefore that it has only 1 reference, so we set the local variable 'refs' to 1. Later on, in the final if-else statement at update_ref_for_cow(): static noinline int update_ref_for_cow(struct btrfs_trans_handle trans, struct btrfs_root root, struct extent_buffer buf, struct extent_buffer cow, int last_ref) { (...) if (refs > 1) { (...) } else { (...) btrfs_clear_buffer_dirty(trans, buf); last_ref = 1; } } So we mark the extent buffer 36007936 as not dirty, and as a result we don't write it to disk later in the transaction commit, despite the fact that the snapshot's root points to it. Attempting to access the leaf or dumping the tree for example shows that the extent buffer was not written: $ btrfs inspect-internal dump-tree -t 292 /dev/sdb btrfs-progs v6.2.2 file tree key (292 ROOT_ITEM 33) node 36110336 level 1 items 2 free space 119 generation 33 owner 292 node 36110336 flags 0x1(WRITTEN) backref revision 1 checksum stored a8103e3e checksum calced a8103e3e fs uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 chunk uuid e8c9c885-78f4-4d31-85fe-89e5f5fd4a07 key (256 INODE_ITEM 0) block 36007936 gen 33 key (257 EXTENT_DATA 0) block 36052992 gen 33 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 total bytes 107374182400 bytes used 38572032 uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 The respective on disk region is full of zeroes as the device was trimmed at mkfs time. Obviously 'btrfs check' also detects and complains about this: $ btrfs check /dev/sdb Opening filesystem to check... Checking filesystem on /dev/sdb UUID: 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 generation: 33 (33) [1/7] checking root items [2/7] checking extents checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 owner ref check failed [36007936 4096] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 The following tree block(s) is corrupted in tree 292: tree block bytenr: 36110336, level: 1, node key: (256, 1, 0) root 292 root dir 256 not found ERROR: errors found in fs roots found 38572032 bytes used, error(s) found total csum bytes: 16048 total tree bytes: 1265664 total fs tree bytes: 1118208 total extent tree bytes: 65536 btree space waste bytes: 562598 file data blocks allocated: 65978368 referenced 36569088 Fix this by updating btrfs_block_can_be_shared() to consider that an extent buffer may be shared if it matches the commit root and if its generation matches the current transaction's generation. This can be reproduced with the following script: $ cat test.sh #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi # Use a filesystem with a 64K node size so that we have the same node # size on every machine regardless of its page size (on x86_64 default # node size is 16K due to the 4K page size, while on PPC it's 64K by # default). This way we can make sure we are able to create a btree for # the subvolume with a height of 2. mkfs.btrfs -f -n 64K $DEV mount $DEV $MNT btrfs subvolume create $MNT/subvol # Create a few empty files on the subvolume, this bumps its btree # height to 2 (root node at level 1 and 2 leaves). for ((i = 1; i <= 300; i++)); do echo -n > $MNT/subvol/file_$i done btrfs subvolume snapshot -r $MNT/subvol $MNT/subvol/snap umount $DEV btrfs check $DEV Running it on a 6.5 kernel (or any 6.6-rc kernel at the moment): $ ./test.sh Create subvolume '/mnt/sdi/subvol' Create a readonly snapshot of '/mnt/sdi/subvol' in '/mnt/sdi/subvol/snap' Opening filesystem to check... Checking filesystem on /dev/sdi UUID: bbdde2ff-7d02-45ca-8a73-3c36f23755a1 [1/7] checking root items [2/7] checking extents parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure owner ref check failed [30539776 65536] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure Wrong key of child node/leaf, wanted: (256, 1, 0), have: (2, 132, 0) Wrong generation of child node/leaf, wanted: 5, have: 7 root 257 root dir 256 not found ERROR: errors found in fs roots found 917504 bytes used, error(s) found total csum bytes: 0 total tree bytes: 851968 total fs tree bytes: 393216 total extent tree bytes: 65536 btree space waste bytes: 736550 file data blocks allocated: 0 referenced 0 A test case for fstests will follow soon. Fixes: `1b53e51a4a` ("btrfs: don't commit transaction for every subvol create") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-23 17:17:30 +02:00
Christian König	4984fc578a	drm/amdkfd: reserve a fence slot while locking the BO Looks like the KFD still needs this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `8abc1eb298` ("drm/amdkfd: switch over to using drm_exec v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com	2023-10-23 14:48:47 +02:00
Michael Ellerman	daa9ada209	powerpc/mm: Fix boot crash with FLATMEM Erhard reported that his G5 was crashing with v6.6-rc kernels: mpic: Setting up HT PICs workarounds for U3/U4 BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe Faulting instruction address: 0xc00000000005dc40 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 6.6.0-rc3-PMacGS #1 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c00000000005dc40 LR: c000000000066660 CTR: c000000000007730 REGS: c0000000022bf510 TRAP: 0380 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44004242 XER: 00000000 IRQMASK: 3 GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300 GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000 GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8 GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000 GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000 GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac NIP hash_page_do_lazy_icache+0x50/0x100 LR __hash_page_4K+0x420/0x590 Call Trace: hash_page_mm+0x364/0x6f0 do_hash_fault+0x114/0x2b0 data_access_common_virt+0x198/0x1f0 --- interrupt: 300 at mpic_init+0x4bc/0x10c4 NIP: c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000 REGS: c0000000022bf9f0 TRAP: 0300 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 24004248 XER: 00000000 DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1 ... NIP mpic_init+0x4bc/0x10c4 LR mpic_init+0x464/0x10c4 --- interrupt: 300 pmac_setup_one_mpic+0x258/0x2dc pmac_pic_init+0x28c/0x3d8 init_IRQ+0x90/0x140 start_kernel+0x57c/0x78c start_here_common+0x1c/0x20 A bisect pointed to the breakage beginning with commit `9fee28baa6` ("powerpc: implement the new page table range API"). Analysis of the oops pointed to a struct page with a corrupted compound_head being loaded via page_folio() -> _compound_head() in hash_page_do_lazy_icache(). The access by the mpic code is to an MMIO address, so the expectation is that the struct page for that address would be initialised by init_unavailable_range(), as pointed out by Aneesh. Instrumentation showed that was not the case, which eventually lead to the realisation that pfn_valid() was returning false for that address, causing the struct page to not be initialised. Because the system is using FLATMEM, the version of pfn_valid() in memory_model.h is used: static inline int pfn_valid(unsigned long pfn) { ... return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr; } Which relies on max_mapnr being initialised. Early in boot max_mapnr is zero meaning no PFNs are valid. max_mapnr is initialised in mem_init() called via: start_kernel() mm_core_init() # init/main.c:928 mem_init() But that is too late for the usage in init_unavailable_range() called via: start_kernel() setup_arch() # init/main.c:893 paging_init() free_area_init() init_unavailable_range() Although max_mapnr is currently set in mem_init(), the value is actually already available much earlier, as soon as mem_topology_setup() has completed, which is also before paging_init() is called. So move the initialisation there, which causes paging_init() to correctly initialise the struct page and fixes the bug. This bug seems to have been lurking for years, but went unnoticed because the pre-folio code was inspecting the uninitialised page->flags but not dereferencing it. Thanks to Erhard and Aneesh for help debugging. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au	2023-10-23 22:50:15 +11:00

1 2 3 4 5 ...

1220776 commits