linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-24 19:35:58 +00:00

Author	SHA1	Message	Date
Linus Torvalds	3eab830189	uselib: remove use of __FMODE_EXEC Jann Horn points out that uselib() really shouldn't trigger the new FMODE_EXEC logic introduced by commit `4759ff71f2` ("exec: __FMODE_EXEC instead of in_execve for LSMs"). In fact, it shouldn't even have ever triggered the old pre-existing logic for __FMODE_EXEC (like the NFS code that makes executables not need read permissions). Unlike a real execve(), that can work even with files that are purely executable by the user (not readable), uselib() has that MAY_READ requirement becasue it's really just a convenience wrapper around mmap() for legacy shared libraries. The whole FMODE_EXEC bit was originally introduced by commit `b500531e6f` ("[PATCH] Introduce FMODE_EXEC file flag"), primarily to give ETXTBUSY error returns for distributed filesystems. It has since grown a few other warts (like that NFS thing), but there really isn't any reason to use it for uselib(), and now that we are trying to use it to replace the horrid 'tsk->in_execve' flag, it's actively wrong. Of course, as Jann Horn also points out, nobody should be enabling CONFIG_USELIB in the first place in this day and age, but that's a different discussion entirely. Reported-by: Jann Horn <jannh@google.com> Fixes: `4759ff71f2` ("exec: __FMODE_EXEC instead of in_execve for LSMs") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-01-24 13:12:20 -08:00
Mimi Zohar	1ed4b56310	Revert "KEYS: encrypted: Add check for strsep" This reverts commit `b4af096b5d`. New encrypted keys are created either from kernel-generated random numbers or user-provided decrypted data. Revert the change requiring user-provided decrypted data. Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>	2024-01-24 16:11:59 -05:00
Jenishkumar Maheshbhai Patel	9f538b415d	net: mvpp2: clear BM pool before initialization Register value persist after booting the kernel using kexec which results in kernel panic. Thus clear the BM pool registers before initialisation to fix the issue. Fixes: `3f518509de` ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Jenishkumar Maheshbhai Patel <jpatel2@marvell.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://lore.kernel.org/r/20240119035914.2595665-1-jpatel2@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 12:27:33 -08:00
Bernd Edlinger	a5f5eee282	net: stmmac: Wait a bit for the reset to take effect otherwise the synopsys_id value may be read out wrong, because the GMAC_VERSION register might still be in reset state, for at least 1 us after the reset is de-asserted. Add a wait for 10 us before continuing to be on the safe side. > From what have you got that delay value? Just try and error, with very old linux versions and old gcc versions the synopsys_id was read out correctly most of the time (but not always), with recent linux versions and recnet gcc versions it was read out wrongly most of the time, but again not always. I don't have access to the VHDL code in question, so I cannot tell why it takes so long to get the correct values, I also do not have more than a few hardware samples, so I cannot tell how long this timeout must be in worst case. Experimentally I can tell that the register is read several times as zero immediately after the reset is de-asserted, also adding several no-ops is not enough, adding a printk is enough, also udelay(1) seems to be enough but I tried that not very often, and I have not access to many hardware samples to be 100% sure about the necessary delay. And since the udelay here is only executed once per device instance, it seems acceptable to delay the boot for 10 us. BTW: my hardware's synopsys id is 0x37. Fixes: `c5e4ddbdfa` ("net: stmmac: Add support for optional reset control") Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 12:19:59 -08:00
Linus Torvalds	443b349019	samples/cgroup: add .gitignore file for generated samples Make 'git status' quietly happy again after a full allmodconfig build. Fixes: `60433a9d03` ("samples: introduce new samples subdir for cgroup") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-01-24 11:52:40 -08:00
Kees Cook	90383cc078	exec: Distinguish in_execve from in_exec Just to help distinguish the fs->in_exec flag from the current->in_execve flag, add comments in check_unsafe_exec() and copy_fs() for more context. Also note that in_execve is only used by TOMOYO now. Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>	2024-01-24 11:48:52 -08:00
Kees Cook	4759ff71f2	exec: Check __FMODE_EXEC instead of in_execve for LSMs After commit `978ffcbf00` ("execve: open the executable file before doing anything else"), current->in_execve was no longer in sync with the open(). This broke AppArmor and TOMOYO which depend on this flag to distinguish "open" operations from being "exec" operations. Instead of moving around in_execve, switch to using __FMODE_EXEC, which is where the "is this an exec?" intent is stored. Note that TOMOYO still uses in_execve around cred handling. Reported-by: Kevin Locke <kevin@kevinlocke.name> Closes: https://lore.kernel.org/all/ZbE4qn9_h14OqADK@kevinlocke.name Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: `978ffcbf00` ("execve: open the executable file before doing anything else") Cc: Josh Triplett <josh@joshtriplett.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-mm@kvack.org> Cc: <apparmor@lists.ubuntu.com> Cc: <linux-security-module@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-01-24 11:38:58 -08:00
Pablo Neira Ayuso	d0009effa8	netfilter: nf_tables: validate NFPROTO_* family Several expressions explicitly refer to NF_INET_* hook definitions from expr->ops->validate, however, family is not validated. Bail out with EOPNOTSUPP in case they are used from unsupported families. Fixes: `0ca743a559` ("netfilter: nf_tables: add compatibility layer for x_tables") Fixes: `a3c90f7a23` ("netfilter: nf_tables: flow offload expression") Fixes: `2fa841938c` ("netfilter: nf_tables: introduce routing expression") Fixes: `554ced0a6e` ("netfilter: nf_tables: add support for native socket matching") Fixes: `ad49d86e07` ("netfilter: nf_tables: Add synproxy support") Fixes: `4ed8eb6570` ("netfilter: nf_tables: Add native tproxy support") Fixes: `6c47260250` ("netfilter: nf_tables: add xfrm expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:40 +01:00
Florian Westphal	f342de4e2f	netfilter: nf_tables: reject QUEUE/DROP verdict parameters This reverts commit `e0abdadcc6`. core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar, or 0. Due to the reverted commit, its possible to provide a positive value, e.g. NF_ACCEPT (1), which results in use-after-free. Its not clear to me why this commit was made. NF_QUEUE is not used by nftables; "queue" rules in nftables will result in use of "nft_queue" expression. If we later need to allow specifiying errno values from userspace (do not know why), this has to call NF_DROP_GETERR and check that "err <= 0" holds true. Fixes: `e0abdadcc6` ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters") Cc: stable@vger.kernel.org Reported-by: Notselwyn <notselwyn@pwning.tech> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:39 +01:00
Florian Westphal	b462579b2b	netfilter: nf_tables: restrict anonymous set and map names to 16 bytes nftables has two types of sets/maps, one where userspace defines the name, and anonymous sets/maps, where userspace defines a template name. For the latter, kernel requires presence of exactly one "%d". nftables uses "__set%d" and "__map%d" for this. The kernel will expand the format specifier and replaces it with the smallest unused number. As-is, userspace could define a template name that allows to move the set name past the 256 bytes upperlimit (post-expansion). I don't see how this could be a problem, but I would prefer if userspace cannot do this, so add a limit of 16 bytes for the '%d' template name. 16 bytes is the old total upper limit for set names that existed when nf_tables was merged initially. Fixes: `387454901b` ("netfilter: nf_tables: Allow set names of up to 255 chars") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:30 +01:00
Florian Westphal	c9d9eb9c53	netfilter: nft_limit: reject configurations that cause integer overflow Reject bogus configs where internal token counter wraps around. This only occurs with very very large requests, such as 17gbyte/s. Its better to reject this rather than having incorrect ratelimit. Fixes: `d2168e849e` ("netfilter: nft_limit: add per-byte limiting") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:01:16 +01:00
Pablo Neira Ayuso	01acb2e866	netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER event is reported, otherwise a stale reference to netdevice remains in the hook list. Fixes: `60a3815da7` ("netfilter: add inet ingress support") Cc: stable@vger.kernel.org Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 19:50:21 +01:00
George Guo	b253d87fd7	netfilter: nf_tables: cleanup documentation - Correct comments for nlpid, family, udlen and udata in struct nft_table, and afinfo is no longer a member of enum nft_set_class. - Add comment for data in struct nft_set_elem. - Add comment for flags in struct nft_ctx. - Add comments for timeout in struct nft_set_iter, and flags is not a member of struct nft_set_iter, remove the comment for it. - Add comments for commit, abort, estimate and gc_init in struct nft_set_ops. - Add comments for pending_update, num_exprs, exprs and catchall_list in struct nft_set. - Add comment for ext_len in struct nft_set_ext_tmpl. - Add comment for inner_ops in struct nft_expr_type. - Add comments for clone, destroy_clone, reduce, gc, offload, offload_action, offload_stats in struct nft_expr_ops. - Add comments for blob_gen_0, blob_gen_1, bound, genmask, udlen, udata, blob_next in struct nft_chain. - Add comment for flags in struct nft_base_chain. - Add comments for udlen, udata in struct nft_object. - Add comment for type in struct nft_object_ops. - Add comment for hook_list in struct nft_flowtable, and remove comments for dev_name and ops which are not members of struct nft_flowtable. Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 19:50:20 +01:00
Charles Keepax	f9f4b0c642	spi: cs42l43: Handle error from devm_pm_runtime_enable As it devm_pm_runtime_enable can fail due to memory allocations, it is best to handle the error. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://msgid.link/r/20240124174101.2270249-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>	2024-01-24 18:08:50 +00:00
Frederic Weisbecker	e787644caf	rcu: Defer RCU kthreads wakeup when CPU is dying When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiescent state propagates up to the top, some tasks may then be woken up to complete the grace period: the main grace period kthread and/or the expedited main workqueue (or kworker). If those kthreads have a SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Since this happens after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the timer gets ignored. Therefore if the RCU kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled. This triggers TREE03 rcutorture hangs: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2eb/0xa80 schedule+0x1f/0x90 schedule_timeout+0x163/0x270 ? __pfx_process_timeout+0x10/0x10 rcu_gp_fqs_loop+0x37c/0x5b0 ? __pfx_rcu_gp_kthread+0x10/0x10 rcu_gp_kthread+0x17c/0x200 kthread+0xde/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The situation can't be solved with just unpinning the timer. The hrtimer infrastructure and the nohz heuristics involved in finding the best remote target for an unpinned timer would then also need to handle enqueues from an offline CPU in the most horrendous way. So fix this on the RCU side instead and defer the wake up to an online CPU if it's too late for the local one. Reported-by: Paul E. McKenney <paulmck@kernel.org> Fixes: `5c0930ccaa` ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>	2024-01-24 22:46:17 +05:30
Linus Torvalds	1110ebe058	fbdev fixes and cleanups for 6.8-rc2: - stifb: Fix crash in stifb_blank() - savage/sis: Error out if pixclock equals zero - minor trivial cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZbEw6wAKCRD3ErUQojoP X/8UAQCt7qn3lty18BTgChgYboMNquc0NVTj9cU0+EkwBa4LnAD+ML4QJPIa5HN2 LetnHIXp03dBM6JAR16+H6HIWBDePwA= =whwV -----END PGP SIGNATURE----- Merge tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes and cleanups from Helge Deller: "A crash fix in stifb which was missed to be included in the drm-misc tree, two checks to prevent wrong userspace input in sisfb and savagefb and two trivial printk cleanups: - stifb: Fix crash in stifb_blank() - savage/sis: Error out if pixclock equals zero - minor trivial cleanups" * tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: stifb: Fix crash in stifb_blank() fbcon: Fix incorrect printed function name in fbcon_prepare_logo() fbdev: sis: Error out if pixclock equals zero fbdev: savage: Error out if pixclock equals zero fbdev: vt8500lcdfb: Remove unnecessary print function dev_err()	2024-01-24 08:55:51 -08:00
Hans Verkuil	b32431b753	media: vb2: refactor setting flags and caps, fix missing cap Several functions implementing VIDIOC_REQBUFS and _CREATE_BUFS all use almost the same code to fill in the flags and capability fields. Refactor this into a new vb2_set_flags_and_caps() function that replaces the old fill_buf_caps() and validate_memory_flags() functions. This also fixes a bug where vb2_ioctl_create_bufs() would not set the V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS cap and also not fill in the max_num_buffers field. Fixes: `d055a76c00` ("media: core: Report the maximum possible number of buffers for the queue") Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Acked-by: Tomasz Figa <tfiga@chromium.org>	2024-01-24 17:27:51 +01:00
Benjamin Gaignard	78e23c3e91	media: media videobuf2: Stop direct calls to queue num_buffers field Use vb2_get_num_buffers() to avoid using queue num_buffers field directly. This allows us to change how the number of buffers is computed in the future. Fixes: `d055a76c00` ("media: core: Report the maximum possible number of buffers for the queue") Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Acked-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>	2024-01-24 17:27:51 +01:00
Brandon Brnich	c14d17a325	media: chips-media: wave5: Remove K3 References Change compatible string to match dt bindings for TI devices. K3 family prefix should not be included as it deviates from naming convention. Fixes: `9707a6254a` ("media: chips-media: wave5: Add the v4l2 layer") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdUYOq=q1j=d+Eac28hthOUAaNUkuvxmRu-mUN1pLKq69g@mail.gmail.com/ Signed-off-by: Brandon Brnich <b-brnich@ti.com> Reviewed-by: Nishanth Menon <nm@ti.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>	2024-01-24 17:27:51 +01:00
Brandon Brnich	520d970897	dt-bindings: media: Remove K3 Family Prefix from Compatible K3 family prefix is not included in other TI compatible strings. Remove this prefix to keep naming convention consistent. Fixes: `de4b9f7e37` ("dt-bindings: media: wave5: add yaml devicetree bindings") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdUYOq=q1j=d+Eac28hthOUAaNUkuvxmRu-mUN1pLKq69g@mail.gmail.com/ Signed-off-by: Brandon Brnich <b-brnich@ti.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Nishanth Menon <nm@ti.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>	2024-01-24 17:27:51 +01:00
Thomas Zimmermann	d1b163aa07	Revert "drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync" This reverts commit `60aebc9559`. Commit `60aebc9559` ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync") messes up initialization order of the graphics drivers and leads to blank displays on some systems. So revert the commit. To make the display drivers fully independent from initialization order requires to track framebuffer memory by device and independently from the loaded drivers. The kernel currently lacks the infrastructure to do so. Reported-by: Jaak Ristioja <jaak@ristioja.ee> Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t Reported-by: Huacai Chen <chenhuacai@loongson.cn> Closes: https://lore.kernel.org/dri-devel/20231108024613.2898921-1-chenhuacai@loongson.cn/ Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10133 Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240123120937.27736-1-tzimmermann@suse.de	2024-01-24 17:03:44 +01:00
Maksim Kiselev	e169bd4fb2	aoe: avoid potential deadlock at set_capacity Move set_capacity() outside of the section procected by (&d->lock). To avoid possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- [1] lock(&bdev->bd_size_lock); local_irq_disable(); [2] lock(&d->lock); [3] lock(&bdev->bd_size_lock); <Interrupt> [4] lock(&d->lock); * DEADLOCK * Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity(). [2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc() is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call. In this situation an attempt to acquire [4]lock(&d->lock) from aoecmd_cfg_rsp() will lead to deadlock. So the simplest solution is breaking lock dependency [2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity() outside. Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240124072436.3745720-2-bigunclemax@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-01-24 08:30:43 -07:00
Mark Brown	2f8c7c3715	spi: Raise limit on number of chip selects As reported by Guenter the limit we've got on the number of chip selects is set too low for some systems, raise the limit. We should really remove the hard coded limit but this is needed as a fix so let's do the simple thing and raise the limit for now. Fixes: `4d8ff6b099` ("spi: Add multi-cs memories support in SPI core") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/r/20240124-spi-multi-cs-max-v2-1-df6fc5ab1abc@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>	2024-01-24 15:11:38 +00:00
NeilBrown	edcf972515	nfsd: fix RELEASE_LOCKOWNER The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep. First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour. If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error. The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request. When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID. So clearly it is harmful to allow a false positive, which testing so_count allows. The test is nonsense because ... well... it doesn't mean anything. so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks. When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be. In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail. In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success. The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe. So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'. Fixes: `ce3c4ad7f4` ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-24 09:49:11 -05:00
Dawei Li	b184c8c288	genirq: Initialize resend_node hlist for all interrupt descriptors For a CONFIG_SPARSE_IRQ=n kernel, early_irq_init() is supposed to initialize all interrupt descriptors. It does except for irq_desc::resend_node, which ia only initialized for the first descriptor. Use the indexed decriptor and not the base pointer to address that. Fixes: `bc06a9e087` ("genirq: Use hlist for managing resend handlers") Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122085716.2999875-5-dawei.li@shingroup.cn	2024-01-24 14:15:41 +01:00
Richard Palethorpe	56062d60f1	x86/entry/ia32: Ensure s32 is sign extended to s64 Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended. This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value. This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events. It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up. Fixes: `ebeb8c82ff` ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/	2024-01-24 11:49:19 +01:00
Lucas De Marchi	9e3a13f3ee	drm/xe: Remove PVC from xe_wa kunit tests Since the PCI IDs for PVC weren't added to the xe driver, the xe_wa tests should not try to create a fake PVC device since they can't find the right PCI ID. Fix bugs when running kunit: # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111 Expected ret == 0, but ret == -19 (0xffffffffffffffed) [FAILED] PVC (B0) # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111 Expected ret == 0, but ret == -19 (0xffffffffffffffed) [FAILED] PVC (B1) # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111 Expected ret == 0, but ret == -19 (0xffffffffffffffed) [FAILED] PVC (C0) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240123031242.3548724-1-lucas.demarchi@intel.com (cherry picked from commit `ab5ae65fb2`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:55 +01:00
Moti Haimovski	d186e51b0e	drm/xe/vm: bugfix in xe_vm_create_ioctl Fix xe_vm_create_ioctl routine not freeing the vm-id allocated to it when the function fails. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122102424.4008095-1-mhaimovski@habana.ai (cherry picked from commit `f6bf0424ca`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:41 +01:00
Himal Prasad Ghimiray	c0e2508cb1	drm/xe/xe2: Use XE_CACHE_WB pat index The pat table entry associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is non coherent. Migration expects the coherency with cpu therefore use the coherent entry XE_CACHE_WB for buffers not supporting compression. For read/write to flat ccs region the issue is not related to coherency with cpu. The hardware expects the pat index associated with GPUVA for indirect access to be compression enabled hence use XE_CACHE_NONE_COMPRESSION. v2 - Fix the argument to emit_pte, pass the bool directly. (Thomas) v3 - Rebase - Update commit message (Matt) v4 - Add a Fixes: tag. (Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `65ef8dbad1` ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index") Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com (cherry picked from commit `6a02867560`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:33 +01:00
Lucas De Marchi	981460d8ee	drm/xe/display: Avoid calling readq() readq() is not available in 32bits and i915_gem_object_read_from_page() is supposed to allow reading arbitrary sizes determined by the `size` argument. Currently the only caller only passes a size == 8 so the second problem is not that big. Migrate to calling memcpy()/memcpy_fromio() to allow possible changes in the display side and to fix the build on 32b architectures. v2: Use memcpy/memcpy_fromio directly rather than using iosys-map with the same size == 8 bytes restriction (Matt Roper) Fixes: `44e694958b` ("drm/xe/display: Implement display support") Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-4-lucas.demarchi@intel.com (cherry picked from commit `406663f777`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:25 +01:00
Lucas De Marchi	52e8948c6b	drm/xe/mmio: Cast to u64 when printing resource_size_t uses %pa format in printk since the size varies depending on build options. However to keep the io_size/physical_size addition in the same call we can't pass the address without adding yet another variable in these function. Simply cast it to u64 and keep using %llx. Fixes: `286089ce69` ("drm/xe: Improve vram info debug printing") Cc: Oak Zeng <oak.zeng@intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-3-lucas.demarchi@intel.com (cherry picked from commit `6d8d038364`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:15 +01:00
Lucas De Marchi	32f6c33257	drm/xe: Use _ULL for u64 division Use DIV_ROUND_UP_ULL() so it also works on 32bit build. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-2-lucas.demarchi@intel.com (cherry picked from commit `7b5bdb447b`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:06 +01:00
Thomas Hellström	03b72dbbd4	drm/xe: Use a NULL pointer instead of 0. The last argument of xe_pcode_read() is a pointer. Use NULL instead of 0. Fixes: `92d44a422d` ("drm/xe/hwmon: Expose card reactive critical power") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-6-thomas.hellstrom@linux.intel.com (cherry picked from commit `79f8eacbdf`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:12:59 +01:00
Thomas Hellström	3213b8070a	drm/xe/dmabuf: Make xe_dmabuf_ops static It is not referenced outside of the xe_dma_buf.c source file. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `e2dc52f849`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:12:45 +01:00
Dinghao Liu	aef855df7e	net/mlx5e: fix a potential double-free in fs_any_create_groups When kcalloc() for ft->g succeeds but kvzalloc() for in fails, fs_any_create_groups() will free ft->g. However, its caller fs_any_create_table() will free ft->g again through calling mlx5e_destroy_flow_table(), which will lead to a double-free. Fix this by setting ft->g to NULL in fs_any_create_groups(). Fixes: `0f575c20bf` ("net/mlx5e: Introduce Flow Steering ANY API") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:38 -08:00
Zhipeng Lu	3c6d518924	net/mlx5e: fix a double-free in arfs_create_groups When `in` allocated by kvzalloc fails, arfs_create_groups will free ft->g and return an error. However, arfs_create_table, the only caller of arfs_create_groups, will hold this error and call to mlx5e_destroy_flow_table, in which the ft->g will be freed again. Fixes: `1cabe6b096` ("net/mlx5e: Create aRFS flow tables") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:38 -08:00
Leon Romanovsky	315a597f9b	net/mlx5e: Ignore IPsec replay window values on sender side XFRM stack doesn't prevent from users to configure replay window in TX side and strongswan sets replay_window to be 1. It causes to failures in validation logic when trying to offload the SA. Replay window is not relevant in TX side and should be ignored. Fixes: `cded6d8012` ("net/mlx5e: Store replay window in XFRM attributes") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:37 -08:00
Leon Romanovsky	20f5468a79	net/mlx5e: Allow software parsing when IPsec crypto is enabled All ConnectX devices have software parsing capability enabled, but it is more correct to set allow_swp only if capability exists, which for IPsec means that crypto offload is supported. Fixes: `2451da081a` ("net/mlx5: Unify device IPsec capabilities check") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:37 -08:00
Rahul Rameshbabu	20cbf8cbb8	net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO mlx5 devices have specific constants for choosing the CQ period mode. These constants do not have to match the constants used by the kernel software API for DIM period mode selection. Fixes: `cdd04f4d4d` ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:37 -08:00
Yevgeny Kliteynik	5b2a2523ee	net/mlx5: DR, Can't go to uplink vport on RX rule Go-To-Vport action on RX is not allowed when the vport is uplink. In such case, the packet should be dropped. Fixes: `9db810ed2d` ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:36 -08:00
Yevgeny Kliteynik	5665954293	net/mlx5: DR, Use the right GVMI number for drop action When FW provides ICM addresses for drop RX/TX, the provided capability is 64 bits that contain its GVMI as well as the ICM address itself. In case of TX DROP this GVMI is different from the GVMI that the domain is operating on. This patch fixes the action to use these GVMI IDs, as provided by FW. Fixes: `9db810ed2d` ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:36 -08:00
Moshe Shemesh	ec7cc38ef9	net/mlx5: Bridge, fix multicast packets sent to uplink To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules. Fixes: `18c2916cee` ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:35 -08:00
Yishai Hadas	cc80915877	net/mlx5: Fix a WARN upon a callback command failure The below WARN [1] is reported once a callback command failed. As a callback runs under an interrupt context, needs to use the IRQ save/restore variant. [1] DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()) WARNING: CPU: 15 PID: 0 at kernel/locking/lockdep.c:4353 lockdep_hardirqs_on_prepare+0x11b/0x180 Modules linked in: vhost_net vhost tap mlx5_vfio_pci vfio_pci vfio_pci_core vfio_iommu_type1 vfio mlx5_vdpa vringh vhost_iotlb vdpa nfnetlink_cttimeout openvswitch nsh ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrackxt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 PID: 0 Comm: swapper/15 Tainted: G W 6.7.0-rc4+ #1587 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:lockdep_hardirqs_on_prepare+0x11b/0x180 Code: 00 5b c3 c3 e8 e6 0d 58 00 85 c0 74 d6 8b 15 f0 c3 76 01 85 d2 75 cc 48 c7 c6 04 a5 3b 82 48 c7 c7 f1 e9 39 82 e8 95 12 f9 ff <0f> 0b 5b c3 e8 bc 0d 58 00 85 c0 74 ac 8b 3d c6 c3 76 01 85 ff 75 RSP: 0018:ffffc900003ecd18 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffff88885fbdb880 RDI: ffff88885fbdb888 RBP: 00000000ffffff87 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 284e4f5f4e524157 R12: 00000000002c9aa1 R13: ffff88810aace980 R14: ffff88810aace9b8 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88885fbc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f731436f4c8 CR3: 000000010aae6001 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? __warn+0x81/0x170 ? lockdep_hardirqs_on_prepare+0x11b/0x180 ? report_bug+0xf8/0x1c0 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? lockdep_hardirqs_on_prepare+0x11b/0x180 ? lockdep_hardirqs_on_prepare+0x11b/0x180 trace_hardirqs_on+0x4a/0xa0 raw_spin_unlock_irq+0x24/0x30 cmd_status_err+0xc0/0x1a0 [mlx5_core] cmd_status_err+0x1a0/0x1a0 [mlx5_core] mlx5_cmd_exec_cb_handler+0x24/0x40 [mlx5_core] mlx5_cmd_comp_handler+0x129/0x4b0 [mlx5_core] cmd_comp_notifier+0x1a/0x20 [mlx5_core] notifier_call_chain+0x3e/0xe0 atomic_notifier_call_chain+0x5f/0x130 mlx5_eq_async_int+0xe7/0x200 [mlx5_core] notifier_call_chain+0x3e/0xe0 atomic_notifier_call_chain+0x5f/0x130 irq_int_handler+0x11/0x20 [mlx5_core] __handle_irq_event_percpu+0x99/0x220 ? tick_irq_enter+0x5d/0x80 handle_irq_event_percpu+0xf/0x40 handle_irq_event+0x3a/0x60 handle_edge_irq+0xa2/0x1c0 __common_interrupt+0x55/0x140 common_interrupt+0x7d/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x13/0x20 Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 ea 08 25 01 85 c0 7e 07 0f 00 2d 7f b0 26 00 fb f4 <fa> c3 90 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 80 d0 02 00 RSP: 0018:ffffc9000010fec8 EFLAGS: 00000242 RAX: 0000000000000001 RBX: 000000000000000f RCX: 4000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff811c410c RBP: ffffffff829478c0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle+0x1ec/0x210 default_idle_call+0x6c/0x90 do_idle+0x1ec/0x210 cpu_startup_entry+0x26/0x30 start_secondary+0x11b/0x150 secondary_startup_64_no_verify+0x165/0x16b </TASK> irq event stamp: 833284 hardirqs last enabled at (833283): [<ffffffff811c410c>] do_idle+0x1ec/0x210 hardirqs last disabled at (833284): [<ffffffff81daf9ef>] common_interrupt+0xf/0xa0 softirqs last enabled at (833224): [<ffffffff81dc199f>] __do_softirq+0x2bf/0x40e softirqs last disabled at (833177): [<ffffffff81178ddf>] irq_exit_rcu+0x7f/0xa0 Fixes: `34f46ae0d4` ("net/mlx5: Add command failures data to debugfs") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:35 -08:00
Vlad Buslov	d76fdd31f9	net/mlx5e: Fix peer flow lists handling The cited change refactored mlx5e_tc_del_fdb_peer_flow() to only clear DUP flag when list of peer flows has become empty. However, if any concurrent user holds a reference to a peer flow (for example, the neighbor update workqueue task is updating peer flow's parent encap entry concurrently), then the flow will not be removed from the peer list and, consecutively, DUP flag will remain set. Since mlx5e_tc_del_fdb_peers_flow() calls mlx5e_tc_del_fdb_peer_flow() for every possible peer index the algorithm will try to remove the flow from eswitch instances that it has never peered with causing either NULL pointer dereference when trying to remove the flow peer list head of peer_index that was never initialized or a warning if the list debug config is enabled[0]. Fix the issue by always removing the peer flow from the list even when not releasing the last reference to it. [0]: [ 3102.985806] ------------[ cut here ]------------ [ 3102.986223] list_del corruption, ffff888139110698->next is NULL [ 3102.986757] WARNING: CPU: 2 PID: 22109 at lib/list_debug.c:53 __list_del_entry_valid_or_report+0x4f/0xc0 [ 3102.987561] Modules linked in: act_ct nf_flow_table bonding act_tunnel_key act_mirred act_skbedit vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcg ss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core [last unloaded: bonding] [ 3102.991113] CPU: 2 PID: 22109 Comm: revalidator28 Not tainted 6.6.0-rc6+ #3 [ 3102.991695] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 3102.992605] RIP: 0010:__list_del_entry_valid_or_report+0x4f/0xc0 [ 3102.993122] Code: 39 c2 74 56 48 8b 32 48 39 fe 75 62 48 8b 51 08 48 39 f2 75 73 b8 01 00 00 00 c3 48 89 fe 48 c7 c7 48 fd 0a 82 e8 41 0b ad ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 fd 0a 82 e8 2d 0b ad ff 0f 0b [ 3102.994615] RSP: 0018:ffff8881383e7710 EFLAGS: 00010286 [ 3102.995078] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 3102.995670] RDX: 0000000000000001 RSI: ffff88885f89b640 RDI: ffff88885f89b640 [ 3102.997188] DEL flow 00000000be367878 on port 0 [ 3102.998594] RBP: dead000000000122 R08: 0000000000000000 R09: c0000000ffffdfff [ 3102.999604] R10: 0000000000000008 R11: ffff8881383e7598 R12: dead000000000100 [ 3103.000198] R13: 0000000000000002 R14: ffff888139110000 R15: ffff888101901240 [ 3103.000790] FS: 00007f424cde4700(0000) GS:ffff88885f880000(0000) knlGS:0000000000000000 [ 3103.001486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3103.001986] CR2: 00007fd42e8dcb70 CR3: 000000011e68a003 CR4: 0000000000370ea0 [ 3103.002596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3103.003190] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3103.003787] Call Trace: [ 3103.004055] <TASK> [ 3103.004297] ? __warn+0x7d/0x130 [ 3103.004623] ? __list_del_entry_valid_or_report+0x4f/0xc0 [ 3103.005094] ? report_bug+0xf1/0x1c0 [ 3103.005439] ? console_unlock+0x4a/0xd0 [ 3103.005806] ? handle_bug+0x3f/0x70 [ 3103.006149] ? exc_invalid_op+0x13/0x60 [ 3103.006531] ? asm_exc_invalid_op+0x16/0x20 [ 3103.007430] ? __list_del_entry_valid_or_report+0x4f/0xc0 [ 3103.007910] mlx5e_tc_del_fdb_peers_flow+0xcf/0x240 [mlx5_core] [ 3103.008463] mlx5e_tc_del_flow+0x46/0x270 [mlx5_core] [ 3103.008944] mlx5e_flow_put+0x26/0x50 [mlx5_core] [ 3103.009401] mlx5e_delete_flower+0x25f/0x380 [mlx5_core] [ 3103.009901] tc_setup_cb_destroy+0xab/0x180 [ 3103.010292] fl_hw_destroy_filter+0x99/0xc0 [cls_flower] [ 3103.010779] __fl_delete+0x2d4/0x2f0 [cls_flower] [ 3103.011207] fl_delete+0x36/0x80 [cls_flower] [ 3103.011614] tc_del_tfilter+0x56f/0x750 [ 3103.011982] rtnetlink_rcv_msg+0xff/0x3a0 [ 3103.012362] ? netlink_ack+0x1c7/0x4e0 [ 3103.012719] ? rtnl_calcit.isra.44+0x130/0x130 [ 3103.013134] netlink_rcv_skb+0x54/0x100 [ 3103.013533] netlink_unicast+0x1ca/0x2b0 [ 3103.013902] netlink_sendmsg+0x361/0x4d0 [ 3103.014269] __sock_sendmsg+0x38/0x60 [ 3103.014643] ____sys_sendmsg+0x1f2/0x200 [ 3103.015018] ? copy_msghdr_from_user+0x72/0xa0 [ 3103.015265] ___sys_sendmsg+0x87/0xd0 [ 3103.016608] ? copy_msghdr_from_user+0x72/0xa0 [ 3103.017014] ? ___sys_recvmsg+0x9b/0xd0 [ 3103.017381] ? ttwu_do_activate.isra.137+0x58/0x180 [ 3103.017821] ? wake_up_q+0x49/0x90 [ 3103.018157] ? futex_wake+0x137/0x160 [ 3103.018521] ? __sys_sendmsg+0x51/0x90 [ 3103.018882] __sys_sendmsg+0x51/0x90 [ 3103.019230] ? exit_to_user_mode_prepare+0x56/0x130 [ 3103.019670] do_syscall_64+0x3c/0x80 [ 3103.020017] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 3103.020469] RIP: 0033:0x7f4254811ef4 [ 3103.020816] Code: 89 f3 48 83 ec 10 48 89 7c 24 08 48 89 14 24 e8 42 eb ff ff 48 8b 14 24 41 89 c0 48 89 de 48 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 48 89 04 24 e8 78 eb ff ff 48 8b [ 3103.022290] RSP: 002b:00007f424cdd9480 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 3103.022970] RAX: ffffffffffffffda RBX: 00007f424cdd9510 RCX: 00007f4254811ef4 [ 3103.023564] RDX: 0000000000000000 RSI: 00007f424cdd9510 RDI: 0000000000000012 [ 3103.024158] RBP: 00007f424cdda238 R08: 0000000000000000 R09: 00007f41d801a4b0 [ 3103.024748] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 3103.025341] R13: 00007f424cdd9510 R14: 00007f424cdda240 R15: 00007f424cdd99a0 [ 3103.025931] </TASK> [ 3103.026182] ---[ end trace 0000000000000000 ]--- [ 3103.027033] ------------[ cut here ]------------ Fixes: `9be6c21fdc` ("net/mlx5e: Handle offloads flows per peer") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:34 -08:00
Tariq Toukan	c20767fd45	net/mlx5e: Fix inconsistent hairpin RQT sizes The processing of traffic in hairpin queues occurs in HW/FW and does not involve the cpus, hence the upper bound on max num channels does not apply to them. Using this bound for the hairpin RQT max_table_size is wrong. It could be too small, and cause the error below [1]. As the RQT size provided on init does not get modified later, use the same value for both actual and max table sizes. [1] mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1200): CREATE_RQT(0x916) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x538faf), err(-22) Fixes: `74a8dadac1` ("net/mlx5e: Preparations for supporting larger number of channels") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:34 -08:00
Rahul Rameshbabu	3876638b2c	net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context Indirection (*) is of lower precedence than postfix increment (++). Logic in napi_poll context would cause an out-of-bound read by first increment the pointer address by byte address space and then dereference the value. Rather, the intended logic was to dereference first and then increment the underlying value. Fixes: `92214be597` ("net/mlx5e: Update doorbell for port timestamping CQ before the software counter") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:33 -08:00
Tariq Toukan	cfbc3608a8	net/mlx5: Fix query of sd_group field The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: `f5e9563299` ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:33 -08:00
Saeed Mahameed	25461ce8b3	net/mlx5e: Use the correct lag ports number when creating TISes The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. This fixes the following errors that might appear in kernel log: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22) mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22 Fixes: `b25bd37c85` ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-01-24 00:15:32 -08:00
Conrad Kostecki	0077a504e1	ahci: asm1166: correct count of reported ports The ASM1166 SATA host controller always reports wrongly, that it has 32 ports. But in reality, it only has six ports. This seems to be a hardware issue, as all tested ASM1166 SATA host controllers reports such high count of ports. Example output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0xffffff3f impl SATA mode. By adjusting the port_map, the count is limited to six ports. New output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0x3f impl SATA mode. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=211873 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218346 Signed-off-by: Conrad Kostecki <conikost@gentoo.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-01-24 08:53:55 +01:00
Shyam Prasad N	993d1c346b	cifs: fix stray unlock in cifs_chan_skip_or_disable A recent change moved the code that decides to skip a channel or disable multichannel entirely, into a helper function. During this, a mutex_unlock of the session_mutex should have been removed. Doing that here. Fixes: `f591062bdb` ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-23 20:23:29 -06:00

... 2 3 4 5 6 ...

1249107 commits