linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-21 18:11:39 +00:00

Author	SHA1	Message	Date
Dave Gerlach	3d66a67f73	arm64: dts: ti: k3-am65: Add clocks to dwc3 nodes commit `a81e5442d7` upstream. The TI sci-clk driver can scan the DT for all clocks provided by system firmware and does this by checking the clocks property of all nodes, so we must add this to the dwc3 nodes so USB clocks are available. Without this USB does not work with latest system firmware i.e. [ 1.714662] clk: couldn't get parent clock 0 for /interconnect@100000/dwc3@4020000 Fixes: `cc54a99464` ("arm64: dts: ti: k3-am6: add USB suppor") Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: stable@kernel.org Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:19 +02:00
Marek Szyprowski	9d971b0059	ARM: dts: exynos: Fix polarity of the LCD SPI bus on UniversalC210 board commit `32a1671ff8` upstream. Recent changes in the SPI core and the SPI-GPIO driver revealed that the GPIO lines for the LD9040 LCD controller on the UniversalC210 board are defined incorrectly. Fix the polarity for those lines to match the old behavior and hardware requirements to fix LCD panel operation with recent kernels. Cc: <stable@vger.kernel.org> # 5.0.x Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:19 +02:00
James Smart	e5b9c1027e	scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path commit `0ab384a49c` upstream. If a call to lpfc_get_cmd_rsp_buf_per_hdwq returns NULL (memory allocation failure), a previously allocated lpfc_io_buf resource is leaked. Fix by releasing the lpfc_io_buf resource in the failure path. Fixes: `d79c9e9d4b` ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200128002312.16346-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:19 +02:00
Stanley Chu	73a122c263	scsi: ufs: fix Auto-Hibern8 error detection commit `5a244e0ea6` upstream. Auto-Hibern8 may be disabled by some vendors or sysfs in runtime even if Auto-Hibern8 capability is supported by host. If Auto-Hibern8 capability is supported by host but not actually enabled, Auto-Hibern8 error shall not happen. To fix this, provide a way to detect if Auto-Hibern8 is actually enabled first, and bypass Auto-Hibern8 disabling case in ufshcd_is_auto_hibern8_error(). Fixes: `8217444039` ("scsi: ufs: Add error-handling of Auto-Hibernate") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200129105251.12466-4-stanley.chu@mediatek.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:19 +02:00
Steffen Maier	0ad68e6212	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point commit `819732be9f` upstream. v2.6.27 commit `cc8c282963` ("[SCSI] zfcp: Automatically attach remote ports") introduced zfcp automatic port scan. Before that, the user had to use the sysfs attribute "port_add" of an FCP device (adapter) to add and open remote (target) ports, even for the remote peer port in point-to-point topology. That code path did a proper port open recovery trigger taking the erp_lock. Since above commit, a new helper function zfcp_erp_open_ptp_port() performed an UNlocked port open recovery trigger. This can race with other parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt e.g. adapter->erp_total_count or adapter->erp_ready_head. As already found for fabric topology in v4.17 commit `fa89adba19` ("scsi: zfcp: fix infinite iteration on ERP ready list"), there was an endless loop during tracing of rport (un)block. A subsequent v4.18 commit `9e156c54ac` ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger") introduced a lockdep assertion for that case. As a side effect, that lockdep assertion now uncovered the unlocked code path for PtP. It is from within an adapter ERP action: zfcp_erp_strategy[1479] intentionally DROPs erp lock around zfcp_erp_strategy_do_action() zfcp_erp_strategy_do_action[1441] NO erp lock zfcp_erp_adapter_strategy[876] NO erp lock zfcp_erp_adapter_strategy_open[855] NO erp lock zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around zfcp_erp_action_to_running(), BUT _not_ around zfcp_erp_enqueue_ptp_port() zfcp_erp_enqueue_ptp_port[728] BUG: _not_ taking erp lock _zfcp_erp_port_reopen[432] assumes to be called with erp lock zfcp_erp_action_enqueue[314] assumes to be called with erp lock zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock: lockdep_assert_held(&adapter->erp_lock); It causes the following lockdep warning: WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x16a/0x188 no locks held by zfcperp0.0.17c0/775. Fix this by using the proper locked recovery trigger helper function. Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com Fixes: `cc8c282963` ("[SCSI] zfcp: Automatically attach remote ports") Cc: <stable@vger.kernel.org> #v2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Gilad Ben-Yossef	8179a26031	crypto: ccree - dec auth tag size from cryptlen map commit `8962c6d2c2` upstream. Remove the auth tag size from cryptlen before mapping the destination in out-of-place AEAD decryption thus resolving a crash with extended testmgr tests. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: stable@vger.kernel.org # v4.19+ Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Gilad Ben-Yossef	9135cd1b0f	crypto: ccree - only try to map auth tag if needed commit `504e84abec` upstream. Make sure to only add the size of the auth tag to the source mapping for encryption if it is an in-place operation. Failing to do this previously caused us to try and map auth size len bytes from a NULL mapping and crashing if both the cryptlen and assoclen are zero. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Gilad Ben-Yossef	a867446427	crypto: ccree - protect against empty or NULL scatterlists commit `ce0fc6db38` upstream. Deal gracefully with a NULL or empty scatterlist which can happen if both cryptlen and assoclen are zero and we're doing in-place AEAD encryption. This fixes a crash when this causes us to try and map a NULL page, at least with some platforms / DMA mapping configs. Cc: stable@vger.kernel.org # v4.19+ Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Andrei Botila	f3f13f9794	crypto: caam - update xts sector size for large input length commit `3f142b6a7b` upstream. Since in the software implementation of XTS-AES there is no notion of sector every input length is processed the same way. CAAM implementation has the notion of sector which causes different results between the software implementation and the one in CAAM for input lengths bigger than 512 bytes. Increase sector size to maximum value on 16 bits. Fixes: `c6415a6016` ("crypto: caam - add support for acipher xts(aes)") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Andrei Botila <andrei.botila@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Horia Geantă	bc8413b626	crypto: caam/qi2 - fix chacha20 data size error commit `3a5a9e1ef3` upstream. HW generates a Data Size error for chacha20 requests that are not a multiple of 64B, since algorithm state (AS) does not have the FINAL bit set. Since updating req->iv (for chaining) is not required, modify skcipher descriptors to set the FINAL bit for chacha20. [Note that for skcipher decryption we know that ctx1_iv_off is 0, which allows for an optimization by not checking algorithm type, since append_dec_op1() sets FINAL bit for all algorithms except AES.] Also drop the descriptor operations that save the IV. However, in order to keep code logic simple, things like S/G tables generation etc. are not touched. Cc: <stable@vger.kernel.org> # v5.3+ Fixes: `334d37c9e2` ("crypto: caam - update IV using HW support") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Tested-by: Valentin Ciocoi Radulescu <valentin.ciocoi@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Matthew Wilcox (Oracle)	07378b0991	xarray: Fix early termination of xas_for_each_marked commit `7e934cf5ac` upstream. xas_for_each_marked() is using entry == NULL as a termination condition of the iteration. When xas_for_each_marked() is used protected only by RCU, this can however race with xas_store(xas, NULL) in the following way: TASK1 TASK2 page_cache_delete() find_get_pages_range_tag() xas_for_each_marked() xas_find_marked() off = xas_find_chunk() xas_store(&xas, NULL) xas_init_marks(&xas); ... rcu_assign_pointer(*slot, NULL); entry = xa_entry(off); And thus xas_for_each_marked() terminates prematurely possibly leading to missed entries in the iteration (translating to missing writeback of some pages or a similar problem). If we find a NULL entry that has been marked, skip it (unless we're trying to allocate an entry). Reported-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org Fixes: `ef8e5717db` ("page cache: Convert delete_batch to XArray") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Matthew Wilcox (Oracle)	8f4c8e92bd	XArray: Fix xas_pause for large multi-index entries commit `c36d451ad3` upstream. Inspired by the recent Coverity report, I looked for other places where the offset wasn't being converted to an unsigned long before being shifted, and I found one in xas_pause() when the entry being paused is of order >32. Fixes: `b803b42823` ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Nikos Tsironis	a1ffc47f22	dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() commit `81d5553d12` upstream. dm_clone_nr_of_hydrated_regions() returns the number of regions that have been hydrated so far. In order to do so it employs bitmap_weight(). Until now, the return type of dm_clone_nr_of_hydrated_regions() was unsigned long. Because bitmap_weight() returns an int, in case BITS_PER_LONG == 64 and the return value of bitmap_weight() is 2^31 (the maximum allowed number of regions for a device), the result is sign extended from 32 bits to 64 bits and an incorrect value is displayed, in the status output of dm-clone, as the number of hydrated regions. Fix this by having dm_clone_nr_of_hydrated_regions() return an unsigned int. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:18 +02:00
Nikos Tsironis	996f8f1ba7	dm clone: Add overflow check for number of regions commit `cd481c1226` upstream. Add overflow check for clone->nr_regions variable, which holds the number of regions of the target. The overflow can occur with sufficiently large devices, if BITS_PER_LONG == 32. E.g., if the region size is 8 sectors (4K), the overflow would occur for device sizes > 34359738360 sectors (~16TB). This could result in multiple device sectors wrongly mapping to the same region number, due to the truncation from 64 bits to 32 bits, which would lead to data corruption. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Shetty, Harshini X (EXT-Sony Mobile)	2e70305934	dm verity fec: fix memory leak in verity_fec_dtr commit `75fa601934` upstream. Fix below kmemleak detected in verity_fec_ctr. output_pool is allocated for each dm-verity-fec device. But it is not freed when dm-table for the verity target is removed. Hence free the output mempool in destructor function verity_fec_dtr. unreferenced object 0xffffffffa574d000 (size 4096): comm "init", pid 1667, jiffies 4294894890 (age 307.168s) hex dump (first 32 bytes): 8e 36 00 98 66 a8 0b 9b 00 00 00 00 00 00 00 00 .6..f........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000060e82407>] __kmalloc+0x2b4/0x340 [<00000000dd99488f>] mempool_kmalloc+0x18/0x20 [<000000002560172b>] mempool_init_node+0x98/0x118 [<000000006c3574d2>] mempool_init+0x14/0x20 [<0000000008cb266e>] verity_fec_ctr+0x388/0x3b0 [<000000000887261b>] verity_ctr+0x87c/0x8d0 [<000000002b1e1c62>] dm_table_add_target+0x174/0x348 [<000000002ad89eda>] table_load+0xe4/0x328 [<000000001f06f5e9>] dm_ctl_ioctl+0x3b4/0x5a0 [<00000000bee5fbb7>] do_vfs_ioctl+0x5dc/0x928 [<00000000b475b8f5>] __arm64_sys_ioctl+0x70/0x98 [<000000005361e2e8>] el0_svc_common+0xa0/0x158 [<000000001374818f>] el0_svc_handler+0x6c/0x88 [<000000003364e9f4>] el0_svc+0x8/0xc [<000000009d84cec9>] 0xffffffffffffffff Fixes: `a739ff3f54` ("dm verity: add support for forward error correction") Depends-on: `6f1c819c21` ("dm: convert to bioset_init()/mempool_init()") Cc: stable@vger.kernel.org Signed-off-by: Harshini Shetty <harshini.x.shetty@sony.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Mikulas Patocka	833309f3fb	dm integrity: fix a crash with unusually large tag size commit `b93b6643e9` upstream. If the user specifies tag size larger than HASH_MAX_DIGESTSIZE, there's a crash in integrity_metadata(). Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Mikulas Patocka	bef0d2f5fd	dm writecache: add cond_resched to avoid CPU hangs commit `1edaa447d9` upstream. Initializing a dm-writecache device can take a long time when the persistent memory device is large. Add cond_resched() to a few loops to avoid warnings that the CPU is stuck. Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Jakub Kicinski	5c84ab9c96	mm, memcg: do not high throttle allocators based on wraparound commit `9b8b17541f` upstream. If a cgroup violates its memory.high constraints, we may end up unduly penalising it. For example, for the following hierarchy: A: max high, 20 usage A/B: 9 high, 10 usage A/C: max high, 10 usage We would end up doing the following calculation below when calculating high delay for A/B: A/B: 10 - 9 = 1... A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21. This gets worse with higher disparities in usage in the parent. I have no idea how this disappeared from the final version of the patch, but it is certainly Not Good(tm). This wasn't obvious in testing because, for a simple cgroup hierarchy with only one child, the result is usually roughly the same. It's only in more complex hierarchies that things go really awry (although still, the effects are limited to a maximum of 2 seconds in schedule_timeout_killable at a maximum). [chris@chrisdown.name: changelog] Fixes: `e26733e0d0` ("mm, memcg: throttle allocators based on ancestral memory.high") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [5.4.x] Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Maxime Ripard	935e87b20c	arm64: dts: allwinner: h5: Fix PMU compatible commit `4ae7a3c3d7` upstream. The commit `c35a516a46` ("arm64: dts: allwinner: H5: Add PMU node") introduced support for the PMU found on the Allwinner H5. However, the binding only allows for a single compatible, while the patch was adding two. Make sure we follow the binding. Fixes: `c35a516a46` ("arm64: dts: allwinner: H5: Add PMU node") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Scott Wood	1dbfae0095	sched/core: Remove duplicate assignment in sched_tick_remote() commit `82e0516ce3` upstream. A redundant "curr = rq->curr" was added; remove it. Fixes: `ebc0f83c78` ("timers/nohz: Update NOHZ load in remote tick") Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1580776558-12882-1-git-send-email-swood@redhat.com Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Maxime Ripard	8b06804632	arm64: dts: allwinner: h6: Fix PMU compatible commit `4c7eeb9af3` upstream. The commit `7aa9b9eb7d` ("arm64: dts: allwinner: H6: Add PMU mode") introduced support for the PMU found on the Allwinner H6. However, the binding only allows for a single compatible, while the patch was adding two. Make sure we follow the binding. Fixes: `7aa9b9eb7d` ("arm64: dts: allwinner: H6: Add PMU mode") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Subash Abhinov Kasiviswanathan	27dbb36338	net: qualcomm: rmnet: Allow configuration updates to existing devices commit `2abb579238` upstream. This allows the changelink operation to succeed if the mux_id was specified as an argument. Note that the mux_id must match the existing mux_id of the rmnet device or should be an unused mux_id. Fixes: `1dc49e9d16` ("net: rmnet: do not allow to change mux id if mux id is duplicated") Reported-and-tested-by: Alex Elder <elder@linaro.org> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
Anssi Hannula	add09c86cd	tools: gpio: Fix out-of-tree build regression commit `82f04bfe2a` upstream. Commit `0161a94e2d` ("tools: gpio: Correctly add make dependencies for gpio_utils") added a make rule for gpio-utils-in.o but used $(output) instead of the correct $(OUTPUT) for the output directory, breaking out-of-tree build (O=xx) with the following error: No rule to make target 'out/tools/gpio/gpio-utils-in.o', needed by 'out/tools/gpio/lsgpio-in.o'. Stop. Fix that. Fixes: `0161a94e2d` ("tools: gpio: Correctly add make dependencies for gpio_utils") Cc: <stable@vger.kernel.org> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Link: https://lore.kernel.org/r/20200325103154.32235-1-anssi.hannula@bitwise.fi Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:17 +02:00
YueHaibing	a0f079ac13	powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init() commit `11dd34f3ea` upstream. There is no need to have the 'struct dentry *vpa_dir' variable static since new value always be assigned before use it. Fixes: `c6c26fb55e` ("powerpc/pseries: Export raw per-CPU VPA data via debugfs") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190218125644.87448-1-yuehaibing@huawei.com Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Yangbo Lu	e0ae9da3fb	mmc: sdhci-of-esdhc: fix esdhc_reset() for different controller versions commit `2aa3d826ad` upstream. This patch is to fix operating in esdhc_reset() for different controller versions, and to add bus-width restoring after data reset for eSDHC (verdor version <= 2.2). Also add annotation for understanding. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20200108040713.38888-1-yangbo.lu@nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Jens Axboe	7661469ef5	io_uring: honor original task RLIMIT_FSIZE commit `4ed734b0d0` upstream. With the previous fixes for number of files open checking, I added some debug code to see if we had other spots where we're checking rlimit() against the async io-wq workers. The only one I found was file size checking, which we should also honor. During write and fallocate prep, store the max file size and override that for the current ask if we're in io-wq worker context. Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Gao Xiang	a181a74610	erofs: correct the remaining shrink objects commit `9d5a09c6f3` upstream. The remaining count should not include successful shrink attempts. Fixes: `e7e9a307be` ("staging: erofs: introduce workstation for decompression") Cc: <stable@vger.kernel.org> # 4.19+ Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Rosioru Dragos	433868b19c	crypto: mxs-dcp - fix scatterlist linearization for hash commit `fa03481b6e` upstream. The incorrect traversal of the scatterlist, during the linearization phase lead to computing the hash value of the wrong input buffer. New implementation uses scatterwalk_map_and_copy() to address this issue. Cc: <stable@vger.kernel.org> Fixes: `15b59e7c37` ("crypto: mxs - Add Freescale MXS DCP driver") Signed-off-by: Rosioru Dragos <dragos.rosioru@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Dan Carpenter	248414f505	crypto: rng - Fix a refcounting bug in crypto_rng_reset() commit `eed74b3eba` upstream. We need to decrement this refcounter on these error paths. Fixes: `f7d76e05d0` ("crypto: user - fix use_after_free of struct xxx_request") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Nikita Shubin	6b936b1872	remoteproc: Fix NULL pointer dereference in rproc_virtio_notify commit `791c13b709` upstream. Undefined rproc_ops .kick method in remoteproc driver will result in "Unable to handle kernel NULL pointer dereference" in rproc_virtio_notify, after firmware loading if: 1) .kick method wasn't defined in driver 2) resource_table exists in firmware and has "Virtio device entry" defined Let's refuse to register an rproc-induced virtio device if no kick method was defined for rproc. [ 13.180049][ T415] 8<--- cut here --- [ 13.190558][ T415] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 13.212544][ T415] pgd = (ptrval) [ 13.217052][ T415] [00000000] *pgd=00000000 [ 13.224692][ T415] Internal error: Oops: 80000005 [#1] PREEMPT SMP ARM [ 13.231318][ T415] Modules linked in: rpmsg_char imx_rproc virtio_rpmsg_bus rpmsg_core [last unloaded: imx_rproc] [ 13.241687][ T415] CPU: 0 PID: 415 Comm: unload-load.sh Not tainted 5.5.2-00002-g707df13bbbdd #6 [ 13.250561][ T415] Hardware name: Freescale i.MX7 Dual (Device Tree) [ 13.257009][ T415] PC is at 0x0 [ 13.260249][ T415] LR is at rproc_virtio_notify+0x2c/0x54 [ 13.265738][ T415] pc : [<00000000>] lr : [<8050f6b0>] psr: 60010113 [ 13.272702][ T415] sp : b8d47c48 ip : 00000001 fp : bc04de00 [ 13.278625][ T415] r10: bc04c000 r9 : 00000cc0 r8 : b8d46000 [ 13.284548][ T415] r7 : 00000000 r6 : b898f200 r5 : 00000000 r4 : b8a29800 [ 13.291773][ T415] r3 : 00000000 r2 : 990a3ad4 r1 : 00000000 r0 : b8a29800 [ 13.299000][ T415] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 13.306833][ T415] Control: 10c5387d Table: b8b4806a DAC: 00000051 [ 13.313278][ T415] Process unload-load.sh (pid: 415, stack limit = 0x(ptrval)) [ 13.320591][ T415] Stack: (0xb8d47c48 to 0xb8d48000) [ 13.325651][ T415] 7c40: b895b680 00000001 b898f200 803c6430 b895bc80 7f00ae18 [ 13.334531][ T415] 7c60: 00000035 00000000 00000000 b9393200 80b3ed80 00004000 b9393268 bbf5a9a2 [ 13.343410][ T415] 7c80: 00000e00 00000200 00000000 7f00aff0 7f00a014 b895b680 b895b800 990a3ad4 [ 13.352290][ T415] 7ca0: 00000001 b898f210 b898f200 00000000 00000000 7f00e000 00000001 00000000 [ 13.361170][ T415] 7cc0: 00000000 803c62e0 80b2169c 802a0924 b898f210 00000000 00000000 b898f210 [ 13.370049][ T415] 7ce0: 80b9ba44 00000000 80b9ba48 00000000 7f00e000 00000008 80b2169c 80400114 [ 13.378929][ T415] 7d00: 80b2169c 8061fd64 b898f210 7f00e000 80400744 b8d46000 80b21634 80b21634 [ 13.387809][ T415] 7d20: 80b2169c 80400614 80b21634 80400718 7f00e000 00000000 b8d47d7c 80400744 [ 13.396689][ T415] 7d40: b8d46000 80b21634 80b21634 803fe338 b898f254 b80fe76c b8d32e38 990a3ad4 [ 13.405569][ T415] 7d60: fffffff3 b898f210 b8d46000 00000001 b898f254 803ffe7c 80857a90 b898f210 [ 13.414449][ T415] 7d80: 00000001 990a3ad4 b8d46000 b898f210 b898f210 80b17aec b8a29c20 803ff0a4 [ 13.423328][ T415] 7da0: b898f210 00000000 b8d46000 803fb8e0 b898f200 00000000 80b17aec b898f210 [ 13.432209][ T415] 7dc0: b8a29c20 990a3ad4 b895b900 b898f200 8050fb7c 80b17aec b898f210 b8a29c20 [ 13.441088][ T415] 7de0: b8a29800 b895b900 b8a29a04 803c5ec0 b8a29c00 b898f200 b8a29a20 00000007 [ 13.449968][ T415] 7e00: b8a29c20 8050fd78 b8a29800 00000000 b8a29a20 b8a29c04 b8a29820 b8a299d0 [ 13.458848][ T415] 7e20: b895b900 8050e5a4 b8a29800 b8a299d8 b8d46000 b8a299e0 b8a29820 b8a299d0 [ 13.467728][ T415] 7e40: b895b900 8050e008 000041ed 00000000 b8b8c440 b8a299d8 b8a299e0 b8a299d8 [ 13.476608][ T415] 7e60: b8b8c440 990a3ad4 00000000 b8a29820 b8b8c400 00000006 b8a29800 b895b880 [ 13.485487][ T415] 7e80: b8d47f78 00000000 00000000 8050f4b4 00000006 b895b890 b8b8c400 008fbea0 [ 13.494367][ T415] 7ea0: b895b880 8029f530 00000000 00000000 b8d46000 00000006 b8d46000 008fbea0 [ 13.503246][ T415] 7ec0: 8029f434 00000000 b8d46000 00000000 00000000 8021e2e4 0000000a 8061fd0c [ 13.512125][ T415] 7ee0: 0000000a b8af0c00 0000000a b8af0c40 00000001 b8af0c40 00000000 8061f910 [ 13.521005][ T415] 7f00: 0000000a 80240af4 00000002 b8d46000 00000000 8061fd0c 00000002 80232d7c [ 13.529884][ T415] 7f20: 00000000 b8d46000 00000000 990a3ad4 00000000 00000006 b8a62d80 008fbea0 [ 13.538764][ T415] 7f40: b8d47f78 00000000 b8d46000 00000000 00000000 802210c0 b88f2900 00000000 [ 13.547644][ T415] 7f60: b8a62d80 b8a62d80 b8d46000 00000006 008fbea0 80221320 00000000 00000000 [ 13.556524][ T415] 7f80: b8af0c00 990a3ad4 0000006c 008fbea0 76f1cda0 00000004 80101204 00000004 [ 13.565403][ T415] 7fa0: 00000000 80101000 0000006c 008fbea0 00000001 008fbea0 00000006 00000000 [ 13.574283][ T415] 7fc0: 0000006c 008fbea0 76f1cda0 00000004 00000006 00000006 00000000 00000000 [ 13.583162][ T415] 7fe0: 00000004 7ebaf7d0 76eb4c0b 76e3f206 600d0030 00000001 00000000 00000000 [ 13.592056][ T415] [<8050f6b0>] (rproc_virtio_notify) from [<803c6430>] (virtqueue_notify+0x1c/0x34) [ 13.601298][ T415] [<803c6430>] (virtqueue_notify) from [<7f00ae18>] (rpmsg_probe+0x280/0x380 [virtio_rpmsg_bus]) [ 13.611663][ T415] [<7f00ae18>] (rpmsg_probe [virtio_rpmsg_bus]) from [<803c62e0>] (virtio_dev_probe+0x1f8/0x2c4) [ 13.622022][ T415] [<803c62e0>] (virtio_dev_probe) from [<80400114>] (really_probe+0x200/0x450) [ 13.630817][ T415] [<80400114>] (really_probe) from [<80400614>] (driver_probe_device+0x16c/0x1ac) [ 13.639873][ T415] [<80400614>] (driver_probe_device) from [<803fe338>] (bus_for_each_drv+0x84/0xc8) [ 13.649102][ T415] [<803fe338>] (bus_for_each_drv) from [<803ffe7c>] (__device_attach+0xd4/0x164) [ 13.658069][ T415] [<803ffe7c>] (__device_attach) from [<803ff0a4>] (bus_probe_device+0x84/0x8c) [ 13.666950][ T415] [<803ff0a4>] (bus_probe_device) from [<803fb8e0>] (device_add+0x444/0x768) [ 13.675572][ T415] [<803fb8e0>] (device_add) from [<803c5ec0>] (register_virtio_device+0xa4/0xfc) [ 13.684541][ T415] [<803c5ec0>] (register_virtio_device) from [<8050fd78>] (rproc_add_virtio_dev+0xcc/0x1b8) [ 13.694466][ T415] [<8050fd78>] (rproc_add_virtio_dev) from [<8050e5a4>] (rproc_start+0x148/0x200) [ 13.703521][ T415] [<8050e5a4>] (rproc_start) from [<8050e008>] (rproc_boot+0x384/0x5c0) [ 13.711708][ T415] [<8050e008>] (rproc_boot) from [<8050f4b4>] (state_store+0x3c/0xc8) [ 13.719723][ T415] [<8050f4b4>] (state_store) from [<8029f530>] (kernfs_fop_write+0xfc/0x214) [ 13.728348][ T415] [<8029f530>] (kernfs_fop_write) from [<8021e2e4>] (__vfs_write+0x30/0x1cc) [ 13.736971][ T415] [<8021e2e4>] (__vfs_write) from [<802210c0>] (vfs_write+0xac/0x17c) [ 13.744985][ T415] [<802210c0>] (vfs_write) from [<80221320>] (ksys_write+0x64/0xe4) [ 13.752825][ T415] [<80221320>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) [ 13.761178][ T415] Exception stack(0xb8d47fa8 to 0xb8d47ff0) [ 13.766932][ T415] 7fa0: 0000006c 008fbea0 00000001 008fbea0 00000006 00000000 [ 13.775811][ T415] 7fc0: 0000006c 008fbea0 76f1cda0 00000004 00000006 00000006 00000000 00000000 [ 13.784687][ T415] 7fe0: 00000004 7ebaf7d0 76eb4c0b 76e3f206 [ 13.790442][ T415] Code: bad PC value [ 13.839214][ T415] ---[ end trace 1fe21ecfc9f28852 ]--- Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Nikita Shubin <NShubin@topcon.com> Fixes: `7a18694162` ("remoteproc: remove the single rpmsg vdev limitation") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200306072452.24743-1-NShubin@topcon.com Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Sibi Sankar	5b677eddc5	remoteproc: qcom_q6v5_mss: Reload the mba region on coredump commit `d96f2571dc` upstream. On secure devices after a wdog/fatal interrupt, the mba region has to be refreshed in order to prevent the following errors during mba load. Err Logs: remoteproc remoteproc2: stopped remote processor 4080000.remoteproc qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284031232 qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284031232 .... qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284031232 qcom-q6v5-mss 4080000.remoteproc: MBA booted, loading mpss Fixes: `7dd8ade24d` ("remoteproc: qcom: q6v5-mss: Add custom dump function for modem") Cc: stable@vger.kernel.org Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20200304194729.27979-4-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:16 +02:00
Bjorn Andersson	241f681d19	remoteproc: qcom_q6v5_mss: Don't reassign mpss region on shutdown commit `900fc60df2` upstream. Trying to reclaim mpss memory while the mba is not running causes the system to crash on devices with security fuses blown, so leave it assigned to the remote on shutdown and recover it on a subsequent boot. Fixes: `6c5a9dc248` ("remoteproc: qcom: Make secure world call for mem ownership switch") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20200304194729.27979-2-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Josef Bacik	87a9058d55	btrfs: use nofs allocations for running delayed items commit `351cbf6e44` upstream. Zygo reported the following lockdep splat while testing the balance patches ====================================================== WARNING: possible circular locking dependency detected 5.6.0-c6f0579d496a+ #53 Not tainted ------------------------------------------------------ kswapd0/1133 is trying to acquire lock: ffff888092f622c0 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node+0x7c/0x5b0 but task is already holding lock: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.91+0x29/0x30 fs_reclaim_acquire+0x19/0x20 kmem_cache_alloc_trace+0x32/0x740 add_block_entry+0x45/0x260 btrfs_ref_tree_mod+0x6e2/0x8b0 btrfs_alloc_tree_block+0x789/0x880 alloc_tree_block_no_bg_flush+0xc6/0xf0 __btrfs_cow_block+0x270/0x940 btrfs_cow_block+0x1ba/0x3a0 btrfs_search_slot+0x999/0x1030 btrfs_insert_empty_items+0x81/0xe0 btrfs_insert_delayed_items+0x128/0x7d0 __btrfs_run_delayed_items+0xf4/0x2a0 btrfs_run_delayed_items+0x13/0x20 btrfs_commit_transaction+0x5cc/0x1390 insert_balance_item.isra.39+0x6b2/0x6e0 btrfs_balance+0x72d/0x18d0 btrfs_ioctl_balance+0x3de/0x4c0 btrfs_ioctl+0x30ab/0x44a0 ksys_ioctl+0xa1/0xe0 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x77/0x2c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0x197e/0x2550 lock_acquire+0x103/0x220 __mutex_lock+0x13d/0xce0 mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 kswapd+0x35a/0x800 kthread+0x1e9/0x210 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&delayed_node->mutex); lock(fs_reclaim); lock(&delayed_node->mutex); * DEADLOCK * 3 locks held by kswapd0/1133: #0: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8fc380d8 (shrinker_rwsem){++++}, at: shrink_slab+0x1e8/0x410 #2: ffff8881e0e6c0e8 (&type->s_umount_key#42){++++}, at: trylock_super+0x1b/0x70 stack backtrace: CPU: 2 PID: 1133 Comm: kswapd0 Not tainted 5.6.0-c6f0579d496a+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xc1/0x11a print_circular_bug.isra.38.cold.57+0x145/0x14a check_noncircular+0x2a9/0x2f0 ? print_circular_bug.isra.38+0x130/0x130 ? stack_trace_consume_entry+0x90/0x90 ? save_trace+0x3cc/0x420 __lock_acquire+0x197e/0x2550 ? btrfs_inode_clear_file_extent_range+0x9b/0xb0 ? register_lock_class+0x960/0x960 lock_acquire+0x103/0x220 ? __btrfs_release_delayed_node+0x7c/0x5b0 __mutex_lock+0x13d/0xce0 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? __asan_loadN+0xf/0x20 ? pvclock_clocksource_read+0xeb/0x190 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? mutex_lock_io_nested+0xc20/0xc20 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1e6/0x2e0 mutex_lock_nested+0x1b/0x20 ? mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 ? btrfs_setattr+0x840/0x840 ? do_raw_spin_unlock+0xa8/0x140 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 ? invalidate_inodes+0x310/0x310 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 ? do_shrink_slab+0x530/0x530 ? do_shrink_slab+0x530/0x530 ? __kasan_check_read+0x11/0x20 ? mem_cgroup_protected+0x13d/0x260 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 ? mem_cgroup_shrink_node+0x490/0x490 ? _raw_spin_unlock_irq+0x27/0x40 ? finish_task_switch+0xce/0x390 ? rcu_read_lock_bh_held+0xb0/0xb0 kswapd+0x35a/0x800 ? _raw_spin_unlock_irqrestore+0x4c/0x60 ? balance_pgdat+0x8a0/0x8a0 ? finish_wait+0x110/0x110 ? __kasan_check_read+0x11/0x20 ? __kthread_parkme+0xc6/0xe0 ? balance_pgdat+0x8a0/0x8a0 kthread+0x1e9/0x210 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 This is because we hold that delayed node's mutex while doing tree operations. Fix this by just wrapping the searches in nofs. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Robbie Ko	0425813c22	btrfs: fix missing semaphore unlock in btrfs_sync_file commit `6ff06729c2` upstream. Ordered ops are started twice in sync file, once outside of inode mutex and once inside, taking the dio semaphore. There was one error path missing the semaphore unlock. Fixes: `aab15e8ec2` ("Btrfs: fix rare chances for data loss when doing a fast fsync") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Josef Bacik	08e69ab983	btrfs: unset reloc control if we fail to recover commit `fb2d83eefe` upstream. If we fail to load an fs root, or fail to start a transaction we can bail without unsetting the reloc control, which leads to problems later when we free the reloc control but still have it attached to the file system. In the normal path we'll end up calling unset_reloc_control() twice, but all it does is set fs_info->reloc_control = NULL, and we can only have one balance at a time so it's not racey. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Filipe Manana	098d3da1ad	btrfs: fix missing file extent item for hole after ranged fsync commit `95418ed1d1` upstream. When doing a fast fsync for a range that starts at an offset greater than zero, we can end up with a log that when replayed causes the respective inode miss a file extent item representing a hole if we are not using the NO_HOLES feature. This is because for fast fsyncs we don't log any extents that cover a range different from the one requested in the fsync. Example scenario to trigger it: $ mkfs.btrfs -O ^no-holes -f /dev/sdd $ mount /dev/sdd /mnt # Create a file with a single 256K and fsync it to clear to full sync # bit in the inode - we want the msync below to trigger a fast fsync. $ xfs_io -f -c "pwrite -S 0xab 0 256K" -c "fsync" /mnt/foo # Force a transaction commit and wipe out the log tree. $ sync # Dirty 768K of data, increasing the file size to 1Mb, and flush only # the range from 256K to 512K without updating the log tree # (sync_file_range() does not trigger fsync, it only starts writeback # and waits for it to finish). $ xfs_io -c "pwrite -S 0xcd 256K 768K" /mnt/foo $ xfs_io -c "sync_range -abw 256K 256K" /mnt/foo # Now dirty the range from 768K to 1M again and sync that range. $ xfs_io -c "mmap -w 768K 256K" \ -c "mwrite -S 0xef 768K 256K" \ -c "msync -s 768K 256K" \ -c "munmap" \ /mnt/foo <power fail> # Mount to replay the log. $ mount /dev/sdd /mnt $ umount /mnt $ btrfs check /dev/sdd Opening filesystem to check... Checking filesystem on /dev/sdd UUID: 482fb574-b288-478e-a190-a9c44a78fca6 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 5 inode 257 errors 100, file extent discount Found file extent holes: start: 262144, len: 524288 ERROR: errors found in fs roots found 720896 bytes used, error(s) found total csum bytes: 512 total tree bytes: 131072 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 123514 file data blocks allocated: 589824 referenced 589824 Fix this issue by setting the range to full (0 to LLONG_MAX) when the NO_HOLES feature is not enabled. This results in extra work being done but it gives the guarantee we don't end up with missing holes after replaying the log. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Josef Bacik	b436fbff6f	btrfs: drop block from cache on error in relocation commit `8e19c9732a` upstream. If we have an error while building the backref tree in relocation we'll process all the pending edges and then free the node. However if we integrated some edges into the cache we'll lose our link to those edges by simply freeing this node, which means we'll leak memory and references to any roots that we've found. Instead we need to use remove_backref_node(), which walks through all of the edges that are still linked to this node and free's them up and drops any root references we may be holding. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Josef Bacik	dd68ba0d73	btrfs: set update the uuid generation as soon as possible commit `75ec1db871` upstream. In my EIO stress testing I noticed I was getting forced to rescan the uuid tree pretty often, which was weird. This is because my error injection stuff would sometimes inject an error after log replay but before we loaded the UUID tree. If log replay committed the transaction it wouldn't have updated the uuid tree generation, but the tree was valid and didn't change, so there's no reason to not update the generation here. Fix this by setting the BTRFS_FS_UPDATE_UUID_TREE_GEN bit immediately after reading all the fs roots if the uuid tree generation matches the fs generation. Then any transaction commits that happen during mount won't screw up our uuid tree state, forcing us to do needless uuid rescans. Fixes: `70f8017547` ("Btrfs: check UUID tree during mount if required") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Josef Bacik	441b83a842	btrfs: reloc: clean dirty subvols if we fail to start a transaction commit `6217b0fadd` upstream. If we do merge_reloc_roots() we could insert a few roots onto the dirty subvol roots list, where we hold a ref on them. If we fail to start the transaction we need to run clean_dirty_subvols() in order to cleanup the refs. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Filipe Manana	1bd44cada4	Btrfs: fix crash during unmount due to race with delayed inode workers commit `f0cc2cd701` upstream. During unmount we can have a job from the delayed inode items work queue still running, that can lead to at least two bad things: 1) A crash, because the worker can try to create a transaction just after the fs roots were freed; 2) A transaction leak, because the worker can create a transaction before the fs roots are freed and just after we committed the last transaction and after we stopped the transaction kthread. A stack trace example of the crash: [79011.691214] kernel BUG at lib/radix-tree.c:982! [79011.692056] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [79011.693180] CPU: 3 PID: 1394 Comm: kworker/u8:2 Tainted: G W 5.6.0-rc2-btrfs-next-54 #2 (...) [79011.696789] Workqueue: btrfs-delayed-meta btrfs_work_helper [btrfs] [79011.697904] RIP: 0010:radix_tree_tag_set+0xe7/0x170 (...) [79011.702014] RSP: 0018:ffffb3c84a317ca0 EFLAGS: 00010293 [79011.702949] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [79011.704202] RDX: ffffb3c84a317cb0 RSI: ffffb3c84a317ca8 RDI: ffff8db3931340a0 [79011.705463] RBP: 0000000000000005 R08: 0000000000000005 R09: ffffffff974629d0 [79011.706756] R10: ffffb3c84a317bc0 R11: 0000000000000001 R12: ffff8db393134000 [79011.708010] R13: ffff8db3931340a0 R14: ffff8db393134068 R15: 0000000000000001 [79011.709270] FS: 0000000000000000(0000) GS:ffff8db3b6a00000(0000) knlGS:0000000000000000 [79011.710699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [79011.711710] CR2: 00007f22c2a0a000 CR3: 0000000232ad4005 CR4: 00000000003606e0 [79011.712958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [79011.714205] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [79011.715448] Call Trace: [79011.715925] record_root_in_trans+0x72/0xf0 [btrfs] [79011.716819] btrfs_record_root_in_trans+0x4b/0x70 [btrfs] [79011.717925] start_transaction+0xdd/0x5c0 [btrfs] [79011.718829] btrfs_async_run_delayed_root+0x17e/0x2b0 [btrfs] [79011.719915] btrfs_work_helper+0xaa/0x720 [btrfs] [79011.720773] process_one_work+0x26d/0x6a0 [79011.721497] worker_thread+0x4f/0x3e0 [79011.722153] ? process_one_work+0x6a0/0x6a0 [79011.722901] kthread+0x103/0x140 [79011.723481] ? kthread_create_worker_on_cpu+0x70/0x70 [79011.724379] ret_from_fork+0x3a/0x50 (...) The following diagram shows a sequence of steps that lead to the crash during ummount of the filesystem: CPU 1 CPU 2 CPU 3 btrfs_punch_hole() btrfs_btree_balance_dirty() btrfs_balance_delayed_items() --> sees fs_info->delayed_root->items with value 200, which is greater than BTRFS_DELAYED_BACKGROUND (128) and smaller than BTRFS_DELAYED_WRITEBACK (512) btrfs_wq_run_delayed_node() --> queues a job for fs_info->delayed_workers to run btrfs_async_run_delayed_root() btrfs_async_run_delayed_root() --> job queued by CPU 1 --> starts picking and running delayed nodes from the prepare_list list close_ctree() btrfs_delete_unused_bgs() btrfs_commit_super() btrfs_join_transaction() --> gets transaction N btrfs_commit_transaction(N) --> set transaction state to TRANTS_STATE_COMMIT_START btrfs_first_prepared_delayed_node() --> picks delayed node X through the prepared_list list btrfs_run_delayed_items() btrfs_first_delayed_node() --> also picks delayed node X but through the node_list list __btrfs_commit_inode_delayed_items() --> runs all delayed items from this node and drops the node's item count to 0 through call to btrfs_release_delayed_inode() --> finishes running any remaining delayed nodes --> finishes transaction commit --> stops cleaner and transaction threads btrfs_free_fs_roots() --> frees all roots and removes them from the radix tree fs_info->fs_roots_radix btrfs_join_transaction() start_transaction() btrfs_record_root_in_trans() record_root_in_trans() radix_tree_tag_set() --> crashes because the root is not in the radix tree anymore If the worker is able to call btrfs_join_transaction() before the unmount task frees the fs roots, we end up leaking a transaction and all its resources, since after the call to btrfs_commit_super() and stopping the transaction kthread, we don't expect to have any transaction open anymore. When this situation happens the worker has a delayed node that has no more items to run, since the task calling btrfs_run_delayed_items(), which is doing a transaction commit, picks the same node and runs all its items first. We can not wait for the worker to complete when running delayed items through btrfs_run_delayed_items(), because we call that function in several phases of a transaction commit, and that could cause a deadlock because the worker calls btrfs_join_transaction() and the task doing the transaction commit may have already set the transaction state to TRANS_STATE_COMMIT_DOING. Also it's not possible to get into a situation where only some of the items of a delayed node are added to the fs/subvolume tree in the current transaction and the remaining ones in the next transaction, because when running the items of a delayed inode we lock its mutex, effectively waiting for the worker if the worker is running the items of the delayed node already. Since this can only cause issues when unmounting a filesystem, fix it in a simple way by waiting for any jobs on the delayed workers queue before calling btrfs_commit_supper() at close_ctree(). This works because at this point no one can call btrfs_btree_balance_dirty() or btrfs_balance_delayed_items(), and if we end up waiting for any worker to complete, btrfs_commit_super() will commit the transaction created by the worker. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Qu Wenruo	941dabde6c	btrfs: Don't submit any btree write bio if the fs has errors commit `b3ff8f1d38` upstream. [BUG] There is a fuzzed image which could cause KASAN report at unmount time. BUG: KASAN: use-after-free in btrfs_queue_work+0x2c1/0x390 Read of size 8 at addr ffff888067cf6848 by task umount/1922 CPU: 0 PID: 1922 Comm: umount Tainted: G W 5.0.21 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x5b/0x8b print_address_description+0x70/0x280 kasan_report+0x13a/0x19b btrfs_queue_work+0x2c1/0x390 btrfs_wq_submit_bio+0x1cd/0x240 btree_submit_bio_hook+0x18c/0x2a0 submit_one_bio+0x1be/0x320 flush_write_bio.isra.41+0x2c/0x70 btree_write_cache_pages+0x3bb/0x7f0 do_writepages+0x5c/0x130 __writeback_single_inode+0xa3/0x9a0 writeback_single_inode+0x23d/0x390 write_inode_now+0x1b5/0x280 iput+0x2ef/0x600 close_ctree+0x341/0x750 generic_shutdown_super+0x126/0x370 kill_anon_super+0x31/0x50 btrfs_kill_super+0x36/0x2b0 deactivate_locked_super+0x80/0xc0 deactivate_super+0x13c/0x150 cleanup_mnt+0x9a/0x130 task_work_run+0x11a/0x1b0 exit_to_usermode_loop+0x107/0x130 do_syscall_64+0x1e5/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [CAUSE] The fuzzed image has a completely screwd up extent tree: leaf 29421568 gen 8 total ptrs 6 free space 3587 owner EXTENT_TREE refs 2 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 5938 item 0 key (12587008 168 4096) itemoff 3942 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 259 offset 0 count 1 item 1 key (12591104 168 8192) itemoff 3889 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 271 offset 0 count 1 item 2 key (`12599296` 168 4096) itemoff 3836 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 259 offset 4096 count 1 item 3 key (29360128 169 0) itemoff 3803 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 item 4 key (29368320 169 1) itemoff 3770 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 item 5 key (29372416 169 0) itemoff 3737 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 Note that leaf 29421568 doesn't have its backref in the extent tree. Thus extent allocator can re-allocate leaf 29421568 for other trees. In short, the bug is caused by: - Existing tree block gets allocated to log tree This got its generation bumped. - Log tree balance cleaned dirty bit of offending tree block It will not be written back to disk, thus no WRITTEN flag. - Original owner of the tree block gets COWed Since the tree block has higher transid, no WRITTEN flag, it's reused, and not traced by transaction::dirty_pages. - Transaction aborted Tree blocks get cleaned according to transaction::dirty_pages. But the offending tree block is not recorded at all. - Filesystem unmount All pages are assumed to be are clean, destroying all workqueue, then call iput(btree_inode). But offending tree block is still dirty, which triggers writeback, and causes use-after-free bug. The detailed sequence looks like this: - Initial status eb: 29421568, header=WRITTEN bflags_dirty=0, page_dirty=0, gen=8, not traced by any dirty extent_iot_tree. - New tree block is allocated Since there is no backref for 29421568, it's re-allocated as new tree block. Keep in mind that tree block 29421568 is still referred by extent tree. - Tree block 29421568 is filled for log tree eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 << (gen bumped) traced by btrfs_root::dirty_log_pages - Some log tree operations Since the fs is using node size 4096, the log tree can easily go a level higher. - Log tree needs balance Tree block 29421568 gets all its content pushed to right, thus now it is empty, and we don't need it. btrfs_clean_tree_block() from __push_leaf_right() get called. eb: 29421568, header=0 bflags_dirty=0, page_dirty=0, gen=9 traced by btrfs_root::dirty_log_pages - Log tree write back btree_write_cache_pages() goes through dirty pages ranges, but since page of tree block 29421568 gets cleaned already, it's not written back to disk. Thus it doesn't have WRITTEN bit set. But ranges in dirty_log_pages are cleared. eb: 29421568, header=0 bflags_dirty=0, page_dirty=0, gen=9 not traced by any dirty extent_iot_tree. - Extent tree update when committing transaction Since tree block 29421568 has transid equal to running trans, and has no WRITTEN bit, should_cow_block() will use it directly without adding it to btrfs_transaction::dirty_pages. eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 not traced by any dirty extent_iot_tree. At this stage, we're doomed. We have a dirty eb not tracked by any extent io tree. - Transaction gets aborted due to corrupted extent tree Btrfs cleans up dirty pages according to transaction::dirty_pages and btrfs_root::dirty_log_pages. But since tree block 29421568 is not tracked by neither of them, it's still dirty. eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 not traced by any dirty extent_iot_tree. - Filesystem unmount Since all cleanup is assumed to be done, all workqueus are destroyed. Then iput(btree_inode) is called, expecting no dirty pages. But tree 29421568 is still dirty, thus triggering writeback. Since all workqueues are already freed, we cause use-after-free. This shows us that, log tree blocks + bad extent tree can cause wild dirty pages. [FIX] To fix the problem, don't submit any btree write bio if the filesytem has any error. This is the last safe net, just in case other cleanup haven't caught catch it. Link: https://github.com/bobfuzzer/CVE/tree/master/CVE-2019-19377 CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:15 +02:00
Frieder Schrempf	0297b7f984	mtd: spinand: Do not erase the block before writing a bad block marker commit `b645ad39d5` upstream. Currently when marking a block, we use spinand_erase_op() to erase the block before writing the marker to the OOB area. Doing so without waiting for the operation to finish can lead to the marking failing silently and no bad block marker being written to the flash. In fact we don't need to do an erase at all before writing the BBM. The ECC is disabled for raw accesses to the OOB data and we don't need to work around any issues with chips reporting ECC errors as it is known to be the case for raw NAND. Fixes: `7529df4652` ("mtd: nand: Add core infrastructure to support SPI NANDs") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20200218100432.32433-4-frieder.schrempf@kontron.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Frieder Schrempf	4da7c98c30	mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers commit `2148937501` upstream. For reading and writing the bad block markers, spinand->oobbuf is currently used as a buffer for the marker bytes. During the underlying read and write operations to actually get/set the content of the OOB area, the content of spinand->oobbuf is reused and changed by accessing it through spinand->oobbuf and/or spinand->databuf. This is a flaw in the original design of the SPI NAND core and at the latest from `13c15e07ee` ("mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache") on, it results in not having the bad block marker written at all, as the spinand->oobbuf is cleared to 0xff after setting the marker bytes to zero. To fix it, we now just store the two bytes for the marker on the stack and let the read/write operations copy it from/to the page buffer later. Fixes: `7529df4652` ("mtd: nand: Add core infrastructure to support SPI NANDs") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20200218100432.32433-2-frieder.schrempf@kontron.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Yilu Lin	c138ad0741	CIFS: Fix bug which the return value by asynchronous read is error commit `97adda8b3a` upstream. This patch is used to fix the bug in collect_uncached_read_data() that rc is automatically converted from a signed number to an unsigned number when the CIFS asynchronous read fails. It will cause ctx->rc is error. Example: Share a directory and create a file on the Windows OS. Mount the directory to the Linux OS using CIFS. On the CIFS client of the Linux OS, invoke the pread interface to deliver the read request. The size of the read length plus offset of the read request is greater than the maximum file size. In this case, the CIFS server on the Windows OS returns a failure message (for example, the return value of smb2.nt_status is STATUS_INVALID_PARAMETER). After receiving the response message, the CIFS client parses smb2.nt_status to STATUS_INVALID_PARAMETER and converts it to the Linux error code (rdata->result=-22). Then the CIFS client invokes the collect_uncached_read_data function to assign the value of rdata->result to rc, that is, rc=rdata->result=-22. The type of the ctx->total_len variable is unsigned integer, the type of the rc variable is integer, and the type of the ctx->rc variable is ssize_t. Therefore, during the ternary operation, the value of rc is automatically converted to an unsigned number. The final result is ctx->rc=4294967274. However, the expected result is ctx->rc=-22. Signed-off-by: Yilu Lin <linyilu@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Steve French	9b35348318	smb3: fix performance regression with setting mtime commit `cf5371ae46` upstream. There are cases when we don't want to send the SMB2 flush operation (e.g. when user specifies mount parm "nostrictsync") and it can be a very expensive operation on the server. In most cases in order to set mtime, we simply need to flush (write) the dirtry pages from the client and send the writes to the server not also send a flush protocol operation to the server. Fixes: `aa081859b1` ("cifs: flush before set-info if we have writeable handles") CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Vitaly Kuznetsov	40888c31ac	KVM: VMX: fix crash cleanup when KVM wasn't used commit `dbef2808af` upstream. If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: `31603d4fc2` ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Sean Christopherson	93a2b73688	KVM: VMX: Add a trampoline to fix VMREAD error handling commit `842f4be958` upstream. Add a hand coded assembly trampoline to preserve volatile registers across vmread_error(), and to handle the calling convention differences between 64-bit and 32-bit due to asmlinkage on vmread_error(). Pass @field and @fault on the stack when invoking the trampoline to avoid clobbering volatile registers in the context of the inline assembly. Calling vmread_error() directly from inline assembly is partially broken on 64-bit, and completely broken on 32-bit. On 64-bit, it will clobber %rdi and %rsi (used to pass @field and @fault) and any volatile regs written by vmread_error(). On 32-bit, asmlinkage means vmread_error() expects the parameters to be passed on the stack, not via regs. Opportunistically zero out the result in the trampoline to save a few bytes of code for every VMREAD. A happy side effect of the trampoline is that the inline code footprint is reduced by three bytes on 64-bit due to PUSH/POP being more efficent (in terms of opcode bytes) than MOV. Fixes: `6e2020977e` ("KVM: VMX: Add error handling to VMREAD helper") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200326160712.28803-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Sean Christopherson	771b9374a5	KVM: x86: Gracefully handle __vmalloc() failure during VM allocation commit `d18b2f43b9` upstream. Check the result of __vmalloc() to avoid dereferencing a NULL pointer in the event that allocation failres. Fixes: `d1e5b0e98e` ("kvm: Make VM ioctl do valloc for some archs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Sean Christopherson	455f37affe	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support commit `31603d4fc2` upstream. VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: `8f536b7697` ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:14 +02:00
Sean Christopherson	bcd1d7462a	KVM: x86: Allocate new rmap and large page tracking when moving memslot commit `edd4fa37ba` upstream. Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: `05da45583d` ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-04-17 10:50:13 +02:00

1 2 3 4 5 ...

877816 commits