From 13bcb80b7ee79431fce361e060611134cb19e209 Mon Sep 17 00:00:00 2001 From: Zhenyu Wang Date: Wed, 20 Feb 2019 16:25:04 +0800 Subject: [PATCH 001/468] drm/i915/gvt: Fix MI_FLUSH_DW parsing with correct index check When MI_FLUSH_DW post write hw status page in index mode, the index value is in dword step and turned into address offset in cmd dword1. As status page size is 4K, so can't exceed that. This fixed upper bound check in cmd parser code which incorrectly stopped VM for reason of invalid MI_FLUSH_DW write index. v2: - Fix upper bound as 4K page size because index value is address offset. Fixes: be1da7070aea ("drm/i915/gvt: vGPU command scanner") Cc: stable@vger.kernel.org # v4.10+ Cc: "Zhao, Yan Y" Reviewed-by: Yan Zhao Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c index 77ae634eb11c..bd95fd6b4ac8 100644 --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c @@ -1446,7 +1446,7 @@ static inline int cmd_address_audit(struct parser_exec_state *s, } if (index_mode) { - if (guest_gma >= I915_GTT_PAGE_SIZE / sizeof(u64)) { + if (guest_gma >= I915_GTT_PAGE_SIZE) { ret = -EFAULT; goto err; } From 1e8b15a1988ed3c7429402017d589422628cdf47 Mon Sep 17 00:00:00 2001 From: Colin Xu Date: Fri, 22 Feb 2019 14:13:42 +0800 Subject: [PATCH 002/468] drm/i915/gvt: Add in context mmio 0x20D8 to gen9 mmio list Depends on GEN family and I915_PARAM_HAS_CONTEXT_ISOLATION, Mesa driver will decide whether constant buffer 0 address is relative or absolute, and load GPU initial state by lri to context mmio INSTPM (GEN8) or 0x20D8 (>=GEN9). Mesa Commit fa8a764b62 ("i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.") INSTPM is already added to gen8_engine_mmio_list, but 0x20D8 is missed in gen9_engine_mmio_list. From GVT point of view, different guest could have different context so should switch those mmio accordingly. v2: Update fixes commit ID. Fixes: 178657139307 ("drm/i915/gvt: vGPU context switch") Reviewed-by: Zhenyu Wang Signed-off-by: Colin Xu Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/mmio_context.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index d6e02c15ef97..38492450552a 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -132,6 +132,7 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS, GEN9_GAMT_ECO_REG_RW_IA, 0x0, false}, /* 0x4ab0 */ {RCS, GEN9_CSFE_CHICKEN1_RCS, 0xffff, false}, /* 0x20d4 */ + {RCS, _MMIO(0x20D8), 0xffff, true}, /* 0x20d8 */ {RCS, GEN8_GARBCNTL, 0x0, false}, /* 0xb004 */ {RCS, GEN7_FF_THREAD_MODE, 0x0, false}, /* 0x20a0 */ From e20119f7eaaaf6aad5b44f35155ce500429e17f6 Mon Sep 17 00:00:00 2001 From: Takeshi Kihara Date: Thu, 21 Feb 2019 13:59:38 +0100 Subject: [PATCH 003/468] arm64: dts: renesas: r8a77990: Fix SCIF5 DMA channels According to the R-Car Gen3 Hardware Manual Errata for Rev 1.50 of Feb 12, 2019, the DMA channels for SCIF5 are corrected from 16..47 to 0..15 on R-Car E3. Signed-off-by: Takeshi Kihara Fixes: a5ebe5e49a862e21 ("arm64: dts: renesas: r8a77990: Add SCIF-{0,1,3,4,5} device nodes") Signed-off-by: Geert Uytterhoeven Reviewed-by: Fabrizio Castro Signed-off-by: Simon Horman --- arch/arm64/boot/dts/renesas/r8a77990.dtsi | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/renesas/r8a77990.dtsi b/arch/arm64/boot/dts/renesas/r8a77990.dtsi index 786178cf1ffd..36f409091cfc 100644 --- a/arch/arm64/boot/dts/renesas/r8a77990.dtsi +++ b/arch/arm64/boot/dts/renesas/r8a77990.dtsi @@ -2,7 +2,7 @@ /* * Device Tree Source for the R-Car E3 (R8A77990) SoC * - * Copyright (C) 2018 Renesas Electronics Corp. + * Copyright (C) 2018-2019 Renesas Electronics Corp. */ #include @@ -1042,9 +1042,8 @@ scif5: serial@e6f30000 { <&cpg CPG_CORE R8A77990_CLK_S3D1C>, <&scif_clk>; clock-names = "fck", "brg_int", "scif_clk"; - dmas = <&dmac1 0x5b>, <&dmac1 0x5a>, - <&dmac2 0x5b>, <&dmac2 0x5a>; - dma-names = "tx", "rx", "tx", "rx"; + dmas = <&dmac0 0x5b>, <&dmac0 0x5a>; + dma-names = "tx", "rx"; power-domains = <&sysc R8A77990_PD_ALWAYS_ON>; resets = <&cpg 202>; status = "disabled"; From c21cd4ae82e169b394139243684aaacf31bfcdf8 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 21 Feb 2019 15:04:28 +0100 Subject: [PATCH 004/468] arm64: dts: renesas: r8a774c0: Fix SCIF5 DMA channels Correct the DMA channels for SCIF5 from 16..47 to 0..15, as was done for R-Car E3. Signed-off-by: Takeshi Kihara Fixes: 2660a6af690ebbb4 ("arm64: dts: renesas: r8a774c0: Add SCIF and HSCIF nodes") Signed-off-by: Geert Uytterhoeven Reviewed-by: Fabrizio Castro Signed-off-by: Simon Horman --- arch/arm64/boot/dts/renesas/r8a774c0.dtsi | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/renesas/r8a774c0.dtsi b/arch/arm64/boot/dts/renesas/r8a774c0.dtsi index f2e390f7f1d5..6590b63268b0 100644 --- a/arch/arm64/boot/dts/renesas/r8a774c0.dtsi +++ b/arch/arm64/boot/dts/renesas/r8a774c0.dtsi @@ -2,7 +2,7 @@ /* * Device Tree Source for the RZ/G2E (R8A774C0) SoC * - * Copyright (C) 2018 Renesas Electronics Corp. + * Copyright (C) 2018-2019 Renesas Electronics Corp. */ #include @@ -990,9 +990,8 @@ scif5: serial@e6f30000 { <&cpg CPG_CORE R8A774C0_CLK_S3D1C>, <&scif_clk>; clock-names = "fck", "brg_int", "scif_clk"; - dmas = <&dmac1 0x5b>, <&dmac1 0x5a>, - <&dmac2 0x5b>, <&dmac2 0x5a>; - dma-names = "tx", "rx", "tx", "rx"; + dmas = <&dmac0 0x5b>, <&dmac0 0x5a>; + dma-names = "tx", "rx"; power-domains = <&sysc R8A774C0_PD_ALWAYS_ON>; resets = <&cpg 202>; status = "disabled"; From 9f4984773240cf004cbffb99316af3269bbb5e26 Mon Sep 17 00:00:00 2001 From: Weinan Li Date: Wed, 27 Feb 2019 15:36:58 +0800 Subject: [PATCH 005/468] drm/i915/gvt: stop scheduling workload when vgpu is inactive There is one corner case that workload_thread may pick and dispatch one workload of vgpu after it's already deactivated. Below is the scenario: 1. deactive_vgpu got the vgpu_lock, it found pending workload was submitted, then it released the vgpu_lock and wait for vgpu idle. 2. before deactive_vgpu got the vgpu_lock back, workload_thread might pick one new valid workload, then it was blocked by the vgpu_lock. 3. deactive_vgpu got the vgpu_lock again, finished the last processes of deactivating, then release the vgpu_lock. 4. workload_thread got the vgpu_lock, then it will try to dispatch the fetched workload. It's not expected one workload of deactivated vgpu is dispatched. The solution is to add condition check of the vgpu's active flag and stop to schedule when it's inactive. Reviewed-by: Zhenyu Wang Signed-off-by: Weinan Li Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/scheduler.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 55bb7885e228..af97df10495e 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -738,7 +738,8 @@ static struct intel_vgpu_workload *pick_next_workload( goto out; } - if (list_empty(workload_q_head(scheduler->current_vgpu, ring_id))) + if (!scheduler->current_vgpu->active || + list_empty(workload_q_head(scheduler->current_vgpu, ring_id))) goto out; /* From f552e7bd028f93157b206bd1c7e3e195e919e8c2 Mon Sep 17 00:00:00 2001 From: Zhenyu Wang Date: Fri, 1 Mar 2019 15:55:04 +0800 Subject: [PATCH 006/468] drm/i915/gvt: Don't submit request for error workload dispatch As vGPU shadow ctx is loaded with guest context state, arbitrarily submitting request in error workload dispatch path would cause trouble. So don't try to submit in error path now like in previous code. This is to fix VM failure when GPU hang happens. Fixes: f0e994372518 ("drm/i915/gvt: Fix workload request allocation before request add") Reviewed-by: Xiong Zhang Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/scheduler.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index af97df10495e..817d46f25f09 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -677,6 +677,7 @@ static int dispatch_workload(struct intel_vgpu_workload *workload) { struct intel_vgpu *vgpu = workload->vgpu; struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; + struct i915_request *rq; int ring_id = workload->ring_id; int ret; @@ -702,6 +703,14 @@ static int dispatch_workload(struct intel_vgpu_workload *workload) ret = prepare_workload(workload); out: + if (ret) { + /* We might still need to add request with + * clean ctx to retire it properly.. + */ + rq = fetch_and_zero(&workload->req); + i915_request_put(rq); + } + if (!IS_ERR_OR_NULL(workload->req)) { gvt_dbg_sched("ring id %d submit workload to i915 %p\n", ring_id, workload->req); From bfb1ce1259ca201b50aa4ab5ec7e19266ef46896 Mon Sep 17 00:00:00 2001 From: Omer Shpigelman Date: Tue, 5 Mar 2019 10:59:16 +0200 Subject: [PATCH 007/468] habanalabs: fix MMU number of pages calculation The requested allocation size is 64bit, hence the number of requested pages and the total requested size should 64bit as well. This patch fixes all places where these are treated as 32bit. Signed-off-by: Omer Shpigelman Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/debugfs.c | 7 ++++--- drivers/misc/habanalabs/habanalabs.h | 8 ++++---- drivers/misc/habanalabs/memory.c | 29 ++++++++++++++-------------- 3 files changed, 23 insertions(+), 21 deletions(-) diff --git a/drivers/misc/habanalabs/debugfs.c b/drivers/misc/habanalabs/debugfs.c index a53c12aff6ad..974a87789bd8 100644 --- a/drivers/misc/habanalabs/debugfs.c +++ b/drivers/misc/habanalabs/debugfs.c @@ -232,6 +232,7 @@ static int vm_show(struct seq_file *s, void *data) struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; enum vm_type_t *vm_type; bool once = true; + u64 j; int i; if (!dev_entry->hdev->mmu_enable) @@ -260,7 +261,7 @@ static int vm_show(struct seq_file *s, void *data) } else { phys_pg_pack = hnode->ptr; seq_printf(s, - " 0x%-14llx %-10u %-4u\n", + " 0x%-14llx %-10llu %-4u\n", hnode->vaddr, phys_pg_pack->total_size, phys_pg_pack->handle); } @@ -282,9 +283,9 @@ static int vm_show(struct seq_file *s, void *data) phys_pg_pack->page_size); seq_puts(s, " physical address\n"); seq_puts(s, "---------------------\n"); - for (i = 0 ; i < phys_pg_pack->npages ; i++) { + for (j = 0 ; j < phys_pg_pack->npages ; j++) { seq_printf(s, " 0x%-14llx\n", - phys_pg_pack->pages[i]); + phys_pg_pack->pages[j]); } } spin_unlock(&vm->idr_lock); diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h index a7c95e9f9b9a..806f0a5ee4d8 100644 --- a/drivers/misc/habanalabs/habanalabs.h +++ b/drivers/misc/habanalabs/habanalabs.h @@ -793,11 +793,11 @@ struct hl_vm_hash_node { * struct hl_vm_phys_pg_pack - physical page pack. * @vm_type: describes the type of the virtual area descriptor. * @pages: the physical page array. + * @npages: num physical pages in the pack. + * @total_size: total size of all the pages in this list. * @mapping_cnt: number of shared mappings. * @asid: the context related to this list. - * @npages: num physical pages in the pack. * @page_size: size of each page in the pack. - * @total_size: total size of all the pages in this list. * @flags: HL_MEM_* flags related to this list. * @handle: the provided handle related to this list. * @offset: offset from the first page. @@ -807,11 +807,11 @@ struct hl_vm_hash_node { struct hl_vm_phys_pg_pack { enum vm_type_t vm_type; /* must be first */ u64 *pages; + u64 npages; + u64 total_size; atomic_t mapping_cnt; u32 asid; - u32 npages; u32 page_size; - u32 total_size; u32 flags; u32 handle; u32 offset; diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c index 3a12fd1a5274..4f8c968e441a 100644 --- a/drivers/misc/habanalabs/memory.c +++ b/drivers/misc/habanalabs/memory.c @@ -56,9 +56,9 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args, struct hl_device *hdev = ctx->hdev; struct hl_vm *vm = &hdev->vm; struct hl_vm_phys_pg_pack *phys_pg_pack; - u64 paddr = 0; - u32 total_size, num_pgs, num_curr_pgs, page_size, page_shift; - int handle, rc, i; + u64 paddr = 0, total_size, num_pgs, i; + u32 num_curr_pgs, page_size, page_shift; + int handle, rc; bool contiguous; num_curr_pgs = 0; @@ -73,7 +73,7 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args, paddr = (u64) gen_pool_alloc(vm->dram_pg_pool, total_size); if (!paddr) { dev_err(hdev->dev, - "failed to allocate %u huge contiguous pages\n", + "failed to allocate %llu huge contiguous pages\n", num_pgs); return -ENOMEM; } @@ -267,7 +267,7 @@ static void free_phys_pg_pack(struct hl_device *hdev, struct hl_vm_phys_pg_pack *phys_pg_pack) { struct hl_vm *vm = &hdev->vm; - int i; + u64 i; if (!phys_pg_pack->created_from_userptr) { if (phys_pg_pack->contiguous) { @@ -519,7 +519,7 @@ static inline int add_va_block(struct hl_device *hdev, * - Return the start address of the virtual block */ static u64 get_va_block(struct hl_device *hdev, - struct hl_va_range *va_range, u32 size, u64 hint_addr, + struct hl_va_range *va_range, u64 size, u64 hint_addr, bool is_userptr) { struct hl_vm_va_block *va_block, *new_va_block = NULL; @@ -577,7 +577,8 @@ static u64 get_va_block(struct hl_device *hdev, } if (!new_va_block) { - dev_err(hdev->dev, "no available va block for size %u\n", size); + dev_err(hdev->dev, "no available va block for size %llu\n", + size); goto out; } @@ -648,8 +649,8 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, struct hl_vm_phys_pg_pack *phys_pg_pack; struct scatterlist *sg; dma_addr_t dma_addr; - u64 page_mask; - u32 npages, total_npages, page_size = PAGE_SIZE; + u64 page_mask, total_npages; + u32 npages, page_size = PAGE_SIZE; bool first = true, is_huge_page_opt = true; int rc, i, j; @@ -750,9 +751,9 @@ static int map_phys_page_pack(struct hl_ctx *ctx, u64 vaddr, struct hl_vm_phys_pg_pack *phys_pg_pack) { struct hl_device *hdev = ctx->hdev; - u64 next_vaddr = vaddr, paddr; + u64 next_vaddr = vaddr, paddr, mapped_pg_cnt = 0, i; u32 page_size = phys_pg_pack->page_size; - int i, rc = 0, mapped_pg_cnt = 0; + int rc = 0; for (i = 0 ; i < phys_pg_pack->npages ; i++) { paddr = phys_pg_pack->pages[i]; @@ -764,7 +765,7 @@ static int map_phys_page_pack(struct hl_ctx *ctx, u64 vaddr, rc = hl_mmu_map(ctx, next_vaddr, paddr, page_size); if (rc) { dev_err(hdev->dev, - "map failed for handle %u, npages: %d, mapped: %d", + "map failed for handle %u, npages: %llu, mapped: %llu", phys_pg_pack->handle, phys_pg_pack->npages, mapped_pg_cnt); goto err; @@ -985,10 +986,10 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr) struct hl_vm_hash_node *hnode = NULL; struct hl_userptr *userptr = NULL; enum vm_type_t *vm_type; - u64 next_vaddr; + u64 next_vaddr, i; u32 page_size; bool is_userptr; - int i, rc; + int rc; /* protect from double entrance */ mutex_lock(&ctx->mem_hash_lock); From 4eb1d1253ddd95e985c57fc99e9de6802dd2d867 Mon Sep 17 00:00:00 2001 From: Omer Shpigelman Date: Thu, 7 Mar 2019 15:47:19 +0200 Subject: [PATCH 008/468] habanalabs: fix bug when mapping very large memory area This patch fixes a bug of allocating a too big memory size with kmalloc, which causes a failure. In case of mapping a large memory block, an array of the relevant physical page addresses is allocated. If there are many pages the array might be too big to allocate with kmalloc, hence changing to kvmalloc. Signed-off-by: Omer Shpigelman Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/memory.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c index 4f8c968e441a..ce1fda40a8b8 100644 --- a/drivers/misc/habanalabs/memory.c +++ b/drivers/misc/habanalabs/memory.c @@ -93,7 +93,7 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args, phys_pg_pack->flags = args->flags; phys_pg_pack->contiguous = contiguous; - phys_pg_pack->pages = kcalloc(num_pgs, sizeof(u64), GFP_KERNEL); + phys_pg_pack->pages = kvmalloc_array(num_pgs, sizeof(u64), GFP_KERNEL); if (!phys_pg_pack->pages) { rc = -ENOMEM; goto pages_arr_err; @@ -148,7 +148,7 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args, gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[i], page_size); - kfree(phys_pg_pack->pages); + kvfree(phys_pg_pack->pages); pages_arr_err: kfree(phys_pg_pack); pages_pack_err: @@ -288,7 +288,7 @@ static void free_phys_pg_pack(struct hl_device *hdev, } } - kfree(phys_pg_pack->pages); + kvfree(phys_pg_pack->pages); kfree(phys_pg_pack); } @@ -692,7 +692,8 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, page_mask = ~(((u64) page_size) - 1); - phys_pg_pack->pages = kcalloc(total_npages, sizeof(u64), GFP_KERNEL); + phys_pg_pack->pages = kvmalloc_array(total_npages, sizeof(u64), + GFP_KERNEL); if (!phys_pg_pack->pages) { rc = -ENOMEM; goto page_pack_arr_mem_err; From f650a95b71026f5940804f273f9c36b60634131f Mon Sep 17 00:00:00 2001 From: Omer Shpigelman Date: Wed, 13 Mar 2019 13:36:28 +0200 Subject: [PATCH 009/468] habanalabs: complete user context cleanup before hard reset This patch fixes a bug which led to a crash during hard reset flow. Before a hard reset is executed, we wait a few seconds for the user context cleanup to complete. If it wasn't completed, we kill the user process and move on to the reset flow. Upon killing the user process, the context cleanup flow begins and may take a while due to MMU unmaps. Meanwhile, in the driver reset flow, we change the PCI DRAM bar location which can interfere with the MMU that uses the bar. If the context cleanup flow didn't finish quickly, a crash may occur due to PCI DRAM bar mislocation during the MMU unmap. Hence adding a wait between killing the user process and the start of the reset flow. Signed-off-by: Omer Shpigelman Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/device.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c index de46aa6ed154..93d67983ddba 100644 --- a/drivers/misc/habanalabs/device.c +++ b/drivers/misc/habanalabs/device.c @@ -11,6 +11,8 @@ #include #include +#define HL_PLDM_PENDING_RESET_PER_SEC (HL_PENDING_RESET_PER_SEC * 10) + bool hl_device_disabled_or_in_reset(struct hl_device *hdev) { if ((hdev->disabled) || (atomic_read(&hdev->in_reset))) @@ -462,9 +464,16 @@ static void hl_device_hard_reset_pending(struct work_struct *work) struct hl_device_reset_work *device_reset_work = container_of(work, struct hl_device_reset_work, reset_work); struct hl_device *hdev = device_reset_work->hdev; - u16 pending_cnt = HL_PENDING_RESET_PER_SEC; + u16 pending_total, pending_cnt; struct task_struct *task = NULL; + if (hdev->pldm) + pending_total = HL_PLDM_PENDING_RESET_PER_SEC; + else + pending_total = HL_PENDING_RESET_PER_SEC; + + pending_cnt = pending_total; + /* Flush all processes that are inside hl_open */ mutex_lock(&hdev->fd_open_cnt_lock); @@ -489,6 +498,19 @@ static void hl_device_hard_reset_pending(struct work_struct *work) } } + pending_cnt = pending_total; + + while ((atomic_read(&hdev->fd_open_cnt)) && (pending_cnt)) { + + pending_cnt--; + + ssleep(1); + } + + if (atomic_read(&hdev->fd_open_cnt)) + dev_crit(hdev->dev, + "Going to hard reset with open user contexts\n"); + mutex_unlock(&hdev->fd_open_cnt_lock); hl_device_reset(hdev, true, true); From d12a5e2458d49aad2b7d25766794eec95ae8f6f1 Mon Sep 17 00:00:00 2001 From: Omer Shpigelman Date: Thu, 14 Mar 2019 16:54:45 +0200 Subject: [PATCH 010/468] habanalabs: fix mapping with page size bigger than 4KB This patch fixes the mapping of virtual address to physical addresses on architectures where PAGE_SIZE is bigger than 4KB. The break down to the device page size was done only for the virtual address while it should have been done for the physical address as well. As a result virtual addresses were mapped to wrong physical address. The fix is to apply the break down for the physical addresses as well in order to get correct mappings. Signed-off-by: Omer Shpigelman Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/mmu.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c index 2f2e99cb2743..3a5a2cec8305 100644 --- a/drivers/misc/habanalabs/mmu.c +++ b/drivers/misc/habanalabs/mmu.c @@ -832,7 +832,7 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr, int hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr, u32 page_size) { struct hl_device *hdev = ctx->hdev; - u64 real_virt_addr; + u64 real_virt_addr, real_phys_addr; u32 real_page_size, npages; int i, rc, mapped_cnt = 0; @@ -857,14 +857,16 @@ int hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr, u32 page_size) npages = page_size / real_page_size; real_virt_addr = virt_addr; + real_phys_addr = phys_addr; for (i = 0 ; i < npages ; i++) { - rc = _hl_mmu_map(ctx, real_virt_addr, phys_addr, + rc = _hl_mmu_map(ctx, real_virt_addr, real_phys_addr, real_page_size); if (rc) goto err; real_virt_addr += real_page_size; + real_phys_addr += real_page_size; mapped_cnt++; } From cbaa99ed1b697072f089693a7fe2d649d08bf317 Mon Sep 17 00:00:00 2001 From: Oded Gabbay Date: Sun, 3 Mar 2019 15:13:15 +0200 Subject: [PATCH 011/468] habanalabs: perform accounting for active CS This patch adds accounting for active CS. Active means that the CS was submitted to the H/W queues and was not completed yet. This is necessary to support suspend operation. Because the device will be reset upon suspend, we can only suspend after all active CS have been completed. Hence, we need to perform accounting on their number. Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/command_submission.c | 6 ++++++ drivers/misc/habanalabs/device.c | 1 + drivers/misc/habanalabs/habanalabs.h | 13 ++++++++----- drivers/misc/habanalabs/hw_queue.c | 5 +++-- 4 files changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c index 3525236ed8d9..19c84214a7ea 100644 --- a/drivers/misc/habanalabs/command_submission.c +++ b/drivers/misc/habanalabs/command_submission.c @@ -179,6 +179,12 @@ static void cs_do_release(struct kref *ref) /* We also need to update CI for internal queues */ if (cs->submitted) { + int cs_cnt = atomic_dec_return(&hdev->cs_active_cnt); + + WARN_ONCE((cs_cnt < 0), + "hl%d: error in CS active cnt %d\n", + hdev->id, cs_cnt); + hl_int_hw_queue_update_ci(cs); spin_lock(&hdev->hw_queues_mirror_lock); diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c index 93d67983ddba..470d8005b50e 100644 --- a/drivers/misc/habanalabs/device.c +++ b/drivers/misc/habanalabs/device.c @@ -218,6 +218,7 @@ static int device_early_init(struct hl_device *hdev) spin_lock_init(&hdev->hw_queues_mirror_lock); atomic_set(&hdev->in_reset, 0); atomic_set(&hdev->fd_open_cnt, 0); + atomic_set(&hdev->cs_active_cnt, 0); return 0; diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h index 806f0a5ee4d8..a8ee52c880cd 100644 --- a/drivers/misc/habanalabs/habanalabs.h +++ b/drivers/misc/habanalabs/habanalabs.h @@ -1056,13 +1056,15 @@ struct hl_device_reset_work { * @cb_pool_lock: protects the CB pool. * @user_ctx: current user context executing. * @dram_used_mem: current DRAM memory consumption. - * @in_reset: is device in reset flow. - * @curr_pll_profile: current PLL profile. - * @fd_open_cnt: number of open user processes. * @timeout_jiffies: device CS timeout value. * @max_power: the max power of the device, as configured by the sysadmin. This * value is saved so in case of hard-reset, KMD will restore this * value and update the F/W after the re-initialization + * @in_reset: is device in reset flow. + * @curr_pll_profile: current PLL profile. + * @fd_open_cnt: number of open user processes. + * @cs_active_cnt: number of active command submissions on this device (active + * means already in H/W queues) * @major: habanalabs KMD major. * @high_pll: high PLL profile frequency. * @soft_reset_cnt: number of soft reset since KMD loading. @@ -1128,11 +1130,12 @@ struct hl_device { struct hl_ctx *user_ctx; atomic64_t dram_used_mem; + u64 timeout_jiffies; + u64 max_power; atomic_t in_reset; atomic_t curr_pll_profile; atomic_t fd_open_cnt; - u64 timeout_jiffies; - u64 max_power; + atomic_t cs_active_cnt; u32 major; u32 high_pll; u32 soft_reset_cnt; diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c index 67bece26417c..ef3bb6951360 100644 --- a/drivers/misc/habanalabs/hw_queue.c +++ b/drivers/misc/habanalabs/hw_queue.c @@ -370,12 +370,13 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs) spin_unlock(&hdev->hw_queues_mirror_lock); } - list_for_each_entry_safe(job, tmp, &cs->job_list, cs_node) { + atomic_inc(&hdev->cs_active_cnt); + + list_for_each_entry_safe(job, tmp, &cs->job_list, cs_node) if (job->ext_queue) ext_hw_queue_schedule_job(job); else int_hw_queue_schedule_job(job); - } cs->submitted = true; From 7cb5101ee0107376f8eace195a138f99174e80ff Mon Sep 17 00:00:00 2001 From: Oded Gabbay Date: Sun, 3 Mar 2019 22:29:20 +0200 Subject: [PATCH 012/468] habanalabs: prevent host crash during suspend/resume This patch fixes the implementation of suspend/resume of the device so that upon resume of the device, the host won't crash due to PCI completion timeout. Upon suspend, the device is being reset due to PERST. Therefore, upon resume, the driver must initialize the PCI controller as if the driver was loaded. If the controller is not initialized and the device tries to access the device through the PCI bars, the host will crash with PCI completion timeout error. Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/device.c | 46 +++++++++++++++++++-- drivers/misc/habanalabs/goya/goya.c | 63 +---------------------------- 2 files changed, 43 insertions(+), 66 deletions(-) diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c index 470d8005b50e..77d51be66c7e 100644 --- a/drivers/misc/habanalabs/device.c +++ b/drivers/misc/habanalabs/device.c @@ -416,6 +416,27 @@ int hl_device_suspend(struct hl_device *hdev) pci_save_state(hdev->pdev); + /* Block future CS/VM/JOB completion operations */ + rc = atomic_cmpxchg(&hdev->in_reset, 0, 1); + if (rc) { + dev_err(hdev->dev, "Can't suspend while in reset\n"); + return -EIO; + } + + /* This blocks all other stuff that is not blocked by in_reset */ + hdev->disabled = true; + + /* + * Flush anyone that is inside the critical section of enqueue + * jobs to the H/W + */ + hdev->asic_funcs->hw_queues_lock(hdev); + hdev->asic_funcs->hw_queues_unlock(hdev); + + /* Flush processes that are sending message to CPU */ + mutex_lock(&hdev->send_cpu_message_lock); + mutex_unlock(&hdev->send_cpu_message_lock); + rc = hdev->asic_funcs->suspend(hdev); if (rc) dev_err(hdev->dev, @@ -443,21 +464,38 @@ int hl_device_resume(struct hl_device *hdev) pci_set_power_state(hdev->pdev, PCI_D0); pci_restore_state(hdev->pdev); - rc = pci_enable_device(hdev->pdev); + rc = pci_enable_device_mem(hdev->pdev); if (rc) { dev_err(hdev->dev, "Failed to enable PCI device in resume\n"); return rc; } + pci_set_master(hdev->pdev); + rc = hdev->asic_funcs->resume(hdev); if (rc) { - dev_err(hdev->dev, - "Failed to enable PCI access from device CPU\n"); - return rc; + dev_err(hdev->dev, "Failed to resume device after suspend\n"); + goto disable_device; + } + + + hdev->disabled = false; + atomic_set(&hdev->in_reset, 0); + + rc = hl_device_reset(hdev, true, false); + if (rc) { + dev_err(hdev->dev, "Failed to reset device during resume\n"); + goto disable_device; } return 0; + +disable_device: + pci_clear_master(hdev->pdev); + pci_disable_device(hdev->pdev); + + return rc; } static void hl_device_hard_reset_pending(struct work_struct *work) diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c index 238dd57c541b..538d8d59d9dc 100644 --- a/drivers/misc/habanalabs/goya/goya.c +++ b/drivers/misc/habanalabs/goya/goya.c @@ -1201,15 +1201,6 @@ static int goya_stop_external_queues(struct hl_device *hdev) return retval; } -static void goya_resume_external_queues(struct hl_device *hdev) -{ - WREG32(mmDMA_QM_0_GLBL_CFG1, 0); - WREG32(mmDMA_QM_1_GLBL_CFG1, 0); - WREG32(mmDMA_QM_2_GLBL_CFG1, 0); - WREG32(mmDMA_QM_3_GLBL_CFG1, 0); - WREG32(mmDMA_QM_4_GLBL_CFG1, 0); -} - /* * goya_init_cpu_queues - Initialize PQ/CQ/EQ of CPU * @@ -2178,36 +2169,6 @@ static int goya_stop_internal_queues(struct hl_device *hdev) return retval; } -static void goya_resume_internal_queues(struct hl_device *hdev) -{ - WREG32(mmMME_QM_GLBL_CFG1, 0); - WREG32(mmMME_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC0_QM_GLBL_CFG1, 0); - WREG32(mmTPC0_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC1_QM_GLBL_CFG1, 0); - WREG32(mmTPC1_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC2_QM_GLBL_CFG1, 0); - WREG32(mmTPC2_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC3_QM_GLBL_CFG1, 0); - WREG32(mmTPC3_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC4_QM_GLBL_CFG1, 0); - WREG32(mmTPC4_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC5_QM_GLBL_CFG1, 0); - WREG32(mmTPC5_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC6_QM_GLBL_CFG1, 0); - WREG32(mmTPC6_CMDQ_GLBL_CFG1, 0); - - WREG32(mmTPC7_QM_GLBL_CFG1, 0); - WREG32(mmTPC7_CMDQ_GLBL_CFG1, 0); -} - static void goya_dma_stall(struct hl_device *hdev) { WREG32(mmDMA_QM_0_GLBL_CFG1, 1 << DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT); @@ -2905,20 +2866,6 @@ int goya_suspend(struct hl_device *hdev) { int rc; - rc = goya_stop_internal_queues(hdev); - - if (rc) { - dev_err(hdev->dev, "failed to stop internal queues\n"); - return rc; - } - - rc = goya_stop_external_queues(hdev); - - if (rc) { - dev_err(hdev->dev, "failed to stop external queues\n"); - return rc; - } - rc = goya_send_pci_access_msg(hdev, ARMCP_PACKET_DISABLE_PCI_ACCESS); if (rc) dev_err(hdev->dev, "Failed to disable PCI access from CPU\n"); @@ -2928,15 +2875,7 @@ int goya_suspend(struct hl_device *hdev) int goya_resume(struct hl_device *hdev) { - int rc; - - goya_resume_external_queues(hdev); - goya_resume_internal_queues(hdev); - - rc = goya_send_pci_access_msg(hdev, ARMCP_PACKET_ENABLE_PCI_ACCESS); - if (rc) - dev_err(hdev->dev, "Failed to enable PCI access from CPU\n"); - return rc; + return goya_init_iatu(hdev); } static int goya_cb_mmap(struct hl_device *hdev, struct vm_area_struct *vma, From 7c22278edd0a931c565a8511dfc1bc57ffbb9166 Mon Sep 17 00:00:00 2001 From: Oded Gabbay Date: Sun, 3 Mar 2019 10:23:29 +0200 Subject: [PATCH 013/468] habanalabs: cast to expected type This patch fix the following sparse warning: drivers/misc/habanalabs/goya/goya.c:3646:14: warning: incorrect type in assignment (different address spaces) drivers/misc/habanalabs/goya/goya.c:3646:14: expected void *base drivers/misc/habanalabs/goya/goya.c:3646:14: got void [noderef] * Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/goya/goya.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c index 538d8d59d9dc..ea979ebd62fb 100644 --- a/drivers/misc/habanalabs/goya/goya.c +++ b/drivers/misc/habanalabs/goya/goya.c @@ -3009,7 +3009,7 @@ void *goya_get_int_queue_base(struct hl_device *hdev, u32 queue_id, *dma_handle = hdev->asic_prop.sram_base_address; - base = hdev->pcie_bar[SRAM_CFG_BAR_ID]; + base = (void *) hdev->pcie_bar[SRAM_CFG_BAR_ID]; switch (queue_id) { case GOYA_QUEUE_ID_MME: From 1e18d5e6731d674fee0bb4b66f5ea61e504452a3 Mon Sep 17 00:00:00 2001 From: Zhenyu Wang Date: Fri, 1 Mar 2019 15:04:12 +0800 Subject: [PATCH 014/468] drm/i915/gvt: Only assign ppgtt root at dispatch time This moves ppgtt root hook out of scan and shadow function, as it's only required at dispatch time. Also make sure this checks against shadow mm to be ready, otherwise bail to fail earlier. Reviewed-by: Xiong Zhang Cc: Xiong Zhang Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/scheduler.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 817d46f25f09..0f2f2c43c0e0 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -345,7 +345,7 @@ static int set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload, int i = 0; if (mm->type != INTEL_GVT_MM_PPGTT || !mm->ppgtt_mm.shadowed) - return -1; + return -EINVAL; if (mm->ppgtt_mm.root_entry_type == GTT_TYPE_PPGTT_ROOT_L4_ENTRY) { px_dma(&ppgtt->pml4) = mm->ppgtt_mm.shadow_pdps[0]; @@ -409,12 +409,6 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload) if (workload->shadow) return 0; - ret = set_context_ppgtt_from_shadow(workload, shadow_ctx); - if (ret < 0) { - gvt_vgpu_err("workload shadow ppgtt isn't ready\n"); - return ret; - } - /* pin shadow context by gvt even the shadow context will be pinned * when i915 alloc request. That is because gvt will update the guest * context from shadow context when workload is completed, and at that @@ -677,6 +671,8 @@ static int dispatch_workload(struct intel_vgpu_workload *workload) { struct intel_vgpu *vgpu = workload->vgpu; struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; + struct intel_vgpu_submission *s = &vgpu->submission; + struct i915_gem_context *shadow_ctx = s->shadow_ctx; struct i915_request *rq; int ring_id = workload->ring_id; int ret; @@ -687,6 +683,12 @@ static int dispatch_workload(struct intel_vgpu_workload *workload) mutex_lock(&vgpu->vgpu_lock); mutex_lock(&dev_priv->drm.struct_mutex); + ret = set_context_ppgtt_from_shadow(workload, shadow_ctx); + if (ret < 0) { + gvt_vgpu_err("workload shadow ppgtt isn't ready\n"); + goto err_req; + } + ret = intel_gvt_workload_req_alloc(workload); if (ret) goto err_req; From 72aabfb862e40ee83c136c4f87877c207e6859b7 Mon Sep 17 00:00:00 2001 From: Zhenyu Wang Date: Fri, 1 Mar 2019 15:04:13 +0800 Subject: [PATCH 015/468] drm/i915/gvt: Add mutual lock for ppgtt mm LRU list This adds mutex to guard against update of global ppgtt mm LRU list. To resolve error found as below warning. [73130.012162] ------------[ cut here ]------------ [73130.012168] list_add corruption. prev->next should be next (ffff995f970cca50), but was 0000000000000000. (prev=ffff995f0dc5bdf8). [73130.012181] WARNING: CPU: 3 PID: 82 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 [73130.012183] Modules linked in: btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) dm_mod(E) kvmgt(E) fuse(E) xt_addrtype(E) nft_compat(E) xt_conntrack(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) devlink(E) nf_tables(E) nfnetlink(E) loop(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) mei_me(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) intel_cstate(E) intel_uncore(E) mei(E) intel_pch_thermal(E) intel_rapl_perf(E) pcspkr(E) iTCO_wdt(E) iTCO_vendor_support(E) idma64(E) sg(E) virt_dma(E) acpi_pad(E) evdev(E) binfmt_misc(E) ip_tables(E) x_tables(E) ipv6(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) xhci_pci(E) sdhci_pci(E) cqhci(E) intel_lpss_pci(E) intel_lpss(E) crc32c_intel(E) xhci_hcd(E) sdhci(E) i2c_i801(E) e1000e(E) mmc_core(E) [73130.012218] ptp(E) pps_core(E) usbcore(E) mfd_core(E) sd_mod(E) fan(E) thermal(E) [73130.012227] CPU: 3 PID: 82 Comm: gvt workload 0 Tainted: G W E 5.0.0-rc7-staging-190226+ #282 [73130.012228] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0039.2016.0316.1747 03/16/2016 [73130.012232] RIP: 0010:__list_add_valid+0x4d/0x70 [73130.012234] Code: c3 48 89 d1 48 c7 c7 e0 82 91 bb 48 89 c2 e8 44 8a cc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 30 83 91 bb e8 2d 8a cc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 83 91 bb e8 [73130.012236] RSP: 0018:ffffa4924107fdd0 EFLAGS: 00010286 [73130.012238] RAX: 0000000000000000 RBX: ffff995d8a5ccf00 RCX: 0000000000000006 [73130.012240] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff995faad96680 [73130.012241] RBP: 0000000000000000 R08: 0000000000213a28 R09: 0000000000000084 [73130.012243] R10: 0000000000000000 R11: ffffa4924107fc70 R12: ffff995d8a5ccf78 [73130.012245] R13: ffff995f970c8000 R14: ffff995f0dc5bdf8 R15: ffff995f970cca50 [73130.012247] FS: 0000000000000000(0000) GS:ffff995faad80000(0000) knlGS:0000000000000000 [73130.012249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [73130.012250] CR2: 00000222e1891000 CR3: 0000000116848002 CR4: 00000000003626e0 [73130.012252] Call Trace: [73130.012258] intel_vgpu_pin_mm+0x7a/0xa0 [73130.012262] workload_thread+0x683/0x12a0 [73130.012266] ? do_wait_intr_irq+0xb0/0xb0 [73130.012269] ? finish_wait+0x80/0x80 [73130.012271] ? intel_vgpu_clean_workloads+0x110/0x110 [73130.012274] kthread+0x116/0x130 [73130.012276] ? kthread_bind+0x30/0x30 [73130.012280] ret_from_fork+0x35/0x40 [73130.012285] WARNING: CPU: 3 PID: 82 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 [73130.012286] ---[ end trace 458a2e792eec21c0 ]--- v2: - simplify lock handling Reviewed-by: Xiong Zhang Cc: Xiong Zhang Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/gtt.c | 14 +++++++++++++- drivers/gpu/drm/i915/gvt/gtt.h | 1 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index c7103dd2d8d5..d7052ab7908c 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1882,7 +1882,11 @@ struct intel_vgpu_mm *intel_vgpu_create_ppgtt_mm(struct intel_vgpu *vgpu, } list_add_tail(&mm->ppgtt_mm.list, &vgpu->gtt.ppgtt_mm_list_head); + + mutex_lock(&gvt->gtt.ppgtt_mm_lock); list_add_tail(&mm->ppgtt_mm.lru_list, &gvt->gtt.ppgtt_mm_lru_list_head); + mutex_unlock(&gvt->gtt.ppgtt_mm_lock); + return mm; } @@ -1967,9 +1971,10 @@ int intel_vgpu_pin_mm(struct intel_vgpu_mm *mm) if (ret) return ret; + mutex_lock(&mm->vgpu->gvt->gtt.ppgtt_mm_lock); list_move_tail(&mm->ppgtt_mm.lru_list, &mm->vgpu->gvt->gtt.ppgtt_mm_lru_list_head); - + mutex_unlock(&mm->vgpu->gvt->gtt.ppgtt_mm_lock); } return 0; @@ -1980,6 +1985,8 @@ static int reclaim_one_ppgtt_mm(struct intel_gvt *gvt) struct intel_vgpu_mm *mm; struct list_head *pos, *n; + mutex_lock(&gvt->gtt.ppgtt_mm_lock); + list_for_each_safe(pos, n, &gvt->gtt.ppgtt_mm_lru_list_head) { mm = container_of(pos, struct intel_vgpu_mm, ppgtt_mm.lru_list); @@ -1987,9 +1994,11 @@ static int reclaim_one_ppgtt_mm(struct intel_gvt *gvt) continue; list_del_init(&mm->ppgtt_mm.lru_list); + mutex_unlock(&gvt->gtt.ppgtt_mm_lock); invalidate_ppgtt_mm(mm); return 1; } + mutex_unlock(&gvt->gtt.ppgtt_mm_lock); return 0; } @@ -2659,6 +2668,7 @@ int intel_gvt_init_gtt(struct intel_gvt *gvt) } } INIT_LIST_HEAD(&gvt->gtt.ppgtt_mm_lru_list_head); + mutex_init(&gvt->gtt.ppgtt_mm_lock); return 0; } @@ -2699,7 +2709,9 @@ void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu) list_for_each_safe(pos, n, &vgpu->gtt.ppgtt_mm_list_head) { mm = container_of(pos, struct intel_vgpu_mm, ppgtt_mm.list); if (mm->type == INTEL_GVT_MM_PPGTT) { + mutex_lock(&vgpu->gvt->gtt.ppgtt_mm_lock); list_del_init(&mm->ppgtt_mm.lru_list); + mutex_unlock(&vgpu->gvt->gtt.ppgtt_mm_lock); if (mm->ppgtt_mm.shadowed) invalidate_ppgtt_mm(mm); } diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h index d8cb04cc946d..edb610dc5d86 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.h +++ b/drivers/gpu/drm/i915/gvt/gtt.h @@ -88,6 +88,7 @@ struct intel_gvt_gtt { void (*mm_free_page_table)(struct intel_vgpu_mm *mm); struct list_head oos_page_use_list_head; struct list_head oos_page_free_list_head; + struct mutex ppgtt_mm_lock; struct list_head ppgtt_mm_lru_list_head; struct page *scratch_page; From 7f3d6c8e8f5f041c86c0a9f64e4b4ab7c6373ac2 Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Wed, 20 Feb 2019 10:19:50 -0800 Subject: [PATCH 016/468] soc: bcm: bcm2835-pm: Fix PM_IMAGE_PERI power domain support. We don't have ASB master/slave regs for this domain, so just skip that step. Signed-off-by: Eric Anholt Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.") --- drivers/soc/bcm/bcm2835-power.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c index 48412957ec7a..4a1b99b773c0 100644 --- a/drivers/soc/bcm/bcm2835-power.c +++ b/drivers/soc/bcm/bcm2835-power.c @@ -150,7 +150,12 @@ struct bcm2835_power { static int bcm2835_asb_enable(struct bcm2835_power *power, u32 reg) { - u64 start = ktime_get_ns(); + u64 start; + + if (!reg) + return 0; + + start = ktime_get_ns(); /* Enable the module's async AXI bridges. */ ASB_WRITE(reg, ASB_READ(reg) & ~ASB_REQ_STOP); @@ -165,7 +170,12 @@ static int bcm2835_asb_enable(struct bcm2835_power *power, u32 reg) static int bcm2835_asb_disable(struct bcm2835_power *power, u32 reg) { - u64 start = ktime_get_ns(); + u64 start; + + if (!reg) + return 0; + + start = ktime_get_ns(); /* Enable the module's async AXI bridges. */ ASB_WRITE(reg, ASB_READ(reg) | ASB_REQ_STOP); From 4deabfae643d8852c643664d9088a647abfaa5d0 Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Wed, 20 Feb 2019 10:19:51 -0800 Subject: [PATCH 017/468] soc: bcm: bcm2835-pm: Fix error paths of initialization. The clock driver may probe after ours and so we need to pass the -EPROBE_DEFER out. Fix the other error path while we're here. v2: Use dom->name instead of dom->gov as the flag for initialized domains, since we aren't setting up a governor. Make sure to clear ->clk when no clk is present in the DT. Signed-off-by: Eric Anholt Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.") --- drivers/soc/bcm/bcm2835-power.c | 35 ++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c index 4a1b99b773c0..241c4ed80899 100644 --- a/drivers/soc/bcm/bcm2835-power.c +++ b/drivers/soc/bcm/bcm2835-power.c @@ -485,7 +485,7 @@ static int bcm2835_power_pd_power_off(struct generic_pm_domain *domain) } } -static void +static int bcm2835_init_power_domain(struct bcm2835_power *power, int pd_xlate_index, const char *name) { @@ -493,6 +493,17 @@ bcm2835_init_power_domain(struct bcm2835_power *power, struct bcm2835_power_domain *dom = &power->domains[pd_xlate_index]; dom->clk = devm_clk_get(dev->parent, name); + if (IS_ERR(dom->clk)) { + int ret = PTR_ERR(dom->clk); + + if (ret == -EPROBE_DEFER) + return ret; + + /* Some domains don't have a clk, so make sure that we + * don't deref an error pointer later. + */ + dom->clk = NULL; + } dom->base.name = name; dom->base.power_on = bcm2835_power_pd_power_on; @@ -505,6 +516,8 @@ bcm2835_init_power_domain(struct bcm2835_power *power, pm_genpd_init(&dom->base, NULL, true); power->pd_xlate.domains[pd_xlate_index] = &dom->base; + + return 0; } /** bcm2835_reset_reset - Resets a block that has a reset line in the @@ -602,7 +615,7 @@ static int bcm2835_power_probe(struct platform_device *pdev) { BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM0 }, { BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM1 }, }; - int ret, i; + int ret = 0, i; u32 id; power = devm_kzalloc(dev, sizeof(*power), GFP_KERNEL); @@ -629,8 +642,11 @@ static int bcm2835_power_probe(struct platform_device *pdev) power->pd_xlate.num_domains = ARRAY_SIZE(power_domain_names); - for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) - bcm2835_init_power_domain(power, i, power_domain_names[i]); + for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) { + ret = bcm2835_init_power_domain(power, i, power_domain_names[i]); + if (ret) + goto fail; + } for (i = 0; i < ARRAY_SIZE(domain_deps); i++) { pm_genpd_add_subdomain(&power->domains[domain_deps[i].parent].base, @@ -644,12 +660,21 @@ static int bcm2835_power_probe(struct platform_device *pdev) ret = devm_reset_controller_register(dev, &power->reset); if (ret) - return ret; + goto fail; of_genpd_add_provider_onecell(dev->parent->of_node, &power->pd_xlate); dev_info(dev, "Broadcom BCM2835 power domains driver"); return 0; + +fail: + for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) { + struct generic_pm_domain *dom = &power->domains[i].base; + + if (dom->name) + pm_genpd_remove(dom); + } + return ret; } static int bcm2835_power_remove(struct platform_device *pdev) From 544e784188f1dd7c797c70b213385e67d92005b6 Mon Sep 17 00:00:00 2001 From: Helen Koike Date: Mon, 4 Mar 2019 18:48:37 -0300 Subject: [PATCH 018/468] ARM: dts: bcm283x: Fix hdmi hpd gpio pull Raspberry pi board model B revison 2 have the hot plug detector gpio active high (and not low as it was in the dts). Signed-off-by: Helen Koike Fixes: 49ac67e0c39c ("ARM: bcm2835: Add VC4 to the device tree.") Reviewed-by: Eric Anholt Signed-off-by: Eric Anholt --- arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts b/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts index 5641d162dfdb..28e7513ce617 100644 --- a/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts +++ b/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts @@ -93,7 +93,7 @@ i2s_alt2: i2s_alt2 { }; &hdmi { - hpd-gpios = <&gpio 46 GPIO_ACTIVE_LOW>; + hpd-gpios = <&gpio 46 GPIO_ACTIVE_HIGH>; }; &pwm { From cd479eccd2e057116d504852814402a1e68ead80 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Mon, 4 Mar 2019 12:33:28 +0100 Subject: [PATCH 019/468] s390: limit brk randomization to 32MB For a 64-bit process the randomization of the program break is quite large with 1GB. That is as big as the randomization of the anonymous mapping base, for a test case started with '/lib/ld64.so.1 ' it can happen that the heap is placed after the stack. To avoid this limit the program break randomization to 32MB for 64-bit and keep 8MB for 31-bit. Reported-by: Stefan Liebler Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/elf.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index 7d22a474a040..f74639a05f0f 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -252,11 +252,14 @@ do { \ /* * Cache aliasing on the latest machines calls for a mapping granularity - * of 512KB. For 64-bit processes use a 512KB alignment and a randomization - * of up to 1GB. For 31-bit processes the virtual address space is limited, - * use no alignment and limit the randomization to 8MB. + * of 512KB for the anonymous mapping base. For 64-bit processes use a + * 512KB alignment and a randomization of up to 1GB. For 31-bit processes + * the virtual address space is limited, use no alignment and limit the + * randomization to 8MB. + * For the additional randomization of the program break use 32MB for + * 64-bit and 8MB for 31-bit. */ -#define BRK_RND_MASK (is_compat_task() ? 0x7ffUL : 0x3ffffUL) +#define BRK_RND_MASK (is_compat_task() ? 0x7ffUL : 0x1fffUL) #define MMAP_RND_MASK (is_compat_task() ? 0x7ffUL : 0x3ff80UL) #define MMAP_ALIGN_MASK (is_compat_task() ? 0 : 0x7fUL) #define STACK_RND_MASK MMAP_RND_MASK From 01396a374c3d31bc5f8b693026cfa9a657319624 Mon Sep 17 00:00:00 2001 From: Harald Freudenberger Date: Fri, 22 Feb 2019 17:24:11 +0100 Subject: [PATCH 020/468] s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger Reviewed-by: Ingo Franzki Signed-off-by: Martin Schwidefsky --- drivers/s390/crypto/ap_bus.c | 9 ++++++++- drivers/s390/crypto/ap_bus.h | 2 ++ drivers/s390/crypto/ap_queue.c | 28 ++++++++++++++++++++++------ drivers/s390/crypto/zcrypt_api.c | 30 ++++++++++++++++++------------ 4 files changed, 50 insertions(+), 19 deletions(-) diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index e15816ff1265..033a1acabf48 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -810,11 +810,18 @@ static int ap_device_remove(struct device *dev) struct ap_device *ap_dev = to_ap_dev(dev); struct ap_driver *ap_drv = ap_dev->drv; + /* prepare ap queue device removal */ if (is_queue_dev(dev)) - ap_queue_remove(to_ap_queue(dev)); + ap_queue_prepare_remove(to_ap_queue(dev)); + + /* driver's chance to clean up gracefully */ if (ap_drv->remove) ap_drv->remove(ap_dev); + /* now do the ap queue device remove */ + if (is_queue_dev(dev)) + ap_queue_remove(to_ap_queue(dev)); + /* Remove queue/card from list of active queues/cards */ spin_lock_bh(&ap_list_lock); if (is_card_dev(dev)) diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h index d0059eae5d94..15a98a673c5c 100644 --- a/drivers/s390/crypto/ap_bus.h +++ b/drivers/s390/crypto/ap_bus.h @@ -91,6 +91,7 @@ enum ap_state { AP_STATE_WORKING, AP_STATE_QUEUE_FULL, AP_STATE_SUSPEND_WAIT, + AP_STATE_REMOVE, /* about to be removed from driver */ AP_STATE_UNBOUND, /* momentary not bound to a driver */ AP_STATE_BORKED, /* broken */ NR_AP_STATES @@ -252,6 +253,7 @@ void ap_bus_force_rescan(void); void ap_queue_init_reply(struct ap_queue *aq, struct ap_message *ap_msg); struct ap_queue *ap_queue_create(ap_qid_t qid, int device_type); +void ap_queue_prepare_remove(struct ap_queue *aq); void ap_queue_remove(struct ap_queue *aq); void ap_queue_suspend(struct ap_device *ap_dev); void ap_queue_resume(struct ap_device *ap_dev); diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c index ba261210c6da..6a340f2c3556 100644 --- a/drivers/s390/crypto/ap_queue.c +++ b/drivers/s390/crypto/ap_queue.c @@ -420,6 +420,10 @@ static ap_func_t *ap_jumptable[NR_AP_STATES][NR_AP_EVENTS] = { [AP_EVENT_POLL] = ap_sm_suspend_read, [AP_EVENT_TIMEOUT] = ap_sm_nop, }, + [AP_STATE_REMOVE] = { + [AP_EVENT_POLL] = ap_sm_nop, + [AP_EVENT_TIMEOUT] = ap_sm_nop, + }, [AP_STATE_UNBOUND] = { [AP_EVENT_POLL] = ap_sm_nop, [AP_EVENT_TIMEOUT] = ap_sm_nop, @@ -740,18 +744,31 @@ void ap_flush_queue(struct ap_queue *aq) } EXPORT_SYMBOL(ap_flush_queue); +void ap_queue_prepare_remove(struct ap_queue *aq) +{ + spin_lock_bh(&aq->lock); + /* flush queue */ + __ap_flush_queue(aq); + /* set REMOVE state to prevent new messages are queued in */ + aq->state = AP_STATE_REMOVE; + del_timer_sync(&aq->timeout); + spin_unlock_bh(&aq->lock); +} + void ap_queue_remove(struct ap_queue *aq) { - ap_flush_queue(aq); - del_timer_sync(&aq->timeout); - - /* reset with zero, also clears irq registration */ + /* + * all messages have been flushed and the state is + * AP_STATE_REMOVE. Now reset with zero which also + * clears the irq registration and move the state + * to AP_STATE_UNBOUND to signal that this queue + * is not used by any driver currently. + */ spin_lock_bh(&aq->lock); ap_zapq(aq->qid); aq->state = AP_STATE_UNBOUND; spin_unlock_bh(&aq->lock); } -EXPORT_SYMBOL(ap_queue_remove); void ap_queue_reinit_state(struct ap_queue *aq) { @@ -760,4 +777,3 @@ void ap_queue_reinit_state(struct ap_queue *aq) ap_wait(ap_sm_event(aq, AP_EVENT_POLL)); spin_unlock_bh(&aq->lock); } -EXPORT_SYMBOL(ap_queue_reinit_state); diff --git a/drivers/s390/crypto/zcrypt_api.c b/drivers/s390/crypto/zcrypt_api.c index eb93c2d27d0a..689c2af7026a 100644 --- a/drivers/s390/crypto/zcrypt_api.c +++ b/drivers/s390/crypto/zcrypt_api.c @@ -586,6 +586,7 @@ static inline bool zcrypt_check_queue(struct ap_perms *perms, int queue) static inline struct zcrypt_queue *zcrypt_pick_queue(struct zcrypt_card *zc, struct zcrypt_queue *zq, + struct module **pmod, unsigned int weight) { if (!zq || !try_module_get(zq->queue->ap_dev.drv->driver.owner)) @@ -595,15 +596,15 @@ static inline struct zcrypt_queue *zcrypt_pick_queue(struct zcrypt_card *zc, atomic_add(weight, &zc->load); atomic_add(weight, &zq->load); zq->request_count++; + *pmod = zq->queue->ap_dev.drv->driver.owner; return zq; } static inline void zcrypt_drop_queue(struct zcrypt_card *zc, struct zcrypt_queue *zq, + struct module *mod, unsigned int weight) { - struct module *mod = zq->queue->ap_dev.drv->driver.owner; - zq->request_count--; atomic_sub(weight, &zc->load); atomic_sub(weight, &zq->load); @@ -653,6 +654,7 @@ static long zcrypt_rsa_modexpo(struct ap_perms *perms, unsigned int weight, pref_weight; unsigned int func_code; int qid = 0, rc = -ENODEV; + struct module *mod; trace_s390_zcrypt_req(mex, TP_ICARSAMODEXPO); @@ -706,7 +708,7 @@ static long zcrypt_rsa_modexpo(struct ap_perms *perms, pref_weight = weight; } } - pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, weight); + pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, &mod, weight); spin_unlock(&zcrypt_list_lock); if (!pref_zq) { @@ -718,7 +720,7 @@ static long zcrypt_rsa_modexpo(struct ap_perms *perms, rc = pref_zq->ops->rsa_modexpo(pref_zq, mex); spin_lock(&zcrypt_list_lock); - zcrypt_drop_queue(pref_zc, pref_zq, weight); + zcrypt_drop_queue(pref_zc, pref_zq, mod, weight); spin_unlock(&zcrypt_list_lock); out: @@ -735,6 +737,7 @@ static long zcrypt_rsa_crt(struct ap_perms *perms, unsigned int weight, pref_weight; unsigned int func_code; int qid = 0, rc = -ENODEV; + struct module *mod; trace_s390_zcrypt_req(crt, TP_ICARSACRT); @@ -788,7 +791,7 @@ static long zcrypt_rsa_crt(struct ap_perms *perms, pref_weight = weight; } } - pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, weight); + pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, &mod, weight); spin_unlock(&zcrypt_list_lock); if (!pref_zq) { @@ -800,7 +803,7 @@ static long zcrypt_rsa_crt(struct ap_perms *perms, rc = pref_zq->ops->rsa_modexpo_crt(pref_zq, crt); spin_lock(&zcrypt_list_lock); - zcrypt_drop_queue(pref_zc, pref_zq, weight); + zcrypt_drop_queue(pref_zc, pref_zq, mod, weight); spin_unlock(&zcrypt_list_lock); out: @@ -819,6 +822,7 @@ static long _zcrypt_send_cprb(struct ap_perms *perms, unsigned int func_code; unsigned short *domain; int qid = 0, rc = -ENODEV; + struct module *mod; trace_s390_zcrypt_req(xcRB, TB_ZSECSENDCPRB); @@ -865,7 +869,7 @@ static long _zcrypt_send_cprb(struct ap_perms *perms, pref_weight = weight; } } - pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, weight); + pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, &mod, weight); spin_unlock(&zcrypt_list_lock); if (!pref_zq) { @@ -881,7 +885,7 @@ static long _zcrypt_send_cprb(struct ap_perms *perms, rc = pref_zq->ops->send_cprb(pref_zq, xcRB, &ap_msg); spin_lock(&zcrypt_list_lock); - zcrypt_drop_queue(pref_zc, pref_zq, weight); + zcrypt_drop_queue(pref_zc, pref_zq, mod, weight); spin_unlock(&zcrypt_list_lock); out: @@ -932,6 +936,7 @@ static long zcrypt_send_ep11_cprb(struct ap_perms *perms, unsigned int func_code; struct ap_message ap_msg; int qid = 0, rc = -ENODEV; + struct module *mod; trace_s390_zcrypt_req(xcrb, TP_ZSENDEP11CPRB); @@ -1000,7 +1005,7 @@ static long zcrypt_send_ep11_cprb(struct ap_perms *perms, pref_weight = weight; } } - pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, weight); + pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, &mod, weight); spin_unlock(&zcrypt_list_lock); if (!pref_zq) { @@ -1012,7 +1017,7 @@ static long zcrypt_send_ep11_cprb(struct ap_perms *perms, rc = pref_zq->ops->send_ep11_cprb(pref_zq, xcrb, &ap_msg); spin_lock(&zcrypt_list_lock); - zcrypt_drop_queue(pref_zc, pref_zq, weight); + zcrypt_drop_queue(pref_zc, pref_zq, mod, weight); spin_unlock(&zcrypt_list_lock); out_free: @@ -1033,6 +1038,7 @@ static long zcrypt_rng(char *buffer) struct ap_message ap_msg; unsigned int domain; int qid = 0, rc = -ENODEV; + struct module *mod; trace_s390_zcrypt_req(buffer, TP_HWRNGCPRB); @@ -1064,7 +1070,7 @@ static long zcrypt_rng(char *buffer) pref_weight = weight; } } - pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, weight); + pref_zq = zcrypt_pick_queue(pref_zc, pref_zq, &mod, weight); spin_unlock(&zcrypt_list_lock); if (!pref_zq) { @@ -1076,7 +1082,7 @@ static long zcrypt_rng(char *buffer) rc = pref_zq->ops->rng(pref_zq, buffer, &ap_msg); spin_lock(&zcrypt_list_lock); - zcrypt_drop_queue(pref_zc, pref_zq, weight); + zcrypt_drop_queue(pref_zc, pref_zq, mod, weight); spin_unlock(&zcrypt_list_lock); out: From 152e9b8676c6e788c6bff095c1eaae7b86df5003 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Wed, 6 Mar 2019 13:31:21 +0200 Subject: [PATCH 021/468] s390/vtime: steal time exponential moving average To be able to judge the current overcommitment ratio for a CPU add a lowcore field with the exponential moving average of the steal time. The average is updated every tick. Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/lowcore.h | 61 +++++++++++++++++---------------- arch/s390/kernel/smp.c | 3 +- arch/s390/kernel/vtime.c | 19 ++++++---- 3 files changed, 45 insertions(+), 38 deletions(-) diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h index cc0947e08b6f..5b9f10b1e55d 100644 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -91,52 +91,53 @@ struct lowcore { __u64 hardirq_timer; /* 0x02e8 */ __u64 softirq_timer; /* 0x02f0 */ __u64 steal_timer; /* 0x02f8 */ - __u64 last_update_timer; /* 0x0300 */ - __u64 last_update_clock; /* 0x0308 */ - __u64 int_clock; /* 0x0310 */ - __u64 mcck_clock; /* 0x0318 */ - __u64 clock_comparator; /* 0x0320 */ - __u64 boot_clock[2]; /* 0x0328 */ + __u64 avg_steal_timer; /* 0x0300 */ + __u64 last_update_timer; /* 0x0308 */ + __u64 last_update_clock; /* 0x0310 */ + __u64 int_clock; /* 0x0318*/ + __u64 mcck_clock; /* 0x0320 */ + __u64 clock_comparator; /* 0x0328 */ + __u64 boot_clock[2]; /* 0x0330 */ /* Current process. */ - __u64 current_task; /* 0x0338 */ - __u64 kernel_stack; /* 0x0340 */ + __u64 current_task; /* 0x0340 */ + __u64 kernel_stack; /* 0x0348 */ /* Interrupt, DAT-off and restartstack. */ - __u64 async_stack; /* 0x0348 */ - __u64 nodat_stack; /* 0x0350 */ - __u64 restart_stack; /* 0x0358 */ + __u64 async_stack; /* 0x0350 */ + __u64 nodat_stack; /* 0x0358 */ + __u64 restart_stack; /* 0x0360 */ /* Restart function and parameter. */ - __u64 restart_fn; /* 0x0360 */ - __u64 restart_data; /* 0x0368 */ - __u64 restart_source; /* 0x0370 */ + __u64 restart_fn; /* 0x0368 */ + __u64 restart_data; /* 0x0370 */ + __u64 restart_source; /* 0x0378 */ /* Address space pointer. */ - __u64 kernel_asce; /* 0x0378 */ - __u64 user_asce; /* 0x0380 */ - __u64 vdso_asce; /* 0x0388 */ + __u64 kernel_asce; /* 0x0380 */ + __u64 user_asce; /* 0x0388 */ + __u64 vdso_asce; /* 0x0390 */ /* * The lpp and current_pid fields form a * 64-bit value that is set as program * parameter with the LPP instruction. */ - __u32 lpp; /* 0x0390 */ - __u32 current_pid; /* 0x0394 */ + __u32 lpp; /* 0x0398 */ + __u32 current_pid; /* 0x039c */ /* SMP info area */ - __u32 cpu_nr; /* 0x0398 */ - __u32 softirq_pending; /* 0x039c */ - __u32 preempt_count; /* 0x03a0 */ - __u32 spinlock_lockval; /* 0x03a4 */ - __u32 spinlock_index; /* 0x03a8 */ - __u32 fpu_flags; /* 0x03ac */ - __u64 percpu_offset; /* 0x03b0 */ - __u64 vdso_per_cpu_data; /* 0x03b8 */ - __u64 machine_flags; /* 0x03c0 */ - __u64 gmap; /* 0x03c8 */ - __u8 pad_0x03d0[0x0400-0x03d0]; /* 0x03d0 */ + __u32 cpu_nr; /* 0x03a0 */ + __u32 softirq_pending; /* 0x03a4 */ + __u32 preempt_count; /* 0x03a8 */ + __u32 spinlock_lockval; /* 0x03ac */ + __u32 spinlock_index; /* 0x03b0 */ + __u32 fpu_flags; /* 0x03b4 */ + __u64 percpu_offset; /* 0x03b8 */ + __u64 vdso_per_cpu_data; /* 0x03c0 */ + __u64 machine_flags; /* 0x03c8 */ + __u64 gmap; /* 0x03d0 */ + __u8 pad_0x03d8[0x0400-0x03d8]; /* 0x03d8 */ /* br %r1 trampoline */ __u16 br_r1_trampoline; /* 0x0400 */ diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index b198ece2aad6..b8eb99685546 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -266,7 +266,8 @@ static void pcpu_prepare_secondary(struct pcpu *pcpu, int cpu) lc->percpu_offset = __per_cpu_offset[cpu]; lc->kernel_asce = S390_lowcore.kernel_asce; lc->machine_flags = S390_lowcore.machine_flags; - lc->user_timer = lc->system_timer = lc->steal_timer = 0; + lc->user_timer = lc->system_timer = + lc->steal_timer = lc->avg_steal_timer = 0; __ctl_store(lc->cregs_save_area, 0, 15); save_access_regs((unsigned int *) lc->access_regs_save_area); memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list, diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index 98f850e00008..a69a0911ed0e 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -124,7 +124,7 @@ static void account_system_index_scaled(struct task_struct *p, u64 cputime, */ static int do_account_vtime(struct task_struct *tsk) { - u64 timer, clock, user, guest, system, hardirq, softirq, steal; + u64 timer, clock, user, guest, system, hardirq, softirq; timer = S390_lowcore.last_update_timer; clock = S390_lowcore.last_update_clock; @@ -182,12 +182,6 @@ static int do_account_vtime(struct task_struct *tsk) if (softirq) account_system_index_scaled(tsk, softirq, CPUTIME_SOFTIRQ); - steal = S390_lowcore.steal_timer; - if ((s64) steal > 0) { - S390_lowcore.steal_timer = 0; - account_steal_time(cputime_to_nsecs(steal)); - } - return virt_timer_forward(user + guest + system + hardirq + softirq); } @@ -213,8 +207,19 @@ void vtime_task_switch(struct task_struct *prev) */ void vtime_flush(struct task_struct *tsk) { + u64 steal, avg_steal; + if (do_account_vtime(tsk)) virt_timer_expire(); + + steal = S390_lowcore.steal_timer; + avg_steal = S390_lowcore.avg_steal_timer / 2; + if ((s64) steal > 0) { + S390_lowcore.steal_timer = 0; + account_steal_time(steal); + avg_steal += steal; + } + S390_lowcore.avg_steal_timer = avg_steal; } /* From cd44bc40a1f1eb4e259889579d599f30b1287828 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Thu, 28 Feb 2019 14:31:31 +0100 Subject: [PATCH 022/468] mt76: introduce q->stopped parameter Introduce mt76_queue stopped parameter in order to run ieee80211_wake_queue only when mac80211 queues have been previously stopped and avoid to disable interrupts when it is not necessary Signed-off-by: Lorenzo Bianconi Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/dma.c | 5 ++++- drivers/net/wireless/mediatek/mt76/mt76.h | 1 + drivers/net/wireless/mediatek/mt76/tx.c | 5 ++++- drivers/net/wireless/mediatek/mt76/usb.c | 6 +++++- 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 6eedc0ec7661..99e341cb1f92 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -180,7 +180,10 @@ mt76_dma_tx_cleanup(struct mt76_dev *dev, enum mt76_txq_id qid, bool flush) else mt76_dma_sync_idx(dev, q); - wake = wake && qid < IEEE80211_NUM_ACS && q->queued < q->ndesc - 8; + wake = wake && q->stopped && + qid < IEEE80211_NUM_ACS && q->queued < q->ndesc - 8; + if (wake) + q->stopped = false; if (!q->queued) wake_up(&dev->tx_wait); diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 5dfb0601f101..3152b8c73c4a 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -126,6 +126,7 @@ struct mt76_queue { int ndesc; int queued; int buf_size; + bool stopped; u8 buf_offset; u8 hw_idx; diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index 5a349fe3e576..4d38a54014e8 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -289,8 +289,11 @@ mt76_tx(struct mt76_dev *dev, struct ieee80211_sta *sta, dev->queue_ops->tx_queue_skb(dev, q, skb, wcid, sta); dev->queue_ops->kick(dev, q); - if (q->queued > q->ndesc - 8) + if (q->queued > q->ndesc - 8 && !q->stopped) { ieee80211_stop_queue(dev->hw, skb_get_queue_mapping(skb)); + q->stopped = true; + } + spin_unlock_bh(&q->lock); } EXPORT_SYMBOL_GPL(mt76_tx); diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index ae6ada370597..4c1abd492405 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -655,7 +655,11 @@ static void mt76u_tx_tasklet(unsigned long data) spin_lock_bh(&q->lock); } mt76_txq_schedule(dev, q); - wake = i < IEEE80211_NUM_ACS && q->queued < q->ndesc - 8; + + wake = q->stopped && q->queued < q->ndesc - 8; + if (wake) + q->stopped = false; + if (!q->queued) wake_up(&dev->tx_wait); From fc7801021733b9fbf213ae2bde5dc5e73896a9c7 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Wed, 27 Feb 2019 19:38:29 +0100 Subject: [PATCH 023/468] mt76: rewrite dma descriptor base and ring size on queue reset Useful in case the hardware reset clobbers these values Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/dma.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 99e341cb1f92..76629b98c78d 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -130,6 +130,8 @@ mt76_dma_tx_cleanup_idx(struct mt76_dev *dev, struct mt76_queue *q, int idx, static void mt76_dma_sync_idx(struct mt76_dev *dev, struct mt76_queue *q) { + iowrite32(q->desc_dma, &q->regs->desc_base); + iowrite32(q->ndesc, &q->regs->ring_size); q->head = ioread32(&q->regs->dma_idx); q->tail = q->head; iowrite32(q->head, &q->regs->cpu_idx); From de3c2af15fce23c42407ad0a868ac47df2e7279a Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Wed, 27 Feb 2019 19:40:27 +0100 Subject: [PATCH 024/468] mt76: mt76x02: when setting a key, use PN from mac80211 Preparation for full device restart support Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x02_mac.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c index 91ff6598eccf..8109bac5aee6 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c @@ -73,6 +73,7 @@ int mt76x02_mac_wcid_set_key(struct mt76x02_dev *dev, u8 idx, enum mt76x02_cipher_type cipher; u8 key_data[32]; u8 iv_data[8]; + u64 pn; cipher = mt76x02_mac_get_key_info(key, key_data); if (cipher == MT_CIPHER_NONE && key) @@ -85,9 +86,22 @@ int mt76x02_mac_wcid_set_key(struct mt76x02_dev *dev, u8 idx, if (key) { mt76_rmw_field(dev, MT_WCID_ATTR(idx), MT_WCID_ATTR_PAIRWISE, !!(key->flags & IEEE80211_KEY_FLAG_PAIRWISE)); + + pn = atomic64_read(&key->tx_pn); + iv_data[3] = key->keyidx << 6; - if (cipher >= MT_CIPHER_TKIP) + if (cipher >= MT_CIPHER_TKIP) { iv_data[3] |= 0x20; + put_unaligned_le32(pn >> 16, &iv_data[4]); + } + + if (cipher == MT_CIPHER_TKIP) { + iv_data[0] = (pn >> 8) & 0xff; + iv_data[1] = (iv_data[0] | 0x20) & 0x7f; + iv_data[2] = pn & 0xff; + } else if (cipher >= MT_CIPHER_AES_CCMP) { + put_unaligned_le16((pn & 0xffff), &iv_data[0]); + } } mt76_wr_copy(dev, MT_WCID_IV(idx), iv_data, sizeof(iv_data)); From 004960423fe17dfff93753017b7081dab36c7180 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Wed, 27 Feb 2019 19:42:39 +0100 Subject: [PATCH 025/468] mt76: mt76x2: implement full device restart on watchdog reset Restart the firmware and re-initialize the MAC to be able to recover from more kinds of hang states Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76.h | 1 + .../net/wireless/mediatek/mt76/mt76x02_mac.c | 26 ++++++ .../net/wireless/mediatek/mt76/mt76x02_mac.h | 2 + .../net/wireless/mediatek/mt76/mt76x02_mmio.c | 81 +++++++++++++++++-- .../net/wireless/mediatek/mt76/mt76x02_util.c | 4 + .../wireless/mediatek/mt76/mt76x2/mt76x2.h | 1 + .../wireless/mediatek/mt76/mt76x2/pci_init.c | 2 +- .../wireless/mediatek/mt76/mt76x2/pci_mcu.c | 21 +++++ drivers/net/wireless/mediatek/mt76/tx.c | 3 + 9 files changed, 132 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 3152b8c73c4a..cb50e96e736b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -144,6 +144,7 @@ struct mt76_mcu_ops { const struct mt76_reg_pair *rp, int len); int (*mcu_rd_rp)(struct mt76_dev *dev, u32 base, struct mt76_reg_pair *rp, int len); + int (*mcu_restart)(struct mt76_dev *dev); }; struct mt76_queue_ops { diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c index 8109bac5aee6..e1e0c8da5a8c 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c @@ -67,6 +67,32 @@ int mt76x02_mac_shared_key_setup(struct mt76x02_dev *dev, u8 vif_idx, } EXPORT_SYMBOL_GPL(mt76x02_mac_shared_key_setup); +void mt76x02_mac_wcid_sync_pn(struct mt76x02_dev *dev, u8 idx, + struct ieee80211_key_conf *key) +{ + enum mt76x02_cipher_type cipher; + u8 key_data[32]; + u32 iv, eiv; + u64 pn; + + cipher = mt76x02_mac_get_key_info(key, key_data); + iv = mt76_rr(dev, MT_WCID_IV(idx)); + eiv = mt76_rr(dev, MT_WCID_IV(idx) + 4); + + pn = (u64)eiv << 16; + if (cipher == MT_CIPHER_TKIP) { + pn |= (iv >> 16) & 0xff; + pn |= (iv & 0xff) << 8; + } else if (cipher >= MT_CIPHER_AES_CCMP) { + pn |= iv & 0xffff; + } else { + return; + } + + atomic64_set(&key->tx_pn, pn); +} + + int mt76x02_mac_wcid_set_key(struct mt76x02_dev *dev, u8 idx, struct ieee80211_key_conf *key) { diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.h b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.h index 6b1f25d2f64c..caeeef96c42f 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.h @@ -177,6 +177,8 @@ int mt76x02_mac_shared_key_setup(struct mt76x02_dev *dev, u8 vif_idx, u8 key_idx, struct ieee80211_key_conf *key); int mt76x02_mac_wcid_set_key(struct mt76x02_dev *dev, u8 idx, struct ieee80211_key_conf *key); +void mt76x02_mac_wcid_sync_pn(struct mt76x02_dev *dev, u8 idx, + struct ieee80211_key_conf *key); void mt76x02_mac_wcid_setup(struct mt76x02_dev *dev, u8 idx, u8 vif_idx, u8 *mac); void mt76x02_mac_wcid_set_drop(struct mt76x02_dev *dev, u8 idx, bool drop); diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c index 1229f19f2b02..9de015bcd4f9 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c @@ -19,6 +19,7 @@ #include #include "mt76x02.h" +#include "mt76x02_mcu.h" #include "mt76x02_trace.h" struct beacon_bc_data { @@ -418,9 +419,65 @@ static bool mt76x02_tx_hang(struct mt76x02_dev *dev) return i < 4; } +static void mt76x02_key_sync(struct ieee80211_hw *hw, struct ieee80211_vif *vif, + struct ieee80211_sta *sta, + struct ieee80211_key_conf *key, void *data) +{ + struct mt76x02_dev *dev = hw->priv; + struct mt76_wcid *wcid; + + if (!sta) + return; + + wcid = (struct mt76_wcid *) sta->drv_priv; + + if (wcid->hw_key_idx != key->keyidx || wcid->sw_iv) + return; + + mt76x02_mac_wcid_sync_pn(dev, wcid->idx, key); +} + +static void mt76x02_reset_state(struct mt76x02_dev *dev) +{ + int i; + + clear_bit(MT76_STATE_RUNNING, &dev->mt76.state); + + rcu_read_lock(); + + ieee80211_iter_keys_rcu(dev->mt76.hw, NULL, mt76x02_key_sync, NULL); + + for (i = 0; i < ARRAY_SIZE(dev->mt76.wcid); i++) { + struct mt76_wcid *wcid = rcu_dereference(dev->mt76.wcid[i]); + struct mt76x02_sta *msta; + struct ieee80211_sta *sta; + struct ieee80211_vif *vif; + void *priv; + + if (!wcid) + continue; + + priv = msta = container_of(wcid, struct mt76x02_sta, wcid); + sta = container_of(priv, struct ieee80211_sta, drv_priv); + + priv = msta->vif; + vif = container_of(priv, struct ieee80211_vif, drv_priv); + + mt76_sta_state(dev->mt76.hw, vif, sta, + IEEE80211_STA_NONE, IEEE80211_STA_NOTEXIST); + memset(msta, 0, sizeof(*msta)); + } + + rcu_read_unlock(); + + dev->vif_mask = 0; + dev->beacon_mask = 0; +} + static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) { u32 mask = dev->mt76.mmio.irqmask; + bool restart = dev->mt76.mcu_ops->mcu_restart; int i; ieee80211_stop_queues(dev->mt76.hw); @@ -432,6 +489,9 @@ static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) for (i = 0; i < ARRAY_SIZE(dev->mt76.napi); i++) napi_disable(&dev->mt76.napi[i]); + if (restart) + mt76x02_reset_state(dev); + mutex_lock(&dev->mt76.mutex); if (dev->beacon_mask) @@ -452,20 +512,21 @@ static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) /* let fw reset DMA */ mt76_set(dev, 0x734, 0x3); + if (restart) + dev->mt76.mcu_ops->mcu_restart(&dev->mt76); + for (i = 0; i < ARRAY_SIZE(dev->mt76.q_tx); i++) mt76_queue_tx_cleanup(dev, i, true); for (i = 0; i < ARRAY_SIZE(dev->mt76.q_rx); i++) mt76_queue_rx_reset(dev, i); - mt76_wr(dev, MT_MAC_SYS_CTRL, - MT_MAC_SYS_CTRL_ENABLE_TX | MT_MAC_SYS_CTRL_ENABLE_RX); - mt76_set(dev, MT_WPDMA_GLO_CFG, - MT_WPDMA_GLO_CFG_TX_DMA_EN | MT_WPDMA_GLO_CFG_RX_DMA_EN); + mt76x02_mac_start(dev); + if (dev->ed_monitor) mt76_set(dev, MT_TXOP_CTRL_CFG, MT_TXOP_ED_CCA_EN); - if (dev->beacon_mask) + if (dev->beacon_mask && !restart) mt76_set(dev, MT_BEACON_TIME_CFG, MT_BEACON_TIME_CFG_BEACON_TX | MT_BEACON_TIME_CFG_TBTT_EN); @@ -486,9 +547,13 @@ static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) napi_schedule(&dev->mt76.napi[i]); } - ieee80211_wake_queues(dev->mt76.hw); - - mt76_txq_schedule_all(&dev->mt76); + if (restart) { + mt76x02_mcu_function_select(dev, Q_SELECT, 1); + ieee80211_restart_hw(dev->mt76.hw); + } else { + ieee80211_wake_queues(dev->mt76.hw); + mt76_txq_schedule_all(&dev->mt76); + } } static void mt76x02_check_tx_hang(struct mt76x02_dev *dev) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c index a48c261b0c63..28eccb3119d1 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c @@ -237,6 +237,8 @@ int mt76x02_sta_add(struct mt76_dev *mdev, struct ieee80211_vif *vif, struct mt76x02_vif *mvif = (struct mt76x02_vif *)vif->drv_priv; int idx = 0; + memset(msta, 0, sizeof(*msta)); + idx = mt76_wcid_alloc(dev->mt76.wcid_mask, ARRAY_SIZE(dev->mt76.wcid)); if (idx < 0) return -ENOSPC; @@ -274,6 +276,8 @@ mt76x02_vif_init(struct mt76x02_dev *dev, struct ieee80211_vif *vif, struct mt76x02_vif *mvif = (struct mt76x02_vif *)vif->drv_priv; struct mt76_txq *mtxq; + memset(mvif, 0, sizeof(*mvif)); + mvif->idx = idx; mvif->group_wcid.idx = MT_VIF_WCID(idx); mvif->group_wcid.hw_key_idx = -1; diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/mt76x2.h b/drivers/net/wireless/mediatek/mt76/mt76x2/mt76x2.h index 6c619f1c65c9..d7abe3d73bad 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/mt76x2.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/mt76x2.h @@ -71,6 +71,7 @@ int mt76x2_mcu_load_cr(struct mt76x02_dev *dev, u8 type, u8 temp_level, void mt76x2_cleanup(struct mt76x02_dev *dev); +int mt76x2_mac_reset(struct mt76x02_dev *dev, bool hard); void mt76x2_reset_wlan(struct mt76x02_dev *dev, bool enable); void mt76x2_init_txpower(struct mt76x02_dev *dev, struct ieee80211_supported_band *sband); diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci_init.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci_init.c index 984d9c4c2e1a..d3927a13e92e 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci_init.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci_init.c @@ -77,7 +77,7 @@ mt76x2_fixup_xtal(struct mt76x02_dev *dev) } } -static int mt76x2_mac_reset(struct mt76x02_dev *dev, bool hard) +int mt76x2_mac_reset(struct mt76x02_dev *dev, bool hard) { const u8 *macaddr = dev->mt76.macaddr; u32 val; diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci_mcu.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci_mcu.c index 03e24ae7f66c..605dc66ae83b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci_mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci_mcu.c @@ -165,9 +165,30 @@ mt76pci_load_firmware(struct mt76x02_dev *dev) return -ENOENT; } +static int +mt76pci_mcu_restart(struct mt76_dev *mdev) +{ + struct mt76x02_dev *dev; + int ret; + + dev = container_of(mdev, struct mt76x02_dev, mt76); + + mt76x02_mcu_cleanup(dev); + mt76x2_mac_reset(dev, true); + + ret = mt76pci_load_firmware(dev); + if (ret) + return ret; + + mt76_wr(dev, MT_WPDMA_RST_IDX, ~0); + + return 0; +} + int mt76x2_mcu_init(struct mt76x02_dev *dev) { static const struct mt76_mcu_ops mt76x2_mcu_ops = { + .mcu_restart = mt76pci_mcu_restart, .mcu_send_msg = mt76x02_mcu_msg_send, }; int ret; diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index 4d38a54014e8..fc7dffe066be 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -580,6 +580,9 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) struct mt76_txq *mtxq = (struct mt76_txq *) txq->drv_priv; struct mt76_queue *hwq = mtxq->hwq; + if (!test_bit(MT76_STATE_RUNNING, &dev->state)) + return; + spin_lock_bh(&hwq->lock); if (list_empty(&mtxq->list)) list_add_tail(&mtxq->list, &hwq->swq); From 7b25d3b8e485c7721cba9c71b44d1c286e61c8e7 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Thu, 28 Feb 2019 16:11:06 +0100 Subject: [PATCH 026/468] mt76x02: fix hdr pointer in write txwi for USB Since we add txwi at the begining of skb->data, it no longer point to ieee80211_hdr. This breaks settings TS bit for probe response and beacons. Acked-by: Lorenzo Bianconi Signed-off-by: Stanislaw Gruszka Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c index 43f07461c8d3..6fb52b596d42 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c @@ -85,8 +85,9 @@ int mt76x02u_tx_prepare_skb(struct mt76_dev *mdev, void *data, mt76x02_insert_hdr_pad(skb); - txwi = skb_push(skb, sizeof(struct mt76x02_txwi)); + txwi = (struct mt76x02_txwi *)(skb->data - sizeof(struct mt76x02_txwi)); mt76x02_mac_write_txwi(dev, txwi, skb, wcid, sta, len); + skb_push(skb, sizeof(struct mt76x02_txwi)); pid = mt76_tx_status_skb_add(mdev, wcid, skb); txwi->pktid = pid; From 3fd0824a2f800a2870569d385917ae1102647055 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 1 Mar 2019 16:51:03 +0100 Subject: [PATCH 027/468] mt76: mt76x02: only update the base mac address if necessary Also update the mask first before calculating the vif index. Fixes an issue where adding back the same interfaces in a different order fails because of duplicate vif index use Fixes: 06662264ce2ad ("mt76x02: use mask for vifs") Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x02_util.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c index 28eccb3119d1..cd072ac614f7 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c @@ -293,6 +293,12 @@ mt76x02_add_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif) struct mt76x02_dev *dev = hw->priv; unsigned int idx = 0; + /* Allow to change address in HW if we create first interface. */ + if (!dev->vif_mask && + (((vif->addr[0] ^ dev->mt76.macaddr[0]) & ~GENMASK(4, 1)) || + memcmp(vif->addr + 1, dev->mt76.macaddr + 1, ETH_ALEN - 1))) + mt76x02_mac_setaddr(dev, vif->addr); + if (vif->addr[0] & BIT(1)) idx = 1 + (((dev->mt76.macaddr[0] ^ vif->addr[0]) >> 2) & 7); @@ -315,10 +321,6 @@ mt76x02_add_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif) if (dev->vif_mask & BIT(idx)) return -EBUSY; - /* Allow to change address in HW if we create first interface. */ - if (!dev->vif_mask && !ether_addr_equal(dev->mt76.macaddr, vif->addr)) - mt76x02_mac_setaddr(dev, vif->addr); - dev->vif_mask |= BIT(idx); mt76x02_vif_init(dev, vif, idx); From a0ac806109277bd865b1048ec521f708b195670b Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sat, 2 Mar 2019 18:19:20 +0100 Subject: [PATCH 028/468] mt76: mt76x02: reduce false positives in ED/CCA tx blocking Full tx blocking (as opposed to CCA blocking) should only happen if there is a continuous non-802.11 signal above the energy detect threshold. Unfortunately the ED/CCA counter can't detect that, as it also counts 802.11 signals as busy. Similar to the vendor code, implement a learning mode that waits until the AGC gain has already been adjusted to the lowest value (due to false CCA events), and the number of false CCA events still remains high, and the blocking threshold is exceeded for more than 5 seconds. Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x02.h | 3 +++ .../net/wireless/mediatek/mt76/mt76x02_mac.c | 25 ++++++++++++++++--- .../net/wireless/mediatek/mt76/mt76x02_phy.c | 2 ++ 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02.h b/drivers/net/wireless/mediatek/mt76/mt76x02.h index 6915cce5def9..42cc9da68f7b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x02.h @@ -51,6 +51,7 @@ struct mt76x02_calibration { u16 false_cca; s8 avg_rssi_all; s8 agc_gain_adjust; + s8 agc_lowest_gain; s8 low_gain; s8 temp_vco; @@ -114,8 +115,10 @@ struct mt76x02_dev { struct mt76x02_dfs_pattern_detector dfs_pd; /* edcca monitor */ + unsigned long ed_trigger_timeout; bool ed_tx_blocked; bool ed_monitor; + u8 ed_monitor_learning; u8 ed_trigger; u8 ed_silent; ktime_t ed_time; diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c index e1e0c8da5a8c..9ed231abe916 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c @@ -960,6 +960,7 @@ void mt76x02_edcca_init(struct mt76x02_dev *dev, bool enable) } } mt76x02_edcca_tx_enable(dev, true); + dev->ed_monitor_learning = true; /* clear previous CCA timer value */ mt76_rr(dev, MT_ED_CCA_TIMER); @@ -969,6 +970,10 @@ EXPORT_SYMBOL_GPL(mt76x02_edcca_init); #define MT_EDCCA_TH 92 #define MT_EDCCA_BLOCK_TH 2 +#define MT_EDCCA_LEARN_TH 50 +#define MT_EDCCA_LEARN_CCA 180 +#define MT_EDCCA_LEARN_TIMEOUT (20 * HZ) + static void mt76x02_edcca_check(struct mt76x02_dev *dev) { ktime_t cur_time; @@ -991,11 +996,23 @@ static void mt76x02_edcca_check(struct mt76x02_dev *dev) dev->ed_trigger = 0; } - if (dev->ed_trigger > MT_EDCCA_BLOCK_TH && - !dev->ed_tx_blocked) + if (dev->cal.agc_lowest_gain && + dev->cal.false_cca > MT_EDCCA_LEARN_CCA && + dev->ed_trigger > MT_EDCCA_LEARN_TH) { + dev->ed_monitor_learning = false; + dev->ed_trigger_timeout = jiffies + 20 * HZ; + } else if (!dev->ed_monitor_learning && + time_is_after_jiffies(dev->ed_trigger_timeout)) { + dev->ed_monitor_learning = true; + mt76x02_edcca_tx_enable(dev, true); + } + + if (dev->ed_monitor_learning) + return; + + if (dev->ed_trigger > MT_EDCCA_BLOCK_TH && !dev->ed_tx_blocked) mt76x02_edcca_tx_enable(dev, false); - else if (dev->ed_silent > MT_EDCCA_BLOCK_TH && - dev->ed_tx_blocked) + else if (dev->ed_silent > MT_EDCCA_BLOCK_TH && dev->ed_tx_blocked) mt76x02_edcca_tx_enable(dev, true); } diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_phy.c b/drivers/net/wireless/mediatek/mt76/mt76x02_phy.c index a020c757ba5c..a54b63a96eae 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_phy.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_phy.c @@ -194,6 +194,8 @@ bool mt76x02_phy_adjust_vga_gain(struct mt76x02_dev *dev) ret = true; } + dev->cal.agc_lowest_gain = dev->cal.agc_gain_adjust >= limit; + return ret; } EXPORT_SYMBOL_GPL(mt76x02_phy_adjust_vga_gain); From 7635276989a183bdb424f9e930f836b6264d54dc Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 11:06:19 +0100 Subject: [PATCH 029/468] mt76: mt7603: fix tx status HT rate validation Use the correct variable in the check. Fixes an uninitialized variable warning Reported-by: Gustavo A. R. Silva Fixes: c8846e1015022 ("mt76: add driver for MT7603E and MT7628/7688") Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/mac.c b/drivers/net/wireless/mediatek/mt76/mt7603/mac.c index 0a0115861b51..5e31d7da96fc 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/mac.c @@ -1072,7 +1072,7 @@ mt7603_fill_txs(struct mt7603_dev *dev, struct mt7603_sta *sta, case MT_PHY_TYPE_HT: final_rate_flags |= IEEE80211_TX_RC_MCS; final_rate &= GENMASK(5, 0); - if (i > 15) + if (final_rate > 15) return false; break; default: From 45a042e3026824a7e910db7a4dd38fef0540b902 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 15:10:00 +0100 Subject: [PATCH 030/468] mt76: mt76x2: fix external LNA gain settings Devices with external LNA need different values for AGC registers 8 and 9 Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x2/phy.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c b/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c index 1848e8ab2e21..c7e71f2ba2a7 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c @@ -260,10 +260,15 @@ mt76x2_phy_set_gain_val(struct mt76x02_dev *dev) gain_val[0] = dev->cal.agc_gain_cur[0] - dev->cal.agc_gain_adjust; gain_val[1] = dev->cal.agc_gain_cur[1] - dev->cal.agc_gain_adjust; - if (dev->mt76.chandef.width >= NL80211_CHAN_WIDTH_40) + val = 0x1836 << 16; + if (!mt76x2_has_ext_lna(dev) && + dev->mt76.chandef.width >= NL80211_CHAN_WIDTH_40) val = 0x1e42 << 16; - else - val = 0x1836 << 16; + + if (mt76x2_has_ext_lna(dev) && + dev->mt76.chandef.chan->band == NL80211_BAND_2GHZ && + dev->mt76.chandef.width < NL80211_CHAN_WIDTH_40) + val = 0x0f36 << 16; val |= 0xf8; From b8cfd87ac24273e36fbd3ecda631f3ba6566d493 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 15:12:14 +0100 Subject: [PATCH 031/468] mt76: mt76x2: fix 2.4 GHz channel gain settings AGC register 35, 37 override for the low gain setting should only be done on 5 GHz. Also, 2.4 GHz needs a different value for register 35 Signed-off-by: Felix Fietkau --- .../net/wireless/mediatek/mt76/mt76x2/phy.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c b/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c index c7e71f2ba2a7..769a9b972044 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/phy.c @@ -285,6 +285,7 @@ void mt76x2_phy_update_channel_gain(struct mt76x02_dev *dev) { u8 *gain = dev->cal.agc_gain_init; u8 low_gain_delta, gain_delta; + u32 agc_35, agc_37; bool gain_change; int low_gain; u32 val; @@ -323,6 +324,16 @@ void mt76x2_phy_update_channel_gain(struct mt76x02_dev *dev) else low_gain_delta = 14; + agc_37 = 0x2121262c; + if (dev->mt76.chandef.chan->band == NL80211_BAND_2GHZ) + agc_35 = 0x11111516; + else if (low_gain == 2) + agc_35 = agc_37 = 0x08080808; + else if (dev->mt76.chandef.width == NL80211_CHAN_WIDTH_80) + agc_35 = 0x10101014; + else + agc_35 = 0x11111116; + if (low_gain == 2) { mt76_wr(dev, MT_BBP(RXO, 18), 0xf000a990); mt76_wr(dev, MT_BBP(AGC, 35), 0x08080808); @@ -331,15 +342,13 @@ void mt76x2_phy_update_channel_gain(struct mt76x02_dev *dev) dev->cal.agc_gain_adjust = 0; } else { mt76_wr(dev, MT_BBP(RXO, 18), 0xf000a991); - if (dev->mt76.chandef.width == NL80211_CHAN_WIDTH_80) - mt76_wr(dev, MT_BBP(AGC, 35), 0x10101014); - else - mt76_wr(dev, MT_BBP(AGC, 35), 0x11111116); - mt76_wr(dev, MT_BBP(AGC, 37), 0x2121262C); gain_delta = 0; dev->cal.agc_gain_adjust = low_gain_delta; } + mt76_wr(dev, MT_BBP(AGC, 35), agc_35); + mt76_wr(dev, MT_BBP(AGC, 37), agc_37); + dev->cal.agc_gain_cur[0] = gain[0] - gain_delta; dev->cal.agc_gain_cur[1] = gain[1] - gain_delta; mt76x2_phy_set_gain_val(dev); From f25e813bf48dce541c29b12cfc19a2c25b6db915 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 18:39:08 +0100 Subject: [PATCH 032/468] mt76: mt7603: clear ps filtering mode before releasing buffered frames Fixes sending them, otherwise they loop back right into the buffer Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/main.c b/drivers/net/wireless/mediatek/mt76/mt7603/main.c index b10775ed92e6..8da0b8707d24 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/main.c @@ -399,6 +399,8 @@ mt7603_release_buffered_frames(struct ieee80211_hw *hw, __skb_queue_head_init(&list); + mt7603_wtbl_set_ps(dev, msta, false); + spin_lock_bh(&dev->ps_lock); skb_queue_walk_safe(&msta->psq, skb, tmp) { if (!nframes) From fca9615f1a436d1e9f64042cda08a34fe32ce668 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 18:40:18 +0100 Subject: [PATCH 033/468] mt76: mt7603: fix up hardware queue index for PS filtered packets Make the queue index match the hardware queue on which they get sent out Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c index d69e82c66ab2..494a48e023ef 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c @@ -50,6 +50,10 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) val = le32_to_cpu(txd[0]); skb_set_queue_mapping(skb, FIELD_GET(MT_TXD0_Q_IDX, val)); + val &= ~(MT_TXD0_P_IDX | MT_TXD0_Q_IDX); + val |= FIELD_PREP(MT_TXD0_Q_IDX, MT_TX_HW_QUEUE_MGMT); + txd[0] = cpu_to_le32(val); + spin_lock_bh(&dev->ps_lock); __skb_queue_tail(&msta->psq, skb); if (skb_queue_len(&msta->psq) >= 64) { From e004b7006600258615b969f05c4d008c1d68973b Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 18:51:35 +0100 Subject: [PATCH 034/468] mt76: mt7603: notify mac80211 about buffered frames in ps queue Also fix the size check for filtered powersave frames Fixes a corner case with waking up clients Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c index 494a48e023ef..b3ae0aaea62a 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c @@ -27,12 +27,16 @@ static void mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) { __le32 *txd = (__le32 *)skb->data; + struct ieee80211_hdr *hdr; + struct ieee80211_sta *sta; struct mt7603_sta *msta; struct mt76_wcid *wcid; + void *priv; int idx; u32 val; + u8 tid; - if (skb->len < sizeof(MT_TXD_SIZE) + sizeof(struct ieee80211_hdr)) + if (skb->len < MT_TXD_SIZE + sizeof(struct ieee80211_hdr)) goto free; val = le32_to_cpu(txd[1]); @@ -46,7 +50,7 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) if (!wcid) goto free; - msta = container_of(wcid, struct mt7603_sta, wcid); + priv = msta = container_of(wcid, struct mt7603_sta, wcid); val = le32_to_cpu(txd[0]); skb_set_queue_mapping(skb, FIELD_GET(MT_TXD0_Q_IDX, val)); @@ -54,6 +58,11 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) val |= FIELD_PREP(MT_TXD0_Q_IDX, MT_TX_HW_QUEUE_MGMT); txd[0] = cpu_to_le32(val); + sta = container_of(priv, struct ieee80211_sta, drv_priv); + hdr = (struct ieee80211_hdr *) &skb->data[MT_TXD_SIZE]; + tid = *ieee80211_get_qos_ctl(hdr) & IEEE80211_QOS_CTL_TID_MASK; + ieee80211_sta_set_buffered(sta, tid, true); + spin_lock_bh(&dev->ps_lock); __skb_queue_tail(&msta->psq, skb); if (skb_queue_len(&msta->psq) >= 64) { From b7001f46085e06a74e4677b44ac55566f66e55aa Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 19:16:03 +0100 Subject: [PATCH 035/468] mt76: mt7603: clear the service period on releasing PS filtered packets These packets have no txwi entry in the ring, so tracking via tx status does not work. To prevent PS poll requests from being unanswered, end the service period right away Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/main.c b/drivers/net/wireless/mediatek/mt76/mt7603/main.c index 8da0b8707d24..ea25eff5e81c 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/main.c @@ -416,6 +416,9 @@ mt7603_release_buffered_frames(struct ieee80211_hw *hw, } spin_unlock_bh(&dev->ps_lock); + if (!skb_queue_empty(&list)) + ieee80211_sta_eosp(sta); + mt7603_ps_tx_list(dev, &list); if (nframes) From ffc9a7ff59a41df9a0a265b2dd04d97f2b35893a Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 19:19:21 +0100 Subject: [PATCH 036/468] mt76: when releasing PS frames, end the service period if no frame was found Fixes a rare corner case if the txq dequeue attempt fails, but mac80211 still has PS buffered packets Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/tx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index fc7dffe066be..2585df512335 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -377,7 +377,10 @@ mt76_release_buffered_frames(struct ieee80211_hw *hw, struct ieee80211_sta *sta, if (last_skb) { mt76_queue_ps_skb(dev, sta, last_skb, true); dev->queue_ops->kick(dev, hwq); + } else { + ieee80211_sta_eosp(sta); } + spin_unlock_bh(&hwq->lock); } EXPORT_SYMBOL_GPL(mt76_release_buffered_frames); From 643749d4a82b150afd2f87b29ca3dda885f8e384 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Sun, 3 Mar 2019 19:40:36 +0100 Subject: [PATCH 037/468] mt76: mt76x02: disable ED/CCA by default This feature has been reported to cause stability issues on several systems. Disable it until it has been fixed and verified. It can still be enabled through debugfs Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x02.h | 1 + .../wireless/mediatek/mt76/mt76x02_debugfs.c | 27 +++++++++++++++++++ .../net/wireless/mediatek/mt76/mt76x02_dfs.c | 3 ++- 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02.h b/drivers/net/wireless/mediatek/mt76/mt76x02.h index 42cc9da68f7b..b5a36d9fb2bd 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x02.h @@ -118,6 +118,7 @@ struct mt76x02_dev { unsigned long ed_trigger_timeout; bool ed_tx_blocked; bool ed_monitor; + u8 ed_monitor_enabled; u8 ed_monitor_learning; u8 ed_trigger; u8 ed_silent; diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_debugfs.c b/drivers/net/wireless/mediatek/mt76/mt76x02_debugfs.c index 7580c5c986ff..b1d6fd4861e3 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_debugfs.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_debugfs.c @@ -116,6 +116,32 @@ static int read_agc(struct seq_file *file, void *data) return 0; } +static int +mt76_edcca_set(void *data, u64 val) +{ + struct mt76x02_dev *dev = data; + enum nl80211_dfs_regions region = dev->dfs_pd.region; + + dev->ed_monitor_enabled = !!val; + dev->ed_monitor = dev->ed_monitor_enabled && + region == NL80211_DFS_ETSI; + mt76x02_edcca_init(dev, true); + + return 0; +} + +static int +mt76_edcca_get(void *data, u64 *val) +{ + struct mt76x02_dev *dev = data; + + *val = dev->ed_monitor_enabled; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(fops_edcca, mt76_edcca_get, mt76_edcca_set, + "%lld\n"); + void mt76x02_init_debugfs(struct mt76x02_dev *dev) { struct dentry *dir; @@ -127,6 +153,7 @@ void mt76x02_init_debugfs(struct mt76x02_dev *dev) debugfs_create_u8("temperature", 0400, dir, &dev->cal.temp); debugfs_create_bool("tpc", 0600, dir, &dev->enable_tpc); + debugfs_create_file("edcca", 0400, dir, dev, &fops_edcca); debugfs_create_file("ampdu_stat", 0400, dir, dev, &fops_ampdu_stat); debugfs_create_file("dfs_stats", 0400, dir, dev, &fops_dfs_stat); debugfs_create_devm_seqfile(dev->mt76.dev, "txpower", dir, diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_dfs.c b/drivers/net/wireless/mediatek/mt76/mt76x02_dfs.c index e4649103efd4..17d12d212d1b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_dfs.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_dfs.c @@ -885,7 +885,8 @@ mt76x02_dfs_set_domain(struct mt76x02_dev *dev, if (dfs_pd->region != region) { tasklet_disable(&dfs_pd->dfs_tasklet); - dev->ed_monitor = region == NL80211_DFS_ETSI; + dev->ed_monitor = dev->ed_monitor_enabled && + region == NL80211_DFS_ETSI; mt76x02_edcca_init(dev, true); dfs_pd->region = region; From b126c889743513543d007ac1c5c82b61ef003133 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 4 Mar 2019 08:36:11 +0100 Subject: [PATCH 038/468] mt76: mt7603: set moredata flag when queueing ps-filtered packets Clients should poll for more packets afterwards Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/main.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/main.c b/drivers/net/wireless/mediatek/mt76/mt7603/main.c index ea25eff5e81c..cc0fe0933b2d 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/main.c @@ -5,6 +5,7 @@ #include #include #include "mt7603.h" +#include "mac.h" #include "eeprom.h" static int @@ -385,6 +386,15 @@ mt7603_sta_ps(struct mt76_dev *mdev, struct ieee80211_sta *sta, bool ps) mt7603_ps_tx_list(dev, &list); } +static void +mt7603_ps_set_more_data(struct sk_buff *skb) +{ + struct ieee80211_hdr *hdr; + + hdr = (struct ieee80211_hdr *) &skb->data[MT_TXD_SIZE]; + hdr->frame_control |= cpu_to_le16(IEEE80211_FCTL_MOREDATA); +} + static void mt7603_release_buffered_frames(struct ieee80211_hw *hw, struct ieee80211_sta *sta, @@ -411,6 +421,7 @@ mt7603_release_buffered_frames(struct ieee80211_hw *hw, skb_set_queue_mapping(skb, MT_TXQ_PSD); __skb_unlink(skb, &msta->psq); + mt7603_ps_set_more_data(skb); __skb_queue_tail(&list, skb); nframes--; } From 7c1b998d3483acf785c45035c0e14a62023f1edb Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Mon, 4 Mar 2019 01:10:00 +0000 Subject: [PATCH 039/468] mt76: fix return value check in mt76_wmac_probe() In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: c8846e101502 ("mt76: add driver for MT7603E and MT7628/7688") Signed-off-by: Wei Yongjun Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt7603/soc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/soc.c b/drivers/net/wireless/mediatek/mt76/mt7603/soc.c index e13fea80d970..b920be1f5718 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/soc.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/soc.c @@ -23,9 +23,9 @@ mt76_wmac_probe(struct platform_device *pdev) } mem_base = devm_ioremap_resource(&pdev->dev, res); - if (!mem_base) { + if (IS_ERR(mem_base)) { dev_err(&pdev->dev, "Failed to get memory resource\n"); - return -EINVAL; + return PTR_ERR(mem_base); } mdev = mt76_alloc_device(&pdev->dev, sizeof(*dev), &mt7603_ops, From 411e05f4e87794332f328ca3fa201b731f023db5 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 6 Mar 2019 10:51:12 +0100 Subject: [PATCH 040/468] mt76x2u: remove duplicated entry in mt76x2u_device_table Remove duplicated entry in mt76x2u_device_table since Alfa AWUS036ACM and Aukey USB-AC1200 have the same ids Fixes: 62a25dc56990a ("mt76x2u: Add support for Alfa AWUS036ACM") Signed-off-by: Lorenzo Bianconi Signed-off-by: Felix Fietkau --- drivers/net/wireless/mediatek/mt76/mt76x2/usb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c index ddb6b2c48e01..7a5d539873ca 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c @@ -21,11 +21,10 @@ #include "mt76x2u.h" static const struct usb_device_id mt76x2u_device_table[] = { - { USB_DEVICE(0x0e8d, 0x7612) }, /* Alfa AWUS036ACM */ { USB_DEVICE(0x0b05, 0x1833) }, /* Asus USB-AC54 */ { USB_DEVICE(0x0b05, 0x17eb) }, /* Asus USB-AC55 */ { USB_DEVICE(0x0b05, 0x180b) }, /* Asus USB-N53 B1 */ - { USB_DEVICE(0x0e8d, 0x7612) }, /* Aukey USB-AC1200 */ + { USB_DEVICE(0x0e8d, 0x7612) }, /* Aukey USBAC1200 - Alfa AWUS036ACM */ { USB_DEVICE(0x057c, 0x8503) }, /* Avm FRITZ!WLAN AC860 */ { USB_DEVICE(0x7392, 0xb711) }, /* Edimax EW 7722 UAC */ { USB_DEVICE(0x0846, 0x9053) }, /* Netgear A6210 */ From 688cd8bd2c0fa9dc88e5ced55a73ddc79edf875d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 4 Mar 2019 21:38:42 +0100 Subject: [PATCH 041/468] iwlwifi: fix 64-bit division do_div() expects unsigned operands and otherwise triggers a warning like: drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:465:2: error: comparison of distinct pointer types ('typeof ((rtt_avg)) *' (aka 'long long *') and 'uint64_t *' (aka 'unsigned long long *')) [-Werror,-Wcompare-distinct-pointer-types] do_div(rtt_avg, 6666); ^~~~~~~~~~~~~~~~~~~~~ include/asm-generic/div64.h:222:28: note: expanded from macro 'do_div' (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~ 1 error generated. Change the do_div() to the simpler div_s64() that can handle negative inputs correctly. Fixes: 937b10c0de68 ("iwlwifi: mvm: add debug prints for FTM") Signed-off-by: Arnd Bergmann Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c index e9822a3ec373..94132cfd1f56 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c @@ -460,9 +460,7 @@ static int iwl_mvm_ftm_range_resp_valid(struct iwl_mvm *mvm, u8 request_id, static void iwl_mvm_debug_range_resp(struct iwl_mvm *mvm, u8 index, struct cfg80211_pmsr_result *res) { - s64 rtt_avg = res->ftm.rtt_avg * 100; - - do_div(rtt_avg, 6666); + s64 rtt_avg = div_s64(res->ftm.rtt_avg * 100, 6666); IWL_DEBUG_INFO(mvm, "entry %d\n", index); IWL_DEBUG_INFO(mvm, "\tstatus: %d\n", res->status); From f38a1f0a5a5710b14c0e899628c815522c6111cf Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Fri, 8 Mar 2019 15:58:20 -0800 Subject: [PATCH 042/468] libbpf: handle BTF parsing and loading properly This patch splits and cleans up error handling logic for loading BTF data. Previously, if BTF data was parsed successfully, but failed to load into kernel, we'd report nonsensical error code, instead of error returned from btf__load(). Now btf__new() and btf__load() are handled separately with proper cleanup and warning reporting. Fixes: d29d87f7e612 ("btf: separate btf creation and loading") Reported-by: Martin KaFai Lau Signed-off-by: Andrii Nakryiko Acked-by: Martin KaFai Lau Acked-by: Yonghong Song Signed-off-by: Daniel Borkmann --- tools/lib/bpf/libbpf.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index d5b830d60601..5e977d2688da 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -835,12 +835,19 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) obj->efile.maps_shndx = idx; else if (strcmp(name, BTF_ELF_SEC) == 0) { obj->btf = btf__new(data->d_buf, data->d_size); - if (IS_ERR(obj->btf) || btf__load(obj->btf)) { + if (IS_ERR(obj->btf)) { pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n", BTF_ELF_SEC, PTR_ERR(obj->btf)); - if (!IS_ERR(obj->btf)) - btf__free(obj->btf); obj->btf = NULL; + continue; + } + err = btf__load(obj->btf); + if (err) { + pr_warning("Error loading %s into kernel: %d. Ignored and continue.\n", + BTF_ELF_SEC, err); + btf__free(obj->btf); + obj->btf = NULL; + err = 0; } } else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) { btf_ext_data = data; From 50b7f1b7236bab08ebbbecf90521e84b068d7a17 Mon Sep 17 00:00:00 2001 From: Cornelia Huck Date: Mon, 11 Mar 2019 10:59:53 +0100 Subject: [PATCH 043/468] vfio: ccw: only free cp on final interrupt When we get an interrupt for a channel program, it is not necessarily the final interrupt; for example, the issuing guest may request an intermediate interrupt by specifying the program-controlled-interrupt flag on a ccw. We must not switch the state to idle if the interrupt is not yet final; even more importantly, we must not free the translated channel program if the interrupt is not yet final, or the host can crash during cp rewind. Fixes: e5f84dbaea59 ("vfio: ccw: return I/O results asynchronously") Cc: stable@vger.kernel.org # v4.12+ Reviewed-by: Eric Farman Signed-off-by: Cornelia Huck --- drivers/s390/cio/vfio_ccw_drv.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index a10cec0e86eb..0b3b9de45c60 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -72,20 +72,24 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) { struct vfio_ccw_private *private; struct irb *irb; + bool is_final; private = container_of(work, struct vfio_ccw_private, io_work); irb = &private->irb; + is_final = !(scsw_actl(&irb->scsw) & + (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); if (scsw_is_solicited(&irb->scsw)) { cp_update_scsw(&private->cp, &irb->scsw); - cp_free(&private->cp); + if (is_final) + cp_free(&private->cp); } memcpy(private->io_region->irb_area, irb, sizeof(*irb)); if (private->io_trigger) eventfd_signal(private->io_trigger, 1); - if (private->mdev) + if (private->mdev && is_final) private->state = VFIO_CCW_STATE_IDLE; } From 0d9c038feff6f834ad9e5d88b66715235ab23ff3 Mon Sep 17 00:00:00 2001 From: Tony Krowiak Date: Mon, 18 Feb 2019 12:01:35 -0500 Subject: [PATCH 044/468] zcrypt: handle AP Info notification from CHSC SEI command The current AP bus implementation periodically polls the AP configuration to detect changes. When the AP configuration is dynamically changed via the SE or an SCLP instruction, the changes will not be reflected to sysfs until the next time the AP configuration is polled. The CHSC architecture provides a Store Event Information (SEI) command to make notification of an AP configuration change. This patch introduces a handler to process notification from the CHSC SEI command by immediately kicking off an AP bus scan-after-event. Signed-off-by: Tony Krowiak Reviewed-by: Halil Pasic Reviewed-by: Sebastian Ott Reviewed-by: Harald Freudenberger Reviewed-by: Cornelia Huck Signed-off-by: Sebastian Ott Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/ap.h | 11 +++++++++++ drivers/s390/cio/chsc.c | 13 +++++++++++++ drivers/s390/crypto/ap_bus.c | 10 ++++++++++ 3 files changed, 34 insertions(+) diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h index 1a6a7092d942..e94a0a28b5eb 100644 --- a/arch/s390/include/asm/ap.h +++ b/arch/s390/include/asm/ap.h @@ -360,4 +360,15 @@ static inline struct ap_queue_status ap_dqap(ap_qid_t qid, return reg1; } +/* + * Interface to tell the AP bus code that a configuration + * change has happened. The bus code should at least do + * an ap bus resource rescan. + */ +#if IS_ENABLED(CONFIG_ZCRYPT) +void ap_bus_cfg_chg(void); +#else +static inline void ap_bus_cfg_chg(void){}; +#endif + #endif /* _ASM_S390_AP_H_ */ diff --git a/drivers/s390/cio/chsc.c b/drivers/s390/cio/chsc.c index a0baee25134c..eaf4699be2c5 100644 --- a/drivers/s390/cio/chsc.c +++ b/drivers/s390/cio/chsc.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "css.h" #include "cio.h" @@ -586,6 +587,15 @@ static void chsc_process_sei_scm_avail(struct chsc_sei_nt0_area *sei_area) " failed (rc=%d).\n", ret); } +static void chsc_process_sei_ap_cfg_chg(struct chsc_sei_nt0_area *sei_area) +{ + CIO_CRW_EVENT(3, "chsc: ap config changed\n"); + if (sei_area->rs != 5) + return; + + ap_bus_cfg_chg(); +} + static void chsc_process_sei_nt2(struct chsc_sei_nt2_area *sei_area) { switch (sei_area->cc) { @@ -612,6 +622,9 @@ static void chsc_process_sei_nt0(struct chsc_sei_nt0_area *sei_area) case 2: /* i/o resource accessibility */ chsc_process_sei_res_acc(sei_area); break; + case 3: /* ap config changed */ + chsc_process_sei_ap_cfg_chg(sei_area); + break; case 7: /* channel-path-availability information */ chsc_process_sei_chp_avail(sei_area); break; diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 033a1acabf48..1546389d71db 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -867,6 +867,16 @@ void ap_bus_force_rescan(void) } EXPORT_SYMBOL(ap_bus_force_rescan); +/* +* A config change has happened, force an ap bus rescan. +*/ +void ap_bus_cfg_chg(void) +{ + AP_DBF(DBF_INFO, "%s config change, forcing bus rescan\n", __func__); + + ap_bus_force_rescan(); +} + /* * hex2bitmap() - parse hex mask string and set bitmap. * Valid strings are "0x012345678" with at least one valid hex number. From 7a9b6be9fe58194d9a349159176e8cc0d8f10ef8 Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Fri, 8 Mar 2019 13:02:16 -0800 Subject: [PATCH 045/468] arm64: bcm2835: Add missing dependency on MFD_CORE. When adding the MFD dependency for power domains and WDT in bcm2835, I added it only on the arm32 side and missed it for arm64. Fixes: 5e6acc3e678e ("bcm2835-pm: Move bcm2835-watchdog's DT probe to an MFD.") Signed-off-by: Eric Anholt Reported-by: Stefan Wahren Acked-by: Stefan Wahren --- arch/arm64/Kconfig.platforms | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms index 251ecf34cb02..50f9fb562059 100644 --- a/arch/arm64/Kconfig.platforms +++ b/arch/arm64/Kconfig.platforms @@ -27,6 +27,7 @@ config ARCH_BCM2835 bool "Broadcom BCM2835 family" select TIMER_OF select GPIOLIB + select MFD_CORE select PINCTRL select PINCTRL_BCM2835 select ARM_AMBA From d6f1837107c00fa7b2fdb606acf65b33b455dbfd Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Mon, 11 Mar 2019 22:21:09 -0700 Subject: [PATCH 046/468] selftests/bpf: fix segfault of test_progs when prog loading failed The test_progs subtests, test_spin_lock() and test_map_lock(), requires BTF present to run successfully. Currently, when BTF failed to load, test_progs will segfault, $ ./test_progs ... 12: (bf) r1 = r8 13: (85) call bpf_spin_lock#93 map 'hash_map' has to have BTF in order to use bpf_spin_lock libbpf: -- END LOG -- libbpf: failed to load program 'map_lock_demo' libbpf: failed to load object './test_map_lock.o' test_map_lock:bpf_prog_load errno 13 Segmentation fault The segfault is caused by uninitialized variable "obj", which is used in bpf_object__close(obj), when bpf prog failed to load. Initializing variable "obj" to NULL in two occasions fixed the problem. $ ./test_progs ... Summary: 219 PASSED, 2 FAILED Fixes: b4d4556c3266 ("selftests/bpf: add bpf_spin_lock verifier tests") Fixes: ba72a7b4badb ("selftests/bpf: test for BPF_F_LOCK") Reported-by: Daniel Borkmann Signed-off-by: Yonghong Song Acked-by: Song Liu Signed-off-by: Daniel Borkmann --- tools/testing/selftests/bpf/prog_tests/map_lock.c | 2 +- tools/testing/selftests/bpf/prog_tests/spinlock.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c index 90f8a206340a..ee99368c595c 100644 --- a/tools/testing/selftests/bpf/prog_tests/map_lock.c +++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c @@ -37,7 +37,7 @@ void test_map_lock(void) const char *file = "./test_map_lock.o"; int prog_fd, map_fd[2], vars[17] = {}; pthread_t thread_id[6]; - struct bpf_object *obj; + struct bpf_object *obj = NULL; int err = 0, key = 0, i; void *ret; diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c index 9a573a9675d7..114ebe6a438e 100644 --- a/tools/testing/selftests/bpf/prog_tests/spinlock.c +++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c @@ -5,7 +5,7 @@ void test_spinlock(void) { const char *file = "./test_spin_lock.o"; pthread_t thread_id[4]; - struct bpf_object *obj; + struct bpf_object *obj = NULL; int prog_fd; int err = 0, i; void *ret; From 6bf21b54a596d60905cfc7e8af8e2fe16d9fe7e9 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Tue, 12 Mar 2019 09:59:45 +0100 Subject: [PATCH 047/468] libbpf: fix to reject unknown flags in xsk_socket__create() In xsk_socket__create(), the libbpf_flags field was not checked for setting currently unused/unknown flags. This patch fixes that by returning -EINVAL if the user has set any flag that is not in use at this point in time. Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Magnus Karlsson Reviewed-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann --- tools/lib/bpf/xsk.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index f98ac82c9aea..8d0078b65486 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -126,8 +126,8 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg, cfg->frame_headroom = usr_cfg->frame_headroom; } -static void xsk_set_xdp_socket_config(struct xsk_socket_config *cfg, - const struct xsk_socket_config *usr_cfg) +static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg, + const struct xsk_socket_config *usr_cfg) { if (!usr_cfg) { cfg->rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS; @@ -135,14 +135,19 @@ static void xsk_set_xdp_socket_config(struct xsk_socket_config *cfg, cfg->libbpf_flags = 0; cfg->xdp_flags = 0; cfg->bind_flags = 0; - return; + return 0; } + if (usr_cfg->libbpf_flags & ~XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD) + return -EINVAL; + cfg->rx_size = usr_cfg->rx_size; cfg->tx_size = usr_cfg->tx_size; cfg->libbpf_flags = usr_cfg->libbpf_flags; cfg->xdp_flags = usr_cfg->xdp_flags; cfg->bind_flags = usr_cfg->bind_flags; + + return 0; } int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size, @@ -557,7 +562,9 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, } strncpy(xsk->ifname, ifname, IFNAMSIZ); - xsk_set_xdp_socket_config(&xsk->config, usr_config); + err = xsk_set_xdp_socket_config(&xsk->config, usr_config); + if (err) + goto out_socket; if (rx) { err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING, From 2795e8c251614ac0784c9d41008551109f665716 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Mon, 11 Mar 2019 02:25:17 -0500 Subject: [PATCH 048/468] net: ieee802154: fix a potential NULL pointer dereference In case alloc_ordered_workqueue fails, the fix releases sources and returns -ENOMEM to avoid NULL pointer dereference. Signed-off-by: Kangjie Lu Acked-by: Michael Hennerich Signed-off-by: Stefan Schmidt --- drivers/net/ieee802154/adf7242.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ieee802154/adf7242.c b/drivers/net/ieee802154/adf7242.c index cd1d8faccca5..cd6b95e673a5 100644 --- a/drivers/net/ieee802154/adf7242.c +++ b/drivers/net/ieee802154/adf7242.c @@ -1268,6 +1268,10 @@ static int adf7242_probe(struct spi_device *spi) INIT_DELAYED_WORK(&lp->work, adf7242_rx_cal_work); lp->wqueue = alloc_ordered_workqueue(dev_name(&spi->dev), WQ_MEM_RECLAIM); + if (unlikely(!lp->wqueue)) { + ret = -ENOMEM; + goto err_hw_init; + } ret = adf7242_hw_init(lp); if (ret) From 19b39a25388e71390e059906c979f87be4ef0c71 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Tue, 19 Feb 2019 13:10:29 +0800 Subject: [PATCH 049/468] ieee802154: hwsim: propagate genlmsg_reply return code genlmsg_reply can fail, so propagate its return code Signed-off-by: Li RongQing Signed-off-by: Stefan Schmidt --- drivers/net/ieee802154/mac802154_hwsim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index b6743f03dce0..3b88846de31b 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -324,7 +324,7 @@ static int hwsim_get_radio_nl(struct sk_buff *msg, struct genl_info *info) goto out_err; } - genlmsg_reply(skb, info); + res = genlmsg_reply(skb, info); break; } From bf504110bc8aa05df48b0e5f0aa84bfb81e0574b Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 4 Mar 2019 14:06:12 +0000 Subject: [PATCH 050/468] Btrfs: fix incorrect file size after shrinking truncate and fsync If we do a shrinking truncate against an inode which is already present in the respective log tree and then rename it, as part of logging the new name we end up logging an inode item that reflects the old size of the file (the one which we previously logged) and not the new smaller size. The decision to preserve the size previously logged was added by commit 1a4bcf470c886b ("Btrfs: fix fsync data loss after adding hard link to inode") in order to avoid data loss after replaying the log. However that decision is only needed for the case the logged inode size is smaller then the current size of the inode, as explained in that commit's change log. If the current size of the inode is smaller then the previously logged size, we know a shrinking truncate happened and therefore need to use that smaller size. Example to trigger the problem: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 0 8000" /mnt/foo $ xfs_io -c "fsync" /mnt/foo $ xfs_io -c "truncate 3000" /mnt/foo $ mv /mnt/foo /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ mount /dev/sdb /mnt $ od -t x1 -A d /mnt/bar 0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab * 0008000 Once we rename the file, we log its name (and inode item), and because the inode was already logged before in the current transaction, we log it with a size of 8000 bytes because that is the size we previously logged (with the first fsync). As part of the rename, besides logging the inode, we do also sync the log, which is done since commit d4682ba03ef618 ("Btrfs: sync log after logging new name"), so the next fsync against our inode is effectively a no-op, since no new changes happened since the rename operation. Even if did not sync the log during the rename operation, the same problem (fize size of 8000 bytes instead of 3000 bytes) would be visible after replaying the log if the log ended up getting synced to disk through some other means, such as for example by fsyncing some other modified file. In the example above the fsync after the rename operation is there just because not every filesystem may guarantee logging/journalling the inode (and syncing the log/journal) during the rename operation, for example it is needed for f2fs, but not for ext4 and xfs. Fix this scenario by, when logging a new name (which is triggered by rename and link operations), using the current size of the inode instead of the previously logged inode size. A test case for fstests follows soon. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202695 CC: stable@vger.kernel.org # 4.4+ Reported-by: Seulbae Kim Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/tree-log.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f06454a55e00..5256cddf3a43 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4544,6 +4544,19 @@ static int logged_inode_size(struct btrfs_root *log, struct btrfs_inode *inode, item = btrfs_item_ptr(path->nodes[0], path->slots[0], struct btrfs_inode_item); *size_ret = btrfs_inode_size(path->nodes[0], item); + /* + * If the in-memory inode's i_size is smaller then the inode + * size stored in the btree, return the inode's i_size, so + * that we get a correct inode size after replaying the log + * when before a power failure we had a shrinking truncate + * followed by addition of a new name (rename / new hard link). + * Otherwise return the inode size from the btree, to avoid + * data loss when replaying a log due to previously doing a + * write that expands the inode's size and logging a new name + * immediately after. + */ + if (*size_ret > inode->vfs_inode.i_size) + *size_ret = inode->vfs_inode.i_size; } btrfs_release_path(path); From 2cc8334270e281815c3850c3adea363c51f21e0d Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Wed, 6 Mar 2019 17:13:04 -0500 Subject: [PATCH 051/468] btrfs: remove WARN_ON in log_dir_items When Filipe added the recursive directory logging stuff in 2f2ff0ee5e430 ("Btrfs: fix metadata inconsistencies after directory fsync") he specifically didn't take the directory i_mutex for the children directories that we need to log because of lockdep. This is generally fine, but can lead to this WARN_ON() tripping if we happen to run delayed deletion's in between our first search and our second search of dir_item/dir_indexes for this directory. We expect this to happen, so the WARN_ON() isn't necessary. Drop the WARN_ON() and add a comment so we know why this case can happen. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Signed-off-by: David Sterba --- fs/btrfs/tree-log.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 5256cddf3a43..960ea49a7a0f 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3578,9 +3578,16 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans, } btrfs_release_path(path); - /* find the first key from this transaction again */ + /* + * Find the first key from this transaction again. See the note for + * log_new_dir_dentries, if we're logging a directory recursively we + * won't be holding its i_mutex, which means we can modify the directory + * while we're logging it. If we remove an entry between our first + * search and this search we'll not find the key again and can just + * bail. + */ ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0); - if (WARN_ON(ret != 0)) + if (ret != 0) goto done; /* From 609e804d771f59dc5d45a93e5ee0053c74bbe2bf Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Wed, 27 Feb 2019 13:42:30 +0000 Subject: [PATCH 052/468] Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes When we are mixing buffered writes with direct IO writes against the same file and snapshotting is happening concurrently, we can end up with a corrupt file content in the snapshot. Example: 1) Inode/file is empty. 2) Snapshotting starts. 2) Buffered write at offset 0 length 256Kb. This updates the i_size of the inode to 256Kb, disk_i_size remains zero. This happens after the task doing the snapshot flushes all existing delalloc. 3) DIO write at offset 256Kb length 768Kb. Once the ordered extent completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and updates the inode item in the fs tree with a size of 1Mb (which is the value of disk_i_size). 4) The dealloc for the range [0, 256Kb[ did not start yet. 5) The transaction used in the DIO ordered extent completion, which updated the inode item, is committed by the snapshotting task. 6) Snapshot creation completes. 7) Dealloc for the range [0, 256Kb[ is flushed. After that when reading the file from the snapshot we always get zeroes for the range [0, 256Kb[, the file has a size of 1Mb and the data written by the direct IO write is found. From an application's point of view this is a corruption, since in the source subvolume it could never read a version of the file that included the data from the direct IO write without the data from the buffered write included as well. In the snapshot's tree, file extent items are missing for the range [0, 256Kb[. The issue, obviously, does not happen when using the -o flushoncommit mount option. Fix this by flushing delalloc for all the roots that are about to be snapshotted when committing a transaction. This guarantees total ordering when updating the disk_i_size of an inode since the flush for dealloc is done when a transaction is in the TRANS_STATE_COMMIT_START state and wait is done once no more external writers exist. This is similar to what we do when using the flushoncommit mount option, but we do it only if the transaction has snapshots to create and only for the roots of the subvolumes to be snapshotted. The bulk of the dealloc is flushed in the snapshot creation ioctl, so the flush work we do inside the transaction is minimized. This issue, involving buffered and direct IO writes with snapshotting, is often triggered by fstest btrfs/078, and got reported by fsck when not using the NO_HOLES features, for example: $ cat results/btrfs/078.full (...) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 258 inode 264 errors 100, file extent discount Found file extent holes: start: 524288, len: 65536 ERROR: errors found in fs roots Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/transaction.c | 49 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index acdad6d658f5..e4e665f422fc 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1886,8 +1886,10 @@ static void btrfs_cleanup_pending_block_groups(struct btrfs_trans_handle *trans) } } -static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info) +static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; + /* * We use writeback_inodes_sb here because if we used * btrfs_start_delalloc_roots we would deadlock with fs freeze. @@ -1897,15 +1899,50 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info) * from already being in a transaction and our join_transaction doesn't * have to re-take the fs freeze lock. */ - if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) + if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) { writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC); + } else { + struct btrfs_pending_snapshot *pending; + struct list_head *head = &trans->transaction->pending_snapshots; + + /* + * Flush dellaloc for any root that is going to be snapshotted. + * This is done to avoid a corrupted version of files, in the + * snapshots, that had both buffered and direct IO writes (even + * if they were done sequentially) due to an unordered update of + * the inode's size on disk. + */ + list_for_each_entry(pending, head, list) { + int ret; + + ret = btrfs_start_delalloc_snapshot(pending->root); + if (ret) + return ret; + } + } return 0; } -static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info) +static inline void btrfs_wait_delalloc_flush(struct btrfs_trans_handle *trans) { - if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) + struct btrfs_fs_info *fs_info = trans->fs_info; + + if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) { btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); + } else { + struct btrfs_pending_snapshot *pending; + struct list_head *head = &trans->transaction->pending_snapshots; + + /* + * Wait for any dellaloc that we started previously for the roots + * that are going to be snapshotted. This is to avoid a corrupted + * version of files in the snapshots that had both buffered and + * direct IO writes (even if they were done sequentially). + */ + list_for_each_entry(pending, head, list) + btrfs_wait_ordered_extents(pending->root, + U64_MAX, 0, U64_MAX); + } } int btrfs_commit_transaction(struct btrfs_trans_handle *trans) @@ -2023,7 +2060,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) extwriter_counter_dec(cur_trans, trans->type); - ret = btrfs_start_delalloc_flush(fs_info); + ret = btrfs_start_delalloc_flush(trans); if (ret) goto cleanup_transaction; @@ -2039,7 +2076,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) if (ret) goto cleanup_transaction; - btrfs_wait_delalloc_flush(fs_info); + btrfs_wait_delalloc_flush(trans); btrfs_scrub_pause(fs_info); /* From 0cc068e6ee59c1fffbfa977d8bf868b7551d80ac Mon Sep 17 00:00:00 2001 From: David Sterba Date: Thu, 7 Mar 2019 15:40:50 +0100 Subject: [PATCH 053/468] btrfs: don't report readahead errors and don't update statistics As readahead is an optimization, all errors are usually filtered out, but still properly handled when the real read call is done. The commit 5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead") added REQ_RAHEAD to readpages() because that's only used for readahead (despite what one would expect from the callback name). This causes a flood of messages and inflated read error stats, so skip reporting in case it's readahead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403 Reported-by: LimeTech Fixes: 5e9d398240b2 ("btrfs: readpages() should submit IO as read-ahead") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: David Sterba --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9024eee889b9..db934ceae9c1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6407,7 +6407,7 @@ static void btrfs_end_bio(struct bio *bio) if (bio_op(bio) == REQ_OP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); - else + else if (!(bio->bi_opf & REQ_RAHEAD)) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS); if (bio->bi_opf & REQ_PREFLUSH) From 1b986589680a2a5b6fc1ac196ea69925a93d9dd9 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 12 Mar 2019 10:23:02 -0700 Subject: [PATCH 054/468] bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_release Lorenz Bauer [thanks!] reported that a ptr returned by bpf_tcp_sock(sk) can still be accessed after bpf_sk_release(sk). Both bpf_tcp_sock() and bpf_sk_fullsock() have the same issue. This patch addresses them together. A simple reproducer looks like this: sk = bpf_sk_lookup_tcp(); /* if (!sk) ... */ tp = bpf_tcp_sock(sk); /* if (!tp) ... */ bpf_sk_release(sk); snd_cwnd = tp->snd_cwnd; /* oops! The verifier does not complain. */ The problem is the verifier did not scrub the register's states of the tcp_sock ptr (tp) after bpf_sk_release(sk). [ Note that when calling bpf_tcp_sock(sk), the sk is not always refcount-acquired. e.g. bpf_tcp_sock(skb->sk). The verifier works fine for this case. ] Currently, the verifier does not track if a helper's return ptr (in REG_0) is "carry"-ing one of its argument's refcount status. To carry this info, the reg1->id needs to be stored in reg0. One approach was tried, like "reg0->id = reg1->id", when calling "bpf_tcp_sock()". The main idea was to avoid adding another "ref_obj_id" for the same reg. However, overlapping the NULL marking and ref tracking purpose in one "id" does not work well: ref_sk = bpf_sk_lookup_tcp(); fullsock = bpf_sk_fullsock(ref_sk); tp = bpf_tcp_sock(ref_sk); if (!fullsock) { bpf_sk_release(ref_sk); return 0; } /* fullsock_reg->id is marked for NOT-NULL. * Same for tp_reg->id because they have the same id. */ /* oops. verifier did not complain about the missing !tp check */ snd_cwnd = tp->snd_cwnd; Hence, a new "ref_obj_id" is needed in "struct bpf_reg_state". With a new ref_obj_id, when bpf_sk_release(sk) is called, the verifier can scrub all reg states which has a ref_obj_id match. It is done with the changes in release_reg_references() in this patch. While fixing it, sk_to_full_sk() is removed from bpf_tcp_sock() and bpf_sk_fullsock() to avoid these helpers from returning another ptr. It will make bpf_sk_release(tp) possible: sk = bpf_sk_lookup_tcp(); /* if (!sk) ... */ tp = bpf_tcp_sock(sk); /* if (!tp) ... */ bpf_sk_release(tp); A separate helper "bpf_get_listener_sock()" will be added in a later patch to do sk_to_full_sk(). Misc change notes: - To allow bpf_sk_release(tp), the arg of bpf_sk_release() is changed from ARG_PTR_TO_SOCKET to ARG_PTR_TO_SOCK_COMMON. ARG_PTR_TO_SOCKET is removed from bpf.h since no helper is using it. - arg_type_is_refcounted() is renamed to arg_type_may_be_refcounted() because ARG_PTR_TO_SOCK_COMMON is the only one and skb->sk is not refcounted. All bpf_sk_release(), bpf_sk_fullsock() and bpf_tcp_sock() take ARG_PTR_TO_SOCK_COMMON. - check_refcount_ok() ensures is_acquire_function() cannot take arg_type_may_be_refcounted() as its argument. - The check_func_arg() can only allow one refcount-ed arg. It is guaranteed by check_refcount_ok() which ensures at most one arg can be refcounted. Hence, it is a verifier internal error if >1 refcount arg found in check_func_arg(). - In release_reference(), release_reference_state() is called first to ensure a match on "reg->ref_obj_id" can be found before scrubbing the reg states with release_reg_references(). - reg_is_refcounted() is no longer needed. 1. In mark_ptr_or_null_regs(), its usage is replaced by "ref_obj_id && ref_obj_id == id" because, when is_null == true, release_reference_state() should only be called on the ref_obj_id obtained by a acquire helper (i.e. is_acquire_function() == true). Otherwise, the following would happen: sk = bpf_sk_lookup_tcp(); /* if (!sk) { ... } */ fullsock = bpf_sk_fullsock(sk); if (!fullsock) { /* * release_reference_state(fullsock_reg->ref_obj_id) * where fullsock_reg->ref_obj_id == sk_reg->ref_obj_id. * * Hence, the following bpf_sk_release(sk) will fail * because the ref state has already been released in the * earlier release_reference_state(fullsock_reg->ref_obj_id). */ bpf_sk_release(sk); } 2. In release_reg_references(), the current reg_is_refcounted() call is unnecessary because the id check is enough. - The type_is_refcounted() and type_is_refcounted_or_null() are no longer needed also because reg_is_refcounted() is removed. Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Lorenz Bauer Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 - include/linux/bpf_verifier.h | 40 +++++++++++ kernel/bpf/verifier.c | 133 ++++++++++++++++++++--------------- net/core/filter.c | 6 +- 4 files changed, 116 insertions(+), 64 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a2132e09dc1c..f02367faa58d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -193,7 +193,6 @@ enum bpf_arg_type { ARG_PTR_TO_CTX, /* pointer to context */ ARG_ANYTHING, /* any (initialized) argument is ok */ - ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */ ARG_PTR_TO_SPIN_LOCK, /* pointer to bpf_spin_lock */ ARG_PTR_TO_SOCK_COMMON, /* pointer to sock_common */ }; diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 69f7a3449eda..7d8228d1c898 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -66,6 +66,46 @@ struct bpf_reg_state { * same reference to the socket, to determine proper reference freeing. */ u32 id; + /* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned + * from a pointer-cast helper, bpf_sk_fullsock() and + * bpf_tcp_sock(). + * + * Consider the following where "sk" is a reference counted + * pointer returned from "sk = bpf_sk_lookup_tcp();": + * + * 1: sk = bpf_sk_lookup_tcp(); + * 2: if (!sk) { return 0; } + * 3: fullsock = bpf_sk_fullsock(sk); + * 4: if (!fullsock) { bpf_sk_release(sk); return 0; } + * 5: tp = bpf_tcp_sock(fullsock); + * 6: if (!tp) { bpf_sk_release(sk); return 0; } + * 7: bpf_sk_release(sk); + * 8: snd_cwnd = tp->snd_cwnd; // verifier will complain + * + * After bpf_sk_release(sk) at line 7, both "fullsock" ptr and + * "tp" ptr should be invalidated also. In order to do that, + * the reg holding "fullsock" and "sk" need to remember + * the original refcounted ptr id (i.e. sk_reg->id) in ref_obj_id + * such that the verifier can reset all regs which have + * ref_obj_id matching the sk_reg->id. + * + * sk_reg->ref_obj_id is set to sk_reg->id at line 1. + * sk_reg->id will stay as NULL-marking purpose only. + * After NULL-marking is done, sk_reg->id can be reset to 0. + * + * After "fullsock = bpf_sk_fullsock(sk);" at line 3, + * fullsock_reg->ref_obj_id is set to sk_reg->ref_obj_id. + * + * After "tp = bpf_tcp_sock(fullsock);" at line 5, + * tp_reg->ref_obj_id is set to fullsock_reg->ref_obj_id + * which is the same as sk_reg->ref_obj_id. + * + * From the verifier perspective, if sk, fullsock and tp + * are not NULL, they are the same ptr with different + * reg->type. In particular, bpf_sk_release(tp) is also + * allowed and has the same effect as bpf_sk_release(sk). + */ + u32 ref_obj_id; /* For scalar types (SCALAR_VALUE), this represents our knowledge of * the actual value. * For pointer types, this represents the variable part of the offset diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ce166a002d16..86f9cd5d1c4e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -212,7 +212,7 @@ struct bpf_call_arg_meta { int access_size; s64 msize_smax_value; u64 msize_umax_value; - int ptr_id; + int ref_obj_id; int func_id; }; @@ -346,35 +346,15 @@ static bool reg_type_may_be_null(enum bpf_reg_type type) type == PTR_TO_TCP_SOCK_OR_NULL; } -static bool type_is_refcounted(enum bpf_reg_type type) -{ - return type == PTR_TO_SOCKET; -} - -static bool type_is_refcounted_or_null(enum bpf_reg_type type) -{ - return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL; -} - -static bool reg_is_refcounted(const struct bpf_reg_state *reg) -{ - return type_is_refcounted(reg->type); -} - static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { return reg->type == PTR_TO_MAP_VALUE && map_value_has_spin_lock(reg->map_ptr); } -static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg) +static bool arg_type_may_be_refcounted(enum bpf_arg_type type) { - return type_is_refcounted_or_null(reg->type); -} - -static bool arg_type_is_refcounted(enum bpf_arg_type type) -{ - return type == ARG_PTR_TO_SOCKET; + return type == ARG_PTR_TO_SOCK_COMMON; } /* Determine whether the function releases some resources allocated by another @@ -392,6 +372,12 @@ static bool is_acquire_function(enum bpf_func_id func_id) func_id == BPF_FUNC_sk_lookup_udp; } +static bool is_ptr_cast_function(enum bpf_func_id func_id) +{ + return func_id == BPF_FUNC_tcp_sock || + func_id == BPF_FUNC_sk_fullsock; +} + /* string representation of 'enum bpf_reg_type' */ static const char * const reg_type_str[] = { [NOT_INIT] = "?", @@ -465,7 +451,8 @@ static void print_verifier_state(struct bpf_verifier_env *env, if (t == PTR_TO_STACK) verbose(env, ",call_%d", func(env, reg)->callsite); } else { - verbose(env, "(id=%d", reg->id); + verbose(env, "(id=%d ref_obj_id=%d", reg->id, + reg->ref_obj_id); if (t != SCALAR_VALUE) verbose(env, ",off=%d", reg->off); if (type_is_pkt_pointer(t)) @@ -2414,16 +2401,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, /* Any sk pointer can be ARG_PTR_TO_SOCK_COMMON */ if (!type_is_sk_pointer(type)) goto err_type; - } else if (arg_type == ARG_PTR_TO_SOCKET) { - expected_type = PTR_TO_SOCKET; - if (type != expected_type) - goto err_type; - if (meta->ptr_id || !reg->id) { - verbose(env, "verifier internal error: mismatched references meta=%d, reg=%d\n", - meta->ptr_id, reg->id); - return -EFAULT; + if (reg->ref_obj_id) { + if (meta->ref_obj_id) { + verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", + regno, reg->ref_obj_id, + meta->ref_obj_id); + return -EFAULT; + } + meta->ref_obj_id = reg->ref_obj_id; } - meta->ptr_id = reg->id; } else if (arg_type == ARG_PTR_TO_SPIN_LOCK) { if (meta->func_id == BPF_FUNC_spin_lock) { if (process_spin_lock(env, regno, true)) @@ -2740,32 +2726,38 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn) return true; } -static bool check_refcount_ok(const struct bpf_func_proto *fn) +static bool check_refcount_ok(const struct bpf_func_proto *fn, int func_id) { int count = 0; - if (arg_type_is_refcounted(fn->arg1_type)) + if (arg_type_may_be_refcounted(fn->arg1_type)) count++; - if (arg_type_is_refcounted(fn->arg2_type)) + if (arg_type_may_be_refcounted(fn->arg2_type)) count++; - if (arg_type_is_refcounted(fn->arg3_type)) + if (arg_type_may_be_refcounted(fn->arg3_type)) count++; - if (arg_type_is_refcounted(fn->arg4_type)) + if (arg_type_may_be_refcounted(fn->arg4_type)) count++; - if (arg_type_is_refcounted(fn->arg5_type)) + if (arg_type_may_be_refcounted(fn->arg5_type)) count++; + /* A reference acquiring function cannot acquire + * another refcounted ptr. + */ + if (is_acquire_function(func_id) && count) + return false; + /* We only support one arg being unreferenced at the moment, * which is sufficient for the helper functions we have right now. */ return count <= 1; } -static int check_func_proto(const struct bpf_func_proto *fn) +static int check_func_proto(const struct bpf_func_proto *fn, int func_id) { return check_raw_mode_ok(fn) && check_arg_pair_ok(fn) && - check_refcount_ok(fn) ? 0 : -EINVAL; + check_refcount_ok(fn, func_id) ? 0 : -EINVAL; } /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END] @@ -2799,19 +2791,20 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env) } static void release_reg_references(struct bpf_verifier_env *env, - struct bpf_func_state *state, int id) + struct bpf_func_state *state, + int ref_obj_id) { struct bpf_reg_state *regs = state->regs, *reg; int i; for (i = 0; i < MAX_BPF_REG; i++) - if (regs[i].id == id) + if (regs[i].ref_obj_id == ref_obj_id) mark_reg_unknown(env, regs, i); bpf_for_each_spilled_reg(i, state, reg) { if (!reg) continue; - if (reg_is_refcounted(reg) && reg->id == id) + if (reg->ref_obj_id == ref_obj_id) __mark_reg_unknown(reg); } } @@ -2820,15 +2813,20 @@ static void release_reg_references(struct bpf_verifier_env *env, * resources. Identify all copies of the same pointer and clear the reference. */ static int release_reference(struct bpf_verifier_env *env, - struct bpf_call_arg_meta *meta) + int ref_obj_id) { struct bpf_verifier_state *vstate = env->cur_state; + int err; int i; - for (i = 0; i <= vstate->curframe; i++) - release_reg_references(env, vstate->frame[i], meta->ptr_id); + err = release_reference_state(cur_func(env), ref_obj_id); + if (err) + return err; - return release_reference_state(cur_func(env), meta->ptr_id); + for (i = 0; i <= vstate->curframe; i++) + release_reg_references(env, vstate->frame[i], ref_obj_id); + + return 0; } static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, @@ -3047,7 +3045,7 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn memset(&meta, 0, sizeof(meta)); meta.pkt_access = fn->pkt_access; - err = check_func_proto(fn); + err = check_func_proto(fn, func_id); if (err) { verbose(env, "kernel subsystem misconfigured func %s#%d\n", func_id_name(func_id), func_id); @@ -3093,7 +3091,7 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn return err; } } else if (is_release_function(func_id)) { - err = release_reference(env, &meta); + err = release_reference(env, meta.ref_obj_id); if (err) { verbose(env, "func %s#%d reference has not been acquired before\n", func_id_name(func_id), func_id); @@ -3154,8 +3152,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn if (id < 0) return id; - /* For release_reference() */ + /* For mark_ptr_or_null_reg() */ regs[BPF_REG_0].id = id; + /* For release_reference() */ + regs[BPF_REG_0].ref_obj_id = id; } else { /* For mark_ptr_or_null_reg() */ regs[BPF_REG_0].id = ++env->id_gen; @@ -3170,6 +3170,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn return -EINVAL; } + if (is_ptr_cast_function(func_id)) + /* For release_reference() */ + regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id; + do_refine_retval_range(regs, fn->ret_type, func_id, &meta); err = check_map_func_compatibility(env, meta.map_ptr, func_id); @@ -4665,11 +4669,19 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state, } else if (reg->type == PTR_TO_TCP_SOCK_OR_NULL) { reg->type = PTR_TO_TCP_SOCK; } - if (is_null || !(reg_is_refcounted(reg) || - reg_may_point_to_spin_lock(reg))) { - /* We don't need id from this point onwards anymore, - * thus we should better reset it, so that state - * pruning has chances to take effect. + if (is_null) { + /* We don't need id and ref_obj_id from this point + * onwards anymore, thus we should better reset it, + * so that state pruning has chances to take effect. + */ + reg->id = 0; + reg->ref_obj_id = 0; + } else if (!reg_may_point_to_spin_lock(reg)) { + /* For not-NULL ptr, reg->ref_obj_id will be reset + * in release_reg_references(). + * + * reg->id is still used by spin_lock ptr. Other + * than spin_lock ptr type, reg->id can be reset. */ reg->id = 0; } @@ -4684,11 +4696,16 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, { struct bpf_func_state *state = vstate->frame[vstate->curframe]; struct bpf_reg_state *reg, *regs = state->regs; + u32 ref_obj_id = regs[regno].ref_obj_id; u32 id = regs[regno].id; int i, j; - if (reg_is_refcounted_or_null(®s[regno]) && is_null) - release_reference_state(state, id); + if (ref_obj_id && ref_obj_id == id && is_null) + /* regs[regno] is in the " == NULL" branch. + * No one could have freed the reference state before + * doing the NULL check. + */ + WARN_ON_ONCE(release_reference_state(state, id)); for (i = 0; i < MAX_BPF_REG; i++) mark_ptr_or_null_reg(state, ®s[i], id, is_null); diff --git a/net/core/filter.c b/net/core/filter.c index f274620945ff..36b6afacf83c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1796,8 +1796,6 @@ static const struct bpf_func_proto bpf_skb_pull_data_proto = { BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk) { - sk = sk_to_full_sk(sk); - return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL; } @@ -5266,7 +5264,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = { .func = bpf_sk_release, .gpl_only = false, .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_SOCKET, + .arg1_type = ARG_PTR_TO_SOCK_COMMON, }; BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx, @@ -5407,8 +5405,6 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type, BPF_CALL_1(bpf_tcp_sock, struct sock *, sk) { - sk = sk_to_full_sk(sk); - if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP) return (unsigned long)sk; From dbafd7ddd62369b2f3926ab847cbf8fc40e800b7 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 12 Mar 2019 10:23:04 -0700 Subject: [PATCH 055/468] bpf: Add bpf_get_listener_sock(struct bpf_sock *sk) helper Add a new helper "struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)" which returns a bpf_sock in TCP_LISTEN state. It will trace back to the listener sk from a request_sock if possible. It returns NULL for all other cases. No reference is taken because the helper ensures the sk is in SOCK_RCU_FREE (where the TCP_LISTEN sock should be in). Hence, bpf_sk_release() is unnecessary and the verifier does not allow bpf_sk_release(listen_sk) to be called either. The following is also allowed because the bpf_prog is run under rcu_read_lock(): sk = bpf_sk_lookup_tcp(); /* if (!sk) { ... } */ listen_sk = bpf_get_listener_sock(sk); /* if (!listen_sk) { ... } */ bpf_sk_release(sk); src_port = listen_sk->src_port; /* Allowed */ Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 11 ++++++++++- net/core/filter.c | 21 +++++++++++++++++++++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 3c38ac9a92a7..983b25cb608d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2366,6 +2366,14 @@ union bpf_attr { * current value is ect (ECN capable). Works with IPv6 and IPv4. * Return * 1 if set, 0 if not set. + * + * struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk) + * Description + * Return a **struct bpf_sock** pointer in TCP_LISTEN state. + * bpf_sk_release() is unnecessary and not allowed. + * Return + * A **struct bpf_sock** pointer on success, or NULL in + * case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2465,7 +2473,8 @@ union bpf_attr { FN(spin_unlock), \ FN(sk_fullsock), \ FN(tcp_sock), \ - FN(skb_ecn_set_ce), + FN(skb_ecn_set_ce), \ + FN(get_listener_sock), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/net/core/filter.c b/net/core/filter.c index 36b6afacf83c..647c63a7b25b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5418,6 +5418,23 @@ static const struct bpf_func_proto bpf_tcp_sock_proto = { .arg1_type = ARG_PTR_TO_SOCK_COMMON, }; +BPF_CALL_1(bpf_get_listener_sock, struct sock *, sk) +{ + sk = sk_to_full_sk(sk); + + if (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_RCU_FREE)) + return (unsigned long)sk; + + return (unsigned long)NULL; +} + +static const struct bpf_func_proto bpf_get_listener_sock_proto = { + .func = bpf_get_listener_sock, + .gpl_only = false, + .ret_type = RET_PTR_TO_SOCKET_OR_NULL, + .arg1_type = ARG_PTR_TO_SOCK_COMMON, +}; + BPF_CALL_1(bpf_skb_ecn_set_ce, struct sk_buff *, skb) { unsigned int iphdr_len; @@ -5603,6 +5620,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #ifdef CONFIG_INET case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; + case BPF_FUNC_get_listener_sock: + return &bpf_get_listener_sock_proto; case BPF_FUNC_skb_ecn_set_ce: return &bpf_skb_ecn_set_ce_proto; #endif @@ -5698,6 +5717,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_sk_release_proto; case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; + case BPF_FUNC_get_listener_sock: + return &bpf_get_listener_sock_proto; #endif default: return bpf_base_func_proto(func_id); From ef776a272b093fb52612e6d8f4b3e77f9bb49554 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 12 Mar 2019 10:23:06 -0700 Subject: [PATCH 056/468] bpf: Sync bpf.h to tools/ This patch sync the uapi bpf.h to tools/. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- tools/include/uapi/linux/bpf.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 3c38ac9a92a7..983b25cb608d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2366,6 +2366,14 @@ union bpf_attr { * current value is ect (ECN capable). Works with IPv6 and IPv4. * Return * 1 if set, 0 if not set. + * + * struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk) + * Description + * Return a **struct bpf_sock** pointer in TCP_LISTEN state. + * bpf_sk_release() is unnecessary and not allowed. + * Return + * A **struct bpf_sock** pointer on success, or NULL in + * case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2465,7 +2473,8 @@ union bpf_attr { FN(spin_unlock), \ FN(sk_fullsock), \ FN(tcp_sock), \ - FN(skb_ecn_set_ce), + FN(skb_ecn_set_ce), \ + FN(get_listener_sock), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call From b55aa7b04bb42274b4a894020b5b2fa059c3527e Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 12 Mar 2019 10:23:09 -0700 Subject: [PATCH 057/468] bpf: Test ref release issue in bpf_tcp_sock and bpf_sk_fullsock Adding verifier tests to ensure the ptr returned from bpf_tcp_sock() and bpf_sk_fullsock() cannot be accessed after bpf_sk_release() is called. A few of the tests are derived from a reproducer test by Lorenz Bauer. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/verifier/ref_tracking.c | 168 ++++++++++++++++++ tools/testing/selftests/bpf/verifier/sock.c | 4 +- 2 files changed, 170 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c index 3ed3593bd8b6..923f2110072d 100644 --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c @@ -605,3 +605,171 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, +{ + "reference tracking: use ptr from bpf_tcp_sock() after release", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_tcp_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_tcp_sock, snd_cwnd)), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "invalid mem access", +}, +{ + "reference tracking: use ptr from bpf_sk_fullsock() after release", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "invalid mem access", +}, +{ + "reference tracking: use ptr from bpf_sk_fullsock(tp) after release", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_tcp_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_JMP_IMM(BPF_JNE, BPF_REG_6, 0, 1), + BPF_EXIT_INSN(), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, offsetof(struct bpf_sock, type)), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "invalid mem access", +}, +{ + "reference tracking: use sk after bpf_sk_release(tp)", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_tcp_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, offsetof(struct bpf_sock, type)), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "invalid mem access", +}, +{ + "reference tracking: use ptr from bpf_get_listener_sock() after bpf_sk_release(sk)", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_get_listener_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, offsetof(struct bpf_sock, src_port)), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, +}, +{ + "reference tracking: bpf_sk_release(listen_sk)", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_get_listener_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, offsetof(struct bpf_sock, type)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "reference has not been acquired before", +}, +{ + /* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */ + "reference tracking: tp->snd_cwnd after bpf_sk_fullsock(sk) and bpf_tcp_sock(sk)", + .insns = { + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_tcp_sock), + BPF_MOV64_REG(BPF_REG_8, BPF_REG_0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_8, offsetof(struct bpf_tcp_sock, snd_cwnd)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "invalid mem access", +}, diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c index 0ddfdf76aba5..416436231fab 100644 --- a/tools/testing/selftests/bpf/verifier/sock.c +++ b/tools/testing/selftests/bpf/verifier/sock.c @@ -342,7 +342,7 @@ }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = REJECT, - .errstr = "type=sock_common expected=sock", + .errstr = "reference has not been acquired before", }, { "bpf_sk_release(bpf_sk_fullsock(skb->sk))", @@ -380,5 +380,5 @@ }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = REJECT, - .errstr = "type=tcp_sock expected=sock", + .errstr = "reference has not been acquired before", }, From 7681e7b2fbe2a78806423810c0d84dd230b96f94 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 12 Mar 2019 10:23:11 -0700 Subject: [PATCH 058/468] bpf: Add an example for bpf_get_listener_sock This patch adds an example in using the new helper bpf_get_listener_sock(). Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/bpf_helpers.h | 2 + .../bpf/progs/test_sock_fields_kern.c | 88 +++++++++--- .../testing/selftests/bpf/test_sock_fields.c | 136 ++++++++++++++---- 3 files changed, 181 insertions(+), 45 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index c9433a496d54..c81fc350f7ad 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -180,6 +180,8 @@ static struct bpf_sock *(*bpf_sk_fullsock)(struct bpf_sock *sk) = (void *) BPF_FUNC_sk_fullsock; static struct bpf_tcp_sock *(*bpf_tcp_sock)(struct bpf_sock *sk) = (void *) BPF_FUNC_tcp_sock; +static struct bpf_sock *(*bpf_get_listener_sock)(struct bpf_sock *sk) = + (void *) BPF_FUNC_get_listener_sock; static int (*bpf_skb_ecn_set_ce)(void *ctx) = (void *) BPF_FUNC_skb_ecn_set_ce; diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c b/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c index de1a43e8f610..37328f148538 100644 --- a/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c +++ b/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c @@ -8,38 +8,51 @@ #include "bpf_helpers.h" #include "bpf_endian.h" -enum bpf_array_idx { - SRV_IDX, - CLI_IDX, - __NR_BPF_ARRAY_IDX, +enum bpf_addr_array_idx { + ADDR_SRV_IDX, + ADDR_CLI_IDX, + __NR_BPF_ADDR_ARRAY_IDX, +}; + +enum bpf_result_array_idx { + EGRESS_SRV_IDX, + EGRESS_CLI_IDX, + INGRESS_LISTEN_IDX, + __NR_BPF_RESULT_ARRAY_IDX, +}; + +enum bpf_linum_array_idx { + EGRESS_LINUM_IDX, + INGRESS_LINUM_IDX, + __NR_BPF_LINUM_ARRAY_IDX, }; struct bpf_map_def SEC("maps") addr_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(__u32), .value_size = sizeof(struct sockaddr_in6), - .max_entries = __NR_BPF_ARRAY_IDX, + .max_entries = __NR_BPF_ADDR_ARRAY_IDX, }; struct bpf_map_def SEC("maps") sock_result_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(__u32), .value_size = sizeof(struct bpf_sock), - .max_entries = __NR_BPF_ARRAY_IDX, + .max_entries = __NR_BPF_RESULT_ARRAY_IDX, }; struct bpf_map_def SEC("maps") tcp_sock_result_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(__u32), .value_size = sizeof(struct bpf_tcp_sock), - .max_entries = __NR_BPF_ARRAY_IDX, + .max_entries = __NR_BPF_RESULT_ARRAY_IDX, }; struct bpf_map_def SEC("maps") linum_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(__u32), .value_size = sizeof(__u32), - .max_entries = 1, + .max_entries = __NR_BPF_LINUM_ARRAY_IDX, }; static bool is_loopback6(__u32 *a6) @@ -100,18 +113,20 @@ static void tpcpy(struct bpf_tcp_sock *dst, #define RETURN { \ linum = __LINE__; \ - bpf_map_update_elem(&linum_map, &idx0, &linum, 0); \ + bpf_map_update_elem(&linum_map, &linum_idx, &linum, 0); \ return 1; \ } SEC("cgroup_skb/egress") -int read_sock_fields(struct __sk_buff *skb) +int egress_read_sock_fields(struct __sk_buff *skb) { - __u32 srv_idx = SRV_IDX, cli_idx = CLI_IDX, idx; + __u32 srv_idx = ADDR_SRV_IDX, cli_idx = ADDR_CLI_IDX, result_idx; struct sockaddr_in6 *srv_sa6, *cli_sa6; struct bpf_tcp_sock *tp, *tp_ret; struct bpf_sock *sk, *sk_ret; - __u32 linum, idx0 = 0; + __u32 linum, linum_idx; + + linum_idx = EGRESS_LINUM_IDX; sk = skb->sk; if (!sk || sk->state == 10) @@ -132,14 +147,55 @@ int read_sock_fields(struct __sk_buff *skb) RETURN; if (sk->src_port == bpf_ntohs(srv_sa6->sin6_port)) - idx = srv_idx; + result_idx = EGRESS_SRV_IDX; else if (sk->src_port == bpf_ntohs(cli_sa6->sin6_port)) - idx = cli_idx; + result_idx = EGRESS_CLI_IDX; else RETURN; - sk_ret = bpf_map_lookup_elem(&sock_result_map, &idx); - tp_ret = bpf_map_lookup_elem(&tcp_sock_result_map, &idx); + sk_ret = bpf_map_lookup_elem(&sock_result_map, &result_idx); + tp_ret = bpf_map_lookup_elem(&tcp_sock_result_map, &result_idx); + if (!sk_ret || !tp_ret) + RETURN; + + skcpy(sk_ret, sk); + tpcpy(tp_ret, tp); + + RETURN; +} + +SEC("cgroup_skb/ingress") +int ingress_read_sock_fields(struct __sk_buff *skb) +{ + __u32 srv_idx = ADDR_SRV_IDX, result_idx = INGRESS_LISTEN_IDX; + struct bpf_tcp_sock *tp, *tp_ret; + struct bpf_sock *sk, *sk_ret; + struct sockaddr_in6 *srv_sa6; + __u32 linum, linum_idx; + + linum_idx = INGRESS_LINUM_IDX; + + sk = skb->sk; + if (!sk || sk->family != AF_INET6 || !is_loopback6(sk->src_ip6)) + RETURN; + + srv_sa6 = bpf_map_lookup_elem(&addr_map, &srv_idx); + if (!srv_sa6 || sk->src_port != bpf_ntohs(srv_sa6->sin6_port)) + RETURN; + + if (sk->state != 10 && sk->state != 12) + RETURN; + + sk = bpf_get_listener_sock(sk); + if (!sk) + RETURN; + + tp = bpf_tcp_sock(sk); + if (!tp) + RETURN; + + sk_ret = bpf_map_lookup_elem(&sock_result_map, &result_idx); + tp_ret = bpf_map_lookup_elem(&tcp_sock_result_map, &result_idx); if (!sk_ret || !tp_ret) RETURN; diff --git a/tools/testing/selftests/bpf/test_sock_fields.c b/tools/testing/selftests/bpf/test_sock_fields.c index bc8943938bf5..dcae7f664dce 100644 --- a/tools/testing/selftests/bpf/test_sock_fields.c +++ b/tools/testing/selftests/bpf/test_sock_fields.c @@ -16,10 +16,23 @@ #include "cgroup_helpers.h" #include "bpf_rlimit.h" -enum bpf_array_idx { - SRV_IDX, - CLI_IDX, - __NR_BPF_ARRAY_IDX, +enum bpf_addr_array_idx { + ADDR_SRV_IDX, + ADDR_CLI_IDX, + __NR_BPF_ADDR_ARRAY_IDX, +}; + +enum bpf_result_array_idx { + EGRESS_SRV_IDX, + EGRESS_CLI_IDX, + INGRESS_LISTEN_IDX, + __NR_BPF_RESULT_ARRAY_IDX, +}; + +enum bpf_linum_array_idx { + EGRESS_LINUM_IDX, + INGRESS_LINUM_IDX, + __NR_BPF_LINUM_ARRAY_IDX, }; #define CHECK(condition, tag, format...) ({ \ @@ -41,8 +54,16 @@ static int linum_map_fd; static int addr_map_fd; static int tp_map_fd; static int sk_map_fd; -static __u32 srv_idx = SRV_IDX; -static __u32 cli_idx = CLI_IDX; + +static __u32 addr_srv_idx = ADDR_SRV_IDX; +static __u32 addr_cli_idx = ADDR_CLI_IDX; + +static __u32 egress_srv_idx = EGRESS_SRV_IDX; +static __u32 egress_cli_idx = EGRESS_CLI_IDX; +static __u32 ingress_listen_idx = INGRESS_LISTEN_IDX; + +static __u32 egress_linum_idx = EGRESS_LINUM_IDX; +static __u32 ingress_linum_idx = INGRESS_LINUM_IDX; static void init_loopback6(struct sockaddr_in6 *sa6) { @@ -93,29 +114,46 @@ static void print_tp(const struct bpf_tcp_sock *tp) static void check_result(void) { - struct bpf_tcp_sock srv_tp, cli_tp; - struct bpf_sock srv_sk, cli_sk; - __u32 linum, idx0 = 0; + struct bpf_tcp_sock srv_tp, cli_tp, listen_tp; + struct bpf_sock srv_sk, cli_sk, listen_sk; + __u32 ingress_linum, egress_linum; int err; - err = bpf_map_lookup_elem(linum_map_fd, &idx0, &linum); + err = bpf_map_lookup_elem(linum_map_fd, &egress_linum_idx, + &egress_linum); CHECK(err == -1, "bpf_map_lookup_elem(linum_map_fd)", "err:%d errno:%d", err, errno); - err = bpf_map_lookup_elem(sk_map_fd, &srv_idx, &srv_sk); - CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &srv_idx)", - "err:%d errno:%d", err, errno); - err = bpf_map_lookup_elem(tp_map_fd, &srv_idx, &srv_tp); - CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &srv_idx)", + err = bpf_map_lookup_elem(linum_map_fd, &ingress_linum_idx, + &ingress_linum); + CHECK(err == -1, "bpf_map_lookup_elem(linum_map_fd)", "err:%d errno:%d", err, errno); - err = bpf_map_lookup_elem(sk_map_fd, &cli_idx, &cli_sk); - CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &cli_idx)", + err = bpf_map_lookup_elem(sk_map_fd, &egress_srv_idx, &srv_sk); + CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &egress_srv_idx)", "err:%d errno:%d", err, errno); - err = bpf_map_lookup_elem(tp_map_fd, &cli_idx, &cli_tp); - CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &cli_idx)", + err = bpf_map_lookup_elem(tp_map_fd, &egress_srv_idx, &srv_tp); + CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &egress_srv_idx)", "err:%d errno:%d", err, errno); + err = bpf_map_lookup_elem(sk_map_fd, &egress_cli_idx, &cli_sk); + CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &egress_cli_idx)", + "err:%d errno:%d", err, errno); + err = bpf_map_lookup_elem(tp_map_fd, &egress_cli_idx, &cli_tp); + CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &egress_cli_idx)", + "err:%d errno:%d", err, errno); + + err = bpf_map_lookup_elem(sk_map_fd, &ingress_listen_idx, &listen_sk); + CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &ingress_listen_idx)", + "err:%d errno:%d", err, errno); + err = bpf_map_lookup_elem(tp_map_fd, &ingress_listen_idx, &listen_tp); + CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &ingress_listen_idx)", + "err:%d errno:%d", err, errno); + + printf("listen_sk: "); + print_sk(&listen_sk); + printf("\n"); + printf("srv_sk: "); print_sk(&srv_sk); printf("\n"); @@ -124,6 +162,10 @@ static void check_result(void) print_sk(&cli_sk); printf("\n"); + printf("listen_tp: "); + print_tp(&listen_tp); + printf("\n"); + printf("srv_tp: "); print_tp(&srv_tp); printf("\n"); @@ -132,6 +174,19 @@ static void check_result(void) print_tp(&cli_tp); printf("\n"); + CHECK(listen_sk.state != 10 || + listen_sk.family != AF_INET6 || + listen_sk.protocol != IPPROTO_TCP || + memcmp(listen_sk.src_ip6, &in6addr_loopback, + sizeof(listen_sk.src_ip6)) || + listen_sk.dst_ip6[0] || listen_sk.dst_ip6[1] || + listen_sk.dst_ip6[2] || listen_sk.dst_ip6[3] || + listen_sk.src_port != ntohs(srv_sa6.sin6_port) || + listen_sk.dst_port, + "Unexpected listen_sk", + "Check listen_sk output. ingress_linum:%u", + ingress_linum); + CHECK(srv_sk.state == 10 || !srv_sk.state || srv_sk.family != AF_INET6 || @@ -142,7 +197,8 @@ static void check_result(void) sizeof(srv_sk.dst_ip6)) || srv_sk.src_port != ntohs(srv_sa6.sin6_port) || srv_sk.dst_port != cli_sa6.sin6_port, - "Unexpected srv_sk", "Check srv_sk output. linum:%u", linum); + "Unexpected srv_sk", "Check srv_sk output. egress_linum:%u", + egress_linum); CHECK(cli_sk.state == 10 || !cli_sk.state || @@ -154,21 +210,31 @@ static void check_result(void) sizeof(cli_sk.dst_ip6)) || cli_sk.src_port != ntohs(cli_sa6.sin6_port) || cli_sk.dst_port != srv_sa6.sin6_port, - "Unexpected cli_sk", "Check cli_sk output. linum:%u", linum); + "Unexpected cli_sk", "Check cli_sk output. egress_linum:%u", + egress_linum); + + CHECK(listen_tp.data_segs_out || + listen_tp.data_segs_in || + listen_tp.total_retrans || + listen_tp.bytes_acked, + "Unexpected listen_tp", "Check listen_tp output. ingress_linum:%u", + ingress_linum); CHECK(srv_tp.data_segs_out != 1 || srv_tp.data_segs_in || srv_tp.snd_cwnd != 10 || srv_tp.total_retrans || srv_tp.bytes_acked != DATA_LEN, - "Unexpected srv_tp", "Check srv_tp output. linum:%u", linum); + "Unexpected srv_tp", "Check srv_tp output. egress_linum:%u", + egress_linum); CHECK(cli_tp.data_segs_out || cli_tp.data_segs_in != 1 || cli_tp.snd_cwnd != 10 || cli_tp.total_retrans || cli_tp.bytes_received != DATA_LEN, - "Unexpected cli_tp", "Check cli_tp output. linum:%u", linum); + "Unexpected cli_tp", "Check cli_tp output. egress_linum:%u", + egress_linum); } static void test(void) @@ -211,10 +277,10 @@ static void test(void) err, errno); /* Update addr_map with srv_sa6 and cli_sa6 */ - err = bpf_map_update_elem(addr_map_fd, &srv_idx, &srv_sa6, 0); + err = bpf_map_update_elem(addr_map_fd, &addr_srv_idx, &srv_sa6, 0); CHECK(err, "map_update", "err:%d errno:%d", err, errno); - err = bpf_map_update_elem(addr_map_fd, &cli_idx, &cli_sa6, 0); + err = bpf_map_update_elem(addr_map_fd, &addr_cli_idx, &cli_sa6, 0); CHECK(err, "map_update", "err:%d errno:%d", err, errno); /* Connect from cli_sa6 to srv_sa6 */ @@ -273,9 +339,9 @@ int main(int argc, char **argv) struct bpf_prog_load_attr attr = { .file = "test_sock_fields_kern.o", .prog_type = BPF_PROG_TYPE_CGROUP_SKB, - .expected_attach_type = BPF_CGROUP_INET_EGRESS, }; - int cgroup_fd, prog_fd, err; + int cgroup_fd, egress_fd, ingress_fd, err; + struct bpf_program *ingress_prog; struct bpf_object *obj; struct bpf_map *map; @@ -293,12 +359,24 @@ int main(int argc, char **argv) err = join_cgroup(TEST_CGROUP); CHECK(err, "join_cgroup", "err:%d errno:%d", err, errno); - err = bpf_prog_load_xattr(&attr, &obj, &prog_fd); + err = bpf_prog_load_xattr(&attr, &obj, &egress_fd); CHECK(err, "bpf_prog_load_xattr()", "err:%d", err); - err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0); + ingress_prog = bpf_object__find_program_by_title(obj, + "cgroup_skb/ingress"); + CHECK(!ingress_prog, + "bpf_object__find_program_by_title(cgroup_skb/ingress)", + "not found"); + ingress_fd = bpf_program__fd(ingress_prog); + + err = bpf_prog_attach(egress_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0); CHECK(err == -1, "bpf_prog_attach(CPF_CGROUP_INET_EGRESS)", "err:%d errno%d", err, errno); + + err = bpf_prog_attach(ingress_fd, cgroup_fd, + BPF_CGROUP_INET_INGRESS, 0); + CHECK(err == -1, "bpf_prog_attach(CPF_CGROUP_INET_INGRESS)", + "err:%d errno%d", err, errno); close(cgroup_fd); map = bpf_object__find_map_by_name(obj, "addr_map"); From 9768095ba97ce946838e8210f0b44f2fd36ec31d Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Sun, 10 Mar 2019 17:44:09 -0700 Subject: [PATCH 059/468] btf: resolve enum fwds in btf_dedup GCC and clang support enum forward declarations as an extension. Such forward-declared enums will be represented as normal BTF_KIND_ENUM types with vlen=0. This patch adds ability to resolve such enums to their corresponding fully defined enums. This helps to avoid duplicated BTF type graphs which only differ by some types referencing forward-declared enum vs full enum. One such example in kernel is enum irqchip_irq_state, defined in include/linux/interrupt.h and forward-declared in include/linux/irq.h. This causes entire struct task_struct and all referenced types to be duplicated in btf_dedup output. This patch eliminates such duplication cases. Signed-off-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/btf.c | 51 +++++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 1b8d8cdd3575..87e3020ac1bc 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1602,16 +1602,12 @@ static bool btf_equal_int(struct btf_type *t1, struct btf_type *t2) /* Calculate type signature hash of ENUM. */ static __u32 btf_hash_enum(struct btf_type *t) { - struct btf_enum *member = (struct btf_enum *)(t + 1); - __u32 vlen = BTF_INFO_VLEN(t->info); - __u32 h = btf_hash_common(t); - int i; + __u32 h; - for (i = 0; i < vlen; i++) { - h = hash_combine(h, member->name_off); - h = hash_combine(h, member->val); - member++; - } + /* don't hash vlen and enum members to support enum fwd resolving */ + h = hash_combine(0, t->name_off); + h = hash_combine(h, t->info & ~0xffff); + h = hash_combine(h, t->size); return h; } @@ -1637,6 +1633,22 @@ static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2) return true; } +static inline bool btf_is_enum_fwd(struct btf_type *t) +{ + return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM && + BTF_INFO_VLEN(t->info) == 0; +} + +static bool btf_compat_enum(struct btf_type *t1, struct btf_type *t2) +{ + if (!btf_is_enum_fwd(t1) && !btf_is_enum_fwd(t2)) + return btf_equal_enum(t1, t2); + /* ignore vlen when comparing */ + return t1->name_off == t2->name_off && + (t1->info & ~0xffff) == (t2->info & ~0xffff) && + t1->size == t2->size; +} + /* * Calculate type signature hash of STRUCT/UNION, ignoring referenced type IDs, * as referenced type IDs equivalence is established separately during type @@ -1860,6 +1872,17 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id) new_id = cand_node->type_id; break; } + if (d->opts.dont_resolve_fwds) + continue; + if (btf_compat_enum(t, cand)) { + if (btf_is_enum_fwd(t)) { + /* resolve fwd to full enum */ + new_id = cand_node->type_id; + break; + } + /* resolve canonical enum fwd to full enum */ + d->map[cand_node->type_id] = type_id; + } } break; @@ -2084,15 +2107,15 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id, return fwd_kind == real_kind; } - if (cand_type->info != canon_type->info) - return 0; - switch (cand_kind) { case BTF_KIND_INT: return btf_equal_int(cand_type, canon_type); case BTF_KIND_ENUM: - return btf_equal_enum(cand_type, canon_type); + if (d->opts.dont_resolve_fwds) + return btf_equal_enum(cand_type, canon_type); + else + return btf_compat_enum(cand_type, canon_type); case BTF_KIND_FWD: return btf_equal_common(cand_type, canon_type); @@ -2103,6 +2126,8 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id, case BTF_KIND_PTR: case BTF_KIND_TYPEDEF: case BTF_KIND_FUNC: + if (cand_type->info != canon_type->info) + return 0; return btf_dedup_is_equiv(d, cand_type->type, canon_type->type); case BTF_KIND_ARRAY: { From 8fd7a61aa556ad518e897f58264a2e67f9c527f5 Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Sun, 10 Mar 2019 17:44:10 -0700 Subject: [PATCH 060/468] selftests/bpf: add fwd enum resolution test for btf_dedup This patch adds test verifying new btf_dedup logic of resolving forward-declared enums. Signed-off-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_btf.c | 44 ++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c index 38797aa627a7..23e3b314ca60 100644 --- a/tools/testing/selftests/bpf/test_btf.c +++ b/tools/testing/selftests/bpf/test_btf.c @@ -5874,6 +5874,50 @@ const struct btf_dedup_test dedup_tests[] = { .dont_resolve_fwds = false, }, }, +{ + .descr = "dedup: enum fwd resolution", + .input = { + .raw_types = { + /* [1] fwd enum 'e1' before full enum */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 0), 4), + /* [2] full enum 'e1' after fwd */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(2), 123), + /* [3] full enum 'e2' before fwd */ + BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(4), 456), + /* [4] fwd enum 'e2' after full enum */ + BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 0), 4), + /* [5] incompatible fwd enum with different size */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 0), 1), + /* [6] incompatible full enum with different value */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(2), 321), + BTF_END_RAW, + }, + BTF_STR_SEC("\0e1\0e1_val\0e2\0e2_val"), + }, + .expect = { + .raw_types = { + /* [1] full enum 'e1' */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(2), 123), + /* [2] full enum 'e2' */ + BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(4), 456), + /* [3] incompatible fwd enum with different size */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 0), 1), + /* [4] incompatible full enum with different value */ + BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_ENUM, 0, 1), 4), + BTF_ENUM_ENC(NAME_NTH(2), 321), + BTF_END_RAW, + }, + BTF_STR_SEC("\0e1\0e1_val\0e2\0e2_val"), + }, + .opts = { + .dont_resolve_fwds = false, + }, +}, }; From 62369db2df8d1edfa040878203b446e023a16802 Mon Sep 17 00:00:00 2001 From: Quentin Monnet Date: Thu, 14 Mar 2019 12:38:39 +0000 Subject: [PATCH 061/468] bpf: fix documentation for eBPF helpers Another round of minor fixes for the documentation of the BPF helpers located in the UAPI bpf.h header file. Changes include: - Moving around description of some helpers, to keep the descriptions in the same order as helpers are declared (bpf_map_push_elem(), leftover from commit 90b1023f68c7 ("bpf: fix documentation for eBPF helpers"), bpf_rc_keydown(), and bpf_skb_ancestor_cgroup_id()). - Fixing typos ("contex" -> "context"). - Harmonising return types ("void* " -> "void *", "uint64_t" -> "u64"). - Addition of the "bpf_" prefix to bpf_get_storage(). - Light additions of RST markup on some keywords. - Empty line deletion between description and return value for bpf_tcp_sock(). - Edit for the description for bpf_skb_ecn_set_ce() (capital letters, acronym expansion, no effect if ECT not set, more details on return value). Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 128 ++++++++++++++++++++------------------- 1 file changed, 65 insertions(+), 63 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 983b25cb608d..4465d00d3493 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -502,16 +502,6 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * - * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) - * Description - * Push an element *value* in *map*. *flags* is one of: - * - * **BPF_EXIST** - * If the queue/stack is full, the oldest element is removed to - * make room for this. - * Return - * 0 on success, or a negative error in case of failure. - * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -1435,14 +1425,14 @@ union bpf_attr { * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts - * *skb*, but gets socket from **struct bpf_sock_addr** contex. + * *skb*, but gets socket from **struct bpf_sock_addr** context. * Return * A 8-byte long non-decreasing number. * * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts - * *skb*, but gets socket from **struct bpf_sock_ops** contex. + * *skb*, but gets socket from **struct bpf_sock_ops** context. * Return * A 8-byte long non-decreasing number. * @@ -2098,6 +2088,25 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_rc_repeat(void *ctx) + * Description + * This helper is used in programs implementing IR decoding, to + * report a successfully decoded repeat key message. This delays + * the generation of a key up event for previously generated + * key down event. + * + * Some IR protocols like NEC have a special IR message for + * repeating last button, for when a button is held down. + * + * The *ctx* should point to the lirc sample as passed into + * the program. + * + * This helper is only available is the kernel was compiled with + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to + * "**y**". + * Return + * 0 + * * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle) * Description * This helper is used in programs implementing IR decoding, to @@ -2124,26 +2133,7 @@ union bpf_attr { * Return * 0 * - * int bpf_rc_repeat(void *ctx) - * Description - * This helper is used in programs implementing IR decoding, to - * report a successfully decoded repeat key message. This delays - * the generation of a key up event for previously generated - * key down event. - * - * Some IR protocols like NEC have a special IR message for - * repeating last button, for when a button is held down. - * - * The *ctx* should point to the lirc sample as passed into - * the program. - * - * This helper is only available is the kernel was compiled with - * the **CONFIG_BPF_LIRC_MODE2** configuration option set to - * "**y**". - * Return - * 0 - * - * uint64_t bpf_skb_cgroup_id(struct sk_buff *skb) + * u64 bpf_skb_cgroup_id(struct sk_buff *skb) * Description * Return the cgroup v2 id of the socket associated with the *skb*. * This is roughly similar to the **bpf_get_cgroup_classid**\ () @@ -2159,30 +2149,12 @@ union bpf_attr { * Return * The id is returned or 0 in case the id could not be retrieved. * - * u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level) - * Description - * Return id of cgroup v2 that is ancestor of cgroup associated - * with the *skb* at the *ancestor_level*. The root cgroup is at - * *ancestor_level* zero and each step down the hierarchy - * increments the level. If *ancestor_level* == level of cgroup - * associated with *skb*, then return value will be same as that - * of **bpf_skb_cgroup_id**\ (). - * - * The helper is useful to implement policies based on cgroups - * that are upper in hierarchy than immediate cgroup associated - * with *skb*. - * - * The format of returned id and helper limitations are same as in - * **bpf_skb_cgroup_id**\ (). - * Return - * The id is returned or 0 in case the id could not be retrieved. - * * u64 bpf_get_current_cgroup_id(void) * Return * A 64-bit integer containing the current cgroup id based * on the cgroup within which the current task is running. * - * void* get_local_storage(void *map, u64 flags) + * void *bpf_get_local_storage(void *map, u64 flags) * Description * Get the pointer to the local storage area. * The type and the size of the local storage is defined @@ -2209,6 +2181,24 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level) + * Description + * Return id of cgroup v2 that is ancestor of cgroup associated + * with the *skb* at the *ancestor_level*. The root cgroup is at + * *ancestor_level* zero and each step down the hierarchy + * increments the level. If *ancestor_level* == level of cgroup + * associated with *skb*, then return value will be same as that + * of **bpf_skb_cgroup_id**\ (). + * + * The helper is useful to implement policies based on cgroups + * that are upper in hierarchy than immediate cgroup associated + * with *skb*. + * + * The format of returned id and helper limitations are same as in + * **bpf_skb_cgroup_id**\ (). + * Return + * The id is returned or 0 in case the id could not be retrieved. + * * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags) * Description * Look for TCP socket matching *tuple*, optionally in a child @@ -2289,6 +2279,16 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is + * removed to make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * * int bpf_map_pop_elem(struct bpf_map *map, void *value) * Description * Pop an element from *map*. @@ -2346,33 +2346,35 @@ union bpf_attr { * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk) * Description * This helper gets a **struct bpf_sock** pointer such - * that all the fields in bpf_sock can be accessed. + * that all the fields in this **bpf_sock** can be accessed. * Return - * A **struct bpf_sock** pointer on success, or NULL in + * A **struct bpf_sock** pointer on success, or **NULL** in * case of failure. * * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk) * Description * This helper gets a **struct bpf_tcp_sock** pointer from a * **struct bpf_sock** pointer. - * * Return - * A **struct bpf_tcp_sock** pointer on success, or NULL in + * A **struct bpf_tcp_sock** pointer on success, or **NULL** in * case of failure. * * int bpf_skb_ecn_set_ce(struct sk_buf *skb) - * Description - * Sets ECN of IP header to ce (congestion encountered) if - * current value is ect (ECN capable). Works with IPv6 and IPv4. - * Return - * 1 if set, 0 if not set. + * Description + * Set ECN (Explicit Congestion Notification) field of IP header + * to **CE** (Congestion Encountered) if current value is **ECT** + * (ECN Capable Transport). Otherwise, do nothing. Works with IPv6 + * and IPv4. + * Return + * 1 if the **CE** flag is set (either by the current helper call + * or because it was already present), 0 if it is not set. * * struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk) * Description - * Return a **struct bpf_sock** pointer in TCP_LISTEN state. - * bpf_sk_release() is unnecessary and not allowed. + * Return a **struct bpf_sock** pointer in **TCP_LISTEN** state. + * **bpf_sk_release**\ () is unnecessary and not allowed. * Return - * A **struct bpf_sock** pointer on success, or NULL in + * A **struct bpf_sock** pointer on success, or **NULL** in * case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ From 0eb0978528d47699edd091dc2c337952ad8da436 Mon Sep 17 00:00:00 2001 From: Quentin Monnet Date: Thu, 14 Mar 2019 12:38:40 +0000 Subject: [PATCH 062/468] bpf: add documentation for helpers bpf_spin_lock(), bpf_spin_unlock() Add documentation for the BPF spinlock-related helpers to the doc in bpf.h. I added the constraints and restrictions coming with the use of spinlocks for BPF: not all of it is directly related to the use of the helper, but I thought it would be nice for users to find them in the man page. This list of restrictions is nearly a verbatim copy of the list in Alexei's commit log for those helpers. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 55 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4465d00d3493..929c8e537a14 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2343,6 +2343,61 @@ union bpf_attr { * Return * 0 * + * int bpf_spin_lock(struct bpf_spin_lock *lock) + * Description + * Acquire a spinlock represented by the pointer *lock*, which is + * stored as part of a value of a map. Taking the lock allows to + * safely update the rest of the fields in that value. The + * spinlock can (and must) later be released with a call to + * **bpf_spin_unlock**\ (\ *lock*\ ). + * + * Spinlocks in BPF programs come with a number of restrictions + * and constraints: + * + * * **bpf_spin_lock** objects are only allowed inside maps of + * types **BPF_MAP_TYPE_HASH** and **BPF_MAP_TYPE_ARRAY** (this + * list could be extended in the future). + * * BTF description of the map is mandatory. + * * The BPF program can take ONE lock at a time, since taking two + * or more could cause dead locks. + * * Only one **struct bpf_spin_lock** is allowed per map element. + * * When the lock is taken, calls (either BPF to BPF or helpers) + * are not allowed. + * * The **BPF_LD_ABS** and **BPF_LD_IND** instructions are not + * allowed inside a spinlock-ed region. + * * The BPF program MUST call **bpf_spin_unlock**\ () to release + * the lock, on all execution paths, before it returns. + * * The BPF program can access **struct bpf_spin_lock** only via + * the **bpf_spin_lock**\ () and **bpf_spin_unlock**\ () + * helpers. Loading or storing data into the **struct + * bpf_spin_lock** *lock*\ **;** field of a map is not allowed. + * * To use the **bpf_spin_lock**\ () helper, the BTF description + * of the map value must be a struct and have **struct + * bpf_spin_lock** *anyname*\ **;** field at the top level. + * Nested lock inside another struct is not allowed. + * * The **struct bpf_spin_lock** *lock* field in a map value must + * be aligned on a multiple of 4 bytes in that value. + * * Syscall with command **BPF_MAP_LOOKUP_ELEM** does not copy + * the **bpf_spin_lock** field to user space. + * * Syscall with command **BPF_MAP_UPDATE_ELEM**, or update from + * a BPF program, do not update the **bpf_spin_lock** field. + * * **bpf_spin_lock** cannot be on the stack or inside a + * networking packet (it can only be inside of a map values). + * * **bpf_spin_lock** is available to root only. + * * Tracing programs and socket filter programs cannot use + * **bpf_spin_lock**\ () due to insufficient preemption checks + * (but this may change in the future). + * * **bpf_spin_lock** is not allowed in inner maps of map-in-map. + * Return + * 0 + * + * int bpf_spin_unlock(struct bpf_spin_lock *lock) + * Description + * Release the *lock* previously locked by a call to + * **bpf_spin_lock**\ (\ *lock*\ ). + * Return + * 0 + * * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk) * Description * This helper gets a **struct bpf_sock** pointer such From ea6eced00e4b28821804eee24e80d9528f48972a Mon Sep 17 00:00:00 2001 From: Quentin Monnet Date: Thu, 14 Mar 2019 12:38:41 +0000 Subject: [PATCH 063/468] tools: bpf: synchronise BPF UAPI header with tools Synchronise the bpf.h header under tools, to report the latest fixes and additions to the documentation for the BPF helpers. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski Signed-off-by: Alexei Starovoitov --- tools/include/uapi/linux/bpf.h | 183 +++++++++++++++++++++------------ 1 file changed, 120 insertions(+), 63 deletions(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 983b25cb608d..929c8e537a14 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -502,16 +502,6 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * - * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) - * Description - * Push an element *value* in *map*. *flags* is one of: - * - * **BPF_EXIST** - * If the queue/stack is full, the oldest element is removed to - * make room for this. - * Return - * 0 on success, or a negative error in case of failure. - * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -1435,14 +1425,14 @@ union bpf_attr { * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts - * *skb*, but gets socket from **struct bpf_sock_addr** contex. + * *skb*, but gets socket from **struct bpf_sock_addr** context. * Return * A 8-byte long non-decreasing number. * * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts - * *skb*, but gets socket from **struct bpf_sock_ops** contex. + * *skb*, but gets socket from **struct bpf_sock_ops** context. * Return * A 8-byte long non-decreasing number. * @@ -2098,6 +2088,25 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_rc_repeat(void *ctx) + * Description + * This helper is used in programs implementing IR decoding, to + * report a successfully decoded repeat key message. This delays + * the generation of a key up event for previously generated + * key down event. + * + * Some IR protocols like NEC have a special IR message for + * repeating last button, for when a button is held down. + * + * The *ctx* should point to the lirc sample as passed into + * the program. + * + * This helper is only available is the kernel was compiled with + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to + * "**y**". + * Return + * 0 + * * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle) * Description * This helper is used in programs implementing IR decoding, to @@ -2124,26 +2133,7 @@ union bpf_attr { * Return * 0 * - * int bpf_rc_repeat(void *ctx) - * Description - * This helper is used in programs implementing IR decoding, to - * report a successfully decoded repeat key message. This delays - * the generation of a key up event for previously generated - * key down event. - * - * Some IR protocols like NEC have a special IR message for - * repeating last button, for when a button is held down. - * - * The *ctx* should point to the lirc sample as passed into - * the program. - * - * This helper is only available is the kernel was compiled with - * the **CONFIG_BPF_LIRC_MODE2** configuration option set to - * "**y**". - * Return - * 0 - * - * uint64_t bpf_skb_cgroup_id(struct sk_buff *skb) + * u64 bpf_skb_cgroup_id(struct sk_buff *skb) * Description * Return the cgroup v2 id of the socket associated with the *skb*. * This is roughly similar to the **bpf_get_cgroup_classid**\ () @@ -2159,30 +2149,12 @@ union bpf_attr { * Return * The id is returned or 0 in case the id could not be retrieved. * - * u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level) - * Description - * Return id of cgroup v2 that is ancestor of cgroup associated - * with the *skb* at the *ancestor_level*. The root cgroup is at - * *ancestor_level* zero and each step down the hierarchy - * increments the level. If *ancestor_level* == level of cgroup - * associated with *skb*, then return value will be same as that - * of **bpf_skb_cgroup_id**\ (). - * - * The helper is useful to implement policies based on cgroups - * that are upper in hierarchy than immediate cgroup associated - * with *skb*. - * - * The format of returned id and helper limitations are same as in - * **bpf_skb_cgroup_id**\ (). - * Return - * The id is returned or 0 in case the id could not be retrieved. - * * u64 bpf_get_current_cgroup_id(void) * Return * A 64-bit integer containing the current cgroup id based * on the cgroup within which the current task is running. * - * void* get_local_storage(void *map, u64 flags) + * void *bpf_get_local_storage(void *map, u64 flags) * Description * Get the pointer to the local storage area. * The type and the size of the local storage is defined @@ -2209,6 +2181,24 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * u64 bpf_skb_ancestor_cgroup_id(struct sk_buff *skb, int ancestor_level) + * Description + * Return id of cgroup v2 that is ancestor of cgroup associated + * with the *skb* at the *ancestor_level*. The root cgroup is at + * *ancestor_level* zero and each step down the hierarchy + * increments the level. If *ancestor_level* == level of cgroup + * associated with *skb*, then return value will be same as that + * of **bpf_skb_cgroup_id**\ (). + * + * The helper is useful to implement policies based on cgroups + * that are upper in hierarchy than immediate cgroup associated + * with *skb*. + * + * The format of returned id and helper limitations are same as in + * **bpf_skb_cgroup_id**\ (). + * Return + * The id is returned or 0 in case the id could not be retrieved. + * * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags) * Description * Look for TCP socket matching *tuple*, optionally in a child @@ -2289,6 +2279,16 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is + * removed to make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * * int bpf_map_pop_elem(struct bpf_map *map, void *value) * Description * Pop an element from *map*. @@ -2343,36 +2343,93 @@ union bpf_attr { * Return * 0 * + * int bpf_spin_lock(struct bpf_spin_lock *lock) + * Description + * Acquire a spinlock represented by the pointer *lock*, which is + * stored as part of a value of a map. Taking the lock allows to + * safely update the rest of the fields in that value. The + * spinlock can (and must) later be released with a call to + * **bpf_spin_unlock**\ (\ *lock*\ ). + * + * Spinlocks in BPF programs come with a number of restrictions + * and constraints: + * + * * **bpf_spin_lock** objects are only allowed inside maps of + * types **BPF_MAP_TYPE_HASH** and **BPF_MAP_TYPE_ARRAY** (this + * list could be extended in the future). + * * BTF description of the map is mandatory. + * * The BPF program can take ONE lock at a time, since taking two + * or more could cause dead locks. + * * Only one **struct bpf_spin_lock** is allowed per map element. + * * When the lock is taken, calls (either BPF to BPF or helpers) + * are not allowed. + * * The **BPF_LD_ABS** and **BPF_LD_IND** instructions are not + * allowed inside a spinlock-ed region. + * * The BPF program MUST call **bpf_spin_unlock**\ () to release + * the lock, on all execution paths, before it returns. + * * The BPF program can access **struct bpf_spin_lock** only via + * the **bpf_spin_lock**\ () and **bpf_spin_unlock**\ () + * helpers. Loading or storing data into the **struct + * bpf_spin_lock** *lock*\ **;** field of a map is not allowed. + * * To use the **bpf_spin_lock**\ () helper, the BTF description + * of the map value must be a struct and have **struct + * bpf_spin_lock** *anyname*\ **;** field at the top level. + * Nested lock inside another struct is not allowed. + * * The **struct bpf_spin_lock** *lock* field in a map value must + * be aligned on a multiple of 4 bytes in that value. + * * Syscall with command **BPF_MAP_LOOKUP_ELEM** does not copy + * the **bpf_spin_lock** field to user space. + * * Syscall with command **BPF_MAP_UPDATE_ELEM**, or update from + * a BPF program, do not update the **bpf_spin_lock** field. + * * **bpf_spin_lock** cannot be on the stack or inside a + * networking packet (it can only be inside of a map values). + * * **bpf_spin_lock** is available to root only. + * * Tracing programs and socket filter programs cannot use + * **bpf_spin_lock**\ () due to insufficient preemption checks + * (but this may change in the future). + * * **bpf_spin_lock** is not allowed in inner maps of map-in-map. + * Return + * 0 + * + * int bpf_spin_unlock(struct bpf_spin_lock *lock) + * Description + * Release the *lock* previously locked by a call to + * **bpf_spin_lock**\ (\ *lock*\ ). + * Return + * 0 + * * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk) * Description * This helper gets a **struct bpf_sock** pointer such - * that all the fields in bpf_sock can be accessed. + * that all the fields in this **bpf_sock** can be accessed. * Return - * A **struct bpf_sock** pointer on success, or NULL in + * A **struct bpf_sock** pointer on success, or **NULL** in * case of failure. * * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk) * Description * This helper gets a **struct bpf_tcp_sock** pointer from a * **struct bpf_sock** pointer. - * * Return - * A **struct bpf_tcp_sock** pointer on success, or NULL in + * A **struct bpf_tcp_sock** pointer on success, or **NULL** in * case of failure. * * int bpf_skb_ecn_set_ce(struct sk_buf *skb) - * Description - * Sets ECN of IP header to ce (congestion encountered) if - * current value is ect (ECN capable). Works with IPv6 and IPv4. - * Return - * 1 if set, 0 if not set. + * Description + * Set ECN (Explicit Congestion Notification) field of IP header + * to **CE** (Congestion Encountered) if current value is **ECT** + * (ECN Capable Transport). Otherwise, do nothing. Works with IPv6 + * and IPv4. + * Return + * 1 if the **CE** flag is set (either by the current helper call + * or because it was already present), 0 if it is not set. * * struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk) * Description - * Return a **struct bpf_sock** pointer in TCP_LISTEN state. - * bpf_sk_release() is unnecessary and not allowed. + * Return a **struct bpf_sock** pointer in **TCP_LISTEN** state. + * **bpf_sk_release**\ () is unnecessary and not allowed. * Return - * A **struct bpf_sock** pointer on success, or NULL in + * A **struct bpf_sock** pointer on success, or **NULL** in * case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ From 9804501fa1228048857910a6bf23e085aade37cc Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 14 Mar 2019 13:47:59 +0800 Subject: [PATCH 064/468] appletalk: Fix potential NULL pointer dereference in unregister_snap_client register_snap_client may return NULL, all the callers check it, but only print a warning. This will result in NULL pointer dereference in unregister_snap_client and other places. It has always been used like this since v2.6 Reported-by: Dan Carpenter Signed-off-by: YueHaibing Signed-off-by: David S. Miller --- include/linux/atalk.h | 2 +- net/appletalk/aarp.c | 15 ++++++++++++--- net/appletalk/ddp.c | 20 ++++++++++++-------- 3 files changed, 25 insertions(+), 12 deletions(-) diff --git a/include/linux/atalk.h b/include/linux/atalk.h index d5cfc0b15b76..f6034ba774be 100644 --- a/include/linux/atalk.h +++ b/include/linux/atalk.h @@ -108,7 +108,7 @@ static __inline__ struct elapaarp *aarp_hdr(struct sk_buff *skb) #define AARP_RESOLVE_TIME (10 * HZ) extern struct datalink_proto *ddp_dl, *aarp_dl; -extern void aarp_proto_init(void); +extern int aarp_proto_init(void); /* Inter module exports */ diff --git a/net/appletalk/aarp.c b/net/appletalk/aarp.c index 49a16cee2aae..420a98bf79b5 100644 --- a/net/appletalk/aarp.c +++ b/net/appletalk/aarp.c @@ -879,15 +879,24 @@ static struct notifier_block aarp_notifier = { static unsigned char aarp_snap_id[] = { 0x00, 0x00, 0x00, 0x80, 0xF3 }; -void __init aarp_proto_init(void) +int __init aarp_proto_init(void) { + int rc; + aarp_dl = register_snap_client(aarp_snap_id, aarp_rcv); - if (!aarp_dl) + if (!aarp_dl) { printk(KERN_CRIT "Unable to register AARP with SNAP.\n"); + return -ENOMEM; + } timer_setup(&aarp_timer, aarp_expire_timeout, 0); aarp_timer.expires = jiffies + sysctl_aarp_expiry_time; add_timer(&aarp_timer); - register_netdevice_notifier(&aarp_notifier); + rc = register_netdevice_notifier(&aarp_notifier); + if (rc) { + del_timer_sync(&aarp_timer); + unregister_snap_client(aarp_dl); + } + return rc; } /* Remove the AARP entries associated with a device. */ diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index 795fbc6c06aa..709d2542f729 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1904,9 +1904,6 @@ static unsigned char ddp_snap_id[] = { 0x08, 0x00, 0x07, 0x80, 0x9B }; EXPORT_SYMBOL(atrtr_get_dev); EXPORT_SYMBOL(atalk_find_dev_addr); -static const char atalk_err_snap[] __initconst = - KERN_CRIT "Unable to register DDP with SNAP.\n"; - /* Called by proto.c on kernel start up */ static int __init atalk_init(void) { @@ -1921,17 +1918,22 @@ static int __init atalk_init(void) goto out_proto; ddp_dl = register_snap_client(ddp_snap_id, atalk_rcv); - if (!ddp_dl) - printk(atalk_err_snap); + if (!ddp_dl) { + pr_crit("Unable to register DDP with SNAP.\n"); + goto out_sock; + } dev_add_pack(<alk_packet_type); dev_add_pack(&ppptalk_packet_type); rc = register_netdevice_notifier(&ddp_notifier); if (rc) - goto out_sock; + goto out_snap; + + rc = aarp_proto_init(); + if (rc) + goto out_dev; - aarp_proto_init(); rc = atalk_proc_init(); if (rc) goto out_aarp; @@ -1945,11 +1947,13 @@ static int __init atalk_init(void) atalk_proc_exit(); out_aarp: aarp_cleanup_module(); +out_dev: unregister_netdevice_notifier(&ddp_notifier); -out_sock: +out_snap: dev_remove_pack(&ppptalk_packet_type); dev_remove_pack(<alk_packet_type); unregister_snap_client(ddp_dl); +out_sock: sock_unregister(PF_APPLETALK); out_proto: proto_unregister(&ddp_proto); From 80acbed9f8fca1db3fbe915540b756f048aa0fd7 Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Thu, 14 Mar 2019 21:43:19 +0200 Subject: [PATCH 065/468] net: stmmac: don't set own bit too early for jumbo frames Commit 0e80bdc9a72d ("stmmac: first frame prep at the end of xmit routine") overlooked jumbo frames when re-ordering the code, and as a result the own bit was not getting set anymore for the first jumbo frame descriptor. Commit 487e2e22ab79 ("net: stmmac: Set OWN bit for jumbo frames") tried to fix this, but now the bit is getting set too early and the DMA may start while we are still setting up the remaining descriptors. And with the chain mode the own bit remains still unset. Fix by setting the own bit at the end of xmit also with jumbo frames. Fixes: 0e80bdc9a72d ("stmmac: first frame prep at the end of xmit routine") Fixes: 487e2e22ab79 ("net: stmmac: Set OWN bit for jumbo frames") Signed-off-by: Aaro Koskinen Acked-by: Jose Abreu Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 4 ++-- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++++++++------ 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c index d8c5bc412219..bc83ced94e1b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c +++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c @@ -59,7 +59,7 @@ static int jumbo_frm(void *p, struct sk_buff *skb, int csum) desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB); stmmac_prepare_tx_desc(priv, desc, 1, bmax, csum, - STMMAC_RING_MODE, 1, false, skb->len); + STMMAC_RING_MODE, 0, false, skb->len); tx_q->tx_skbuff[entry] = NULL; entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE); @@ -91,7 +91,7 @@ static int jumbo_frm(void *p, struct sk_buff *skb, int csum) tx_q->tx_skbuff_dma[entry].is_jumbo = true; desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB); stmmac_prepare_tx_desc(priv, desc, 1, nopaged_len, csum, - STMMAC_RING_MODE, 1, true, skb->len); + STMMAC_RING_MODE, 0, true, skb->len); } tx_q->cur_tx = entry; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 97c5e1aad88f..6a2e1031a62a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3216,14 +3216,16 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) stmmac_prepare_tx_desc(priv, first, 1, nopaged_len, csum_insertion, priv->mode, 1, last_segment, skb->len); - - /* The own bit must be the latest setting done when prepare the - * descriptor and then barrier is needed to make sure that - * all is coherent before granting the DMA engine. - */ - wmb(); + } else { + stmmac_set_tx_owner(priv, first); } + /* The own bit must be the latest setting done when prepare the + * descriptor and then barrier is needed to make sure that + * all is coherent before granting the DMA engine. + */ + wmb(); + netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len); stmmac_enable_dma_transmission(priv, priv->ioaddr); From 58f2ce6f61615dfd8dd3cc01c9e5bb54ed35637e Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Thu, 14 Mar 2019 21:43:20 +0200 Subject: [PATCH 066/468] net: stmmac: fix jumbo frame sending with non-linear skbs When sending non-linear skbs with jumbo frames, we set up the non-paged data and mark that as a last segment, although the paged fragments are also prepared. This will stall the TX queue and trigger a watchdog warning (a simple reproducer is to run an iperf client mode TCP test with a large MTU - networking fails instantly). Fix by checking if the skb is non-linear. Signed-off-by: Aaro Koskinen Acked-by: Jose Abreu Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c index bc83ced94e1b..f936166d8910 100644 --- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c +++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c @@ -79,7 +79,8 @@ static int jumbo_frm(void *p, struct sk_buff *skb, int csum) desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB); stmmac_prepare_tx_desc(priv, desc, 0, len, csum, - STMMAC_RING_MODE, 1, true, skb->len); + STMMAC_RING_MODE, 1, !skb_is_nonlinear(skb), + skb->len); } else { des2 = dma_map_single(priv->device, skb->data, nopaged_len, DMA_TO_DEVICE); @@ -91,7 +92,8 @@ static int jumbo_frm(void *p, struct sk_buff *skb, int csum) tx_q->tx_skbuff_dma[entry].is_jumbo = true; desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB); stmmac_prepare_tx_desc(priv, desc, 1, nopaged_len, csum, - STMMAC_RING_MODE, 0, true, skb->len); + STMMAC_RING_MODE, 0, !skb_is_nonlinear(skb), + skb->len); } tx_q->cur_tx = entry; From 5bf7295fe34a5251b1d241b9736af4697b590670 Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Thu, 14 Mar 2019 15:31:40 -0500 Subject: [PATCH 067/468] qlcnic: Avoid potential NULL pointer dereference netdev_alloc_skb can fail and return a NULL pointer which is dereferenced without a check. The patch avoids such a scenario. Signed-off-by: Aditya Pakki Signed-off-by: David S. Miller --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c index 3b0adda7cc9c..a4cd6f2cfb86 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c @@ -1048,6 +1048,8 @@ int qlcnic_do_lb_test(struct qlcnic_adapter *adapter, u8 mode) for (i = 0; i < QLCNIC_NUM_ILB_PKT; i++) { skb = netdev_alloc_skb(adapter->netdev, QLCNIC_ILB_PKT_SIZE); + if (!skb) + break; qlcnic_create_loopback_buff(skb->data, adapter->mac_addr); skb_put(skb, QLCNIC_ILB_PKT_SIZE); adapter->ahw->diag_cnt = 0; From eab2fc822af38f31fd5f4e731b5d10b94904d919 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Thu, 14 Mar 2019 23:08:22 +0100 Subject: [PATCH 068/468] sch_cake: Interpret fwmark parameter as a bitmask MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We initially interpreted the fwmark parameter as a flag that simply turned on the feature, using the whole skb->mark field as the index into the CAKE tin_order array. However, it is quite common for different applications to use different parts of the mask field for their own purposes, each using a different mask. Support this use of subsets of the mark by interpreting the TCA_CAKE_FWMARK parameter as a bitmask to apply to the fwmark field when reading it. The result will be right-shifted by the number of unset lower bits of the mask before looking up the tin. In the original commit message we also failed to credit Felix Resch with originally suggesting the fwmark feature back in 2017; so the Suggested-By in this commit covers the whole fwmark feature. Fixes: 0b5c7efdfc6e ("sch_cake: Permit use of connmarks as tin classifiers") Suggested-by: Felix Resch Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller --- net/sched/sch_cake.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 1d2a12132abc..acc9b9da985f 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -211,6 +211,9 @@ struct cake_sched_data { u8 ack_filter; u8 atm_mode; + u32 fwmark_mask; + u16 fwmark_shft; + /* time_next = time_this + ((len * rate_ns) >> rate_shft) */ u16 rate_shft; ktime_t time_next_packet; @@ -258,8 +261,7 @@ enum { CAKE_FLAG_AUTORATE_INGRESS = BIT(1), CAKE_FLAG_INGRESS = BIT(2), CAKE_FLAG_WASH = BIT(3), - CAKE_FLAG_SPLIT_GSO = BIT(4), - CAKE_FLAG_FWMARK = BIT(5) + CAKE_FLAG_SPLIT_GSO = BIT(4) }; /* COBALT operates the Codel and BLUE algorithms in parallel, in order to @@ -1543,7 +1545,7 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, struct sk_buff *skb) { struct cake_sched_data *q = qdisc_priv(sch); - u32 tin; + u32 tin, mark; u8 dscp; /* Tin selection: Default to diffserv-based selection, allow overriding @@ -1551,14 +1553,13 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, */ dscp = cake_handle_diffserv(skb, q->rate_flags & CAKE_FLAG_WASH); + mark = (skb->mark & q->fwmark_mask) >> q->fwmark_shft; if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) tin = 0; - else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ - skb->mark && - skb->mark <= q->tin_cnt) - tin = q->tin_order[skb->mark - 1]; + else if (mark && mark <= q->tin_cnt) + tin = q->tin_order[mark - 1]; else if (TC_H_MAJ(skb->priority) == sch->handle && TC_H_MIN(skb->priority) > 0 && @@ -2172,6 +2173,7 @@ static const struct nla_policy cake_policy[TCA_CAKE_MAX + 1] = { [TCA_CAKE_MPU] = { .type = NLA_U32 }, [TCA_CAKE_INGRESS] = { .type = NLA_U32 }, [TCA_CAKE_ACK_FILTER] = { .type = NLA_U32 }, + [TCA_CAKE_FWMARK] = { .type = NLA_U32 }, }; static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu, @@ -2619,10 +2621,8 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt, } if (tb[TCA_CAKE_FWMARK]) { - if (!!nla_get_u32(tb[TCA_CAKE_FWMARK])) - q->rate_flags |= CAKE_FLAG_FWMARK; - else - q->rate_flags &= ~CAKE_FLAG_FWMARK; + q->fwmark_mask = nla_get_u32(tb[TCA_CAKE_FWMARK]); + q->fwmark_shft = q->fwmark_mask ? __ffs(q->fwmark_mask) : 0; } if (q->tins) { @@ -2784,8 +2784,7 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb) !!(q->rate_flags & CAKE_FLAG_SPLIT_GSO))) goto nla_put_failure; - if (nla_put_u32(skb, TCA_CAKE_FWMARK, - !!(q->rate_flags & CAKE_FLAG_FWMARK))) + if (nla_put_u32(skb, TCA_CAKE_FWMARK, q->fwmark_mask)) goto nla_put_failure; return nla_nest_end(skb, opts); From 3d4c3cec0909dc9c40db82a74aae0cdf3f5ad138 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 14 Mar 2019 23:47:13 +0000 Subject: [PATCH 069/468] drivers: net: atp: fix various indentation issues There is a statement that is indented incorrectly; replace spaces with a tab. Signed-off-by: Colin Ian King Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/atp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/atp.c b/drivers/net/ethernet/realtek/atp.c index cfb67b746595..58e0ca9093d3 100644 --- a/drivers/net/ethernet/realtek/atp.c +++ b/drivers/net/ethernet/realtek/atp.c @@ -482,7 +482,7 @@ static void hardware_init(struct net_device *dev) write_reg_high(ioaddr, IMR, ISRh_RxErr); lp->tx_unit_busy = 0; - lp->pac_cnt_in_tx_buf = 0; + lp->pac_cnt_in_tx_buf = 0; lp->saved_tx_size = 0; } From 68cfe9a286f3ee2371de00ab666b4949ff285196 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 14 Mar 2019 23:56:35 +0000 Subject: [PATCH 070/468] net: sis900: fix indentation issues, remove some spaces There are several statements that contain extra spacing in the indentation; clean this up by removing spaces. Also add { } braces on if statement to keep to kernel coding style. Signed-off-by: Colin Ian King Signed-off-by: David S. Miller --- drivers/net/ethernet/sis/sis900.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/sis/sis900.c b/drivers/net/ethernet/sis/sis900.c index 6073387511f8..67f9bb6e941b 100644 --- a/drivers/net/ethernet/sis/sis900.c +++ b/drivers/net/ethernet/sis/sis900.c @@ -730,10 +730,10 @@ static u16 sis900_default_phy(struct net_device * net_dev) status = mdio_read(net_dev, phy->phy_addr, MII_STATUS); /* Link ON & Not select default PHY & not ghost PHY */ - if ((status & MII_STAT_LINK) && !default_phy && - (phy->phy_types != UNKNOWN)) - default_phy = phy; - else { + if ((status & MII_STAT_LINK) && !default_phy && + (phy->phy_types != UNKNOWN)) { + default_phy = phy; + } else { status = mdio_read(net_dev, phy->phy_addr, MII_CONTROL); mdio_write(net_dev, phy->phy_addr, MII_CONTROL, status | MII_CNTL_AUTO | MII_CNTL_ISOLATE); @@ -741,7 +741,7 @@ static u16 sis900_default_phy(struct net_device * net_dev) phy_home = phy; else if(phy->phy_types == LAN) phy_lan = phy; - } + } } if (!default_phy && phy_home) From 228cd2dba27cee9956c1af97e6445be056881e41 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Thu, 14 Mar 2019 23:12:06 -0500 Subject: [PATCH 071/468] net: strparser: fix a missing check for create_singlethread_workqueue In case create_singlethread_workqueue fails, the check returns an error to callers to avoid potential NULL pointer dereferences. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/strparser/strparser.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index da1a676860ca..860dcfb95ee4 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -550,6 +550,8 @@ EXPORT_SYMBOL_GPL(strp_check_rcv); static int __init strp_mod_init(void) { strp_wq = create_singlethread_workqueue("kstrp"); + if (unlikely(!strp_wq)) + return -ENOMEM; return 0; } From 8a3c245c031944f2176118270e7bc5d4fd4a1075 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Thu, 14 Mar 2019 10:45:23 -0300 Subject: [PATCH 072/468] net: add documentation to socket.c Adds missing sphinx documentation to the socket.c's functions. Also fixes some whitespaces. I also changed the style of older documentation as an effort to have an uniform documentation style. Signed-off-by: Pedro Tammela Signed-off-by: David S. Miller --- include/linux/net.h | 6 + include/linux/socket.h | 12 +- net/socket.c | 277 ++++++++++++++++++++++++++++++++++++++--- 3 files changed, 271 insertions(+), 24 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 651fca72286c..c606c72311d0 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -83,6 +83,12 @@ enum sock_type { #endif /* ARCH_HAS_SOCKET_TYPES */ +/** + * enum sock_shutdown_cmd - Shutdown types + * @SHUT_RD: shutdown receptions + * @SHUT_WR: shutdown transmissions + * @SHUT_RDWR: shutdown receptions/transmissions + */ enum sock_shutdown_cmd { SHUT_RD, SHUT_WR, diff --git a/include/linux/socket.h b/include/linux/socket.h index 6016daeecee4..b57cd8bf96e2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -26,7 +26,7 @@ typedef __kernel_sa_family_t sa_family_t; /* * 1003.1g requires sa_family_t and that sa_data is char. */ - + struct sockaddr { sa_family_t sa_family; /* address family, AF_xxx */ char sa_data[14]; /* 14 bytes of protocol address */ @@ -44,7 +44,7 @@ struct linger { * system, not 4.3. Thus msg_accrights(len) are now missing. They * belong in an obscure libc emulation or the bin. */ - + struct msghdr { void *msg_name; /* ptr to socket address structure */ int msg_namelen; /* size of socket address structure */ @@ -54,7 +54,7 @@ struct msghdr { unsigned int msg_flags; /* flags on received message */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ }; - + struct user_msghdr { void __user *msg_name; /* ptr to socket address structure */ int msg_namelen; /* size of socket address structure */ @@ -122,7 +122,7 @@ struct cmsghdr { * inside range, given by msg->msg_controllen before using * ancillary object DATA. --ANK (980731) */ - + static inline struct cmsghdr * __cmsg_nxthdr(void *__ctl, __kernel_size_t __size, struct cmsghdr *__cmsg) { @@ -264,10 +264,10 @@ struct ucred { /* Maximum queue length specifiable by listen. */ #define SOMAXCONN 128 -/* Flags we can use with send/ and recv. +/* Flags we can use with send/ and recv. Added those for 1003.1g not all are supported yet */ - + #define MSG_OOB 1 #define MSG_PEEK 2 #define MSG_DONTROUTE 4 diff --git a/net/socket.c b/net/socket.c index 3c176a12fe48..8255f5bda0aa 100644 --- a/net/socket.c +++ b/net/socket.c @@ -384,6 +384,18 @@ static struct file_system_type sock_fs_type = { * but we take care of internal coherence yet. */ +/** + * sock_alloc_file - Bind a &socket to a &file + * @sock: socket + * @flags: file status flags + * @dname: protocol name + * + * Returns the &file bound with @sock, implicitly storing it + * in sock->file. If dname is %NULL, sets to "". + * On failure the return is a ERR pointer (see linux/err.h). + * This function uses GFP_KERNEL internally. + */ + struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname) { struct file *file; @@ -424,6 +436,14 @@ static int sock_map_fd(struct socket *sock, int flags) return PTR_ERR(newfile); } +/** + * sock_from_file - Return the &socket bounded to @file. + * @file: file + * @err: pointer to an error code return + * + * On failure returns %NULL and assigns -ENOTSOCK to @err. + */ + struct socket *sock_from_file(struct file *file, int *err) { if (file->f_op == &socket_file_ops) @@ -532,11 +552,11 @@ static const struct inode_operations sockfs_inode_ops = { }; /** - * sock_alloc - allocate a socket + * sock_alloc - allocate a socket * * Allocate a new inode and socket object. The two are bound together * and initialised. The socket is then returned. If we are out of inodes - * NULL is returned. + * NULL is returned. This functions uses GFP_KERNEL internally. */ struct socket *sock_alloc(void) @@ -561,7 +581,7 @@ struct socket *sock_alloc(void) EXPORT_SYMBOL(sock_alloc); /** - * sock_release - close a socket + * sock_release - close a socket * @sock: socket to close * * The socket is released from the protocol stack if it has a release @@ -617,6 +637,15 @@ void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags) } EXPORT_SYMBOL(__sock_tx_timestamp); +/** + * sock_sendmsg - send a message through @sock + * @sock: socket + * @msg: message to send + * + * Sends @msg through @sock, passing through LSM. + * Returns the number of bytes sent, or an error code. + */ + static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg) { int ret = sock->ops->sendmsg(sock, msg, msg_data_left(msg)); @@ -633,6 +662,18 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg) } EXPORT_SYMBOL(sock_sendmsg); +/** + * kernel_sendmsg - send a message through @sock (kernel-space) + * @sock: socket + * @msg: message header + * @vec: kernel vec + * @num: vec array length + * @size: total message data size + * + * Builds the message data with @vec and sends it through @sock. + * Returns the number of bytes sent, or an error code. + */ + int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size) { @@ -641,6 +682,19 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg, } EXPORT_SYMBOL(kernel_sendmsg); +/** + * kernel_sendmsg_locked - send a message through @sock (kernel-space) + * @sk: sock + * @msg: message header + * @vec: output s/g array + * @num: output s/g array length + * @size: total message data size + * + * Builds the message data with @vec and sends it through @sock. + * Returns the number of bytes sent, or an error code. + * Caller must hold @sk. + */ + int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg, struct kvec *vec, size_t num, size_t size) { @@ -811,6 +865,16 @@ void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk, } EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops); +/** + * sock_recvmsg - receive a message from @sock + * @sock: socket + * @msg: message to receive + * @flags: message flags + * + * Receives @msg from @sock, passing through LSM. Returns the total number + * of bytes received, or an error. + */ + static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, int flags) { @@ -826,20 +890,21 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags) EXPORT_SYMBOL(sock_recvmsg); /** - * kernel_recvmsg - Receive a message from a socket (kernel space) - * @sock: The socket to receive the message from - * @msg: Received message - * @vec: Input s/g array for message data - * @num: Size of input s/g array - * @size: Number of bytes to read - * @flags: Message flags (MSG_DONTWAIT, etc...) + * kernel_recvmsg - Receive a message from a socket (kernel space) + * @sock: The socket to receive the message from + * @msg: Received message + * @vec: Input s/g array for message data + * @num: Size of input s/g array + * @size: Number of bytes to read + * @flags: Message flags (MSG_DONTWAIT, etc...) * - * On return the msg structure contains the scatter/gather array passed in the - * vec argument. The array is modified so that it consists of the unfilled - * portion of the original array. + * On return the msg structure contains the scatter/gather array passed in the + * vec argument. The array is modified so that it consists of the unfilled + * portion of the original array. * - * The returned value is the total number of bytes received, or an error. + * The returned value is the total number of bytes received, or an error. */ + int kernel_recvmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size, int flags) { @@ -1005,6 +1070,13 @@ static long sock_do_ioctl(struct net *net, struct socket *sock, * what to do with it - that's up to the protocol still. */ +/** + * get_net_ns - increment the refcount of the network namespace + * @ns: common namespace (net) + * + * Returns the net's common namespace. + */ + struct ns_common *get_net_ns(struct ns_common *ns) { return &get_net(container_of(ns, struct net, ns))->ns; @@ -1099,6 +1171,19 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) return err; } +/** + * sock_create_lite - creates a socket + * @family: protocol family (AF_INET, ...) + * @type: communication type (SOCK_STREAM, ...) + * @protocol: protocol (0, ...) + * @res: new socket + * + * Creates a new socket and assigns it to @res, passing through LSM. + * The new socket initialization is not complete, see kernel_accept(). + * Returns 0 or an error. On failure @res is set to %NULL. + * This function internally uses GFP_KERNEL. + */ + int sock_create_lite(int family, int type, int protocol, struct socket **res) { int err; @@ -1224,6 +1309,21 @@ int sock_wake_async(struct socket_wq *wq, int how, int band) } EXPORT_SYMBOL(sock_wake_async); +/** + * __sock_create - creates a socket + * @net: net namespace + * @family: protocol family (AF_INET, ...) + * @type: communication type (SOCK_STREAM, ...) + * @protocol: protocol (0, ...) + * @res: new socket + * @kern: boolean for kernel space sockets + * + * Creates a new socket and assigns it to @res, passing through LSM. + * Returns 0 or an error. On failure @res is set to %NULL. @kern must + * be set to true if the socket resides in kernel space. + * This function internally uses GFP_KERNEL. + */ + int __sock_create(struct net *net, int family, int type, int protocol, struct socket **res, int kern) { @@ -1333,12 +1433,35 @@ int __sock_create(struct net *net, int family, int type, int protocol, } EXPORT_SYMBOL(__sock_create); +/** + * sock_create - creates a socket + * @family: protocol family (AF_INET, ...) + * @type: communication type (SOCK_STREAM, ...) + * @protocol: protocol (0, ...) + * @res: new socket + * + * A wrapper around __sock_create(). + * Returns 0 or an error. This function internally uses GFP_KERNEL. + */ + int sock_create(int family, int type, int protocol, struct socket **res) { return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0); } EXPORT_SYMBOL(sock_create); +/** + * sock_create_kern - creates a socket (kernel space) + * @net: net namespace + * @family: protocol family (AF_INET, ...) + * @type: communication type (SOCK_STREAM, ...) + * @protocol: protocol (0, ...) + * @res: new socket + * + * A wrapper around __sock_create(). + * Returns 0 or an error. This function internally uses GFP_KERNEL. + */ + int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res) { return __sock_create(net, family, type, protocol, res, 1); @@ -3322,18 +3445,46 @@ static long compat_sock_ioctl(struct file *file, unsigned int cmd, } #endif +/** + * kernel_bind - bind an address to a socket (kernel space) + * @sock: socket + * @addr: address + * @addrlen: length of address + * + * Returns 0 or an error. + */ + int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen) { return sock->ops->bind(sock, addr, addrlen); } EXPORT_SYMBOL(kernel_bind); +/** + * kernel_listen - move socket to listening state (kernel space) + * @sock: socket + * @backlog: pending connections queue size + * + * Returns 0 or an error. + */ + int kernel_listen(struct socket *sock, int backlog) { return sock->ops->listen(sock, backlog); } EXPORT_SYMBOL(kernel_listen); +/** + * kernel_accept - accept a connection (kernel space) + * @sock: listening socket + * @newsock: new connected socket + * @flags: flags + * + * @flags must be SOCK_CLOEXEC, SOCK_NONBLOCK or 0. + * If it fails, @newsock is guaranteed to be %NULL. + * Returns 0 or an error. + */ + int kernel_accept(struct socket *sock, struct socket **newsock, int flags) { struct sock *sk = sock->sk; @@ -3359,6 +3510,19 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags) } EXPORT_SYMBOL(kernel_accept); +/** + * kernel_connect - connect a socket (kernel space) + * @sock: socket + * @addr: address + * @addrlen: address length + * @flags: flags (O_NONBLOCK, ...) + * + * For datagram sockets, @addr is the addres to which datagrams are sent + * by default, and the only address from which datagrams are received. + * For stream sockets, attempts to connect to @addr. + * Returns 0 or an error code. + */ + int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, int flags) { @@ -3366,18 +3530,48 @@ int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, } EXPORT_SYMBOL(kernel_connect); +/** + * kernel_getsockname - get the address which the socket is bound (kernel space) + * @sock: socket + * @addr: address holder + * + * Fills the @addr pointer with the address which the socket is bound. + * Returns 0 or an error code. + */ + int kernel_getsockname(struct socket *sock, struct sockaddr *addr) { return sock->ops->getname(sock, addr, 0); } EXPORT_SYMBOL(kernel_getsockname); +/** + * kernel_peername - get the address which the socket is connected (kernel space) + * @sock: socket + * @addr: address holder + * + * Fills the @addr pointer with the address which the socket is connected. + * Returns 0 or an error code. + */ + int kernel_getpeername(struct socket *sock, struct sockaddr *addr) { return sock->ops->getname(sock, addr, 1); } EXPORT_SYMBOL(kernel_getpeername); +/** + * kernel_getsockopt - get a socket option (kernel space) + * @sock: socket + * @level: API level (SOL_SOCKET, ...) + * @optname: option tag + * @optval: option value + * @optlen: option length + * + * Assigns the option length to @optlen. + * Returns 0 or an error. + */ + int kernel_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen) { @@ -3400,6 +3594,17 @@ int kernel_getsockopt(struct socket *sock, int level, int optname, } EXPORT_SYMBOL(kernel_getsockopt); +/** + * kernel_setsockopt - set a socket option (kernel space) + * @sock: socket + * @level: API level (SOL_SOCKET, ...) + * @optname: option tag + * @optval: option value + * @optlen: option length + * + * Returns 0 or an error. + */ + int kernel_setsockopt(struct socket *sock, int level, int optname, char *optval, unsigned int optlen) { @@ -3420,6 +3625,17 @@ int kernel_setsockopt(struct socket *sock, int level, int optname, } EXPORT_SYMBOL(kernel_setsockopt); +/** + * kernel_sendpage - send a &page through a socket (kernel space) + * @sock: socket + * @page: page + * @offset: page offset + * @size: total size in bytes + * @flags: flags (MSG_DONTWAIT, ...) + * + * Returns the total amount sent in bytes or an error. + */ + int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { @@ -3430,6 +3646,18 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset, } EXPORT_SYMBOL(kernel_sendpage); +/** + * kernel_sendpage_locked - send a &page through the locked sock (kernel space) + * @sk: sock + * @page: page + * @offset: page offset + * @size: total size in bytes + * @flags: flags (MSG_DONTWAIT, ...) + * + * Returns the total amount sent in bytes or an error. + * Caller must hold @sk. + */ + int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags) { @@ -3443,17 +3671,30 @@ int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset, } EXPORT_SYMBOL(kernel_sendpage_locked); +/** + * kernel_shutdown - shut down part of a full-duplex connection (kernel space) + * @sock: socket + * @how: connection part + * + * Returns 0 or an error. + */ + int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how) { return sock->ops->shutdown(sock, how); } EXPORT_SYMBOL(kernel_sock_shutdown); -/* This routine returns the IP overhead imposed by a socket i.e. - * the length of the underlying IP header, depending on whether - * this is an IPv4 or IPv6 socket and the length from IP options turned - * on at the socket. Assumes that the caller has a lock on the socket. +/** + * kernel_sock_ip_overhead - returns the IP overhead imposed by a socket + * @sk: socket + * + * This routine returns the IP overhead imposed by a socket i.e. + * the length of the underlying IP header, depending on whether + * this is an IPv4 or IPv6 socket and the length from IP options turned + * on at the socket. Assumes that the caller has a lock on the socket. */ + u32 kernel_sock_ip_overhead(struct sock *sk) { struct inet_sock *inet; From daa5c4d0167a308306525fd5ab9a5e18e21f4f74 Mon Sep 17 00:00:00 2001 From: Jerome Brunet Date: Thu, 14 Mar 2019 14:49:45 +0100 Subject: [PATCH 073/468] net: phy: meson-gxl: fix interrupt support If an interrupt is already pending when the interrupt is enabled on the GXL phy, no IRQ will ever be triggered. The fix is simply to make sure pending IRQs are cleared before setting up the irq mask. Fixes: cf127ff20af1 ("net: phy: meson-gxl: add interrupt support") Signed-off-by: Jerome Brunet Signed-off-by: David S. Miller --- drivers/net/phy/meson-gxl.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index a238388eb1a5..0eec2913c289 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -201,6 +201,7 @@ static int meson_gxl_ack_interrupt(struct phy_device *phydev) static int meson_gxl_config_intr(struct phy_device *phydev) { u16 val; + int ret; if (phydev->interrupts == PHY_INTERRUPT_ENABLED) { val = INTSRC_ANEG_PR @@ -213,6 +214,11 @@ static int meson_gxl_config_intr(struct phy_device *phydev) val = 0; } + /* Ack any pending IRQ */ + ret = meson_gxl_ack_interrupt(phydev); + if (ret) + return ret; + return phy_write(phydev, INTSRC_MASK, val); } From 4477138fa0ae4e1b699786ef0600863ea6e6c61c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 14 Mar 2019 20:19:47 -0700 Subject: [PATCH 074/468] tun: properly test for IFF_UP Same reasons than the ones explained in commit 4179cb5a4c92 ("vxlan: test dev->flags & IFF_UP before calling netif_rx()") netif_rx_ni() or napi_gro_frags() must be called under a strict contract. At device dismantle phase, core networking clears IFF_UP and flush_all_backlogs() is called after rcu grace period to make sure no incoming packet might be in a cpu backlog and still referencing the device. A similar protocol is used for gro layer. Most drivers call netif_rx() from their interrupt handler, and since the interrupts are disabled at device dismantle, netif_rx() does not have to check dev->flags & IFF_UP Virtual drivers do not have this guarantee, and must therefore make the check themselves. Fixes: 1bd4978a88ac ("tun: honor IFF_UP in tun_get_user()") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- drivers/net/tun.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 1d68921723dc..0d343359f647 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1763,9 +1763,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tfile); - if (!(tun->dev->flags & IFF_UP)) - return -EIO; - if (!(tun->flags & IFF_NO_PI)) { if (len < sizeof(pi)) return -EINVAL; @@ -1867,6 +1864,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, err = skb_copy_datagram_from_iter(skb, 0, from, len); if (err) { + err = -EFAULT; +drop: this_cpu_inc(tun->pcpu_stats->rx_dropped); kfree_skb(skb); if (frags) { @@ -1874,7 +1873,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, mutex_unlock(&tfile->napi_mutex); } - return -EFAULT; + return err; } } @@ -1958,6 +1957,12 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, !tfile->detached) rxhash = __skb_get_hash_symmetric(skb); + rcu_read_lock(); + if (unlikely(!(tun->dev->flags & IFF_UP))) { + err = -EIO; + goto drop; + } + if (frags) { /* Exercise flow dissector code path. */ u32 headlen = eth_get_headlen(skb->data, skb_headlen(skb)); @@ -1965,6 +1970,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, if (unlikely(headlen > skb_headlen(skb))) { this_cpu_inc(tun->pcpu_stats->rx_dropped); napi_free_frags(&tfile->napi); + rcu_read_unlock(); mutex_unlock(&tfile->napi_mutex); WARN_ON(1); return -ENOMEM; @@ -1992,6 +1998,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, } else { netif_rx_ni(skb); } + rcu_read_unlock(); stats = get_cpu_ptr(tun->pcpu_stats); u64_stats_update_begin(&stats->syncp); From 044175a06706d516aa42874bb44dbbfc3c4d20eb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Wed, 13 Mar 2019 15:15:49 +0100 Subject: [PATCH 075/468] xsk: fix umem memory leak on cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When the umem is cleaned up, the task that created it might already be gone. If the task was gone, the xdp_umem_release function did not free the pages member of struct xdp_umem. It turned out that the task lookup was not needed at all; The code was a left-over when we moved from task accounting to user accounting [1]. This patch fixes the memory leak by removing the task lookup logic completely. [1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/ Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/ Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: Jiri Slaby Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann --- include/net/xdp_sock.h | 1 - net/xdp/xdp_umem.c | 19 +------------------ 2 files changed, 1 insertion(+), 19 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 61cf7dbb6782..d074b6d60f8a 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -36,7 +36,6 @@ struct xdp_umem { u32 headroom; u32 chunk_size_nohr; struct user_struct *user; - struct pid *pid; unsigned long address; refcount_t users; struct work_struct work; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 77520eacee8f..989e52386c35 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -193,9 +193,6 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem) static void xdp_umem_release(struct xdp_umem *umem) { - struct task_struct *task; - struct mm_struct *mm; - xdp_umem_clear_dev(umem); ida_simple_remove(&umem_ida, umem->id); @@ -214,21 +211,10 @@ static void xdp_umem_release(struct xdp_umem *umem) xdp_umem_unpin_pages(umem); - task = get_pid_task(umem->pid, PIDTYPE_PID); - put_pid(umem->pid); - if (!task) - goto out; - mm = get_task_mm(task); - put_task_struct(task); - if (!mm) - goto out; - - mmput(mm); kfree(umem->pages); umem->pages = NULL; xdp_umem_unaccount_pages(umem); -out: kfree(umem); } @@ -357,7 +343,6 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (size_chk < 0) return -EINVAL; - umem->pid = get_task_pid(current, PIDTYPE_PID); umem->address = (unsigned long)addr; umem->chunk_mask = ~((u64)chunk_size - 1); umem->size = size; @@ -373,7 +358,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) err = xdp_umem_account_pages(umem); if (err) - goto out; + return err; err = xdp_umem_pin_pages(umem); if (err) @@ -392,8 +377,6 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) out_account: xdp_umem_unaccount_pages(umem); -out: - put_pid(umem->pid); return err; } From 86be36f6502c52ddb4b85938145324fd07332da1 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Fri, 15 Mar 2019 20:21:19 +0530 Subject: [PATCH 076/468] powerpc: bpf: Fix generation of load/store DW instructions Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test was failing on powerpc64 BE, and rightfully indicated that the PPC_LD() macro is not masking away the last two bits of the offset per the ISA, resulting in the generation of 'lwa' instruction instead of the intended 'ld' instruction. Segher also pointed out that we can't simply mask away the last two bits as that will result in loading/storing from/to a memory location that was not intended. This patch addresses this by using ldx/stdx if the offset is not word-aligned. We load the offset into a temporary register (TMP_REG_2) and use that as the index register in a subsequent ldx/stdx. We fix PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL() and PPC_BPF_STL() to factor in the offset value and generate the proper instruction sequence. We also convert all existing users of PPC_LD() and PPC_STD() to use these macros. All existing uses of these macros have been audited to ensure that TMP_REG_2 can be clobbered. Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Yauheni Kaliuta Signed-off-by: Naveen N. Rao Signed-off-by: Daniel Borkmann --- arch/powerpc/include/asm/ppc-opcode.h | 2 ++ arch/powerpc/net/bpf_jit.h | 17 +++++------------ arch/powerpc/net/bpf_jit32.h | 4 ++++ arch/powerpc/net/bpf_jit64.h | 20 ++++++++++++++++++++ arch/powerpc/net/bpf_jit_comp64.c | 12 ++++++------ 5 files changed, 37 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index f9513ad38fa6..e6c9ad088f45 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -302,6 +302,7 @@ /* Misc instructions for BPF compiler */ #define PPC_INST_LBZ 0x88000000 #define PPC_INST_LD 0xe8000000 +#define PPC_INST_LDX 0x7c00002a #define PPC_INST_LHZ 0xa0000000 #define PPC_INST_LWZ 0x80000000 #define PPC_INST_LHBRX 0x7c00062c @@ -309,6 +310,7 @@ #define PPC_INST_STB 0x98000000 #define PPC_INST_STH 0xb0000000 #define PPC_INST_STD 0xf8000000 +#define PPC_INST_STDX 0x7c00012a #define PPC_INST_STDU 0xf8000001 #define PPC_INST_STW 0x90000000 #define PPC_INST_STWU 0x94000000 diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 549e9490ff2a..dcac37745b05 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -51,6 +51,8 @@ #define PPC_LIS(r, i) PPC_ADDIS(r, 0, i) #define PPC_STD(r, base, i) EMIT(PPC_INST_STD | ___PPC_RS(r) | \ ___PPC_RA(base) | ((i) & 0xfffc)) +#define PPC_STDX(r, base, b) EMIT(PPC_INST_STDX | ___PPC_RS(r) | \ + ___PPC_RA(base) | ___PPC_RB(b)) #define PPC_STDU(r, base, i) EMIT(PPC_INST_STDU | ___PPC_RS(r) | \ ___PPC_RA(base) | ((i) & 0xfffc)) #define PPC_STW(r, base, i) EMIT(PPC_INST_STW | ___PPC_RS(r) | \ @@ -65,7 +67,9 @@ #define PPC_LBZ(r, base, i) EMIT(PPC_INST_LBZ | ___PPC_RT(r) | \ ___PPC_RA(base) | IMM_L(i)) #define PPC_LD(r, base, i) EMIT(PPC_INST_LD | ___PPC_RT(r) | \ - ___PPC_RA(base) | IMM_L(i)) + ___PPC_RA(base) | ((i) & 0xfffc)) +#define PPC_LDX(r, base, b) EMIT(PPC_INST_LDX | ___PPC_RT(r) | \ + ___PPC_RA(base) | ___PPC_RB(b)) #define PPC_LWZ(r, base, i) EMIT(PPC_INST_LWZ | ___PPC_RT(r) | \ ___PPC_RA(base) | IMM_L(i)) #define PPC_LHZ(r, base, i) EMIT(PPC_INST_LHZ | ___PPC_RT(r) | \ @@ -85,17 +89,6 @@ ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_BPF_STDCX(s, a, b) EMIT(PPC_INST_STDCX | ___PPC_RS(s) | \ ___PPC_RA(a) | ___PPC_RB(b)) - -#ifdef CONFIG_PPC64 -#define PPC_BPF_LL(r, base, i) do { PPC_LD(r, base, i); } while(0) -#define PPC_BPF_STL(r, base, i) do { PPC_STD(r, base, i); } while(0) -#define PPC_BPF_STLU(r, base, i) do { PPC_STDU(r, base, i); } while(0) -#else -#define PPC_BPF_LL(r, base, i) do { PPC_LWZ(r, base, i); } while(0) -#define PPC_BPF_STL(r, base, i) do { PPC_STW(r, base, i); } while(0) -#define PPC_BPF_STLU(r, base, i) do { PPC_STWU(r, base, i); } while(0) -#endif - #define PPC_CMPWI(a, i) EMIT(PPC_INST_CMPWI | ___PPC_RA(a) | IMM_L(i)) #define PPC_CMPDI(a, i) EMIT(PPC_INST_CMPDI | ___PPC_RA(a) | IMM_L(i)) #define PPC_CMPW(a, b) EMIT(PPC_INST_CMPW | ___PPC_RA(a) | \ diff --git a/arch/powerpc/net/bpf_jit32.h b/arch/powerpc/net/bpf_jit32.h index 6f4daacad296..ade04547703f 100644 --- a/arch/powerpc/net/bpf_jit32.h +++ b/arch/powerpc/net/bpf_jit32.h @@ -123,6 +123,10 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); #define PPC_NTOHS_OFFS(r, base, i) PPC_LHZ_OFFS(r, base, i) #endif +#define PPC_BPF_LL(r, base, i) do { PPC_LWZ(r, base, i); } while(0) +#define PPC_BPF_STL(r, base, i) do { PPC_STW(r, base, i); } while(0) +#define PPC_BPF_STLU(r, base, i) do { PPC_STWU(r, base, i); } while(0) + #define SEEN_DATAREF 0x10000 /* might call external helpers */ #define SEEN_XREG 0x20000 /* X reg is used */ #define SEEN_MEM 0x40000 /* SEEN_MEM+(1< MAX_TAIL_CALL_CNT) * goto out; */ - PPC_LD(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx)); + PPC_BPF_LL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx)); PPC_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT); PPC_BCC(COND_GT, out); @@ -265,7 +265,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 /* prog = array->ptrs[index]; */ PPC_MULI(b2p[TMP_REG_1], b2p_index, 8); PPC_ADD(b2p[TMP_REG_1], b2p[TMP_REG_1], b2p_bpf_array); - PPC_LD(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_array, ptrs)); + PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_array, ptrs)); /* * if (prog == NULL) @@ -275,7 +275,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 PPC_BCC(COND_EQ, out); /* goto *(prog->bpf_func + prologue_size); */ - PPC_LD(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_prog, bpf_func)); + PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_prog, bpf_func)); #ifdef PPC64_ELF_ABI_v1 /* skip past the function descriptor */ PPC_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1], @@ -606,7 +606,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, * the instructions generated will remain the * same across all passes */ - PPC_STD(dst_reg, 1, bpf_jit_stack_local(ctx)); + PPC_BPF_STL(dst_reg, 1, bpf_jit_stack_local(ctx)); PPC_ADDI(b2p[TMP_REG_1], 1, bpf_jit_stack_local(ctx)); PPC_LDBRX(dst_reg, 0, b2p[TMP_REG_1]); break; @@ -662,7 +662,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, PPC_LI32(b2p[TMP_REG_1], imm); src_reg = b2p[TMP_REG_1]; } - PPC_STD(src_reg, dst_reg, off); + PPC_BPF_STL(src_reg, dst_reg, off); break; /* @@ -709,7 +709,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, break; /* dst = *(u64 *)(ul) (src + off) */ case BPF_LDX | BPF_MEM | BPF_DW: - PPC_LD(dst_reg, src_reg, off); + PPC_BPF_LL(dst_reg, src_reg, off); break; /* From 6f19893b644a9454d85e593b5e90914e7a72b7dd Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Thu, 14 Mar 2019 23:20:16 -0500 Subject: [PATCH 077/468] net: openvswitch: fix a NULL pointer dereference upcall is dereferenced even when genlmsg_put fails. The fix goto out to avoid the NULL pointer dereference in this case. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/openvswitch/datapath.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 6679e96ab1dc..45d1469308b0 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -448,6 +448,10 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, upcall = genlmsg_put(user_skb, 0, 0, &dp_packet_genl_family, 0, upcall_info->cmd); + if (!upcall) { + err = -EINVAL; + goto out; + } upcall->dp_ifindex = dp_ifindex; err = ovs_nla_put_key(key, key, OVS_PACKET_ATTR_KEY, false, user_skb); From 0fff9bd47e1341b5c4db862cc39fc68ce45f165d Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Fri, 15 Mar 2019 01:11:22 -0500 Subject: [PATCH 078/468] net: openvswitch: fix missing checks for nla_nest_start nla_nest_start may fail and thus deserves a check. The fix returns -EMSGSIZE when it fails. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/openvswitch/datapath.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 45d1469308b0..9dd158ab51b3 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -464,6 +464,10 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, if (upcall_info->egress_tun_info) { nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_EGRESS_TUN_KEY); + if (!nla) { + err = -EMSGSIZE; + goto out; + } err = ovs_nla_put_tunnel_info(user_skb, upcall_info->egress_tun_info); BUG_ON(err); @@ -472,6 +476,10 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, if (upcall_info->actions_len) { nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS); + if (!nla) { + err = -EMSGSIZE; + goto out; + } err = ovs_nla_put_actions(upcall_info->actions, upcall_info->actions_len, user_skb); From 07660ca679da3007d3231938e2dfb415d3440716 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Fri, 15 Mar 2019 01:14:33 -0500 Subject: [PATCH 079/468] net: ncsi: fix a missing check for nla_nest_start nla_nest_start may fail and thus deserves a check. The fix returns -EMSGSIZE in case it fails. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/ncsi/ncsi-netlink.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c index 5d782445d2fc..bad17bba8ba7 100644 --- a/net/ncsi/ncsi-netlink.c +++ b/net/ncsi/ncsi-netlink.c @@ -251,6 +251,10 @@ static int ncsi_pkg_info_all_nl(struct sk_buff *skb, } attr = nla_nest_start(skb, NCSI_ATTR_PACKAGE_LIST); + if (!attr) { + rc = -EMSGSIZE; + goto err; + } rc = ncsi_write_package_info(skb, ndp, package->id); if (rc) { nla_nest_cancel(skb, attr); From 4589e28db46ee4961edfd794c5bb43887d38c8e5 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Fri, 15 Mar 2019 12:11:59 -0500 Subject: [PATCH 080/468] net: tipc: fix a missing check of nla_nest_start nla_nest_start could fail and requires a check. The fix returns -EMSGSIZE if it fails. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/tipc/group.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/tipc/group.c b/net/tipc/group.c index 06fee142f09f..63f39201e41e 100644 --- a/net/tipc/group.c +++ b/net/tipc/group.c @@ -919,6 +919,9 @@ int tipc_group_fill_sock_diag(struct tipc_group *grp, struct sk_buff *skb) { struct nlattr *group = nla_nest_start(skb, TIPC_NLA_SOCK_GROUP); + if (!group) + return -EMSGSIZE; + if (nla_put_u32(skb, TIPC_NLA_SOCK_GROUP_ID, grp->type) || nla_put_u32(skb, TIPC_NLA_SOCK_GROUP_INSTANCE, From 9180bb4f046064dfa4541488102703b402bb04e1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 16 Mar 2019 13:09:53 -0700 Subject: [PATCH 081/468] tun: add a missing rcu_read_unlock() in error path In my latest patch I missed one rcu_read_unlock(), in case device is down. Fixes: 4477138fa0ae ("tun: properly test for IFF_UP") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 0d343359f647..e9ca1c088d0b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1960,6 +1960,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, rcu_read_lock(); if (unlikely(!(tun->dev->flags & IFF_UP))) { err = -EIO; + rcu_read_unlock(); goto drop; } From 517ccc2aa50dbd7767a9eb8e1d9987a3ed7ced3e Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Sat, 16 Mar 2019 16:46:05 -0500 Subject: [PATCH 082/468] net: tipc: fix a missing check for nla_nest_start nla_nest_start may fail. The fix check its status and returns -EMSGSIZE in case it fails. Signed-off-by: Kangjie Lu Signed-off-by: David S. Miller --- net/tipc/socket.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 3274ef625dba..d6b26862b34e 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -3255,6 +3255,8 @@ static int __tipc_nl_add_sk_con(struct sk_buff *skb, struct tipc_sock *tsk) peer_port = tsk_peer_port(tsk); nest = nla_nest_start(skb, TIPC_NLA_SOCK_CON); + if (!nest) + return -EMSGSIZE; if (nla_put_u32(skb, TIPC_NLA_CON_NODE, peer_node)) goto msg_full; From 6958d11f77d45db80f7e22a21a74d4d5f44dc667 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Sun, 17 Mar 2019 15:21:49 -0700 Subject: [PATCH 083/468] xfs: don't trip over uninitialized buffer on extent read of corrupted inode We've had rather rare reports of bmap btree block corruption where the bmap root block has a level count of zero. The root cause of the corruption is so far unknown. We do have verifier checks to detect this form of on-disk corruption, but this doesn't cover a memory corruption variant of the problem. The latter is a reasonable possibility because the root block is part of the inode fork and can reside in-core for some time before inode extents are read. If this occurs, it leads to a system crash such as the following: BUG: unable to handle kernel paging request at ffffffff00000221 PF error: [normal kernel read fault] ... RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs] ... Call Trace: xfs_iread_extents+0x379/0x540 [xfs] xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs] ? xfs_attr_get+0xd1/0x120 [xfs] ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_iomap_begin+0x4c4/0x6d0 [xfs] ? __vfs_getxattr+0x53/0x70 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_apply+0x63/0x130 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_file_buffered_write+0x62/0x90 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs] __vfs_write+0x150/0x1b0 vfs_write+0xba/0x1c0 ksys_pwrite64+0x64/0xa0 do_syscall_64+0x5a/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The crash occurs because xfs_iread_extents() attempts to release an uninitialized buffer pointer as the level == 0 value prevented the buffer from ever being allocated or read. Change the level > 0 assert to an explicit error check in xfs_iread_extents() to avoid crashing the kernel in the event of localized, in-core inode corruption. Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 48502cb9990f..ae4c3b0d84db 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1191,7 +1191,10 @@ xfs_iread_extents( * Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out. */ level = be16_to_cpu(block->bb_level); - ASSERT(level > 0); + if (unlikely(level == 0)) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp); + return -EFSCORRUPTED; + } pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes); bno = be64_to_cpu(*pp); From 65e9a6d25deb752d97b88335068dc0e7accbc9c3 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 17 Mar 2019 17:17:45 -0700 Subject: [PATCH 084/468] networking: fix snmp_counter.rst Doc. Warnings Fix documentation markup warnings in snmp_counter.rst: Documentation/networking/snmp_counter.rst:416: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:684: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:693: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:707: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:712: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:722: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:733: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:736: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:739: WARNING: Bullet list ends without a blank line; unexpected unindent. Fixes: 80cc49507ba48 ("net: Add part of TCP counts explanations in snmp_counters.rst") Fixes: 8e2ea53a83dfb ("add snmp counters document") Fixes: a6c7c7aac2de6 ("net: add document for several snmp counters") Signed-off-by: Randy Dunlap Cc: yupeng --- Documentation/networking/snmp_counter.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index 52b026be028f..38a4edc4522b 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -413,7 +413,7 @@ algorithm. .. _F-RTO: https://tools.ietf.org/html/rfc5682 TCP Fast Path -============ +============= When kernel receives a TCP packet, it has two paths to handler the packet, one is fast path, another is slow path. The comment in kernel code provides a good explanation of them, I pasted them below:: @@ -681,6 +681,7 @@ The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender. * TcpExtTCPDSACKRecv + The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received. @@ -690,7 +691,7 @@ The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received. invalid SACK and DSACK -==================== +====================== When a SACK (or DSACK) block is invalid, a corresponding counter would be updated. The validation method is base on the start/end sequence number of the SACK block. For more details, please refer the comment @@ -704,11 +705,13 @@ explaination: .. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32 * TcpExtTCPSACKDiscard + This counter indicates how many SACK blocks are invalid. If the invalid SACK block is caused by ACK recording, the TCP stack will only ignore it and won't update this counter. * TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo + When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket. If the undo_marker is not set, the TCP stack isn't @@ -719,7 +722,7 @@ will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld will be updated. As implied in its name, it might be an old packet. SACK shift -========= +========== The linux networking stack stores data in sk_buff struct (skb for short). If a SACK block acrosses multiple skb, the TCP stack will try to re-arrange data in these skb. E.g. if a SACK block acknowledges seq @@ -730,12 +733,15 @@ seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be discard, this operation is 'merge'. * TcpExtTCPSackShifted + A skb is shifted * TcpExtTCPSackMerged + A skb is merged * TcpExtTCPSackShiftFallback + A skb should be shifted or merged, but the TCP stack doesn't do it for some reasons. From ea239314fe42ace880bdd834256834679346c80e Mon Sep 17 00:00:00 2001 From: Erik Hugne Date: Sun, 17 Mar 2019 18:46:42 +0100 Subject: [PATCH 085/468] tipc: allow service ranges to be connect()'ed on RDM/DGRAM We move the check that prevents connecting service ranges to after the RDM/DGRAM check, and move address sanity control to a separate function that also validates the service range. Fixes: 23998835be98 ("tipc: improve address sanity check in tipc_connect()") Signed-off-by: Erik Hugne Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/socket.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index d6b26862b34e..b542f14ed444 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2349,6 +2349,16 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return 0; } +static bool tipc_sockaddr_is_sane(struct sockaddr_tipc *addr) +{ + if (addr->family != AF_TIPC) + return false; + if (addr->addrtype == TIPC_SERVICE_RANGE) + return (addr->addr.nameseq.lower <= addr->addr.nameseq.upper); + return (addr->addrtype == TIPC_SERVICE_ADDR || + addr->addrtype == TIPC_SOCKET_ADDR); +} + /** * tipc_connect - establish a connection to another TIPC port * @sock: socket structure @@ -2384,18 +2394,18 @@ static int tipc_connect(struct socket *sock, struct sockaddr *dest, if (!tipc_sk_type_connectionless(sk)) res = -EINVAL; goto exit; - } else if (dst->family != AF_TIPC) { - res = -EINVAL; } - if (dst->addrtype != TIPC_ADDR_ID && dst->addrtype != TIPC_ADDR_NAME) + if (!tipc_sockaddr_is_sane(dst)) { res = -EINVAL; - if (res) goto exit; - + } /* DGRAM/RDM connect(), just save the destaddr */ if (tipc_sk_type_connectionless(sk)) { memcpy(&tsk->peer, dest, destlen); goto exit; + } else if (dst->addrtype == TIPC_SERVICE_RANGE) { + res = -EINVAL; + goto exit; } previous = sk->sk_state; From cd1b772d4881d1cd15b90ec17aab9ac7950e8850 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 29 Oct 2018 16:32:31 +0100 Subject: [PATCH 086/468] driver core: remove BUS_ATTR() There are now no in-kernel users of BUS_ATTR() so drop it from device.h Everyone should use BUS_ATTR_RO/RW/WO() from now on. Cc: "Rafael J. Wysocki" Signed-off-by: Greg Kroah-Hartman --- include/linux/device.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/device.h b/include/linux/device.h index b425a7ee04ce..4e6987e11f68 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -49,8 +49,6 @@ struct bus_attribute { ssize_t (*store)(struct bus_type *bus, const char *buf, size_t count); }; -#define BUS_ATTR(_name, _mode, _show, _store) \ - struct bus_attribute bus_attr_##_name = __ATTR(_name, _mode, _show, _store) #define BUS_ATTR_RW(_name) \ struct bus_attribute bus_attr_##_name = __ATTR_RW(_name) #define BUS_ATTR_RO(_name) \ From bd31342f0046077e92062a6c09eae6c8f1676916 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 12 Mar 2019 10:09:37 +1100 Subject: [PATCH 087/468] staging: remove mt7621-eth driver/net/ethernet/mediatek/ now supports this hardware, so we don't need a separate driver. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman --- drivers/staging/Kconfig | 2 - drivers/staging/Makefile | 1 - .../bindings/net/mediatek-net-gsw.txt | 48 - drivers/staging/mt7621-eth/Kconfig | 39 - drivers/staging/mt7621-eth/Makefile | 14 - drivers/staging/mt7621-eth/TODO | 13 - drivers/staging/mt7621-eth/ethtool.c | 250 -- drivers/staging/mt7621-eth/ethtool.h | 15 - drivers/staging/mt7621-eth/gsw_mt7620.h | 277 --- drivers/staging/mt7621-eth/gsw_mt7621.c | 297 --- drivers/staging/mt7621-eth/mdio.c | 275 --- drivers/staging/mt7621-eth/mdio.h | 27 - drivers/staging/mt7621-eth/mdio_mt7620.c | 173 -- drivers/staging/mt7621-eth/mtk_eth_soc.c | 2176 ----------------- drivers/staging/mt7621-eth/mtk_eth_soc.h | 716 ------ drivers/staging/mt7621-eth/soc_mt7621.c | 161 -- 16 files changed, 4484 deletions(-) delete mode 100644 drivers/staging/mt7621-eth/Documentation/devicetree/bindings/net/mediatek-net-gsw.txt delete mode 100644 drivers/staging/mt7621-eth/Kconfig delete mode 100644 drivers/staging/mt7621-eth/Makefile delete mode 100644 drivers/staging/mt7621-eth/TODO delete mode 100644 drivers/staging/mt7621-eth/ethtool.c delete mode 100644 drivers/staging/mt7621-eth/ethtool.h delete mode 100644 drivers/staging/mt7621-eth/gsw_mt7620.h delete mode 100644 drivers/staging/mt7621-eth/gsw_mt7621.c delete mode 100644 drivers/staging/mt7621-eth/mdio.c delete mode 100644 drivers/staging/mt7621-eth/mdio.h delete mode 100644 drivers/staging/mt7621-eth/mdio_mt7620.c delete mode 100644 drivers/staging/mt7621-eth/mtk_eth_soc.c delete mode 100644 drivers/staging/mt7621-eth/mtk_eth_soc.h delete mode 100644 drivers/staging/mt7621-eth/soc_mt7621.c diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig index c0901b96cfe4..62951e836cbc 100644 --- a/drivers/staging/Kconfig +++ b/drivers/staging/Kconfig @@ -114,8 +114,6 @@ source "drivers/staging/ralink-gdma/Kconfig" source "drivers/staging/mt7621-mmc/Kconfig" -source "drivers/staging/mt7621-eth/Kconfig" - source "drivers/staging/mt7621-dts/Kconfig" source "drivers/staging/gasket/Kconfig" diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile index 57c6bce13ff4..d1b17ddcd354 100644 --- a/drivers/staging/Makefile +++ b/drivers/staging/Makefile @@ -47,7 +47,6 @@ obj-$(CONFIG_SPI_MT7621) += mt7621-spi/ obj-$(CONFIG_SOC_MT7621) += mt7621-dma/ obj-$(CONFIG_DMA_RALINK) += ralink-gdma/ obj-$(CONFIG_MTK_MMC) += mt7621-mmc/ -obj-$(CONFIG_NET_MEDIATEK_SOC_STAGING) += mt7621-eth/ obj-$(CONFIG_SOC_MT7621) += mt7621-dts/ obj-$(CONFIG_STAGING_GASKET_FRAMEWORK) += gasket/ obj-$(CONFIG_XIL_AXIS_FIFO) += axis-fifo/ diff --git a/drivers/staging/mt7621-eth/Documentation/devicetree/bindings/net/mediatek-net-gsw.txt b/drivers/staging/mt7621-eth/Documentation/devicetree/bindings/net/mediatek-net-gsw.txt deleted file mode 100644 index 596b38552697..000000000000 --- a/drivers/staging/mt7621-eth/Documentation/devicetree/bindings/net/mediatek-net-gsw.txt +++ /dev/null @@ -1,48 +0,0 @@ -Mediatek Gigabit Switch -======================= - -The mediatek gigabit switch can be found on Mediatek SoCs. - -Required properties: -- compatible: Should be "mediatek,mt7620-gsw", "mediatek,mt7621-gsw", - "mediatek,mt7623-gsw" -- reg: Address and length of the register set for the device -- interrupts: Should contain the gigabit switches interrupt - - -Additional required properties for ARM based SoCs: -- mediatek,reset-pin: phandle describing the reset GPIO -- clocks: the clocks used by the switch -- clock-names: the names of the clocks listed in the clocks property - these should be "trgpll", "esw", "gp2", "gp1" -- mt7530-supply: the phandle of the regulator used to power the switch -- mediatek,pctl-regmap: phandle to the port control regmap. this is used to - setup the drive current - - -Optional properties: -- interrupt-parent: Should be the phandle for the interrupt controller - that services interrupts for this device - -Example: - -gsw: switch@1b100000 { - compatible = "mediatek,mt7623-gsw"; - reg = <0 0x1b110000 0 0x300000>; - - interrupt-parent = <&pio>; - interrupts = <168 IRQ_TYPE_EDGE_RISING>; - - clocks = <&apmixedsys CLK_APMIXED_TRGPLL>, - <ðsys CLK_ETHSYS_ESW>, - <ðsys CLK_ETHSYS_GP2>, - <ðsys CLK_ETHSYS_GP1>; - clock-names = "trgpll", "esw", "gp2", "gp1"; - - mt7530-supply = <&mt6323_vpa_reg>; - - mediatek,pctl-regmap = <&syscfg_pctl_a>; - mediatek,reset-pin = <&pio 15 0>; - - status = "okay"; -}; diff --git a/drivers/staging/mt7621-eth/Kconfig b/drivers/staging/mt7621-eth/Kconfig deleted file mode 100644 index 44ea86c7a96c..000000000000 --- a/drivers/staging/mt7621-eth/Kconfig +++ /dev/null @@ -1,39 +0,0 @@ -config NET_VENDOR_MEDIATEK_STAGING - bool "MediaTek ethernet driver - staging version" - depends on RALINK - ---help--- - If you have an MT7621 Mediatek SoC with ethernet, say Y. - -if NET_VENDOR_MEDIATEK_STAGING -choice - prompt "MAC type" - -config NET_MEDIATEK_MT7621 - bool "MT7621" - depends on MIPS && SOC_MT7621 - -endchoice - -config NET_MEDIATEK_SOC_STAGING - tristate "MediaTek SoC Gigabit Ethernet support" - depends on NET_VENDOR_MEDIATEK_STAGING - select PHYLIB - ---help--- - This driver supports the gigabit ethernet MACs in the - MediaTek SoC family. - -config NET_MEDIATEK_MDIO - def_bool NET_MEDIATEK_SOC_STAGING - depends on NET_MEDIATEK_MT7621 - select PHYLIB - -config NET_MEDIATEK_MDIO_MT7620 - def_bool NET_MEDIATEK_SOC_STAGING - depends on NET_MEDIATEK_MT7621 - select NET_MEDIATEK_MDIO - -config NET_MEDIATEK_GSW_MT7621 - def_tristate NET_MEDIATEK_SOC_STAGING - depends on NET_MEDIATEK_MT7621 - -endif #NET_VENDOR_MEDIATEK_STAGING diff --git a/drivers/staging/mt7621-eth/Makefile b/drivers/staging/mt7621-eth/Makefile deleted file mode 100644 index 018bcc3596b3..000000000000 --- a/drivers/staging/mt7621-eth/Makefile +++ /dev/null @@ -1,14 +0,0 @@ -# -# Makefile for the Ralink SoCs built-in ethernet macs -# - -mtk-eth-soc-y += mtk_eth_soc.o ethtool.o - -mtk-eth-soc-$(CONFIG_NET_MEDIATEK_MDIO) += mdio.o -mtk-eth-soc-$(CONFIG_NET_MEDIATEK_MDIO_MT7620) += mdio_mt7620.o - -mtk-eth-soc-$(CONFIG_NET_MEDIATEK_MT7621) += soc_mt7621.o - -obj-$(CONFIG_NET_MEDIATEK_GSW_MT7621) += gsw_mt7621.o - -obj-$(CONFIG_NET_MEDIATEK_SOC_STAGING) += mtk-eth-soc.o diff --git a/drivers/staging/mt7621-eth/TODO b/drivers/staging/mt7621-eth/TODO deleted file mode 100644 index f9e47d4b4cd4..000000000000 --- a/drivers/staging/mt7621-eth/TODO +++ /dev/null @@ -1,13 +0,0 @@ - -- verify devicetree documentation is consistent with code -- fix ethtool - currently doesn't return valid data. -- general code review and clean up -- add support for second MAC on mt7621 -- convert gsw code to use switchdev interfaces -- md7620_mmi_write etc should probably be wrapped - in a regmap abstraction. -- Get soc_mt7621 to work with QDMA TX if possible. -- Ensure phys are correctly configured when a cable - is plugged in. - -Cc: NeilBrown diff --git a/drivers/staging/mt7621-eth/ethtool.c b/drivers/staging/mt7621-eth/ethtool.c deleted file mode 100644 index 8c4228e2c987..000000000000 --- a/drivers/staging/mt7621-eth/ethtool.c +++ /dev/null @@ -1,250 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include "mtk_eth_soc.h" -#include "ethtool.h" - -struct mtk_stat { - char name[ETH_GSTRING_LEN]; - unsigned int idx; -}; - -#define MTK_HW_STAT(stat) { \ - .name = #stat, \ - .idx = offsetof(struct mtk_hw_stats, stat) / sizeof(u64) \ -} - -static const struct mtk_stat mtk_ethtool_hw_stats[] = { - MTK_HW_STAT(tx_bytes), - MTK_HW_STAT(tx_packets), - MTK_HW_STAT(tx_skip), - MTK_HW_STAT(tx_collisions), - MTK_HW_STAT(rx_bytes), - MTK_HW_STAT(rx_packets), - MTK_HW_STAT(rx_overflow), - MTK_HW_STAT(rx_fcs_errors), - MTK_HW_STAT(rx_short_errors), - MTK_HW_STAT(rx_long_errors), - MTK_HW_STAT(rx_checksum_errors), - MTK_HW_STAT(rx_flow_control_packets), -}; - -#define MTK_HW_STATS_LEN ARRAY_SIZE(mtk_ethtool_hw_stats) - -static int mtk_get_link_ksettings(struct net_device *dev, - struct ethtool_link_ksettings *cmd) -{ - struct mtk_mac *mac = netdev_priv(dev); - int err; - - if (!mac->phy_dev) - return -ENODEV; - - if (mac->phy_flags == MTK_PHY_FLAG_ATTACH) { - err = phy_read_status(mac->phy_dev); - if (err) - return -ENODEV; - } - - phy_ethtool_ksettings_get(mac->phy_dev, cmd); - return 0; -} - -static int mtk_set_link_ksettings(struct net_device *dev, - const struct ethtool_link_ksettings *cmd) -{ - struct mtk_mac *mac = netdev_priv(dev); - - if (!mac->phy_dev) - return -ENODEV; - - if (cmd->base.phy_address != mac->phy_dev->mdio.addr) { - if (mac->hw->phy->phy_node[cmd->base.phy_address]) { - mac->phy_dev = mac->hw->phy->phy[cmd->base.phy_address]; - mac->phy_flags = MTK_PHY_FLAG_PORT; - } else if (mac->hw->mii_bus) { - mac->phy_dev = mdiobus_get_phy(mac->hw->mii_bus, - cmd->base.phy_address); - if (!mac->phy_dev) - return -ENODEV; - mac->phy_flags = MTK_PHY_FLAG_ATTACH; - } else { - return -ENODEV; - } - } - - return phy_ethtool_ksettings_set(mac->phy_dev, cmd); -} - -static void mtk_get_drvinfo(struct net_device *dev, - struct ethtool_drvinfo *info) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_soc_data *soc = mac->hw->soc; - - strlcpy(info->driver, mac->hw->dev->driver->name, sizeof(info->driver)); - strlcpy(info->bus_info, dev_name(mac->hw->dev), sizeof(info->bus_info)); - - if (soc->reg_table[MTK_REG_MTK_COUNTER_BASE]) - info->n_stats = MTK_HW_STATS_LEN; -} - -static u32 mtk_get_msglevel(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - - return mac->hw->msg_enable; -} - -static void mtk_set_msglevel(struct net_device *dev, u32 value) -{ - struct mtk_mac *mac = netdev_priv(dev); - - mac->hw->msg_enable = value; -} - -static int mtk_nway_reset(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - - if (!mac->phy_dev) - return -EOPNOTSUPP; - - return genphy_restart_aneg(mac->phy_dev); -} - -static u32 mtk_get_link(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - int err; - - if (!mac->phy_dev) - goto out_get_link; - - if (mac->phy_flags == MTK_PHY_FLAG_ATTACH) { - err = genphy_update_link(mac->phy_dev); - if (err) - goto out_get_link; - } - - return mac->phy_dev->link; - -out_get_link: - return ethtool_op_get_link(dev); -} - -static int mtk_set_ringparam(struct net_device *dev, - struct ethtool_ringparam *ring) -{ - struct mtk_mac *mac = netdev_priv(dev); - - if ((ring->tx_pending < 2) || - (ring->rx_pending < 2) || - (ring->rx_pending > mac->hw->soc->dma_ring_size) || - (ring->tx_pending > mac->hw->soc->dma_ring_size)) - return -EINVAL; - - dev->netdev_ops->ndo_stop(dev); - - mac->hw->tx_ring.tx_ring_size = BIT(fls(ring->tx_pending) - 1); - mac->hw->rx_ring[0].rx_ring_size = BIT(fls(ring->rx_pending) - 1); - - return dev->netdev_ops->ndo_open(dev); -} - -static void mtk_get_ringparam(struct net_device *dev, - struct ethtool_ringparam *ring) -{ - struct mtk_mac *mac = netdev_priv(dev); - - ring->rx_max_pending = mac->hw->soc->dma_ring_size; - ring->tx_max_pending = mac->hw->soc->dma_ring_size; - ring->rx_pending = mac->hw->rx_ring[0].rx_ring_size; - ring->tx_pending = mac->hw->tx_ring.tx_ring_size; -} - -static void mtk_get_strings(struct net_device *dev, u32 stringset, u8 *data) -{ - int i; - - switch (stringset) { - case ETH_SS_STATS: - for (i = 0; i < MTK_HW_STATS_LEN; i++) { - memcpy(data, mtk_ethtool_hw_stats[i].name, - ETH_GSTRING_LEN); - data += ETH_GSTRING_LEN; - } - break; - } -} - -static int mtk_get_sset_count(struct net_device *dev, int sset) -{ - switch (sset) { - case ETH_SS_STATS: - return MTK_HW_STATS_LEN; - default: - return -EOPNOTSUPP; - } -} - -static void mtk_get_ethtool_stats(struct net_device *dev, - struct ethtool_stats *stats, u64 *data) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_hw_stats *hwstats = mac->hw_stats; - unsigned int start; - int i; - - if (netif_running(dev) && netif_device_present(dev)) { - if (spin_trylock(&hwstats->stats_lock)) { - mtk_stats_update_mac(mac); - spin_unlock(&hwstats->stats_lock); - } - } - - do { - start = u64_stats_fetch_begin_irq(&hwstats->syncp); - for (i = 0; i < MTK_HW_STATS_LEN; i++) - data[i] = ((u64 *)hwstats)[mtk_ethtool_hw_stats[i].idx]; - - } while (u64_stats_fetch_retry_irq(&hwstats->syncp, start)); -} - -static struct ethtool_ops mtk_ethtool_ops = { - .get_link_ksettings = mtk_get_link_ksettings, - .set_link_ksettings = mtk_set_link_ksettings, - .get_drvinfo = mtk_get_drvinfo, - .get_msglevel = mtk_get_msglevel, - .set_msglevel = mtk_set_msglevel, - .nway_reset = mtk_nway_reset, - .get_link = mtk_get_link, - .set_ringparam = mtk_set_ringparam, - .get_ringparam = mtk_get_ringparam, -}; - -void mtk_set_ethtool_ops(struct net_device *netdev) -{ - struct mtk_mac *mac = netdev_priv(netdev); - struct mtk_soc_data *soc = mac->hw->soc; - - if (soc->reg_table[MTK_REG_MTK_COUNTER_BASE]) { - mtk_ethtool_ops.get_strings = mtk_get_strings; - mtk_ethtool_ops.get_sset_count = mtk_get_sset_count; - mtk_ethtool_ops.get_ethtool_stats = mtk_get_ethtool_stats; - } - - netdev->ethtool_ops = &mtk_ethtool_ops; -} diff --git a/drivers/staging/mt7621-eth/ethtool.h b/drivers/staging/mt7621-eth/ethtool.h deleted file mode 100644 index 0071469aea6c..000000000000 --- a/drivers/staging/mt7621-eth/ethtool.h +++ /dev/null @@ -1,15 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#ifndef MTK_ETHTOOL_H -#define MTK_ETHTOOL_H - -#include - -void mtk_set_ethtool_ops(struct net_device *netdev); - -#endif /* MTK_ETHTOOL_H */ diff --git a/drivers/staging/mt7621-eth/gsw_mt7620.h b/drivers/staging/mt7621-eth/gsw_mt7620.h deleted file mode 100644 index 70f7e5481952..000000000000 --- a/drivers/staging/mt7621-eth/gsw_mt7620.h +++ /dev/null @@ -1,277 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#ifndef _RALINK_GSW_MT7620_H__ -#define _RALINK_GSW_MT7620_H__ - -#define GSW_REG_PHY_TIMEOUT (5 * HZ) - -#define MT7620_GSW_REG_PIAC 0x0004 - -#define GSW_NUM_VLANS 16 -#define GSW_NUM_VIDS 4096 -#define GSW_NUM_PORTS 7 -#define GSW_PORT6 6 - -#define GSW_MDIO_ACCESS BIT(31) -#define GSW_MDIO_READ BIT(19) -#define GSW_MDIO_WRITE BIT(18) -#define GSW_MDIO_START BIT(16) -#define GSW_MDIO_ADDR_SHIFT 20 -#define GSW_MDIO_REG_SHIFT 25 - -#define GSW_REG_PORT_PMCR(x) (0x3000 + (x * 0x100)) -#define GSW_REG_PORT_STATUS(x) (0x3008 + (x * 0x100)) -#define GSW_REG_SMACCR0 0x3fE4 -#define GSW_REG_SMACCR1 0x3fE8 -#define GSW_REG_CKGCR 0x3ff0 - -#define GSW_REG_IMR 0x7008 -#define GSW_REG_ISR 0x700c -#define GSW_REG_GPC1 0x7014 - -#define SYSC_REG_CHIP_REV_ID 0x0c -#define SYSC_REG_CFG 0x10 -#define SYSC_REG_CFG1 0x14 -#define RST_CTRL_MCM BIT(2) -#define SYSC_PAD_RGMII2_MDIO 0x58 -#define SYSC_GPIO_MODE 0x60 - -#define PORT_IRQ_ST_CHG 0x7f - -#define MT7621_ESW_PHY_POLLING 0x0000 -#define MT7620_ESW_PHY_POLLING 0x7000 - -#define PMCR_IPG BIT(18) -#define PMCR_MAC_MODE BIT(16) -#define PMCR_FORCE BIT(15) -#define PMCR_TX_EN BIT(14) -#define PMCR_RX_EN BIT(13) -#define PMCR_BACKOFF BIT(9) -#define PMCR_BACKPRES BIT(8) -#define PMCR_RX_FC BIT(5) -#define PMCR_TX_FC BIT(4) -#define PMCR_SPEED(_x) (_x << 2) -#define PMCR_DUPLEX BIT(1) -#define PMCR_LINK BIT(0) - -#define PHY_AN_EN BIT(31) -#define PHY_PRE_EN BIT(30) -#define PMY_MDC_CONF(_x) ((_x & 0x3f) << 24) - -/* ethernet subsystem config register */ -#define ETHSYS_SYSCFG0 0x14 -/* ethernet subsystem clock register */ -#define ETHSYS_CLKCFG0 0x2c -#define ETHSYS_TRGMII_CLK_SEL362_5 BIT(11) - -/* p5 RGMII wrapper TX clock control register */ -#define MT7530_P5RGMIITXCR 0x7b04 -/* p5 RGMII wrapper RX clock control register */ -#define MT7530_P5RGMIIRXCR 0x7b00 -/* TRGMII TDX ODT registers */ -#define MT7530_TRGMII_TD0_ODT 0x7a54 -#define MT7530_TRGMII_TD1_ODT 0x7a5c -#define MT7530_TRGMII_TD2_ODT 0x7a64 -#define MT7530_TRGMII_TD3_ODT 0x7a6c -#define MT7530_TRGMII_TD4_ODT 0x7a74 -#define MT7530_TRGMII_TD5_ODT 0x7a7c -/* TRGMII TCK ctrl register */ -#define MT7530_TRGMII_TCK_CTRL 0x7a78 -/* TRGMII Tx ctrl register */ -#define MT7530_TRGMII_TXCTRL 0x7a40 -/* port 6 extended control register */ -#define MT7530_P6ECR 0x7830 -/* IO driver control register */ -#define MT7530_IO_DRV_CR 0x7810 -/* top signal control register */ -#define MT7530_TOP_SIG_CTRL 0x7808 -/* modified hwtrap register */ -#define MT7530_MHWTRAP 0x7804 -/* hwtrap status register */ -#define MT7530_HWTRAP 0x7800 -/* status interrupt register */ -#define MT7530_SYS_INT_STS 0x700c -/* system nterrupt register */ -#define MT7530_SYS_INT_EN 0x7008 -/* system control register */ -#define MT7530_SYS_CTRL 0x7000 -/* port MAC status register */ -#define MT7530_PMSR_P(x) (0x3008 + (x * 0x100)) -/* port MAC control register */ -#define MT7530_PMCR_P(x) (0x3000 + (x * 0x100)) - -#define MT7621_XTAL_SHIFT 6 -#define MT7621_XTAL_MASK 0x7 -#define MT7621_XTAL_25 6 -#define MT7621_XTAL_40 3 -#define MT7621_MDIO_DRV_MASK (3 << 4) -#define MT7621_GE1_MODE_MASK (3 << 12) - -#define TRGMII_TXCTRL_TXC_INV BIT(30) -#define P6ECR_INTF_MODE_RGMII BIT(1) -#define P5RGMIIRXCR_C_ALIGN BIT(8) -#define P5RGMIIRXCR_DELAY_2 BIT(1) -#define P5RGMIITXCR_DELAY_2 (BIT(8) | BIT(2)) - -/* TOP_SIG_CTRL bits */ -#define TOP_SIG_CTRL_NORMAL (BIT(17) | BIT(16)) - -/* MHWTRAP bits */ -#define MHWTRAP_MANUAL BIT(16) -#define MHWTRAP_P5_MAC_SEL BIT(13) -#define MHWTRAP_P6_DIS BIT(8) -#define MHWTRAP_P5_RGMII_MODE BIT(7) -#define MHWTRAP_P5_DIS BIT(6) -#define MHWTRAP_PHY_ACCESS BIT(5) - -/* HWTRAP bits */ -#define HWTRAP_XTAL_SHIFT 9 -#define HWTRAP_XTAL_MASK 0x3 - -/* SYS_CTRL bits */ -#define SYS_CTRL_SW_RST BIT(1) -#define SYS_CTRL_REG_RST BIT(0) - -/* PMCR bits */ -#define PMCR_IFG_XMIT_96 BIT(18) -#define PMCR_MAC_MODE BIT(16) -#define PMCR_FORCE_MODE BIT(15) -#define PMCR_TX_EN BIT(14) -#define PMCR_RX_EN BIT(13) -#define PMCR_BACK_PRES_EN BIT(9) -#define PMCR_BACKOFF_EN BIT(8) -#define PMCR_TX_FC_EN BIT(5) -#define PMCR_RX_FC_EN BIT(4) -#define PMCR_FORCE_SPEED_1000 BIT(3) -#define PMCR_FORCE_FDX BIT(1) -#define PMCR_FORCE_LNK BIT(0) -#define PMCR_FIXED_LINK (PMCR_IFG_XMIT_96 | PMCR_MAC_MODE | \ - PMCR_FORCE_MODE | PMCR_TX_EN | PMCR_RX_EN | \ - PMCR_BACK_PRES_EN | PMCR_BACKOFF_EN | \ - PMCR_FORCE_SPEED_1000 | PMCR_FORCE_FDX | \ - PMCR_FORCE_LNK) - -#define PMCR_FIXED_LINK_FC (PMCR_FIXED_LINK | \ - PMCR_TX_FC_EN | PMCR_RX_FC_EN) - -/* TRGMII control registers */ -#define GSW_INTF_MODE 0x390 -#define GSW_TRGMII_TD0_ODT 0x354 -#define GSW_TRGMII_TD1_ODT 0x35c -#define GSW_TRGMII_TD2_ODT 0x364 -#define GSW_TRGMII_TD3_ODT 0x36c -#define GSW_TRGMII_TXCTL_ODT 0x374 -#define GSW_TRGMII_TCK_ODT 0x37c -#define GSW_TRGMII_RCK_CTRL 0x300 - -#define INTF_MODE_TRGMII BIT(1) -#define TRGMII_RCK_CTRL_RX_RST BIT(31) - -/* Mac control registers */ -#define MTK_MAC_P2_MCR 0x200 -#define MTK_MAC_P1_MCR 0x100 - -#define MAC_MCR_MAX_RX_2K BIT(29) -#define MAC_MCR_IPG_CFG (BIT(18) | BIT(16)) -#define MAC_MCR_FORCE_MODE BIT(15) -#define MAC_MCR_TX_EN BIT(14) -#define MAC_MCR_RX_EN BIT(13) -#define MAC_MCR_BACKOFF_EN BIT(9) -#define MAC_MCR_BACKPR_EN BIT(8) -#define MAC_MCR_FORCE_RX_FC BIT(5) -#define MAC_MCR_FORCE_TX_FC BIT(4) -#define MAC_MCR_SPEED_1000 BIT(3) -#define MAC_MCR_FORCE_DPX BIT(1) -#define MAC_MCR_FORCE_LINK BIT(0) -#define MAC_MCR_FIXED_LINK (MAC_MCR_MAX_RX_2K | MAC_MCR_IPG_CFG | \ - MAC_MCR_FORCE_MODE | MAC_MCR_TX_EN | \ - MAC_MCR_RX_EN | MAC_MCR_BACKOFF_EN | \ - MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_RX_FC | \ - MAC_MCR_FORCE_TX_FC | MAC_MCR_SPEED_1000 | \ - MAC_MCR_FORCE_DPX | MAC_MCR_FORCE_LINK) -#define MAC_MCR_FIXED_LINK_FC (MAC_MCR_MAX_RX_2K | MAC_MCR_IPG_CFG | \ - MAC_MCR_FIXED_LINK) - -/* possible XTAL speed */ -#define MT7623_XTAL_40 0 -#define MT7623_XTAL_20 1 -#define MT7623_XTAL_25 3 - -/* GPIO port control registers */ -#define GPIO_OD33_CTRL8 0x4c0 -#define GPIO_BIAS_CTRL 0xed0 -#define GPIO_DRV_SEL10 0xf00 - -/* on MT7620 the functio of port 4 can be software configured */ -enum { - PORT4_EPHY = 0, - PORT4_EXT, -}; - -/* struct mt7620_gsw - the structure that holds the SoC specific data - * @dev: The Device struct - * @base: The base address - * @piac_offset: The PIAC base may change depending on SoC - * @irq: The IRQ we are using - * @port4: The port4 mode on MT7620 - * @autopoll: Is MDIO autopolling enabled - * @ethsys: The ethsys register map - * @pctl: The pin control register map - * @clk_gsw: The switch clock - * @clk_gp1: The gmac1 clock - * @clk_gp2: The gmac2 clock - * @clk_trgpll: The trgmii pll clock - */ -struct mt7620_gsw { - struct device *dev; - void __iomem *base; - u32 piac_offset; - int irq; - int port4; - unsigned long int autopoll; - - struct regmap *ethsys; - struct regmap *pctl; - - struct clk *clk_gsw; - struct clk *clk_gp1; - struct clk *clk_gp2; - struct clk *clk_trgpll; -}; - -/* switch register I/O wrappers */ -void mtk_switch_w32(struct mt7620_gsw *gsw, u32 val, unsigned int reg); -u32 mtk_switch_r32(struct mt7620_gsw *gsw, unsigned int reg); - -/* the callback used by the driver core to bringup the switch */ -int mtk_gsw_init(struct mtk_eth *eth); - -/* MDIO access wrappers */ -int mt7620_mdio_write(struct mii_bus *bus, int phy_addr, int phy_reg, u16 val); -int mt7620_mdio_read(struct mii_bus *bus, int phy_addr, int phy_reg); -void mt7620_mdio_link_adjust(struct mtk_eth *eth, int port); -int mt7620_has_carrier(struct mtk_eth *eth); -void mt7620_print_link_state(struct mtk_eth *eth, int port, int link, - int speed, int duplex); -void mt7530_mdio_w32(struct mt7620_gsw *gsw, u32 reg, u32 val); -u32 mt7530_mdio_r32(struct mt7620_gsw *gsw, u32 reg); -void mt7530_mdio_m32(struct mt7620_gsw *gsw, u32 mask, u32 set, u32 reg); - -u32 _mt7620_mii_write(struct mt7620_gsw *gsw, u32 phy_addr, - u32 phy_register, u32 write_data); -u32 _mt7620_mii_read(struct mt7620_gsw *gsw, int phy_addr, int phy_reg); -void mt7620_handle_carrier(struct mtk_eth *eth); - -#endif diff --git a/drivers/staging/mt7621-eth/gsw_mt7621.c b/drivers/staging/mt7621-eth/gsw_mt7621.c deleted file mode 100644 index 53767b17bad9..000000000000 --- a/drivers/staging/mt7621-eth/gsw_mt7621.c +++ /dev/null @@ -1,297 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include -#include -#include -#include -#include -#include - -#include - -#include "mtk_eth_soc.h" -#include "gsw_mt7620.h" - -void mtk_switch_w32(struct mt7620_gsw *gsw, u32 val, unsigned int reg) -{ - iowrite32(val, gsw->base + reg); -} -EXPORT_SYMBOL_GPL(mtk_switch_w32); - -u32 mtk_switch_r32(struct mt7620_gsw *gsw, unsigned int reg) -{ - return ioread32(gsw->base + reg); -} -EXPORT_SYMBOL_GPL(mtk_switch_r32); - -static irqreturn_t gsw_interrupt_mt7621(int irq, void *_eth) -{ - struct mtk_eth *eth = (struct mtk_eth *)_eth; - struct mt7620_gsw *gsw = (struct mt7620_gsw *)eth->sw_priv; - u32 reg, i; - - reg = mt7530_mdio_r32(gsw, MT7530_SYS_INT_STS); - - for (i = 0; i < 5; i++) { - unsigned int link; - - if ((reg & BIT(i)) == 0) - continue; - - link = mt7530_mdio_r32(gsw, MT7530_PMSR_P(i)) & 0x1; - - if (link == eth->link[i]) - continue; - - eth->link[i] = link; - if (link) - netdev_info(*eth->netdev, - "port %d link up\n", i); - else - netdev_info(*eth->netdev, - "port %d link down\n", i); - } - - mt7530_mdio_w32(gsw, MT7530_SYS_INT_STS, 0x1f); - - return IRQ_HANDLED; -} - -static void mt7621_hw_init(struct mtk_eth *eth, struct mt7620_gsw *gsw, - struct device_node *np) -{ - u32 i; - u32 val; - - /* hardware reset the switch */ - mtk_reset(eth, RST_CTRL_MCM); - mdelay(10); - - /* reduce RGMII2 PAD driving strength */ - rt_sysc_m32(MT7621_MDIO_DRV_MASK, 0, SYSC_PAD_RGMII2_MDIO); - - /* gpio mux - RGMII1=Normal mode */ - rt_sysc_m32(BIT(14), 0, SYSC_GPIO_MODE); - - /* set GMAC1 RGMII mode */ - rt_sysc_m32(MT7621_GE1_MODE_MASK, 0, SYSC_REG_CFG1); - - /* enable MDIO to control MT7530 */ - rt_sysc_m32(3 << 12, 0, SYSC_GPIO_MODE); - - /* turn off all PHYs */ - for (i = 0; i <= 4; i++) { - val = _mt7620_mii_read(gsw, i, 0x0); - val |= BIT(11); - _mt7620_mii_write(gsw, i, 0x0, val); - } - - /* reset the switch */ - mt7530_mdio_w32(gsw, MT7530_SYS_CTRL, - SYS_CTRL_SW_RST | SYS_CTRL_REG_RST); - usleep_range(10, 20); - - if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) { - /* GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536 */ - mtk_switch_w32(gsw, MAC_MCR_FIXED_LINK, MTK_MAC_P2_MCR); - mt7530_mdio_w32(gsw, MT7530_PMCR_P(6), PMCR_FIXED_LINK); - } else { - /* GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536 */ - mtk_switch_w32(gsw, MAC_MCR_FIXED_LINK_FC, MTK_MAC_P1_MCR); - mt7530_mdio_w32(gsw, MT7530_PMCR_P(6), PMCR_FIXED_LINK_FC); - } - - /* GE2, Link down */ - mtk_switch_w32(gsw, MAC_MCR_FORCE_MODE, MTK_MAC_P2_MCR); - - /* Enable Port 6, P5 as GMAC5, P5 disable */ - val = mt7530_mdio_r32(gsw, MT7530_MHWTRAP); - /* Enable Port 6 */ - val &= ~MHWTRAP_P6_DIS; - /* Disable Port 5 */ - val |= MHWTRAP_P5_DIS; - /* manual override of HW-Trap */ - val |= MHWTRAP_MANUAL; - mt7530_mdio_w32(gsw, MT7530_MHWTRAP, val); - - val = rt_sysc_r32(SYSC_REG_CFG); - val = (val >> MT7621_XTAL_SHIFT) & MT7621_XTAL_MASK; - if (val < MT7621_XTAL_25 && val >= MT7621_XTAL_40) { - /* 40Mhz */ - - /* disable MT7530 core clock */ - _mt7620_mii_write(gsw, 0, 13, 0x1f); - _mt7620_mii_write(gsw, 0, 14, 0x410); - _mt7620_mii_write(gsw, 0, 13, 0x401f); - _mt7620_mii_write(gsw, 0, 14, 0x0); - - /* disable MT7530 PLL */ - _mt7620_mii_write(gsw, 0, 13, 0x1f); - _mt7620_mii_write(gsw, 0, 14, 0x40d); - _mt7620_mii_write(gsw, 0, 13, 0x401f); - _mt7620_mii_write(gsw, 0, 14, 0x2020); - - /* for MT7530 core clock = 500Mhz */ - _mt7620_mii_write(gsw, 0, 13, 0x1f); - _mt7620_mii_write(gsw, 0, 14, 0x40e); - _mt7620_mii_write(gsw, 0, 13, 0x401f); - _mt7620_mii_write(gsw, 0, 14, 0x119); - - /* enable MT7530 PLL */ - _mt7620_mii_write(gsw, 0, 13, 0x1f); - _mt7620_mii_write(gsw, 0, 14, 0x40d); - _mt7620_mii_write(gsw, 0, 13, 0x401f); - _mt7620_mii_write(gsw, 0, 14, 0x2820); - - usleep_range(20, 40); - - /* enable MT7530 core clock */ - _mt7620_mii_write(gsw, 0, 13, 0x1f); - _mt7620_mii_write(gsw, 0, 14, 0x410); - _mt7620_mii_write(gsw, 0, 13, 0x401f); - } - - /* RGMII */ - _mt7620_mii_write(gsw, 0, 14, 0x1); - - /* set MT7530 central align */ - mt7530_mdio_m32(gsw, BIT(0), P6ECR_INTF_MODE_RGMII, MT7530_P6ECR); - mt7530_mdio_m32(gsw, TRGMII_TXCTRL_TXC_INV, 0, - MT7530_TRGMII_TXCTRL); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TCK_CTRL, 0x855); - - /* delay setting for 10/1000M */ - mt7530_mdio_w32(gsw, MT7530_P5RGMIIRXCR, - P5RGMIIRXCR_C_ALIGN | P5RGMIIRXCR_DELAY_2); - mt7530_mdio_w32(gsw, MT7530_P5RGMIITXCR, 0x14); - - /* lower Tx Driving*/ - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD0_ODT, 0x44); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD1_ODT, 0x44); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD2_ODT, 0x44); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD3_ODT, 0x44); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD4_ODT, 0x44); - mt7530_mdio_w32(gsw, MT7530_TRGMII_TD5_ODT, 0x44); - - /* turn on all PHYs */ - for (i = 0; i <= 4; i++) { - val = _mt7620_mii_read(gsw, i, 0); - val &= ~BIT(11); - _mt7620_mii_write(gsw, i, 0, val); - } - -#define MT7530_NUM_PORTS 8 -#define REG_ESW_PORT_PCR(x) (0x2004 | ((x) << 8)) -#define REG_ESW_PORT_PVC(x) (0x2010 | ((x) << 8)) -#define REG_ESW_PORT_PPBV1(x) (0x2014 | ((x) << 8)) -#define MT7530_CPU_PORT 6 - - /* This is copied from mt7530_apply_config in libreCMC driver */ - { - int i; - - for (i = 0; i < MT7530_NUM_PORTS; i++) - mt7530_mdio_w32(gsw, REG_ESW_PORT_PCR(i), 0x00400000); - - mt7530_mdio_w32(gsw, REG_ESW_PORT_PCR(MT7530_CPU_PORT), - 0x00ff0000); - - for (i = 0; i < MT7530_NUM_PORTS; i++) - mt7530_mdio_w32(gsw, REG_ESW_PORT_PVC(i), 0x810000c0); - } - - /* enable irq */ - mt7530_mdio_m32(gsw, 0, 3 << 16, MT7530_TOP_SIG_CTRL); - mt7530_mdio_w32(gsw, MT7530_SYS_INT_EN, 0x1f); -} - -static const struct of_device_id mediatek_gsw_match[] = { - { .compatible = "mediatek,mt7621-gsw" }, - {}, -}; -MODULE_DEVICE_TABLE(of, mediatek_gsw_match); - -int mtk_gsw_init(struct mtk_eth *eth) -{ - struct device_node *np = eth->switch_np; - struct platform_device *pdev = of_find_device_by_node(np); - struct mt7620_gsw *gsw; - - if (!pdev) - return -ENODEV; - - if (!of_device_is_compatible(np, mediatek_gsw_match->compatible)) - return -EINVAL; - - gsw = platform_get_drvdata(pdev); - eth->sw_priv = gsw; - - if (!gsw->irq) - return -EINVAL; - - request_irq(gsw->irq, gsw_interrupt_mt7621, 0, - "gsw", eth); - disable_irq(gsw->irq); - - mt7621_hw_init(eth, gsw, np); - - enable_irq(gsw->irq); - - return 0; -} -EXPORT_SYMBOL_GPL(mtk_gsw_init); - -static int mt7621_gsw_probe(struct platform_device *pdev) -{ - struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - struct mt7620_gsw *gsw; - - gsw = devm_kzalloc(&pdev->dev, sizeof(struct mt7620_gsw), GFP_KERNEL); - if (!gsw) - return -ENOMEM; - - gsw->base = devm_ioremap_resource(&pdev->dev, res); - if (IS_ERR(gsw->base)) - return PTR_ERR(gsw->base); - - gsw->dev = &pdev->dev; - gsw->irq = irq_of_parse_and_map(pdev->dev.of_node, 0); - - platform_set_drvdata(pdev, gsw); - - return 0; -} - -static int mt7621_gsw_remove(struct platform_device *pdev) -{ - platform_set_drvdata(pdev, NULL); - - return 0; -} - -static struct platform_driver gsw_driver = { - .probe = mt7621_gsw_probe, - .remove = mt7621_gsw_remove, - .driver = { - .name = "mt7621-gsw", - .of_match_table = mediatek_gsw_match, - }, -}; - -module_platform_driver(gsw_driver); - -MODULE_LICENSE("GPL"); -MODULE_AUTHOR("John Crispin "); -MODULE_DESCRIPTION("GBit switch driver for Mediatek MT7621 SoC"); diff --git a/drivers/staging/mt7621-eth/mdio.c b/drivers/staging/mt7621-eth/mdio.c deleted file mode 100644 index 5fea6a447eed..000000000000 --- a/drivers/staging/mt7621-eth/mdio.c +++ /dev/null @@ -1,275 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include -#include -#include -#include -#include - -#include "mtk_eth_soc.h" -#include "mdio.h" - -static int mtk_mdio_reset(struct mii_bus *bus) -{ - /* TODO */ - return 0; -} - -static void mtk_phy_link_adjust(struct net_device *dev) -{ - struct mtk_eth *eth = netdev_priv(dev); - unsigned long flags; - int i; - - spin_lock_irqsave(ð->phy->lock, flags); - for (i = 0; i < 8; i++) { - if (eth->phy->phy_node[i]) { - struct phy_device *phydev = eth->phy->phy[i]; - int status_change = 0; - - if (phydev->link) - if (eth->phy->duplex[i] != phydev->duplex || - eth->phy->speed[i] != phydev->speed) - status_change = 1; - - if (phydev->link != eth->link[i]) - status_change = 1; - - switch (phydev->speed) { - case SPEED_1000: - case SPEED_100: - case SPEED_10: - eth->link[i] = phydev->link; - eth->phy->duplex[i] = phydev->duplex; - eth->phy->speed[i] = phydev->speed; - - if (status_change && - eth->soc->mdio_adjust_link) - eth->soc->mdio_adjust_link(eth, i); - break; - } - } - } - spin_unlock_irqrestore(ð->phy->lock, flags); -} - -int mtk_connect_phy_node(struct mtk_eth *eth, struct mtk_mac *mac, - struct device_node *phy_node) -{ - const __be32 *_port = NULL; - struct phy_device *phydev; - int phy_mode, port; - - _port = of_get_property(phy_node, "reg", NULL); - - if (!_port || (be32_to_cpu(*_port) >= 0x20)) { - pr_err("%pOFn: invalid port id\n", phy_node); - return -EINVAL; - } - port = be32_to_cpu(*_port); - phy_mode = of_get_phy_mode(phy_node); - if (phy_mode < 0) { - dev_err(eth->dev, "incorrect phy-mode %d\n", phy_mode); - eth->phy->phy_node[port] = NULL; - return -EINVAL; - } - - phydev = of_phy_connect(eth->netdev[mac->id], phy_node, - mtk_phy_link_adjust, 0, phy_mode); - if (!phydev) { - dev_err(eth->dev, "could not connect to PHY\n"); - eth->phy->phy_node[port] = NULL; - return -ENODEV; - } - - phydev->supported &= PHY_1000BT_FEATURES; - phydev->advertising = phydev->supported; - - dev_info(eth->dev, - "connected port %d to PHY at %s [uid=%08x, driver=%s]\n", - port, phydev_name(phydev), phydev->phy_id, - phydev->drv->name); - - eth->phy->phy[port] = phydev; - eth->link[port] = 0; - - return 0; -} - -static void phy_init(struct mtk_eth *eth, struct mtk_mac *mac, - struct phy_device *phy) -{ - phy_attach(eth->netdev[mac->id], phydev_name(phy), - PHY_INTERFACE_MODE_MII); - - phy->autoneg = AUTONEG_ENABLE; - phy->speed = 0; - phy->duplex = 0; - phy_set_max_speed(phy, SPEED_100); - phy->advertising = phy->supported | ADVERTISED_Autoneg; - - phy_start_aneg(phy); -} - -static int mtk_phy_connect(struct mtk_mac *mac) -{ - struct mtk_eth *eth = mac->hw; - int i; - - for (i = 0; i < 8; i++) { - if (eth->phy->phy_node[i]) { - if (!mac->phy_dev) { - mac->phy_dev = eth->phy->phy[i]; - mac->phy_flags = MTK_PHY_FLAG_PORT; - } - } else if (eth->mii_bus) { - struct phy_device *phy; - - phy = mdiobus_get_phy(eth->mii_bus, i); - if (phy) { - phy_init(eth, mac, phy); - if (!mac->phy_dev) { - mac->phy_dev = phy; - mac->phy_flags = MTK_PHY_FLAG_ATTACH; - } - } - } - } - - return 0; -} - -static void mtk_phy_disconnect(struct mtk_mac *mac) -{ - struct mtk_eth *eth = mac->hw; - unsigned long flags; - int i; - - for (i = 0; i < 8; i++) - if (eth->phy->phy_fixed[i]) { - spin_lock_irqsave(ð->phy->lock, flags); - eth->link[i] = 0; - if (eth->soc->mdio_adjust_link) - eth->soc->mdio_adjust_link(eth, i); - spin_unlock_irqrestore(ð->phy->lock, flags); - } else if (eth->phy->phy[i]) { - phy_disconnect(eth->phy->phy[i]); - } else if (eth->mii_bus) { - struct phy_device *phy = - mdiobus_get_phy(eth->mii_bus, i); - - if (phy) - phy_detach(phy); - } -} - -static void mtk_phy_start(struct mtk_mac *mac) -{ - struct mtk_eth *eth = mac->hw; - unsigned long flags; - int i; - - for (i = 0; i < 8; i++) { - if (eth->phy->phy_fixed[i]) { - spin_lock_irqsave(ð->phy->lock, flags); - eth->link[i] = 1; - if (eth->soc->mdio_adjust_link) - eth->soc->mdio_adjust_link(eth, i); - spin_unlock_irqrestore(ð->phy->lock, flags); - } else if (eth->phy->phy[i]) { - phy_start(eth->phy->phy[i]); - } - } -} - -static void mtk_phy_stop(struct mtk_mac *mac) -{ - struct mtk_eth *eth = mac->hw; - unsigned long flags; - int i; - - for (i = 0; i < 8; i++) - if (eth->phy->phy_fixed[i]) { - spin_lock_irqsave(ð->phy->lock, flags); - eth->link[i] = 0; - if (eth->soc->mdio_adjust_link) - eth->soc->mdio_adjust_link(eth, i); - spin_unlock_irqrestore(ð->phy->lock, flags); - } else if (eth->phy->phy[i]) { - phy_stop(eth->phy->phy[i]); - } -} - -static struct mtk_phy phy_ralink = { - .connect = mtk_phy_connect, - .disconnect = mtk_phy_disconnect, - .start = mtk_phy_start, - .stop = mtk_phy_stop, -}; - -int mtk_mdio_init(struct mtk_eth *eth) -{ - struct device_node *mii_np; - int err; - - if (!eth->soc->mdio_read || !eth->soc->mdio_write) - return 0; - - spin_lock_init(&phy_ralink.lock); - eth->phy = &phy_ralink; - - mii_np = of_get_child_by_name(eth->dev->of_node, "mdio-bus"); - if (!mii_np) { - dev_err(eth->dev, "no %s child node found", "mdio-bus"); - return -ENODEV; - } - - if (!of_device_is_available(mii_np)) { - err = 0; - goto err_put_node; - } - - eth->mii_bus = mdiobus_alloc(); - if (!eth->mii_bus) { - err = -ENOMEM; - goto err_put_node; - } - - eth->mii_bus->name = "mdio"; - eth->mii_bus->read = eth->soc->mdio_read; - eth->mii_bus->write = eth->soc->mdio_write; - eth->mii_bus->reset = mtk_mdio_reset; - eth->mii_bus->priv = eth; - eth->mii_bus->parent = eth->dev; - - snprintf(eth->mii_bus->id, MII_BUS_ID_SIZE, "%pOFn", mii_np); - err = of_mdiobus_register(eth->mii_bus, mii_np); - if (err) - goto err_free_bus; - - return 0; - -err_free_bus: - kfree(eth->mii_bus); -err_put_node: - of_node_put(mii_np); - eth->mii_bus = NULL; - return err; -} - -void mtk_mdio_cleanup(struct mtk_eth *eth) -{ - if (!eth->mii_bus) - return; - - mdiobus_unregister(eth->mii_bus); - of_node_put(eth->mii_bus->dev.of_node); - kfree(eth->mii_bus); -} diff --git a/drivers/staging/mt7621-eth/mdio.h b/drivers/staging/mt7621-eth/mdio.h deleted file mode 100644 index b14e23842a01..000000000000 --- a/drivers/staging/mt7621-eth/mdio.h +++ /dev/null @@ -1,27 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#ifndef _RALINK_MDIO_H__ -#define _RALINK_MDIO_H__ - -#ifdef CONFIG_NET_MEDIATEK_MDIO -int mtk_mdio_init(struct mtk_eth *eth); -void mtk_mdio_cleanup(struct mtk_eth *eth); -int mtk_connect_phy_node(struct mtk_eth *eth, struct mtk_mac *mac, - struct device_node *phy_node); -#else -static inline int mtk_mdio_init(struct mtk_eth *eth) { return 0; } -static inline void mtk_mdio_cleanup(struct mtk_eth *eth) {} -#endif -#endif diff --git a/drivers/staging/mt7621-eth/mdio_mt7620.c b/drivers/staging/mt7621-eth/mdio_mt7620.c deleted file mode 100644 index ced605c2914e..000000000000 --- a/drivers/staging/mt7621-eth/mdio_mt7620.c +++ /dev/null @@ -1,173 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include -#include -#include - -#include "mtk_eth_soc.h" -#include "gsw_mt7620.h" -#include "mdio.h" - -static int mt7620_mii_busy_wait(struct mt7620_gsw *gsw) -{ - unsigned long t_start = jiffies; - - while (1) { - if (!(mtk_switch_r32(gsw, - gsw->piac_offset + MT7620_GSW_REG_PIAC) & - GSW_MDIO_ACCESS)) - return 0; - if (time_after(jiffies, t_start + GSW_REG_PHY_TIMEOUT)) - break; - } - - dev_err(gsw->dev, "mdio: MDIO timeout\n"); - return -1; -} - -u32 _mt7620_mii_write(struct mt7620_gsw *gsw, u32 phy_addr, - u32 phy_register, u32 write_data) -{ - if (mt7620_mii_busy_wait(gsw)) - return -1; - - write_data &= 0xffff; - - mtk_switch_w32(gsw, GSW_MDIO_ACCESS | GSW_MDIO_START | GSW_MDIO_WRITE | - (phy_register << GSW_MDIO_REG_SHIFT) | - (phy_addr << GSW_MDIO_ADDR_SHIFT) | write_data, - MT7620_GSW_REG_PIAC); - - if (mt7620_mii_busy_wait(gsw)) - return -1; - - return 0; -} -EXPORT_SYMBOL_GPL(_mt7620_mii_write); - -u32 _mt7620_mii_read(struct mt7620_gsw *gsw, int phy_addr, int phy_reg) -{ - u32 d; - - if (mt7620_mii_busy_wait(gsw)) - return 0xffff; - - mtk_switch_w32(gsw, GSW_MDIO_ACCESS | GSW_MDIO_START | GSW_MDIO_READ | - (phy_reg << GSW_MDIO_REG_SHIFT) | - (phy_addr << GSW_MDIO_ADDR_SHIFT), - MT7620_GSW_REG_PIAC); - - if (mt7620_mii_busy_wait(gsw)) - return 0xffff; - - d = mtk_switch_r32(gsw, MT7620_GSW_REG_PIAC) & 0xffff; - - return d; -} -EXPORT_SYMBOL_GPL(_mt7620_mii_read); - -int mt7620_mdio_write(struct mii_bus *bus, int phy_addr, int phy_reg, u16 val) -{ - struct mtk_eth *eth = bus->priv; - struct mt7620_gsw *gsw = (struct mt7620_gsw *)eth->sw_priv; - - return _mt7620_mii_write(gsw, phy_addr, phy_reg, val); -} - -int mt7620_mdio_read(struct mii_bus *bus, int phy_addr, int phy_reg) -{ - struct mtk_eth *eth = bus->priv; - struct mt7620_gsw *gsw = (struct mt7620_gsw *)eth->sw_priv; - - return _mt7620_mii_read(gsw, phy_addr, phy_reg); -} - -void mt7530_mdio_w32(struct mt7620_gsw *gsw, u32 reg, u32 val) -{ - _mt7620_mii_write(gsw, 0x1f, 0x1f, (reg >> 6) & 0x3ff); - _mt7620_mii_write(gsw, 0x1f, (reg >> 2) & 0xf, val & 0xffff); - _mt7620_mii_write(gsw, 0x1f, 0x10, val >> 16); -} -EXPORT_SYMBOL_GPL(mt7530_mdio_w32); - -u32 mt7530_mdio_r32(struct mt7620_gsw *gsw, u32 reg) -{ - u16 high, low; - - _mt7620_mii_write(gsw, 0x1f, 0x1f, (reg >> 6) & 0x3ff); - low = _mt7620_mii_read(gsw, 0x1f, (reg >> 2) & 0xf); - high = _mt7620_mii_read(gsw, 0x1f, 0x10); - - return (high << 16) | (low & 0xffff); -} -EXPORT_SYMBOL_GPL(mt7530_mdio_r32); - -void mt7530_mdio_m32(struct mt7620_gsw *gsw, u32 mask, u32 set, u32 reg) -{ - u32 val = mt7530_mdio_r32(gsw, reg); - - val &= ~mask; - val |= set; - mt7530_mdio_w32(gsw, reg, val); -} -EXPORT_SYMBOL_GPL(mt7530_mdio_m32); - -static unsigned char *mtk_speed_str(int speed) -{ - switch (speed) { - case 2: - case SPEED_1000: - return "1000"; - case 1: - case SPEED_100: - return "100"; - case 0: - case SPEED_10: - return "10"; - } - - return "? "; -} - -int mt7620_has_carrier(struct mtk_eth *eth) -{ - struct mt7620_gsw *gsw = (struct mt7620_gsw *)eth->sw_priv; - int i; - - for (i = 0; i < GSW_PORT6; i++) - if (mt7530_mdio_r32(gsw, GSW_REG_PORT_STATUS(i)) & 0x1) - return 1; - return 0; -} - -void mt7620_print_link_state(struct mtk_eth *eth, int port, int link, - int speed, int duplex) -{ - struct mt7620_gsw *gsw = eth->sw_priv; - - if (link) - dev_info(gsw->dev, "port %d link up (%sMbps/%s duplex)\n", - port, mtk_speed_str(speed), - (duplex) ? "Full" : "Half"); - else - dev_info(gsw->dev, "port %d link down\n", port); -} - -void mt7620_mdio_link_adjust(struct mtk_eth *eth, int port) -{ - mt7620_print_link_state(eth, port, eth->link[port], - eth->phy->speed[port], - (eth->phy->duplex[port] == DUPLEX_FULL)); -} diff --git a/drivers/staging/mt7621-eth/mtk_eth_soc.c b/drivers/staging/mt7621-eth/mtk_eth_soc.c deleted file mode 100644 index 6027b19f7bc2..000000000000 --- a/drivers/staging/mt7621-eth/mtk_eth_soc.c +++ /dev/null @@ -1,2176 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "mtk_eth_soc.h" -#include "mdio.h" -#include "ethtool.h" - -#define MAX_RX_LENGTH 1536 -#define MTK_RX_ETH_HLEN (VLAN_ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN) -#define MTK_RX_HLEN (NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN) -#define DMA_DUMMY_DESC 0xffffffff -#define MTK_DEFAULT_MSG_ENABLE \ - (NETIF_MSG_DRV | \ - NETIF_MSG_PROBE | \ - NETIF_MSG_LINK | \ - NETIF_MSG_TIMER | \ - NETIF_MSG_IFDOWN | \ - NETIF_MSG_IFUP | \ - NETIF_MSG_RX_ERR | \ - NETIF_MSG_TX_ERR) - -#define TX_DMA_DESP2_DEF (TX_DMA_LS0 | TX_DMA_DONE) -#define NEXT_TX_DESP_IDX(X) (((X) + 1) & (ring->tx_ring_size - 1)) -#define NEXT_RX_DESP_IDX(X) (((X) + 1) & (ring->rx_ring_size - 1)) - -#define SYSC_REG_RSTCTRL 0x34 - -static int mtk_msg_level = -1; -module_param_named(msg_level, mtk_msg_level, int, 0); -MODULE_PARM_DESC(msg_level, "Message level (-1=defaults,0=none,...,16=all)"); - -static const u16 mtk_reg_table_default[MTK_REG_COUNT] = { - [MTK_REG_PDMA_GLO_CFG] = MTK_PDMA_GLO_CFG, - [MTK_REG_PDMA_RST_CFG] = MTK_PDMA_RST_CFG, - [MTK_REG_DLY_INT_CFG] = MTK_DLY_INT_CFG, - [MTK_REG_TX_BASE_PTR0] = MTK_TX_BASE_PTR0, - [MTK_REG_TX_MAX_CNT0] = MTK_TX_MAX_CNT0, - [MTK_REG_TX_CTX_IDX0] = MTK_TX_CTX_IDX0, - [MTK_REG_TX_DTX_IDX0] = MTK_TX_DTX_IDX0, - [MTK_REG_RX_BASE_PTR0] = MTK_RX_BASE_PTR0, - [MTK_REG_RX_MAX_CNT0] = MTK_RX_MAX_CNT0, - [MTK_REG_RX_CALC_IDX0] = MTK_RX_CALC_IDX0, - [MTK_REG_RX_DRX_IDX0] = MTK_RX_DRX_IDX0, - [MTK_REG_MTK_INT_ENABLE] = MTK_INT_ENABLE, - [MTK_REG_MTK_INT_STATUS] = MTK_INT_STATUS, - [MTK_REG_MTK_DMA_VID_BASE] = MTK_DMA_VID0, - [MTK_REG_MTK_COUNTER_BASE] = MTK_GDMA1_TX_GBCNT, - [MTK_REG_MTK_RST_GL] = MTK_RST_GL, -}; - -static const u16 *mtk_reg_table = mtk_reg_table_default; - -void mtk_w32(struct mtk_eth *eth, u32 val, unsigned int reg) -{ - __raw_writel(val, eth->base + reg); -} - -u32 mtk_r32(struct mtk_eth *eth, unsigned int reg) -{ - return __raw_readl(eth->base + reg); -} - -static void mtk_reg_w32(struct mtk_eth *eth, u32 val, enum mtk_reg reg) -{ - mtk_w32(eth, val, mtk_reg_table[reg]); -} - -static u32 mtk_reg_r32(struct mtk_eth *eth, enum mtk_reg reg) -{ - return mtk_r32(eth, mtk_reg_table[reg]); -} - -/* these bits are also exposed via the reset-controller API. however the switch - * and FE need to be brought out of reset in the exakt same moemtn and the - * reset-controller api does not provide this feature yet. Do the reset manually - * until we fixed the reset-controller api to be able to do this - */ -void mtk_reset(struct mtk_eth *eth, u32 reset_bits) -{ - u32 val; - - regmap_read(eth->ethsys, SYSC_REG_RSTCTRL, &val); - val |= reset_bits; - regmap_write(eth->ethsys, SYSC_REG_RSTCTRL, val); - usleep_range(10, 20); - val &= ~reset_bits; - regmap_write(eth->ethsys, SYSC_REG_RSTCTRL, val); - usleep_range(10, 20); -} -EXPORT_SYMBOL(mtk_reset); - -static inline void mtk_irq_ack(struct mtk_eth *eth, u32 mask) -{ - if (eth->soc->dma_type & MTK_PDMA) - mtk_reg_w32(eth, mask, MTK_REG_MTK_INT_STATUS); - if (eth->soc->dma_type & MTK_QDMA) - mtk_w32(eth, mask, MTK_QMTK_INT_STATUS); -} - -static inline u32 mtk_irq_pending(struct mtk_eth *eth) -{ - u32 status = 0; - - if (eth->soc->dma_type & MTK_PDMA) - status |= mtk_reg_r32(eth, MTK_REG_MTK_INT_STATUS); - if (eth->soc->dma_type & MTK_QDMA) - status |= mtk_r32(eth, MTK_QMTK_INT_STATUS); - - return status; -} - -static void mtk_irq_ack_status(struct mtk_eth *eth, u32 mask) -{ - u32 status_reg = MTK_REG_MTK_INT_STATUS; - - if (mtk_reg_table[MTK_REG_MTK_INT_STATUS2]) - status_reg = MTK_REG_MTK_INT_STATUS2; - - mtk_reg_w32(eth, mask, status_reg); -} - -static u32 mtk_irq_pending_status(struct mtk_eth *eth) -{ - u32 status_reg = MTK_REG_MTK_INT_STATUS; - - if (mtk_reg_table[MTK_REG_MTK_INT_STATUS2]) - status_reg = MTK_REG_MTK_INT_STATUS2; - - return mtk_reg_r32(eth, status_reg); -} - -static inline void mtk_irq_disable(struct mtk_eth *eth, u32 mask) -{ - u32 val; - - if (eth->soc->dma_type & MTK_PDMA) { - val = mtk_reg_r32(eth, MTK_REG_MTK_INT_ENABLE); - mtk_reg_w32(eth, val & ~mask, MTK_REG_MTK_INT_ENABLE); - /* flush write */ - mtk_reg_r32(eth, MTK_REG_MTK_INT_ENABLE); - } - if (eth->soc->dma_type & MTK_QDMA) { - val = mtk_r32(eth, MTK_QMTK_INT_ENABLE); - mtk_w32(eth, val & ~mask, MTK_QMTK_INT_ENABLE); - /* flush write */ - mtk_r32(eth, MTK_QMTK_INT_ENABLE); - } -} - -static inline void mtk_irq_enable(struct mtk_eth *eth, u32 mask) -{ - u32 val; - - if (eth->soc->dma_type & MTK_PDMA) { - val = mtk_reg_r32(eth, MTK_REG_MTK_INT_ENABLE); - mtk_reg_w32(eth, val | mask, MTK_REG_MTK_INT_ENABLE); - /* flush write */ - mtk_reg_r32(eth, MTK_REG_MTK_INT_ENABLE); - } - if (eth->soc->dma_type & MTK_QDMA) { - val = mtk_r32(eth, MTK_QMTK_INT_ENABLE); - mtk_w32(eth, val | mask, MTK_QMTK_INT_ENABLE); - /* flush write */ - mtk_r32(eth, MTK_QMTK_INT_ENABLE); - } -} - -static inline u32 mtk_irq_enabled(struct mtk_eth *eth) -{ - u32 enabled = 0; - - if (eth->soc->dma_type & MTK_PDMA) - enabled |= mtk_reg_r32(eth, MTK_REG_MTK_INT_ENABLE); - if (eth->soc->dma_type & MTK_QDMA) - enabled |= mtk_r32(eth, MTK_QMTK_INT_ENABLE); - - return enabled; -} - -static inline void mtk_hw_set_macaddr(struct mtk_mac *mac, - unsigned char *macaddr) -{ - unsigned long flags; - - spin_lock_irqsave(&mac->hw->page_lock, flags); - mtk_w32(mac->hw, (macaddr[0] << 8) | macaddr[1], MTK_GDMA1_MAC_ADRH); - mtk_w32(mac->hw, (macaddr[2] << 24) | (macaddr[3] << 16) | - (macaddr[4] << 8) | macaddr[5], - MTK_GDMA1_MAC_ADRL); - spin_unlock_irqrestore(&mac->hw->page_lock, flags); -} - -static int mtk_set_mac_address(struct net_device *dev, void *p) -{ - int ret = eth_mac_addr(dev, p); - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - - if (ret) - return ret; - - if (eth->soc->set_mac) - eth->soc->set_mac(mac, dev->dev_addr); - else - mtk_hw_set_macaddr(mac, p); - - return 0; -} - -static inline int mtk_max_frag_size(int mtu) -{ - /* make sure buf_size will be at least MAX_RX_LENGTH */ - if (mtu + MTK_RX_ETH_HLEN < MAX_RX_LENGTH) - mtu = MAX_RX_LENGTH - MTK_RX_ETH_HLEN; - - return SKB_DATA_ALIGN(MTK_RX_HLEN + mtu) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); -} - -static inline int mtk_max_buf_size(int frag_size) -{ - int buf_size = frag_size - NET_SKB_PAD - NET_IP_ALIGN - - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - - WARN_ON(buf_size < MAX_RX_LENGTH); - - return buf_size; -} - -static inline void mtk_get_rxd(struct mtk_rx_dma *rxd, - struct mtk_rx_dma *dma_rxd) -{ - rxd->rxd1 = READ_ONCE(dma_rxd->rxd1); - rxd->rxd2 = READ_ONCE(dma_rxd->rxd2); - rxd->rxd3 = READ_ONCE(dma_rxd->rxd3); - rxd->rxd4 = READ_ONCE(dma_rxd->rxd4); -} - -static inline void mtk_set_txd_pdma(struct mtk_tx_dma *txd, - struct mtk_tx_dma *dma_txd) -{ - WRITE_ONCE(dma_txd->txd1, txd->txd1); - WRITE_ONCE(dma_txd->txd3, txd->txd3); - WRITE_ONCE(dma_txd->txd4, txd->txd4); - /* clean dma done flag last */ - WRITE_ONCE(dma_txd->txd2, txd->txd2); -} - -static void mtk_clean_rx(struct mtk_eth *eth, struct mtk_rx_ring *ring) -{ - int i; - - if (ring->rx_data && ring->rx_dma) { - for (i = 0; i < ring->rx_ring_size; i++) { - if (!ring->rx_data[i]) - continue; - if (!ring->rx_dma[i].rxd1) - continue; - dma_unmap_single(eth->dev, - ring->rx_dma[i].rxd1, - ring->rx_buf_size, - DMA_FROM_DEVICE); - skb_free_frag(ring->rx_data[i]); - } - kfree(ring->rx_data); - ring->rx_data = NULL; - } - - if (ring->rx_dma) { - dma_free_coherent(eth->dev, - ring->rx_ring_size * sizeof(*ring->rx_dma), - ring->rx_dma, - ring->rx_phys); - ring->rx_dma = NULL; - } -} - -static int mtk_dma_rx_alloc(struct mtk_eth *eth, struct mtk_rx_ring *ring) -{ - int i, pad = 0; - - ring->frag_size = mtk_max_frag_size(ETH_DATA_LEN); - ring->rx_buf_size = mtk_max_buf_size(ring->frag_size); - ring->rx_ring_size = eth->soc->dma_ring_size; - ring->rx_data = kcalloc(ring->rx_ring_size, sizeof(*ring->rx_data), - GFP_KERNEL); - if (!ring->rx_data) - goto no_rx_mem; - - for (i = 0; i < ring->rx_ring_size; i++) { - ring->rx_data[i] = netdev_alloc_frag(ring->frag_size); - if (!ring->rx_data[i]) - goto no_rx_mem; - } - - ring->rx_dma = - dma_alloc_coherent(eth->dev, - ring->rx_ring_size * sizeof(*ring->rx_dma), - &ring->rx_phys, GFP_ATOMIC | __GFP_ZERO); - if (!ring->rx_dma) - goto no_rx_mem; - - if (!eth->soc->rx_2b_offset) - pad = NET_IP_ALIGN; - - for (i = 0; i < ring->rx_ring_size; i++) { - dma_addr_t dma_addr = dma_map_single(eth->dev, - ring->rx_data[i] + NET_SKB_PAD + pad, - ring->rx_buf_size, - DMA_FROM_DEVICE); - if (unlikely(dma_mapping_error(eth->dev, dma_addr))) - goto no_rx_mem; - ring->rx_dma[i].rxd1 = (unsigned int)dma_addr; - - if (eth->soc->rx_sg_dma) - ring->rx_dma[i].rxd2 = RX_DMA_PLEN0(ring->rx_buf_size); - else - ring->rx_dma[i].rxd2 = RX_DMA_LSO; - } - ring->rx_calc_idx = ring->rx_ring_size - 1; - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - - return 0; - -no_rx_mem: - return -ENOMEM; -} - -static void mtk_txd_unmap(struct device *dev, struct mtk_tx_buf *tx_buf) -{ - if (tx_buf->flags & MTK_TX_FLAGS_SINGLE0) { - dma_unmap_single(dev, - dma_unmap_addr(tx_buf, dma_addr0), - dma_unmap_len(tx_buf, dma_len0), - DMA_TO_DEVICE); - } else if (tx_buf->flags & MTK_TX_FLAGS_PAGE0) { - dma_unmap_page(dev, - dma_unmap_addr(tx_buf, dma_addr0), - dma_unmap_len(tx_buf, dma_len0), - DMA_TO_DEVICE); - } - if (tx_buf->flags & MTK_TX_FLAGS_PAGE1) - dma_unmap_page(dev, - dma_unmap_addr(tx_buf, dma_addr1), - dma_unmap_len(tx_buf, dma_len1), - DMA_TO_DEVICE); - - tx_buf->flags = 0; - if (tx_buf->skb && (tx_buf->skb != (struct sk_buff *)DMA_DUMMY_DESC)) - dev_kfree_skb_any(tx_buf->skb); - tx_buf->skb = NULL; -} - -static void mtk_pdma_tx_clean(struct mtk_eth *eth) -{ - struct mtk_tx_ring *ring = ð->tx_ring; - int i; - - if (ring->tx_buf) { - for (i = 0; i < ring->tx_ring_size; i++) - mtk_txd_unmap(eth->dev, &ring->tx_buf[i]); - kfree(ring->tx_buf); - ring->tx_buf = NULL; - } - - if (ring->tx_dma) { - dma_free_coherent(eth->dev, - ring->tx_ring_size * sizeof(*ring->tx_dma), - ring->tx_dma, - ring->tx_phys); - ring->tx_dma = NULL; - } -} - -static void mtk_qdma_tx_clean(struct mtk_eth *eth) -{ - struct mtk_tx_ring *ring = ð->tx_ring; - int i; - - if (ring->tx_buf) { - for (i = 0; i < ring->tx_ring_size; i++) - mtk_txd_unmap(eth->dev, &ring->tx_buf[i]); - kfree(ring->tx_buf); - ring->tx_buf = NULL; - } - - if (ring->tx_dma) { - dma_free_coherent(eth->dev, - ring->tx_ring_size * sizeof(*ring->tx_dma), - ring->tx_dma, - ring->tx_phys); - ring->tx_dma = NULL; - } -} - -void mtk_stats_update_mac(struct mtk_mac *mac) -{ - struct mtk_hw_stats *hw_stats = mac->hw_stats; - unsigned int base = mtk_reg_table[MTK_REG_MTK_COUNTER_BASE]; - u64 stats; - - base += hw_stats->reg_offset; - - u64_stats_update_begin(&hw_stats->syncp); - - if (mac->hw->soc->new_stats) { - hw_stats->rx_bytes += mtk_r32(mac->hw, base); - stats = mtk_r32(mac->hw, base + 0x04); - if (stats) - hw_stats->rx_bytes += (stats << 32); - hw_stats->rx_packets += mtk_r32(mac->hw, base + 0x08); - hw_stats->rx_overflow += mtk_r32(mac->hw, base + 0x10); - hw_stats->rx_fcs_errors += mtk_r32(mac->hw, base + 0x14); - hw_stats->rx_short_errors += mtk_r32(mac->hw, base + 0x18); - hw_stats->rx_long_errors += mtk_r32(mac->hw, base + 0x1c); - hw_stats->rx_checksum_errors += mtk_r32(mac->hw, base + 0x20); - hw_stats->rx_flow_control_packets += - mtk_r32(mac->hw, base + 0x24); - hw_stats->tx_skip += mtk_r32(mac->hw, base + 0x28); - hw_stats->tx_collisions += mtk_r32(mac->hw, base + 0x2c); - hw_stats->tx_bytes += mtk_r32(mac->hw, base + 0x30); - stats = mtk_r32(mac->hw, base + 0x34); - if (stats) - hw_stats->tx_bytes += (stats << 32); - hw_stats->tx_packets += mtk_r32(mac->hw, base + 0x38); - } else { - hw_stats->tx_bytes += mtk_r32(mac->hw, base); - hw_stats->tx_packets += mtk_r32(mac->hw, base + 0x04); - hw_stats->tx_skip += mtk_r32(mac->hw, base + 0x08); - hw_stats->tx_collisions += mtk_r32(mac->hw, base + 0x0c); - hw_stats->rx_bytes += mtk_r32(mac->hw, base + 0x20); - hw_stats->rx_packets += mtk_r32(mac->hw, base + 0x24); - hw_stats->rx_overflow += mtk_r32(mac->hw, base + 0x28); - hw_stats->rx_fcs_errors += mtk_r32(mac->hw, base + 0x2c); - hw_stats->rx_short_errors += mtk_r32(mac->hw, base + 0x30); - hw_stats->rx_long_errors += mtk_r32(mac->hw, base + 0x34); - hw_stats->rx_checksum_errors += mtk_r32(mac->hw, base + 0x38); - hw_stats->rx_flow_control_packets += - mtk_r32(mac->hw, base + 0x3c); - } - - u64_stats_update_end(&hw_stats->syncp); -} - -static void mtk_get_stats64(struct net_device *dev, - struct rtnl_link_stats64 *storage) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_hw_stats *hw_stats = mac->hw_stats; - unsigned int base = mtk_reg_table[MTK_REG_MTK_COUNTER_BASE]; - unsigned int start; - - if (!base) { - netdev_stats_to_stats64(storage, &dev->stats); - return; - } - - if (netif_running(dev) && netif_device_present(dev)) { - if (spin_trylock(&hw_stats->stats_lock)) { - mtk_stats_update_mac(mac); - spin_unlock(&hw_stats->stats_lock); - } - } - - do { - start = u64_stats_fetch_begin_irq(&hw_stats->syncp); - storage->rx_packets = hw_stats->rx_packets; - storage->tx_packets = hw_stats->tx_packets; - storage->rx_bytes = hw_stats->rx_bytes; - storage->tx_bytes = hw_stats->tx_bytes; - storage->collisions = hw_stats->tx_collisions; - storage->rx_length_errors = hw_stats->rx_short_errors + - hw_stats->rx_long_errors; - storage->rx_over_errors = hw_stats->rx_overflow; - storage->rx_crc_errors = hw_stats->rx_fcs_errors; - storage->rx_errors = hw_stats->rx_checksum_errors; - storage->tx_aborted_errors = hw_stats->tx_skip; - } while (u64_stats_fetch_retry_irq(&hw_stats->syncp, start)); - - storage->tx_errors = dev->stats.tx_errors; - storage->rx_dropped = dev->stats.rx_dropped; - storage->tx_dropped = dev->stats.tx_dropped; -} - -static int mtk_vlan_rx_add_vid(struct net_device *dev, - __be16 proto, u16 vid) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - u32 idx = (vid & 0xf); - u32 vlan_cfg; - - if (!((mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE]) && - (dev->features & NETIF_F_HW_VLAN_CTAG_TX))) - return 0; - - if (test_bit(idx, ð->vlan_map)) { - netdev_warn(dev, "disable tx vlan offload\n"); - dev->wanted_features &= ~NETIF_F_HW_VLAN_CTAG_TX; - netdev_update_features(dev); - } else { - vlan_cfg = mtk_r32(eth, - mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE] + - ((idx >> 1) << 2)); - if (idx & 0x1) { - vlan_cfg &= 0xffff; - vlan_cfg |= (vid << 16); - } else { - vlan_cfg &= 0xffff0000; - vlan_cfg |= vid; - } - mtk_w32(eth, - vlan_cfg, mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE] + - ((idx >> 1) << 2)); - set_bit(idx, ð->vlan_map); - } - - return 0; -} - -static int mtk_vlan_rx_kill_vid(struct net_device *dev, - __be16 proto, u16 vid) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - u32 idx = (vid & 0xf); - - if (!((mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE]) && - (dev->features & NETIF_F_HW_VLAN_CTAG_TX))) - return 0; - - clear_bit(idx, ð->vlan_map); - - return 0; -} - -static inline u32 mtk_pdma_empty_txd(struct mtk_tx_ring *ring) -{ - barrier(); - return (u32)(ring->tx_ring_size - - ((ring->tx_next_idx - ring->tx_free_idx) & - (ring->tx_ring_size - 1))); -} - -static int mtk_skb_padto(struct sk_buff *skb, struct mtk_eth *eth) -{ - unsigned int len; - int ret; - - if (unlikely(skb->len >= VLAN_ETH_ZLEN)) - return 0; - - if (eth->soc->padding_64b && !eth->soc->padding_bug) - return 0; - - if (skb_vlan_tag_present(skb)) - len = ETH_ZLEN; - else if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) - len = VLAN_ETH_ZLEN; - else if (!eth->soc->padding_64b) - len = ETH_ZLEN; - else - return 0; - - if (skb->len >= len) - return 0; - - ret = skb_pad(skb, len - skb->len); - if (ret < 0) - return ret; - skb->len = len; - skb_set_tail_pointer(skb, len); - - return ret; -} - -static int mtk_pdma_tx_map(struct sk_buff *skb, struct net_device *dev, - int tx_num, struct mtk_tx_ring *ring, bool gso) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - struct skb_frag_struct *frag; - struct mtk_tx_dma txd, *ptxd; - struct mtk_tx_buf *tx_buf; - int i, j, k, frag_size, frag_map_size, offset; - dma_addr_t mapped_addr; - unsigned int nr_frags; - u32 def_txd4; - - if (mtk_skb_padto(skb, eth)) { - netif_warn(eth, tx_err, dev, "tx padding failed!\n"); - return -1; - } - - tx_buf = &ring->tx_buf[ring->tx_next_idx]; - memset(tx_buf, 0, sizeof(*tx_buf)); - memset(&txd, 0, sizeof(txd)); - nr_frags = skb_shinfo(skb)->nr_frags; - - /* init tx descriptor */ - def_txd4 = eth->soc->txd4; - txd.txd4 = def_txd4; - - if (eth->soc->mac_count > 1) - txd.txd4 |= (mac->id + 1) << TX_DMA_FPORT_SHIFT; - - if (gso) - txd.txd4 |= TX_DMA_TSO; - - /* TX Checksum offload */ - if (skb->ip_summed == CHECKSUM_PARTIAL) - txd.txd4 |= TX_DMA_CHKSUM; - - /* VLAN header offload */ - if (skb_vlan_tag_present(skb)) { - u16 tag = skb_vlan_tag_get(skb); - - txd.txd4 |= TX_DMA_INS_VLAN | - ((tag >> VLAN_PRIO_SHIFT) << 4) | - (tag & 0xF); - } - - mapped_addr = dma_map_single(&dev->dev, skb->data, - skb_headlen(skb), DMA_TO_DEVICE); - if (unlikely(dma_mapping_error(&dev->dev, mapped_addr))) - return -1; - - txd.txd1 = mapped_addr; - txd.txd2 = TX_DMA_PLEN0(skb_headlen(skb)); - - tx_buf->flags |= MTK_TX_FLAGS_SINGLE0; - dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); - dma_unmap_len_set(tx_buf, dma_len0, skb_headlen(skb)); - - /* TX SG offload */ - j = ring->tx_next_idx; - k = 0; - for (i = 0; i < nr_frags; i++) { - offset = 0; - frag = &skb_shinfo(skb)->frags[i]; - frag_size = skb_frag_size(frag); - - while (frag_size > 0) { - frag_map_size = min(frag_size, TX_DMA_BUF_LEN); - mapped_addr = skb_frag_dma_map(&dev->dev, frag, offset, - frag_map_size, - DMA_TO_DEVICE); - if (unlikely(dma_mapping_error(&dev->dev, mapped_addr))) - goto err_dma; - - if (k & 0x1) { - j = NEXT_TX_DESP_IDX(j); - txd.txd1 = mapped_addr; - txd.txd2 = TX_DMA_PLEN0(frag_map_size); - txd.txd4 = def_txd4; - - tx_buf = &ring->tx_buf[j]; - memset(tx_buf, 0, sizeof(*tx_buf)); - - tx_buf->flags |= MTK_TX_FLAGS_PAGE0; - dma_unmap_addr_set(tx_buf, dma_addr0, - mapped_addr); - dma_unmap_len_set(tx_buf, dma_len0, - frag_map_size); - } else { - txd.txd3 = mapped_addr; - txd.txd2 |= TX_DMA_PLEN1(frag_map_size); - - tx_buf->skb = (struct sk_buff *)DMA_DUMMY_DESC; - tx_buf->flags |= MTK_TX_FLAGS_PAGE1; - dma_unmap_addr_set(tx_buf, dma_addr1, - mapped_addr); - dma_unmap_len_set(tx_buf, dma_len1, - frag_map_size); - - if (!((i == (nr_frags - 1)) && - (frag_map_size == frag_size))) { - mtk_set_txd_pdma(&txd, - &ring->tx_dma[j]); - memset(&txd, 0, sizeof(txd)); - } - } - frag_size -= frag_map_size; - offset += frag_map_size; - k++; - } - } - - /* set last segment */ - if (k & 0x1) - txd.txd2 |= TX_DMA_LS1; - else - txd.txd2 |= TX_DMA_LS0; - mtk_set_txd_pdma(&txd, &ring->tx_dma[j]); - - /* store skb to cleanup */ - tx_buf->skb = skb; - - netdev_sent_queue(dev, skb->len); - skb_tx_timestamp(skb); - - ring->tx_next_idx = NEXT_TX_DESP_IDX(j); - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - atomic_set(&ring->tx_free_count, mtk_pdma_empty_txd(ring)); - - if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) || !skb->xmit_more) - mtk_reg_w32(eth, ring->tx_next_idx, MTK_REG_TX_CTX_IDX0); - - return 0; - -err_dma: - j = ring->tx_next_idx; - for (i = 0; i < tx_num; i++) { - ptxd = &ring->tx_dma[j]; - tx_buf = &ring->tx_buf[j]; - - /* unmap dma */ - mtk_txd_unmap(&dev->dev, tx_buf); - - ptxd->txd2 = TX_DMA_DESP2_DEF; - j = NEXT_TX_DESP_IDX(j); - } - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - return -1; -} - -/* the qdma core needs scratch memory to be setup */ -static int mtk_init_fq_dma(struct mtk_eth *eth) -{ - dma_addr_t dma_addr, phy_ring_head, phy_ring_tail; - int cnt = eth->soc->dma_ring_size; - int i; - - eth->scratch_ring = dma_alloc_coherent(eth->dev, - cnt * sizeof(struct mtk_tx_dma), - &phy_ring_head, - GFP_ATOMIC | __GFP_ZERO); - if (unlikely(!eth->scratch_ring)) - return -ENOMEM; - - eth->scratch_head = kcalloc(cnt, QDMA_PAGE_SIZE, - GFP_KERNEL); - dma_addr = dma_map_single(eth->dev, - eth->scratch_head, cnt * QDMA_PAGE_SIZE, - DMA_FROM_DEVICE); - if (unlikely(dma_mapping_error(eth->dev, dma_addr))) - return -ENOMEM; - - memset(eth->scratch_ring, 0x0, sizeof(struct mtk_tx_dma) * cnt); - phy_ring_tail = phy_ring_head + (sizeof(struct mtk_tx_dma) * (cnt - 1)); - - for (i = 0; i < cnt; i++) { - eth->scratch_ring[i].txd1 = (dma_addr + (i * QDMA_PAGE_SIZE)); - if (i < cnt - 1) - eth->scratch_ring[i].txd2 = (phy_ring_head + - ((i + 1) * sizeof(struct mtk_tx_dma))); - eth->scratch_ring[i].txd3 = TX_QDMA_SDL(QDMA_PAGE_SIZE); - } - - mtk_w32(eth, phy_ring_head, MTK_QDMA_FQ_HEAD); - mtk_w32(eth, phy_ring_tail, MTK_QDMA_FQ_TAIL); - mtk_w32(eth, (cnt << 16) | cnt, MTK_QDMA_FQ_CNT); - mtk_w32(eth, QDMA_PAGE_SIZE << 16, MTK_QDMA_FQ_BLEN); - - return 0; -} - -static void *mtk_qdma_phys_to_virt(struct mtk_tx_ring *ring, u32 desc) -{ - void *ret = ring->tx_dma; - - return ret + (desc - ring->tx_phys); -} - -static struct mtk_tx_dma *mtk_tx_next_qdma(struct mtk_tx_ring *ring, - struct mtk_tx_dma *txd) -{ - return mtk_qdma_phys_to_virt(ring, txd->txd2); -} - -static struct mtk_tx_buf *mtk_desc_to_tx_buf(struct mtk_tx_ring *ring, - struct mtk_tx_dma *txd) -{ - int idx = txd - ring->tx_dma; - - return &ring->tx_buf[idx]; -} - -static int mtk_qdma_tx_map(struct sk_buff *skb, struct net_device *dev, - int tx_num, struct mtk_tx_ring *ring, bool gso) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - struct mtk_tx_dma *itxd, *txd; - struct mtk_tx_buf *tx_buf; - dma_addr_t mapped_addr; - unsigned int nr_frags; - int i, n_desc = 1; - u32 txd4 = eth->soc->txd4; - - itxd = ring->tx_next_free; - if (itxd == ring->tx_last_free) - return -ENOMEM; - - if (eth->soc->mac_count > 1) - txd4 |= (mac->id + 1) << TX_DMA_FPORT_SHIFT; - - tx_buf = mtk_desc_to_tx_buf(ring, itxd); - memset(tx_buf, 0, sizeof(*tx_buf)); - - if (gso) - txd4 |= TX_DMA_TSO; - - /* TX Checksum offload */ - if (skb->ip_summed == CHECKSUM_PARTIAL) - txd4 |= TX_DMA_CHKSUM; - - /* VLAN header offload */ - if (skb_vlan_tag_present(skb)) - txd4 |= TX_DMA_INS_VLAN_MT7621 | skb_vlan_tag_get(skb); - - mapped_addr = dma_map_single(&dev->dev, skb->data, - skb_headlen(skb), DMA_TO_DEVICE); - if (unlikely(dma_mapping_error(&dev->dev, mapped_addr))) - return -ENOMEM; - - WRITE_ONCE(itxd->txd1, mapped_addr); - tx_buf->flags |= MTK_TX_FLAGS_SINGLE0; - dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); - dma_unmap_len_set(tx_buf, dma_len0, skb_headlen(skb)); - - /* TX SG offload */ - txd = itxd; - nr_frags = skb_shinfo(skb)->nr_frags; - for (i = 0; i < nr_frags; i++) { - struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i]; - unsigned int offset = 0; - int frag_size = skb_frag_size(frag); - - while (frag_size) { - bool last_frag = false; - unsigned int frag_map_size; - - txd = mtk_tx_next_qdma(ring, txd); - if (txd == ring->tx_last_free) - goto err_dma; - - n_desc++; - frag_map_size = min(frag_size, TX_DMA_BUF_LEN); - mapped_addr = skb_frag_dma_map(&dev->dev, frag, offset, - frag_map_size, - DMA_TO_DEVICE); - if (unlikely(dma_mapping_error(&dev->dev, mapped_addr))) - goto err_dma; - - if (i == nr_frags - 1 && - (frag_size - frag_map_size) == 0) - last_frag = true; - - WRITE_ONCE(txd->txd1, mapped_addr); - WRITE_ONCE(txd->txd3, (QDMA_TX_SWC | - TX_DMA_PLEN0(frag_map_size) | - last_frag * TX_DMA_LS0) | - mac->id); - WRITE_ONCE(txd->txd4, 0); - - tx_buf->skb = (struct sk_buff *)DMA_DUMMY_DESC; - tx_buf = mtk_desc_to_tx_buf(ring, txd); - memset(tx_buf, 0, sizeof(*tx_buf)); - - tx_buf->flags |= MTK_TX_FLAGS_PAGE0; - dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); - dma_unmap_len_set(tx_buf, dma_len0, frag_map_size); - frag_size -= frag_map_size; - offset += frag_map_size; - } - } - - /* store skb to cleanup */ - tx_buf->skb = skb; - - WRITE_ONCE(itxd->txd4, txd4); - WRITE_ONCE(itxd->txd3, (QDMA_TX_SWC | TX_DMA_PLEN0(skb_headlen(skb)) | - (!nr_frags * TX_DMA_LS0))); - - netdev_sent_queue(dev, skb->len); - skb_tx_timestamp(skb); - - ring->tx_next_free = mtk_tx_next_qdma(ring, txd); - atomic_sub(n_desc, &ring->tx_free_count); - - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - - if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) || !skb->xmit_more) - mtk_w32(eth, txd->txd2, MTK_QTX_CTX_PTR); - - return 0; - -err_dma: - do { - tx_buf = mtk_desc_to_tx_buf(ring, txd); - - /* unmap dma */ - mtk_txd_unmap(&dev->dev, tx_buf); - - itxd->txd3 = TX_DMA_DESP2_DEF; - itxd = mtk_tx_next_qdma(ring, itxd); - } while (itxd != txd); - - return -ENOMEM; -} - -static inline int mtk_cal_txd_req(struct sk_buff *skb) -{ - int i, nfrags; - struct skb_frag_struct *frag; - - nfrags = 1; - if (skb_is_gso(skb)) { - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { - frag = &skb_shinfo(skb)->frags[i]; - nfrags += DIV_ROUND_UP(frag->size, TX_DMA_BUF_LEN); - } - } else { - nfrags += skb_shinfo(skb)->nr_frags; - } - - return DIV_ROUND_UP(nfrags, 2); -} - -static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - struct mtk_tx_ring *ring = ð->tx_ring; - struct net_device_stats *stats = &dev->stats; - int tx_num; - int len = skb->len; - bool gso = false; - - tx_num = mtk_cal_txd_req(skb); - if (unlikely(atomic_read(&ring->tx_free_count) <= tx_num)) { - netif_stop_queue(dev); - netif_err(eth, tx_queued, dev, - "Tx Ring full when queue awake!\n"); - return NETDEV_TX_BUSY; - } - - /* TSO: fill MSS info in tcp checksum field */ - if (skb_is_gso(skb)) { - if (skb_cow_head(skb, 0)) { - netif_warn(eth, tx_err, dev, - "GSO expand head fail.\n"); - goto drop; - } - - if (skb_shinfo(skb)->gso_type & - (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) { - gso = true; - tcp_hdr(skb)->check = htons(skb_shinfo(skb)->gso_size); - } - } - - if (ring->tx_map(skb, dev, tx_num, ring, gso) < 0) - goto drop; - - stats->tx_packets++; - stats->tx_bytes += len; - - if (unlikely(atomic_read(&ring->tx_free_count) <= ring->tx_thresh)) { - netif_stop_queue(dev); - smp_mb(); - if (unlikely(atomic_read(&ring->tx_free_count) > - ring->tx_thresh)) - netif_wake_queue(dev); - } - - return NETDEV_TX_OK; - -drop: - stats->tx_dropped++; - dev_kfree_skb(skb); - return NETDEV_TX_OK; -} - -static int mtk_poll_rx(struct napi_struct *napi, int budget, - struct mtk_eth *eth, u32 rx_intr) -{ - struct mtk_soc_data *soc = eth->soc; - struct mtk_rx_ring *ring = ð->rx_ring[0]; - int idx = ring->rx_calc_idx; - u32 checksum_bit; - struct sk_buff *skb; - u8 *data, *new_data; - struct mtk_rx_dma *rxd, trxd; - int done = 0, pad; - - if (eth->soc->hw_features & NETIF_F_RXCSUM) - checksum_bit = soc->checksum_bit; - else - checksum_bit = 0; - - if (eth->soc->rx_2b_offset) - pad = 0; - else - pad = NET_IP_ALIGN; - - while (done < budget) { - struct net_device *netdev; - unsigned int pktlen; - dma_addr_t dma_addr; - int mac = 0; - - idx = NEXT_RX_DESP_IDX(idx); - rxd = &ring->rx_dma[idx]; - data = ring->rx_data[idx]; - - mtk_get_rxd(&trxd, rxd); - if (!(trxd.rxd2 & RX_DMA_DONE)) - break; - - /* find out which mac the packet come from. values start at 1 */ - if (eth->soc->mac_count > 1) { - mac = (trxd.rxd4 >> RX_DMA_FPORT_SHIFT) & - RX_DMA_FPORT_MASK; - mac--; - if (mac < 0 || mac >= eth->soc->mac_count) - goto release_desc; - } - - netdev = eth->netdev[mac]; - - /* alloc new buffer */ - new_data = napi_alloc_frag(ring->frag_size); - if (unlikely(!new_data || !netdev)) { - netdev->stats.rx_dropped++; - goto release_desc; - } - dma_addr = dma_map_single(&netdev->dev, - new_data + NET_SKB_PAD + pad, - ring->rx_buf_size, - DMA_FROM_DEVICE); - if (unlikely(dma_mapping_error(&netdev->dev, dma_addr))) { - skb_free_frag(new_data); - goto release_desc; - } - - /* receive data */ - skb = build_skb(data, ring->frag_size); - if (unlikely(!skb)) { - put_page(virt_to_head_page(new_data)); - goto release_desc; - } - skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN); - - dma_unmap_single(&netdev->dev, trxd.rxd1, - ring->rx_buf_size, DMA_FROM_DEVICE); - pktlen = RX_DMA_GET_PLEN0(trxd.rxd2); - skb->dev = netdev; - skb_put(skb, pktlen); - if (trxd.rxd4 & checksum_bit) - skb->ip_summed = CHECKSUM_UNNECESSARY; - else - skb_checksum_none_assert(skb); - skb->protocol = eth_type_trans(skb, netdev); - - netdev->stats.rx_packets++; - netdev->stats.rx_bytes += pktlen; - - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX && - RX_DMA_VID(trxd.rxd3)) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), - RX_DMA_VID(trxd.rxd3)); - napi_gro_receive(napi, skb); - - ring->rx_data[idx] = new_data; - rxd->rxd1 = (unsigned int)dma_addr; - -release_desc: - if (eth->soc->rx_sg_dma) - rxd->rxd2 = RX_DMA_PLEN0(ring->rx_buf_size); - else - rxd->rxd2 = RX_DMA_LSO; - - ring->rx_calc_idx = idx; - /* make sure that all changes to the dma ring are flushed before - * we continue - */ - wmb(); - if (eth->soc->dma_type == MTK_QDMA) - mtk_w32(eth, ring->rx_calc_idx, MTK_QRX_CRX_IDX0); - else - mtk_reg_w32(eth, ring->rx_calc_idx, - MTK_REG_RX_CALC_IDX0); - done++; - } - - if (done < budget) - mtk_irq_ack(eth, rx_intr); - - return done; -} - -static int mtk_pdma_tx_poll(struct mtk_eth *eth, int budget, bool *tx_again) -{ - struct sk_buff *skb; - struct mtk_tx_buf *tx_buf; - int done = 0; - u32 idx, hwidx; - struct mtk_tx_ring *ring = ð->tx_ring; - unsigned int bytes = 0; - - idx = ring->tx_free_idx; - hwidx = mtk_reg_r32(eth, MTK_REG_TX_DTX_IDX0); - - while ((idx != hwidx) && budget) { - tx_buf = &ring->tx_buf[idx]; - skb = tx_buf->skb; - - if (!skb) - break; - - if (skb != (struct sk_buff *)DMA_DUMMY_DESC) { - bytes += skb->len; - done++; - budget--; - } - mtk_txd_unmap(eth->dev, tx_buf); - idx = NEXT_TX_DESP_IDX(idx); - } - ring->tx_free_idx = idx; - atomic_set(&ring->tx_free_count, mtk_pdma_empty_txd(ring)); - - /* read hw index again make sure no new tx packet */ - if (idx != hwidx || idx != mtk_reg_r32(eth, MTK_REG_TX_DTX_IDX0)) - *tx_again = 1; - - if (done) - netdev_completed_queue(*eth->netdev, done, bytes); - - return done; -} - -static int mtk_qdma_tx_poll(struct mtk_eth *eth, int budget, bool *tx_again) -{ - struct mtk_tx_ring *ring = ð->tx_ring; - struct mtk_tx_dma *desc; - struct sk_buff *skb; - struct mtk_tx_buf *tx_buf; - int total = 0, done[MTK_MAX_DEVS]; - unsigned int bytes[MTK_MAX_DEVS]; - u32 cpu, dma; - int i; - - memset(done, 0, sizeof(done)); - memset(bytes, 0, sizeof(bytes)); - - cpu = mtk_r32(eth, MTK_QTX_CRX_PTR); - dma = mtk_r32(eth, MTK_QTX_DRX_PTR); - - desc = mtk_qdma_phys_to_virt(ring, cpu); - - while ((cpu != dma) && budget) { - u32 next_cpu = desc->txd2; - int mac; - - desc = mtk_tx_next_qdma(ring, desc); - if ((desc->txd3 & QDMA_TX_OWNER_CPU) == 0) - break; - - mac = (desc->txd4 >> TX_DMA_FPORT_SHIFT) & - TX_DMA_FPORT_MASK; - mac--; - - tx_buf = mtk_desc_to_tx_buf(ring, desc); - skb = tx_buf->skb; - if (!skb) - break; - - if (skb != (struct sk_buff *)DMA_DUMMY_DESC) { - bytes[mac] += skb->len; - done[mac]++; - budget--; - } - mtk_txd_unmap(eth->dev, tx_buf); - - ring->tx_last_free->txd2 = next_cpu; - ring->tx_last_free = desc; - atomic_inc(&ring->tx_free_count); - - cpu = next_cpu; - } - - mtk_w32(eth, cpu, MTK_QTX_CRX_PTR); - - /* read hw index again make sure no new tx packet */ - if (cpu != dma || cpu != mtk_r32(eth, MTK_QTX_DRX_PTR)) - *tx_again = true; - - for (i = 0; i < eth->soc->mac_count; i++) { - if (!done[i]) - continue; - netdev_completed_queue(eth->netdev[i], done[i], bytes[i]); - total += done[i]; - } - - return total; -} - -static int mtk_poll_tx(struct mtk_eth *eth, int budget, u32 tx_intr, - bool *tx_again) -{ - struct mtk_tx_ring *ring = ð->tx_ring; - struct net_device *netdev = eth->netdev[0]; - int done; - - done = eth->tx_ring.tx_poll(eth, budget, tx_again); - if (!*tx_again) - mtk_irq_ack(eth, tx_intr); - - if (!done) - return 0; - - smp_mb(); - if (unlikely(!netif_queue_stopped(netdev))) - return done; - - if (atomic_read(&ring->tx_free_count) > ring->tx_thresh) - netif_wake_queue(netdev); - - return done; -} - -static void mtk_stats_update(struct mtk_eth *eth) -{ - int i; - - for (i = 0; i < eth->soc->mac_count; i++) { - if (!eth->mac[i] || !eth->mac[i]->hw_stats) - continue; - if (spin_trylock(ð->mac[i]->hw_stats->stats_lock)) { - mtk_stats_update_mac(eth->mac[i]); - spin_unlock(ð->mac[i]->hw_stats->stats_lock); - } - } -} - -static int mtk_poll(struct napi_struct *napi, int budget) -{ - struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi); - u32 status, mtk_status, mask, tx_intr, rx_intr, status_intr; - int tx_done, rx_done; - bool tx_again = false; - - status = mtk_irq_pending(eth); - mtk_status = mtk_irq_pending_status(eth); - tx_intr = eth->soc->tx_int; - rx_intr = eth->soc->rx_int; - status_intr = eth->soc->status_int; - tx_done = 0; - rx_done = 0; - tx_again = 0; - - if (status & tx_intr) - tx_done = mtk_poll_tx(eth, budget, tx_intr, &tx_again); - - if (status & rx_intr) - rx_done = mtk_poll_rx(napi, budget, eth, rx_intr); - - if (unlikely(mtk_status & status_intr)) { - mtk_stats_update(eth); - mtk_irq_ack_status(eth, status_intr); - } - - if (unlikely(netif_msg_intr(eth))) { - mask = mtk_irq_enabled(eth); - netdev_info(eth->netdev[0], - "done tx %d, rx %d, intr 0x%08x/0x%x\n", - tx_done, rx_done, status, mask); - } - - if (tx_again || rx_done == budget) - return budget; - - status = mtk_irq_pending(eth); - if (status & (tx_intr | rx_intr)) - return budget; - - napi_complete(napi); - mtk_irq_enable(eth, tx_intr | rx_intr); - - return rx_done; -} - -static int mtk_pdma_tx_alloc(struct mtk_eth *eth) -{ - int i; - struct mtk_tx_ring *ring = ð->tx_ring; - - ring->tx_ring_size = eth->soc->dma_ring_size; - ring->tx_free_idx = 0; - ring->tx_next_idx = 0; - ring->tx_thresh = max((unsigned long)ring->tx_ring_size >> 2, - MAX_SKB_FRAGS); - - ring->tx_buf = kcalloc(ring->tx_ring_size, sizeof(*ring->tx_buf), - GFP_KERNEL); - if (!ring->tx_buf) - goto no_tx_mem; - - ring->tx_dma = - dma_alloc_coherent(eth->dev, - ring->tx_ring_size * sizeof(*ring->tx_dma), - &ring->tx_phys, GFP_ATOMIC | __GFP_ZERO); - if (!ring->tx_dma) - goto no_tx_mem; - - for (i = 0; i < ring->tx_ring_size; i++) { - ring->tx_dma[i].txd2 = TX_DMA_DESP2_DEF; - ring->tx_dma[i].txd4 = eth->soc->txd4; - } - - atomic_set(&ring->tx_free_count, mtk_pdma_empty_txd(ring)); - ring->tx_map = mtk_pdma_tx_map; - ring->tx_poll = mtk_pdma_tx_poll; - ring->tx_clean = mtk_pdma_tx_clean; - - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - - mtk_reg_w32(eth, ring->tx_phys, MTK_REG_TX_BASE_PTR0); - mtk_reg_w32(eth, ring->tx_ring_size, MTK_REG_TX_MAX_CNT0); - mtk_reg_w32(eth, 0, MTK_REG_TX_CTX_IDX0); - mtk_reg_w32(eth, MTK_PST_DTX_IDX0, MTK_REG_PDMA_RST_CFG); - - return 0; - -no_tx_mem: - return -ENOMEM; -} - -static int mtk_qdma_tx_alloc_tx(struct mtk_eth *eth) -{ - struct mtk_tx_ring *ring = ð->tx_ring; - int i, sz = sizeof(*ring->tx_dma); - - ring->tx_ring_size = eth->soc->dma_ring_size; - ring->tx_buf = kcalloc(ring->tx_ring_size, sizeof(*ring->tx_buf), - GFP_KERNEL); - if (!ring->tx_buf) - goto no_tx_mem; - - ring->tx_dma = dma_alloc_coherent(eth->dev, ring->tx_ring_size * sz, - &ring->tx_phys, - GFP_ATOMIC | __GFP_ZERO); - if (!ring->tx_dma) - goto no_tx_mem; - - for (i = 0; i < ring->tx_ring_size; i++) { - int next = (i + 1) % ring->tx_ring_size; - u32 next_ptr = ring->tx_phys + next * sz; - - ring->tx_dma[i].txd2 = next_ptr; - ring->tx_dma[i].txd3 = TX_DMA_DESP2_DEF; - } - - atomic_set(&ring->tx_free_count, ring->tx_ring_size - 2); - ring->tx_next_free = &ring->tx_dma[0]; - ring->tx_last_free = &ring->tx_dma[ring->tx_ring_size - 2]; - ring->tx_thresh = max((unsigned long)ring->tx_ring_size >> 2, - MAX_SKB_FRAGS); - - ring->tx_map = mtk_qdma_tx_map; - ring->tx_poll = mtk_qdma_tx_poll; - ring->tx_clean = mtk_qdma_tx_clean; - - /* make sure that all changes to the dma ring are flushed before we - * continue - */ - wmb(); - - mtk_w32(eth, ring->tx_phys, MTK_QTX_CTX_PTR); - mtk_w32(eth, ring->tx_phys, MTK_QTX_DTX_PTR); - mtk_w32(eth, - ring->tx_phys + ((ring->tx_ring_size - 1) * sz), - MTK_QTX_CRX_PTR); - mtk_w32(eth, - ring->tx_phys + ((ring->tx_ring_size - 1) * sz), - MTK_QTX_DRX_PTR); - - return 0; - -no_tx_mem: - return -ENOMEM; -} - -static int mtk_qdma_init(struct mtk_eth *eth, int ring) -{ - int err; - - err = mtk_init_fq_dma(eth); - if (err) - return err; - - err = mtk_qdma_tx_alloc_tx(eth); - if (err) - return err; - - err = mtk_dma_rx_alloc(eth, ð->rx_ring[ring]); - if (err) - return err; - - mtk_w32(eth, eth->rx_ring[ring].rx_phys, MTK_QRX_BASE_PTR0); - mtk_w32(eth, eth->rx_ring[ring].rx_ring_size, MTK_QRX_MAX_CNT0); - mtk_w32(eth, eth->rx_ring[ring].rx_calc_idx, MTK_QRX_CRX_IDX0); - mtk_w32(eth, MTK_PST_DRX_IDX0, MTK_QDMA_RST_IDX); - mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES, MTK_QTX_CFG(0)); - - /* Enable random early drop and set drop threshold automatically */ - mtk_w32(eth, 0x174444, MTK_QDMA_FC_THRES); - mtk_w32(eth, 0x0, MTK_QDMA_HRED2); - - return 0; -} - -static int mtk_pdma_qdma_init(struct mtk_eth *eth) -{ - int err = mtk_qdma_init(eth, 1); - - if (err) - return err; - - err = mtk_dma_rx_alloc(eth, ð->rx_ring[0]); - if (err) - return err; - - mtk_reg_w32(eth, eth->rx_ring[0].rx_phys, MTK_REG_RX_BASE_PTR0); - mtk_reg_w32(eth, eth->rx_ring[0].rx_ring_size, MTK_REG_RX_MAX_CNT0); - mtk_reg_w32(eth, eth->rx_ring[0].rx_calc_idx, MTK_REG_RX_CALC_IDX0); - mtk_reg_w32(eth, MTK_PST_DRX_IDX0, MTK_REG_PDMA_RST_CFG); - - return 0; -} - -static int mtk_pdma_init(struct mtk_eth *eth) -{ - struct mtk_rx_ring *ring = ð->rx_ring[0]; - int err; - - err = mtk_pdma_tx_alloc(eth); - if (err) - return err; - - err = mtk_dma_rx_alloc(eth, ring); - if (err) - return err; - - mtk_reg_w32(eth, ring->rx_phys, MTK_REG_RX_BASE_PTR0); - mtk_reg_w32(eth, ring->rx_ring_size, MTK_REG_RX_MAX_CNT0); - mtk_reg_w32(eth, ring->rx_calc_idx, MTK_REG_RX_CALC_IDX0); - mtk_reg_w32(eth, MTK_PST_DRX_IDX0, MTK_REG_PDMA_RST_CFG); - - return 0; -} - -static void mtk_dma_free(struct mtk_eth *eth) -{ - int i; - - for (i = 0; i < eth->soc->mac_count; i++) - if (eth->netdev[i]) - netdev_reset_queue(eth->netdev[i]); - eth->tx_ring.tx_clean(eth); - mtk_clean_rx(eth, ð->rx_ring[0]); - mtk_clean_rx(eth, ð->rx_ring[1]); - kfree(eth->scratch_head); -} - -static void mtk_tx_timeout(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - struct mtk_tx_ring *ring = ð->tx_ring; - - eth->netdev[mac->id]->stats.tx_errors++; - netif_err(eth, tx_err, dev, - "transmit timed out\n"); - if (eth->soc->dma_type & MTK_PDMA) { - netif_info(eth, drv, dev, "pdma_cfg:%08x\n", - mtk_reg_r32(eth, MTK_REG_PDMA_GLO_CFG)); - netif_info(eth, drv, dev, - "tx_ring=%d, base=%08x, max=%u, ctx=%u, dtx=%u, fdx=%hu, next=%hu\n", - 0, mtk_reg_r32(eth, MTK_REG_TX_BASE_PTR0), - mtk_reg_r32(eth, MTK_REG_TX_MAX_CNT0), - mtk_reg_r32(eth, MTK_REG_TX_CTX_IDX0), - mtk_reg_r32(eth, MTK_REG_TX_DTX_IDX0), - ring->tx_free_idx, - ring->tx_next_idx); - } - if (eth->soc->dma_type & MTK_QDMA) { - netif_info(eth, drv, dev, "qdma_cfg:%08x\n", - mtk_r32(eth, MTK_QDMA_GLO_CFG)); - netif_info(eth, drv, dev, - "tx_ring=%d, ctx=%08x, dtx=%08x, crx=%08x, drx=%08x, free=%hu\n", - 0, mtk_r32(eth, MTK_QTX_CTX_PTR), - mtk_r32(eth, MTK_QTX_DTX_PTR), - mtk_r32(eth, MTK_QTX_CRX_PTR), - mtk_r32(eth, MTK_QTX_DRX_PTR), - atomic_read(&ring->tx_free_count)); - } - netif_info(eth, drv, dev, - "rx_ring=%d, base=%08x, max=%u, calc=%u, drx=%u\n", - 0, mtk_reg_r32(eth, MTK_REG_RX_BASE_PTR0), - mtk_reg_r32(eth, MTK_REG_RX_MAX_CNT0), - mtk_reg_r32(eth, MTK_REG_RX_CALC_IDX0), - mtk_reg_r32(eth, MTK_REG_RX_DRX_IDX0)); - - schedule_work(&mac->pending_work); -} - -static irqreturn_t mtk_handle_irq(int irq, void *_eth) -{ - struct mtk_eth *eth = _eth; - u32 status, int_mask; - - status = mtk_irq_pending(eth); - if (unlikely(!status)) - return IRQ_NONE; - - int_mask = (eth->soc->rx_int | eth->soc->tx_int); - if (likely(status & int_mask)) { - if (likely(napi_schedule_prep(ð->rx_napi))) - __napi_schedule(ð->rx_napi); - } else { - mtk_irq_ack(eth, status); - } - mtk_irq_disable(eth, int_mask); - - return IRQ_HANDLED; -} - -#ifdef CONFIG_NET_POLL_CONTROLLER -static void mtk_poll_controller(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - u32 int_mask = eth->soc->tx_int | eth->soc->rx_int; - - mtk_irq_disable(eth, int_mask); - mtk_handle_irq(dev->irq, dev); - mtk_irq_enable(eth, int_mask); -} -#endif - -int mtk_set_clock_cycle(struct mtk_eth *eth) -{ - unsigned long sysclk = eth->sysclk; - - sysclk /= MTK_US_CYC_CNT_DIVISOR; - sysclk <<= MTK_US_CYC_CNT_SHIFT; - - mtk_w32(eth, (mtk_r32(eth, MTK_GLO_CFG) & - ~(MTK_US_CYC_CNT_MASK << MTK_US_CYC_CNT_SHIFT)) | - sysclk, - MTK_GLO_CFG); - return 0; -} - -void mtk_fwd_config(struct mtk_eth *eth) -{ - u32 fwd_cfg; - - fwd_cfg = mtk_r32(eth, MTK_GDMA1_FWD_CFG); - - /* disable jumbo frame */ - if (eth->soc->jumbo_frame) - fwd_cfg &= ~MTK_GDM1_JMB_EN; - - /* set unicast/multicast/broadcast frame to cpu */ - fwd_cfg &= ~0xffff; - - mtk_w32(eth, fwd_cfg, MTK_GDMA1_FWD_CFG); -} - -void mtk_csum_config(struct mtk_eth *eth) -{ - if (eth->soc->hw_features & NETIF_F_RXCSUM) - mtk_w32(eth, mtk_r32(eth, MTK_GDMA1_FWD_CFG) | - (MTK_GDM1_ICS_EN | MTK_GDM1_TCS_EN | MTK_GDM1_UCS_EN), - MTK_GDMA1_FWD_CFG); - else - mtk_w32(eth, mtk_r32(eth, MTK_GDMA1_FWD_CFG) & - ~(MTK_GDM1_ICS_EN | MTK_GDM1_TCS_EN | MTK_GDM1_UCS_EN), - MTK_GDMA1_FWD_CFG); - if (eth->soc->hw_features & NETIF_F_IP_CSUM) - mtk_w32(eth, mtk_r32(eth, MTK_CDMA_CSG_CFG) | - (MTK_ICS_GEN_EN | MTK_TCS_GEN_EN | MTK_UCS_GEN_EN), - MTK_CDMA_CSG_CFG); - else - mtk_w32(eth, mtk_r32(eth, MTK_CDMA_CSG_CFG) & - ~(MTK_ICS_GEN_EN | MTK_TCS_GEN_EN | MTK_UCS_GEN_EN), - MTK_CDMA_CSG_CFG); -} - -static int mtk_start_dma(struct mtk_eth *eth) -{ - unsigned long flags; - u32 val; - int err; - - if (eth->soc->dma_type == MTK_PDMA) - err = mtk_pdma_init(eth); - else if (eth->soc->dma_type == MTK_QDMA) - err = mtk_qdma_init(eth, 0); - else - err = mtk_pdma_qdma_init(eth); - if (err) { - mtk_dma_free(eth); - return err; - } - - spin_lock_irqsave(ð->page_lock, flags); - - val = MTK_TX_WB_DDONE | MTK_RX_DMA_EN | MTK_TX_DMA_EN; - if (eth->soc->rx_2b_offset) - val |= MTK_RX_2B_OFFSET; - val |= eth->soc->pdma_glo_cfg; - - if (eth->soc->dma_type & MTK_PDMA) - mtk_reg_w32(eth, val, MTK_REG_PDMA_GLO_CFG); - - if (eth->soc->dma_type & MTK_QDMA) - mtk_w32(eth, val, MTK_QDMA_GLO_CFG); - - spin_unlock_irqrestore(ð->page_lock, flags); - - return 0; -} - -static int mtk_open(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - - dma_coerce_mask_and_coherent(&dev->dev, DMA_BIT_MASK(32)); - - if (!atomic_read(ð->dma_refcnt)) { - int err = mtk_start_dma(eth); - - if (err) - return err; - - napi_enable(ð->rx_napi); - mtk_irq_enable(eth, eth->soc->tx_int | eth->soc->rx_int); - } - atomic_inc(ð->dma_refcnt); - - if (eth->phy) - eth->phy->start(mac); - - if (eth->soc->has_carrier && eth->soc->has_carrier(eth)) - netif_carrier_on(dev); - - netif_start_queue(dev); - eth->soc->fwd_config(eth); - - return 0; -} - -static void mtk_stop_dma(struct mtk_eth *eth, u32 glo_cfg) -{ - unsigned long flags; - u32 val; - int i; - - /* stop the dma enfine */ - spin_lock_irqsave(ð->page_lock, flags); - val = mtk_r32(eth, glo_cfg); - mtk_w32(eth, val & ~(MTK_TX_WB_DDONE | MTK_RX_DMA_EN | MTK_TX_DMA_EN), - glo_cfg); - spin_unlock_irqrestore(ð->page_lock, flags); - - /* wait for dma stop */ - for (i = 0; i < 10; i++) { - val = mtk_r32(eth, glo_cfg); - if (val & (MTK_TX_DMA_BUSY | MTK_RX_DMA_BUSY)) { - msleep(20); - continue; - } - break; - } -} - -static int mtk_stop(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - - netif_tx_disable(dev); - if (eth->phy) - eth->phy->stop(mac); - - if (!atomic_dec_and_test(ð->dma_refcnt)) - return 0; - - mtk_irq_disable(eth, eth->soc->tx_int | eth->soc->rx_int); - napi_disable(ð->rx_napi); - - if (eth->soc->dma_type & MTK_PDMA) - mtk_stop_dma(eth, mtk_reg_table[MTK_REG_PDMA_GLO_CFG]); - - if (eth->soc->dma_type & MTK_QDMA) - mtk_stop_dma(eth, MTK_QDMA_GLO_CFG); - - mtk_dma_free(eth); - - return 0; -} - -static int __init mtk_init_hw(struct mtk_eth *eth) -{ - int i, err; - - eth->soc->reset_fe(eth); - - if (eth->soc->switch_init) - if (eth->soc->switch_init(eth)) { - dev_err(eth->dev, "failed to initialize switch core\n"); - return -ENODEV; - } - - err = devm_request_irq(eth->dev, eth->irq, mtk_handle_irq, 0, - dev_name(eth->dev), eth); - if (err) - return err; - - err = mtk_mdio_init(eth); - if (err) - return err; - - /* disable delay and normal interrupt */ - mtk_reg_w32(eth, 0, MTK_REG_DLY_INT_CFG); - if (eth->soc->dma_type & MTK_QDMA) - mtk_w32(eth, 0, MTK_QDMA_DELAY_INT); - mtk_irq_disable(eth, eth->soc->tx_int | eth->soc->rx_int); - - /* frame engine will push VLAN tag regarding to VIDX field in Tx desc */ - if (mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE]) - for (i = 0; i < 16; i += 2) - mtk_w32(eth, ((i + 1) << 16) + i, - mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE] + - (i * 2)); - - if (eth->soc->fwd_config(eth)) - dev_err(eth->dev, "unable to get clock\n"); - - if (mtk_reg_table[MTK_REG_MTK_RST_GL]) { - mtk_reg_w32(eth, 1, MTK_REG_MTK_RST_GL); - mtk_reg_w32(eth, 0, MTK_REG_MTK_RST_GL); - } - - return 0; -} - -static int __init mtk_init(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - struct device_node *port; - const char *mac_addr; - int err; - - mac_addr = of_get_mac_address(mac->of_node); - if (mac_addr) - ether_addr_copy(dev->dev_addr, mac_addr); - - /* If the mac address is invalid, use random mac address */ - if (!is_valid_ether_addr(dev->dev_addr)) { - eth_hw_addr_random(dev); - dev_err(eth->dev, "generated random MAC address %pM\n", - dev->dev_addr); - } - mac->hw->soc->set_mac(mac, dev->dev_addr); - - if (eth->soc->port_init) - for_each_child_of_node(mac->of_node, port) - if (of_device_is_compatible(port, - "mediatek,eth-port") && - of_device_is_available(port)) - eth->soc->port_init(eth, mac, port); - - if (eth->phy) { - err = eth->phy->connect(mac); - if (err) - return err; - } - - return 0; -} - -static void mtk_uninit(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - - if (eth->phy) - eth->phy->disconnect(mac); - mtk_mdio_cleanup(eth); - - mtk_irq_disable(eth, ~0); - free_irq(dev->irq, dev); -} - -static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) -{ - struct mtk_mac *mac = netdev_priv(dev); - - if (!mac->phy_dev) - return -ENODEV; - - switch (cmd) { - case SIOCGMIIPHY: - case SIOCGMIIREG: - case SIOCSMIIREG: - return phy_mii_ioctl(mac->phy_dev, ifr, cmd); - default: - break; - } - - return -EOPNOTSUPP; -} - -static int mtk_change_mtu(struct net_device *dev, int new_mtu) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - int frag_size, old_mtu; - u32 fwd_cfg; - - if (!eth->soc->jumbo_frame) - return eth_change_mtu(dev, new_mtu); - - frag_size = mtk_max_frag_size(new_mtu); - if (new_mtu < 68 || frag_size > PAGE_SIZE) - return -EINVAL; - - old_mtu = dev->mtu; - dev->mtu = new_mtu; - - /* return early if the buffer sizes will not change */ - if (old_mtu <= ETH_DATA_LEN && new_mtu <= ETH_DATA_LEN) - return 0; - if (old_mtu > ETH_DATA_LEN && new_mtu > ETH_DATA_LEN) - return 0; - - if (new_mtu <= ETH_DATA_LEN) - eth->rx_ring[0].frag_size = mtk_max_frag_size(ETH_DATA_LEN); - else - eth->rx_ring[0].frag_size = PAGE_SIZE; - eth->rx_ring[0].rx_buf_size = - mtk_max_buf_size(eth->rx_ring[0].frag_size); - - if (!netif_running(dev)) - return 0; - - mtk_stop(dev); - fwd_cfg = mtk_r32(eth, MTK_GDMA1_FWD_CFG); - if (new_mtu <= ETH_DATA_LEN) { - fwd_cfg &= ~MTK_GDM1_JMB_EN; - } else { - fwd_cfg &= ~(MTK_GDM1_JMB_LEN_MASK << MTK_GDM1_JMB_LEN_SHIFT); - fwd_cfg |= (DIV_ROUND_UP(frag_size, 1024) << - MTK_GDM1_JMB_LEN_SHIFT) | MTK_GDM1_JMB_EN; - } - mtk_w32(eth, fwd_cfg, MTK_GDMA1_FWD_CFG); - - return mtk_open(dev); -} - -static void mtk_pending_work(struct work_struct *work) -{ - struct mtk_mac *mac = container_of(work, struct mtk_mac, pending_work); - struct mtk_eth *eth = mac->hw; - struct net_device *dev = eth->netdev[mac->id]; - int err; - - rtnl_lock(); - mtk_stop(dev); - - err = mtk_open(dev); - if (err) { - netif_alert(eth, ifup, dev, - "Driver up/down cycle failed, closing device.\n"); - dev_close(dev); - } - rtnl_unlock(); -} - -static int mtk_cleanup(struct mtk_eth *eth) -{ - int i; - - for (i = 0; i < eth->soc->mac_count; i++) { - struct mtk_mac *mac = netdev_priv(eth->netdev[i]); - - if (!eth->netdev[i]) - continue; - - unregister_netdev(eth->netdev[i]); - free_netdev(eth->netdev[i]); - cancel_work_sync(&mac->pending_work); - } - - return 0; -} - -static const struct net_device_ops mtk_netdev_ops = { - .ndo_init = mtk_init, - .ndo_uninit = mtk_uninit, - .ndo_open = mtk_open, - .ndo_stop = mtk_stop, - .ndo_start_xmit = mtk_start_xmit, - .ndo_set_mac_address = mtk_set_mac_address, - .ndo_validate_addr = eth_validate_addr, - .ndo_do_ioctl = mtk_do_ioctl, - .ndo_change_mtu = mtk_change_mtu, - .ndo_tx_timeout = mtk_tx_timeout, - .ndo_get_stats64 = mtk_get_stats64, - .ndo_vlan_rx_add_vid = mtk_vlan_rx_add_vid, - .ndo_vlan_rx_kill_vid = mtk_vlan_rx_kill_vid, -#ifdef CONFIG_NET_POLL_CONTROLLER - .ndo_poll_controller = mtk_poll_controller, -#endif -}; - -static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) -{ - struct mtk_mac *mac; - const __be32 *_id = of_get_property(np, "reg", NULL); - int id, err; - - if (!_id) { - dev_err(eth->dev, "missing mac id\n"); - return -EINVAL; - } - id = be32_to_cpup(_id); - if (id >= eth->soc->mac_count || eth->netdev[id]) { - dev_err(eth->dev, "%d is not a valid mac id\n", id); - return -EINVAL; - } - - eth->netdev[id] = alloc_etherdev(sizeof(*mac)); - if (!eth->netdev[id]) { - dev_err(eth->dev, "alloc_etherdev failed\n"); - return -ENOMEM; - } - mac = netdev_priv(eth->netdev[id]); - eth->mac[id] = mac; - mac->id = id; - mac->hw = eth; - mac->of_node = np; - INIT_WORK(&mac->pending_work, mtk_pending_work); - - if (mtk_reg_table[MTK_REG_MTK_COUNTER_BASE]) { - mac->hw_stats = devm_kzalloc(eth->dev, - sizeof(*mac->hw_stats), - GFP_KERNEL); - if (!mac->hw_stats) { - err = -ENOMEM; - goto free_netdev; - } - spin_lock_init(&mac->hw_stats->stats_lock); - mac->hw_stats->reg_offset = id * MTK_STAT_OFFSET; - } - - SET_NETDEV_DEV(eth->netdev[id], eth->dev); - eth->netdev[id]->netdev_ops = &mtk_netdev_ops; - eth->netdev[id]->base_addr = (unsigned long)eth->base; - - if (eth->soc->init_data) - eth->soc->init_data(eth->soc, eth->netdev[id]); - - eth->netdev[id]->vlan_features = eth->soc->hw_features & - ~(NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX); - eth->netdev[id]->features |= eth->soc->hw_features; - - if (mtk_reg_table[MTK_REG_MTK_DMA_VID_BASE]) - eth->netdev[id]->features |= NETIF_F_HW_VLAN_CTAG_FILTER; - - mtk_set_ethtool_ops(eth->netdev[id]); - - err = register_netdev(eth->netdev[id]); - if (err) { - dev_err(eth->dev, "error bringing up device\n"); - err = -ENOMEM; - goto free_netdev; - } - eth->netdev[id]->irq = eth->irq; - netif_info(eth, probe, eth->netdev[id], - "mediatek frame engine at 0x%08lx, irq %d\n", - eth->netdev[id]->base_addr, eth->netdev[id]->irq); - - return 0; - -free_netdev: - free_netdev(eth->netdev[id]); - return err; -} - -static int mtk_probe(struct platform_device *pdev) -{ - struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - const struct of_device_id *match; - struct device_node *mac_np; - struct mtk_soc_data *soc; - struct mtk_eth *eth; - struct clk *sysclk; - int err; - - device_reset(&pdev->dev); - - match = of_match_device(of_mtk_match, &pdev->dev); - soc = (struct mtk_soc_data *)match->data; - - if (soc->reg_table) - mtk_reg_table = soc->reg_table; - - eth = devm_kzalloc(&pdev->dev, sizeof(*eth), GFP_KERNEL); - if (!eth) - return -ENOMEM; - - eth->base = devm_ioremap_resource(&pdev->dev, res); - if (IS_ERR(eth->base)) - return PTR_ERR(eth->base); - - spin_lock_init(ð->page_lock); - - eth->ethsys = syscon_regmap_lookup_by_phandle(pdev->dev.of_node, - "mediatek,ethsys"); - if (IS_ERR(eth->ethsys)) - return PTR_ERR(eth->ethsys); - - eth->irq = platform_get_irq(pdev, 0); - if (eth->irq < 0) { - dev_err(&pdev->dev, "no IRQ resource found\n"); - return -ENXIO; - } - - sysclk = devm_clk_get(&pdev->dev, NULL); - if (IS_ERR(sysclk)) { - dev_err(&pdev->dev, - "the clock is not defined in the devicetree\n"); - return -ENXIO; - } - eth->sysclk = clk_get_rate(sysclk); - - eth->switch_np = of_parse_phandle(pdev->dev.of_node, - "mediatek,switch", 0); - if (soc->has_switch && !eth->switch_np) { - dev_err(&pdev->dev, "failed to read switch phandle\n"); - return -ENODEV; - } - - eth->dev = &pdev->dev; - eth->soc = soc; - eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE); - - err = mtk_init_hw(eth); - if (err) - return err; - - if (eth->soc->mac_count > 1) { - for_each_child_of_node(pdev->dev.of_node, mac_np) { - if (!of_device_is_compatible(mac_np, - "mediatek,eth-mac")) - continue; - - if (!of_device_is_available(mac_np)) - continue; - - err = mtk_add_mac(eth, mac_np); - if (err) - goto err_free_dev; - } - - init_dummy_netdev(ð->dummy_dev); - netif_napi_add(ð->dummy_dev, ð->rx_napi, mtk_poll, - soc->napi_weight); - } else { - err = mtk_add_mac(eth, pdev->dev.of_node); - if (err) - goto err_free_dev; - netif_napi_add(eth->netdev[0], ð->rx_napi, mtk_poll, - soc->napi_weight); - } - - platform_set_drvdata(pdev, eth); - - return 0; - -err_free_dev: - mtk_cleanup(eth); - return err; -} - -static int mtk_remove(struct platform_device *pdev) -{ - struct mtk_eth *eth = platform_get_drvdata(pdev); - - netif_napi_del(ð->rx_napi); - mtk_cleanup(eth); - platform_set_drvdata(pdev, NULL); - - return 0; -} - -static struct platform_driver mtk_driver = { - .probe = mtk_probe, - .remove = mtk_remove, - .driver = { - .name = "mtk_soc_eth", - .of_match_table = of_mtk_match, - }, -}; - -module_platform_driver(mtk_driver); - -MODULE_LICENSE("GPL"); -MODULE_AUTHOR("John Crispin "); -MODULE_DESCRIPTION("Ethernet driver for MediaTek SoC"); diff --git a/drivers/staging/mt7621-eth/mtk_eth_soc.h b/drivers/staging/mt7621-eth/mtk_eth_soc.h deleted file mode 100644 index e6ed80433f49..000000000000 --- a/drivers/staging/mt7621-eth/mtk_eth_soc.h +++ /dev/null @@ -1,716 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#ifndef MTK_ETH_H -#define MTK_ETH_H - -#include -#include -#include -#include -#include -#include -#include -#include - -/* these registers have different offsets depending on the SoC. we use a lookup - * table for these - */ -enum mtk_reg { - MTK_REG_PDMA_GLO_CFG = 0, - MTK_REG_PDMA_RST_CFG, - MTK_REG_DLY_INT_CFG, - MTK_REG_TX_BASE_PTR0, - MTK_REG_TX_MAX_CNT0, - MTK_REG_TX_CTX_IDX0, - MTK_REG_TX_DTX_IDX0, - MTK_REG_RX_BASE_PTR0, - MTK_REG_RX_MAX_CNT0, - MTK_REG_RX_CALC_IDX0, - MTK_REG_RX_DRX_IDX0, - MTK_REG_MTK_INT_ENABLE, - MTK_REG_MTK_INT_STATUS, - MTK_REG_MTK_DMA_VID_BASE, - MTK_REG_MTK_COUNTER_BASE, - MTK_REG_MTK_RST_GL, - MTK_REG_MTK_INT_STATUS2, - MTK_REG_COUNT -}; - -/* delayed interrupt bits */ -#define MTK_DELAY_EN_INT 0x80 -#define MTK_DELAY_MAX_INT 0x04 -#define MTK_DELAY_MAX_TOUT 0x04 -#define MTK_DELAY_TIME 20 -#define MTK_DELAY_CHAN (((MTK_DELAY_EN_INT | MTK_DELAY_MAX_INT) << 8) \ - | MTK_DELAY_MAX_TOUT) -#define MTK_DELAY_INIT ((MTK_DELAY_CHAN << 16) | MTK_DELAY_CHAN) -#define MTK_PSE_FQFC_CFG_INIT 0x80504000 -#define MTK_PSE_FQFC_CFG_256Q 0xff908000 - -/* interrupt bits */ -#define MTK_CNT_PPE_AF BIT(31) -#define MTK_CNT_GDM_AF BIT(29) -#define MTK_PSE_P2_FC BIT(26) -#define MTK_PSE_BUF_DROP BIT(24) -#define MTK_GDM_OTHER_DROP BIT(23) -#define MTK_PSE_P1_FC BIT(22) -#define MTK_PSE_P0_FC BIT(21) -#define MTK_PSE_FQ_EMPTY BIT(20) -#define MTK_GE1_STA_CHG BIT(18) -#define MTK_TX_COHERENT BIT(17) -#define MTK_RX_COHERENT BIT(16) -#define MTK_TX_DONE_INT3 BIT(11) -#define MTK_TX_DONE_INT2 BIT(10) -#define MTK_TX_DONE_INT1 BIT(9) -#define MTK_TX_DONE_INT0 BIT(8) -#define MTK_RX_DONE_INT0 BIT(2) -#define MTK_TX_DLY_INT BIT(1) -#define MTK_RX_DLY_INT BIT(0) - -#define MTK_RX_DONE_INT MTK_RX_DONE_INT0 -#define MTK_TX_DONE_INT (MTK_TX_DONE_INT0 | MTK_TX_DONE_INT1 | \ - MTK_TX_DONE_INT2 | MTK_TX_DONE_INT3) - -#define RT5350_RX_DLY_INT BIT(30) -#define RT5350_TX_DLY_INT BIT(28) -#define RT5350_RX_DONE_INT1 BIT(17) -#define RT5350_RX_DONE_INT0 BIT(16) -#define RT5350_TX_DONE_INT3 BIT(3) -#define RT5350_TX_DONE_INT2 BIT(2) -#define RT5350_TX_DONE_INT1 BIT(1) -#define RT5350_TX_DONE_INT0 BIT(0) - -#define RT5350_RX_DONE_INT (RT5350_RX_DONE_INT0 | RT5350_RX_DONE_INT1) -#define RT5350_TX_DONE_INT (RT5350_TX_DONE_INT0 | RT5350_TX_DONE_INT1 | \ - RT5350_TX_DONE_INT2 | RT5350_TX_DONE_INT3) - -/* registers */ -#define MTK_GDMA_OFFSET 0x0020 -#define MTK_PSE_OFFSET 0x0040 -#define MTK_GDMA2_OFFSET 0x0060 -#define MTK_CDMA_OFFSET 0x0080 -#define MTK_DMA_VID0 0x00a8 -#define MTK_PDMA_OFFSET 0x0100 -#define MTK_PPE_OFFSET 0x0200 -#define MTK_CMTABLE_OFFSET 0x0400 -#define MTK_POLICYTABLE_OFFSET 0x1000 - -#define MT7621_GDMA_OFFSET 0x0500 -#define MT7620_GDMA_OFFSET 0x0600 - -#define RT5350_PDMA_OFFSET 0x0800 -#define RT5350_SDM_OFFSET 0x0c00 - -#define MTK_MDIO_ACCESS 0x00 -#define MTK_MDIO_CFG 0x04 -#define MTK_GLO_CFG 0x08 -#define MTK_RST_GL 0x0C -#define MTK_INT_STATUS 0x10 -#define MTK_INT_ENABLE 0x14 -#define MTK_MDIO_CFG2 0x18 -#define MTK_FOC_TS_T 0x1C - -#define MTK_GDMA1_FWD_CFG (MTK_GDMA_OFFSET + 0x00) -#define MTK_GDMA1_SCH_CFG (MTK_GDMA_OFFSET + 0x04) -#define MTK_GDMA1_SHPR_CFG (MTK_GDMA_OFFSET + 0x08) -#define MTK_GDMA1_MAC_ADRL (MTK_GDMA_OFFSET + 0x0C) -#define MTK_GDMA1_MAC_ADRH (MTK_GDMA_OFFSET + 0x10) - -#define MTK_GDMA2_FWD_CFG (MTK_GDMA2_OFFSET + 0x00) -#define MTK_GDMA2_SCH_CFG (MTK_GDMA2_OFFSET + 0x04) -#define MTK_GDMA2_SHPR_CFG (MTK_GDMA2_OFFSET + 0x08) -#define MTK_GDMA2_MAC_ADRL (MTK_GDMA2_OFFSET + 0x0C) -#define MTK_GDMA2_MAC_ADRH (MTK_GDMA2_OFFSET + 0x10) - -#define MTK_PSE_FQ_CFG (MTK_PSE_OFFSET + 0x00) -#define MTK_CDMA_FC_CFG (MTK_PSE_OFFSET + 0x04) -#define MTK_GDMA1_FC_CFG (MTK_PSE_OFFSET + 0x08) -#define MTK_GDMA2_FC_CFG (MTK_PSE_OFFSET + 0x0C) - -#define MTK_CDMA_CSG_CFG (MTK_CDMA_OFFSET + 0x00) -#define MTK_CDMA_SCH_CFG (MTK_CDMA_OFFSET + 0x04) - -#define MT7621_GDMA_FWD_CFG(x) (MT7621_GDMA_OFFSET + (x * 0x1000)) - -/* FIXME this might be different for different SOCs */ -#define MT7620_GDMA1_FWD_CFG (MT7621_GDMA_OFFSET + 0x00) - -#define RT5350_TX_BASE_PTR0 (RT5350_PDMA_OFFSET + 0x00) -#define RT5350_TX_MAX_CNT0 (RT5350_PDMA_OFFSET + 0x04) -#define RT5350_TX_CTX_IDX0 (RT5350_PDMA_OFFSET + 0x08) -#define RT5350_TX_DTX_IDX0 (RT5350_PDMA_OFFSET + 0x0C) -#define RT5350_TX_BASE_PTR1 (RT5350_PDMA_OFFSET + 0x10) -#define RT5350_TX_MAX_CNT1 (RT5350_PDMA_OFFSET + 0x14) -#define RT5350_TX_CTX_IDX1 (RT5350_PDMA_OFFSET + 0x18) -#define RT5350_TX_DTX_IDX1 (RT5350_PDMA_OFFSET + 0x1C) -#define RT5350_TX_BASE_PTR2 (RT5350_PDMA_OFFSET + 0x20) -#define RT5350_TX_MAX_CNT2 (RT5350_PDMA_OFFSET + 0x24) -#define RT5350_TX_CTX_IDX2 (RT5350_PDMA_OFFSET + 0x28) -#define RT5350_TX_DTX_IDX2 (RT5350_PDMA_OFFSET + 0x2C) -#define RT5350_TX_BASE_PTR3 (RT5350_PDMA_OFFSET + 0x30) -#define RT5350_TX_MAX_CNT3 (RT5350_PDMA_OFFSET + 0x34) -#define RT5350_TX_CTX_IDX3 (RT5350_PDMA_OFFSET + 0x38) -#define RT5350_TX_DTX_IDX3 (RT5350_PDMA_OFFSET + 0x3C) -#define RT5350_RX_BASE_PTR0 (RT5350_PDMA_OFFSET + 0x100) -#define RT5350_RX_MAX_CNT0 (RT5350_PDMA_OFFSET + 0x104) -#define RT5350_RX_CALC_IDX0 (RT5350_PDMA_OFFSET + 0x108) -#define RT5350_RX_DRX_IDX0 (RT5350_PDMA_OFFSET + 0x10C) -#define RT5350_RX_BASE_PTR1 (RT5350_PDMA_OFFSET + 0x110) -#define RT5350_RX_MAX_CNT1 (RT5350_PDMA_OFFSET + 0x114) -#define RT5350_RX_CALC_IDX1 (RT5350_PDMA_OFFSET + 0x118) -#define RT5350_RX_DRX_IDX1 (RT5350_PDMA_OFFSET + 0x11C) -#define RT5350_PDMA_GLO_CFG (RT5350_PDMA_OFFSET + 0x204) -#define RT5350_PDMA_RST_CFG (RT5350_PDMA_OFFSET + 0x208) -#define RT5350_DLY_INT_CFG (RT5350_PDMA_OFFSET + 0x20c) -#define RT5350_MTK_INT_STATUS (RT5350_PDMA_OFFSET + 0x220) -#define RT5350_MTK_INT_ENABLE (RT5350_PDMA_OFFSET + 0x228) -#define RT5350_PDMA_SCH_CFG (RT5350_PDMA_OFFSET + 0x280) - -#define MTK_PDMA_GLO_CFG (MTK_PDMA_OFFSET + 0x00) -#define MTK_PDMA_RST_CFG (MTK_PDMA_OFFSET + 0x04) -#define MTK_PDMA_SCH_CFG (MTK_PDMA_OFFSET + 0x08) -#define MTK_DLY_INT_CFG (MTK_PDMA_OFFSET + 0x0C) -#define MTK_TX_BASE_PTR0 (MTK_PDMA_OFFSET + 0x10) -#define MTK_TX_MAX_CNT0 (MTK_PDMA_OFFSET + 0x14) -#define MTK_TX_CTX_IDX0 (MTK_PDMA_OFFSET + 0x18) -#define MTK_TX_DTX_IDX0 (MTK_PDMA_OFFSET + 0x1C) -#define MTK_TX_BASE_PTR1 (MTK_PDMA_OFFSET + 0x20) -#define MTK_TX_MAX_CNT1 (MTK_PDMA_OFFSET + 0x24) -#define MTK_TX_CTX_IDX1 (MTK_PDMA_OFFSET + 0x28) -#define MTK_TX_DTX_IDX1 (MTK_PDMA_OFFSET + 0x2C) -#define MTK_RX_BASE_PTR0 (MTK_PDMA_OFFSET + 0x30) -#define MTK_RX_MAX_CNT0 (MTK_PDMA_OFFSET + 0x34) -#define MTK_RX_CALC_IDX0 (MTK_PDMA_OFFSET + 0x38) -#define MTK_RX_DRX_IDX0 (MTK_PDMA_OFFSET + 0x3C) -#define MTK_TX_BASE_PTR2 (MTK_PDMA_OFFSET + 0x40) -#define MTK_TX_MAX_CNT2 (MTK_PDMA_OFFSET + 0x44) -#define MTK_TX_CTX_IDX2 (MTK_PDMA_OFFSET + 0x48) -#define MTK_TX_DTX_IDX2 (MTK_PDMA_OFFSET + 0x4C) -#define MTK_TX_BASE_PTR3 (MTK_PDMA_OFFSET + 0x50) -#define MTK_TX_MAX_CNT3 (MTK_PDMA_OFFSET + 0x54) -#define MTK_TX_CTX_IDX3 (MTK_PDMA_OFFSET + 0x58) -#define MTK_TX_DTX_IDX3 (MTK_PDMA_OFFSET + 0x5C) -#define MTK_RX_BASE_PTR1 (MTK_PDMA_OFFSET + 0x60) -#define MTK_RX_MAX_CNT1 (MTK_PDMA_OFFSET + 0x64) -#define MTK_RX_CALC_IDX1 (MTK_PDMA_OFFSET + 0x68) -#define MTK_RX_DRX_IDX1 (MTK_PDMA_OFFSET + 0x6C) - -/* Switch DMA configuration */ -#define RT5350_SDM_CFG (RT5350_SDM_OFFSET + 0x00) -#define RT5350_SDM_RRING (RT5350_SDM_OFFSET + 0x04) -#define RT5350_SDM_TRING (RT5350_SDM_OFFSET + 0x08) -#define RT5350_SDM_MAC_ADRL (RT5350_SDM_OFFSET + 0x0C) -#define RT5350_SDM_MAC_ADRH (RT5350_SDM_OFFSET + 0x10) -#define RT5350_SDM_TPCNT (RT5350_SDM_OFFSET + 0x100) -#define RT5350_SDM_TBCNT (RT5350_SDM_OFFSET + 0x104) -#define RT5350_SDM_RPCNT (RT5350_SDM_OFFSET + 0x108) -#define RT5350_SDM_RBCNT (RT5350_SDM_OFFSET + 0x10C) -#define RT5350_SDM_CS_ERR (RT5350_SDM_OFFSET + 0x110) - -#define RT5350_SDM_ICS_EN BIT(16) -#define RT5350_SDM_TCS_EN BIT(17) -#define RT5350_SDM_UCS_EN BIT(18) - -/* QDMA registers */ -#define MTK_QTX_CFG(x) (0x1800 + (x * 0x10)) -#define MTK_QTX_SCH(x) (0x1804 + (x * 0x10)) -#define MTK_QRX_BASE_PTR0 0x1900 -#define MTK_QRX_MAX_CNT0 0x1904 -#define MTK_QRX_CRX_IDX0 0x1908 -#define MTK_QRX_DRX_IDX0 0x190C -#define MTK_QDMA_GLO_CFG 0x1A04 -#define MTK_QDMA_RST_IDX 0x1A08 -#define MTK_QDMA_DELAY_INT 0x1A0C -#define MTK_QDMA_FC_THRES 0x1A10 -#define MTK_QMTK_INT_STATUS 0x1A18 -#define MTK_QMTK_INT_ENABLE 0x1A1C -#define MTK_QDMA_HRED2 0x1A44 - -#define MTK_QTX_CTX_PTR 0x1B00 -#define MTK_QTX_DTX_PTR 0x1B04 - -#define MTK_QTX_CRX_PTR 0x1B10 -#define MTK_QTX_DRX_PTR 0x1B14 - -#define MTK_QDMA_FQ_HEAD 0x1B20 -#define MTK_QDMA_FQ_TAIL 0x1B24 -#define MTK_QDMA_FQ_CNT 0x1B28 -#define MTK_QDMA_FQ_BLEN 0x1B2C - -#define QDMA_PAGE_SIZE 2048 -#define QDMA_TX_OWNER_CPU BIT(31) -#define QDMA_TX_SWC BIT(14) -#define TX_QDMA_SDL(_x) (((_x) & 0x3fff) << 16) -#define QDMA_RES_THRES 4 - -/* MDIO_CFG register bits */ -#define MTK_MDIO_CFG_AUTO_POLL_EN BIT(29) -#define MTK_MDIO_CFG_GP1_BP_EN BIT(16) -#define MTK_MDIO_CFG_GP1_FRC_EN BIT(15) -#define MTK_MDIO_CFG_GP1_SPEED_10 (0 << 13) -#define MTK_MDIO_CFG_GP1_SPEED_100 (1 << 13) -#define MTK_MDIO_CFG_GP1_SPEED_1000 (2 << 13) -#define MTK_MDIO_CFG_GP1_DUPLEX BIT(12) -#define MTK_MDIO_CFG_GP1_FC_TX BIT(11) -#define MTK_MDIO_CFG_GP1_FC_RX BIT(10) -#define MTK_MDIO_CFG_GP1_LNK_DWN BIT(9) -#define MTK_MDIO_CFG_GP1_AN_FAIL BIT(8) -#define MTK_MDIO_CFG_MDC_CLK_DIV_1 (0 << 6) -#define MTK_MDIO_CFG_MDC_CLK_DIV_2 (1 << 6) -#define MTK_MDIO_CFG_MDC_CLK_DIV_4 (2 << 6) -#define MTK_MDIO_CFG_MDC_CLK_DIV_8 (3 << 6) -#define MTK_MDIO_CFG_TURBO_MII_FREQ BIT(5) -#define MTK_MDIO_CFG_TURBO_MII_MODE BIT(4) -#define MTK_MDIO_CFG_RX_CLK_SKEW_0 (0 << 2) -#define MTK_MDIO_CFG_RX_CLK_SKEW_200 (1 << 2) -#define MTK_MDIO_CFG_RX_CLK_SKEW_400 (2 << 2) -#define MTK_MDIO_CFG_RX_CLK_SKEW_INV (3 << 2) -#define MTK_MDIO_CFG_TX_CLK_SKEW_0 0 -#define MTK_MDIO_CFG_TX_CLK_SKEW_200 1 -#define MTK_MDIO_CFG_TX_CLK_SKEW_400 2 -#define MTK_MDIO_CFG_TX_CLK_SKEW_INV 3 - -/* uni-cast port */ -#define MTK_GDM1_JMB_LEN_MASK 0xf -#define MTK_GDM1_JMB_LEN_SHIFT 28 -#define MTK_GDM1_ICS_EN BIT(22) -#define MTK_GDM1_TCS_EN BIT(21) -#define MTK_GDM1_UCS_EN BIT(20) -#define MTK_GDM1_JMB_EN BIT(19) -#define MTK_GDM1_STRPCRC BIT(16) -#define MTK_GDM1_UFRC_P_CPU (0 << 12) -#define MTK_GDM1_UFRC_P_GDMA1 (1 << 12) -#define MTK_GDM1_UFRC_P_PPE (6 << 12) - -/* checksums */ -#define MTK_ICS_GEN_EN BIT(2) -#define MTK_UCS_GEN_EN BIT(1) -#define MTK_TCS_GEN_EN BIT(0) - -/* dma mode */ -#define MTK_PDMA BIT(0) -#define MTK_QDMA BIT(1) -#define MTK_PDMA_RX_QDMA_TX (MTK_PDMA | MTK_QDMA) - -/* dma ring */ -#define MTK_PST_DRX_IDX0 BIT(16) -#define MTK_PST_DTX_IDX3 BIT(3) -#define MTK_PST_DTX_IDX2 BIT(2) -#define MTK_PST_DTX_IDX1 BIT(1) -#define MTK_PST_DTX_IDX0 BIT(0) - -#define MTK_RX_2B_OFFSET BIT(31) -#define MTK_TX_WB_DDONE BIT(6) -#define MTK_RX_DMA_BUSY BIT(3) -#define MTK_TX_DMA_BUSY BIT(1) -#define MTK_RX_DMA_EN BIT(2) -#define MTK_TX_DMA_EN BIT(0) - -#define MTK_PDMA_SIZE_4DWORDS (0 << 4) -#define MTK_PDMA_SIZE_8DWORDS (1 << 4) -#define MTK_PDMA_SIZE_16DWORDS (2 << 4) - -#define MTK_US_CYC_CNT_MASK 0xff -#define MTK_US_CYC_CNT_SHIFT 0x8 -#define MTK_US_CYC_CNT_DIVISOR 1000000 - -/* PDMA descriptor rxd2 */ -#define RX_DMA_DONE BIT(31) -#define RX_DMA_LSO BIT(30) -#define RX_DMA_PLEN0(_x) (((_x) & 0x3fff) << 16) -#define RX_DMA_GET_PLEN0(_x) (((_x) >> 16) & 0x3fff) -#define RX_DMA_TAG BIT(15) - -/* PDMA descriptor rxd3 */ -#define RX_DMA_TPID(_x) (((_x) >> 16) & 0xffff) -#define RX_DMA_VID(_x) ((_x) & 0xfff) - -/* PDMA descriptor rxd4 */ -#define RX_DMA_L4VALID BIT(30) -#define RX_DMA_FPORT_SHIFT 19 -#define RX_DMA_FPORT_MASK 0x7 - -struct mtk_rx_dma { - unsigned int rxd1; - unsigned int rxd2; - unsigned int rxd3; - unsigned int rxd4; -} __packed __aligned(4); - -/* PDMA tx descriptor bits */ -#define TX_DMA_BUF_LEN 0x3fff -#define TX_DMA_PLEN0_MASK (TX_DMA_BUF_LEN << 16) -#define TX_DMA_PLEN0(_x) (((_x) & TX_DMA_BUF_LEN) << 16) -#define TX_DMA_PLEN1(_x) ((_x) & TX_DMA_BUF_LEN) -#define TX_DMA_GET_PLEN0(_x) (((_x) >> 16) & TX_DMA_BUF_LEN) -#define TX_DMA_GET_PLEN1(_x) ((_x) & TX_DMA_BUF_LEN) -#define TX_DMA_LS1 BIT(14) -#define TX_DMA_LS0 BIT(30) -#define TX_DMA_DONE BIT(31) -#define TX_DMA_FPORT_SHIFT 25 -#define TX_DMA_FPORT_MASK 0x7 -#define TX_DMA_INS_VLAN_MT7621 BIT(16) -#define TX_DMA_INS_VLAN BIT(7) -#define TX_DMA_INS_PPPOE BIT(12) -#define TX_DMA_TAG BIT(15) -#define TX_DMA_TAG_MASK BIT(15) -#define TX_DMA_QN(_x) ((_x) << 16) -#define TX_DMA_PN(_x) ((_x) << 24) -#define TX_DMA_QN_MASK TX_DMA_QN(0x7) -#define TX_DMA_PN_MASK TX_DMA_PN(0x7) -#define TX_DMA_UDF BIT(20) -#define TX_DMA_CHKSUM (0x7 << 29) -#define TX_DMA_TSO BIT(28) -#define TX_DMA_DESP4_DEF (TX_DMA_QN(3) | TX_DMA_PN(1)) - -/* frame engine counters */ -#define MTK_PPE_AC_BCNT0 (MTK_CMTABLE_OFFSET + 0x00) -#define MTK_GDMA1_TX_GBCNT (MTK_CMTABLE_OFFSET + 0x300) -#define MTK_GDMA2_TX_GBCNT (MTK_GDMA1_TX_GBCNT + 0x40) - -/* phy device flags */ -#define MTK_PHY_FLAG_PORT BIT(0) -#define MTK_PHY_FLAG_ATTACH BIT(1) - -struct mtk_tx_dma { - unsigned int txd1; - unsigned int txd2; - unsigned int txd3; - unsigned int txd4; -} __packed __aligned(4); - -struct mtk_eth; -struct mtk_mac; - -/* manage the attached phys */ -struct mtk_phy { - spinlock_t lock; - - struct phy_device *phy[8]; - struct device_node *phy_node[8]; - const __be32 *phy_fixed[8]; - int duplex[8]; - int speed[8]; - int tx_fc[8]; - int rx_fc[8]; - int (*connect)(struct mtk_mac *mac); - void (*disconnect)(struct mtk_mac *mac); - void (*start)(struct mtk_mac *mac); - void (*stop)(struct mtk_mac *mac); -}; - -/* struct mtk_soc_data - the structure that holds the SoC specific data - * @reg_table: Some of the legacy registers changed their location - * over time. Their offsets are stored in this table - * - * @init_data: Some features depend on the silicon revision. This - * callback allows runtime modification of the content of - * this struct - * @reset_fe: This callback is used to trigger the reset of the frame - * engine - * @set_mac: This callback is used to set the unicast mac address - * filter - * @fwd_config: This callback is used to setup the forward config - * register of the MAC - * @switch_init: This callback is used to bring up the switch core - * @port_init: Some SoCs have ports that can be router to a switch port - * or an external PHY. This callback is used to setup these - * ports. - * @has_carrier: This callback allows driver to check if there is a cable - * attached. - * @mdio_init: This callbck is used to setup the MDIO bus if one is - * present - * @mdio_cleanup: This callback is used to cleanup the MDIO state. - * @mdio_write: This callback is used to write data to the MDIO bus. - * @mdio_read: This callback is used to write data to the MDIO bus. - * @mdio_adjust_link: This callback is used to apply the PHY settings. - * @piac_offset: the PIAC register has a different different base offset - * @hw_features: feature set depends on the SoC type - * @dma_ring_size: allow GBit SoCs to set bigger rings than FE SoCs - * @napi_weight: allow GBit SoCs to set bigger napi weight than FE SoCs - * @dma_type: SoCs is PDMA, QDMA or a mix of the 2 - * @pdma_glo_cfg: the default DMA configuration - * @rx_int: the TX interrupt bits used by the SoC - * @tx_int: the TX interrupt bits used by the SoC - * @status_int: the Status interrupt bits used by the SoC - * @checksum_bit: the bits used to turn on HW checksumming - * @txd4: default value of the TXD4 descriptor - * @mac_count: the number of MACs that the SoC has - * @new_stats: there is a old and new way to read hardware stats - * registers - * @jumbo_frame: does the SoC support jumbo frames ? - * @rx_2b_offset: tell the rx dma to offset the data by 2 bytes - * @rx_sg_dma: scatter gather support - * @padding_64b enable 64 bit padding - * @padding_bug: rt2880 has a padding bug - * @has_switch: does the SoC have a built-in switch - * - * Although all of the supported SoCs share the same basic functionality, there - * are several SoC specific functions and features that we need to support. This - * struct holds the SoC specific data so that the common core can figure out - * how to setup and use these differences. - */ -struct mtk_soc_data { - const u16 *reg_table; - - void (*init_data)(struct mtk_soc_data *data, struct net_device *netdev); - void (*reset_fe)(struct mtk_eth *eth); - void (*set_mac)(struct mtk_mac *mac, unsigned char *macaddr); - int (*fwd_config)(struct mtk_eth *eth); - int (*switch_init)(struct mtk_eth *eth); - void (*port_init)(struct mtk_eth *eth, struct mtk_mac *mac, - struct device_node *port); - int (*has_carrier)(struct mtk_eth *eth); - int (*mdio_init)(struct mtk_eth *eth); - void (*mdio_cleanup)(struct mtk_eth *eth); - int (*mdio_write)(struct mii_bus *bus, int phy_addr, int phy_reg, - u16 val); - int (*mdio_read)(struct mii_bus *bus, int phy_addr, int phy_reg); - void (*mdio_adjust_link)(struct mtk_eth *eth, int port); - u32 piac_offset; - netdev_features_t hw_features; - u32 dma_ring_size; - u32 napi_weight; - u32 dma_type; - u32 pdma_glo_cfg; - u32 rx_int; - u32 tx_int; - u32 status_int; - u32 checksum_bit; - u32 txd4; - u32 mac_count; - - u32 new_stats:1; - u32 jumbo_frame:1; - u32 rx_2b_offset:1; - u32 rx_sg_dma:1; - u32 padding_64b:1; - u32 padding_bug:1; - u32 has_switch:1; -}; - -#define MTK_STAT_OFFSET 0x40 - -/* struct mtk_hw_stats - the structure that holds the traffic statistics. - * @stats_lock: make sure that stats operations are atomic - * @reg_offset: the status register offset of the SoC - * @syncp: the refcount - * - * All of the supported SoCs have hardware counters for traffic statstics. - * Whenever the status IRQ triggers we can read the latest stats from these - * counters and store them in this struct. - */ -struct mtk_hw_stats { - spinlock_t stats_lock; - u32 reg_offset; - struct u64_stats_sync syncp; - - u64 tx_bytes; - u64 tx_packets; - u64 tx_skip; - u64 tx_collisions; - u64 rx_bytes; - u64 rx_packets; - u64 rx_overflow; - u64 rx_fcs_errors; - u64 rx_short_errors; - u64 rx_long_errors; - u64 rx_checksum_errors; - u64 rx_flow_control_packets; -}; - -/* PDMA descriptor can point at 1-2 segments. This enum allows us to track how - * memory was allocated so that it can be freed properly - */ -enum mtk_tx_flags { - MTK_TX_FLAGS_SINGLE0 = 0x01, - MTK_TX_FLAGS_PAGE0 = 0x02, - MTK_TX_FLAGS_PAGE1 = 0x04, -}; - -/* struct mtk_tx_buf - This struct holds the pointers to the memory pointed at - * by the TX descriptor s - * @skb: The SKB pointer of the packet being sent - * @dma_addr0: The base addr of the first segment - * @dma_len0: The length of the first segment - * @dma_addr1: The base addr of the second segment - * @dma_len1: The length of the second segment - */ -struct mtk_tx_buf { - struct sk_buff *skb; - u32 flags; - DEFINE_DMA_UNMAP_ADDR(dma_addr0); - DEFINE_DMA_UNMAP_LEN(dma_len0); - DEFINE_DMA_UNMAP_ADDR(dma_addr1); - DEFINE_DMA_UNMAP_LEN(dma_len1); -}; - -/* struct mtk_tx_ring - This struct holds info describing a TX ring - * @tx_dma: The descriptor ring - * @tx_buf: The memory pointed at by the ring - * @tx_phys: The physical addr of tx_buf - * @tx_next_free: Pointer to the next free descriptor - * @tx_last_free: Pointer to the last free descriptor - * @tx_thresh: The threshold of minimum amount of free descriptors - * @tx_map: Callback to map a new packet into the ring - * @tx_poll: Callback for the housekeeping function - * @tx_clean: Callback for the cleanup function - * @tx_ring_size: How many descriptors are in the ring - * @tx_free_idx: The index of th next free descriptor - * @tx_next_idx: QDMA uses a linked list. This element points to the next - * free descriptor in the list - * @tx_free_count: QDMA uses a linked list. Track how many free descriptors - * are present - */ -struct mtk_tx_ring { - struct mtk_tx_dma *tx_dma; - struct mtk_tx_buf *tx_buf; - dma_addr_t tx_phys; - struct mtk_tx_dma *tx_next_free; - struct mtk_tx_dma *tx_last_free; - u16 tx_thresh; - int (*tx_map)(struct sk_buff *skb, struct net_device *dev, int tx_num, - struct mtk_tx_ring *ring, bool gso); - int (*tx_poll)(struct mtk_eth *eth, int budget, bool *tx_again); - void (*tx_clean)(struct mtk_eth *eth); - - /* PDMA only */ - u16 tx_ring_size; - u16 tx_free_idx; - - /* QDMA only */ - u16 tx_next_idx; - atomic_t tx_free_count; -}; - -/* struct mtk_rx_ring - This struct holds info describing a RX ring - * @rx_dma: The descriptor ring - * @rx_data: The memory pointed at by the ring - * @trx_phys: The physical addr of rx_buf - * @rx_ring_size: How many descriptors are in the ring - * @rx_buf_size: The size of each packet buffer - * @rx_calc_idx: The current head of ring - */ -struct mtk_rx_ring { - struct mtk_rx_dma *rx_dma; - u8 **rx_data; - dma_addr_t rx_phys; - u16 rx_ring_size; - u16 frag_size; - u16 rx_buf_size; - u16 rx_calc_idx; -}; - -/* currently no SoC has more than 2 macs */ -#define MTK_MAX_DEVS 2 - -/* struct mtk_eth - This is the main datasructure for holding the state - * of the driver - * @dev: The device pointer - * @base: The mapped register i/o base - * @page_lock: Make sure that register operations are atomic - * @soc: pointer to our SoC specific data - * @dummy_dev: we run 2 netdevs on 1 physical DMA ring and need a - * dummy for NAPI to work - * @netdev: The netdev instances - * @mac: Each netdev is linked to a physical MAC - * @switch_np: The phandle for the switch - * @irq: The IRQ that we are using - * @msg_enable: Ethtool msg level - * @ysclk: The sysclk rate - neeed for calibration - * @ethsys: The register map pointing at the range used to setup - * MII modes - * @dma_refcnt: track how many netdevs are using the DMA engine - * @tx_ring: Pointer to the memore holding info about the TX ring - * @rx_ring: Pointer to the memore holding info about the RX ring - * @rx_napi: The NAPI struct - * @scratch_ring: Newer SoCs need memory for a second HW managed TX ring - * @scratch_head: The scratch memory that scratch_ring points to. - * @phy: Info about the attached PHYs - * @mii_bus: If there is a bus we need to create an instance for it - * @link: Track if the ports have a physical link - * @sw_priv: Pointer to the switches private data - * @vlan_map: RX VID tracking - */ - -struct mtk_eth { - struct device *dev; - void __iomem *base; - spinlock_t page_lock; - struct mtk_soc_data *soc; - struct net_device dummy_dev; - struct net_device *netdev[MTK_MAX_DEVS]; - struct mtk_mac *mac[MTK_MAX_DEVS]; - struct device_node *switch_np; - int irq; - u32 msg_enable; - unsigned long sysclk; - struct regmap *ethsys; - atomic_t dma_refcnt; - struct mtk_tx_ring tx_ring; - struct mtk_rx_ring rx_ring[2]; - struct napi_struct rx_napi; - struct mtk_tx_dma *scratch_ring; - void *scratch_head; - struct mtk_phy *phy; - struct mii_bus *mii_bus; - int link[8]; - void *sw_priv; - unsigned long vlan_map; -}; - -/* struct mtk_mac - the structure that holds the info about the MACs of the - * SoC - * @id: The number of the MAC - * @of_node: Our devicetree node - * @hw: Backpointer to our main datastruture - * @hw_stats: Packet statistics counter - * @phy_dev: The attached PHY if available - * @phy_flags: The PHYs flags - * @pending_work: The workqueue used to reset the dma ring - */ -struct mtk_mac { - int id; - struct device_node *of_node; - struct mtk_eth *hw; - struct mtk_hw_stats *hw_stats; - struct phy_device *phy_dev; - u32 phy_flags; - struct work_struct pending_work; -}; - -/* the struct describing the SoC. these are declared in the soc_xyz.c files */ -extern const struct of_device_id of_mtk_match[]; - -/* read the hardware status register */ -void mtk_stats_update_mac(struct mtk_mac *mac); - -/* default checksum setup handler */ -void mtk_reset(struct mtk_eth *eth, u32 reset_bits); - -/* register i/o wrappers */ -void mtk_w32(struct mtk_eth *eth, u32 val, unsigned int reg); -u32 mtk_r32(struct mtk_eth *eth, unsigned int reg); - -/* default clock calibration handler */ -int mtk_set_clock_cycle(struct mtk_eth *eth); - -/* default checksum setup handler */ -void mtk_csum_config(struct mtk_eth *eth); - -/* default forward config handler */ -void mtk_fwd_config(struct mtk_eth *eth); - -#endif /* MTK_ETH_H */ diff --git a/drivers/staging/mt7621-eth/soc_mt7621.c b/drivers/staging/mt7621-eth/soc_mt7621.c deleted file mode 100644 index 5d63b5d96f6b..000000000000 --- a/drivers/staging/mt7621-eth/soc_mt7621.c +++ /dev/null @@ -1,161 +0,0 @@ -/* This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * Copyright (C) 2009-2016 John Crispin - * Copyright (C) 2009-2016 Felix Fietkau - * Copyright (C) 2013-2016 Michael Lee - */ - -#include -#include -#include -#include - -#include - -#include "mtk_eth_soc.h" -#include "gsw_mt7620.h" -#include "mdio.h" - -#define MT7620_CDMA_CSG_CFG 0x400 -#define MT7621_CDMP_IG_CTRL (MT7620_CDMA_CSG_CFG + 0x00) -#define MT7621_CDMP_EG_CTRL (MT7620_CDMA_CSG_CFG + 0x04) -#define MT7621_RESET_FE BIT(6) -#define MT7621_L4_VALID BIT(24) - -#define MT7621_TX_DMA_UDF BIT(19) - -#define CDMA_ICS_EN BIT(2) -#define CDMA_UCS_EN BIT(1) -#define CDMA_TCS_EN BIT(0) - -#define GDMA_ICS_EN BIT(22) -#define GDMA_TCS_EN BIT(21) -#define GDMA_UCS_EN BIT(20) - -/* frame engine counters */ -#define MT7621_REG_MIB_OFFSET 0x2000 -#define MT7621_PPE_AC_BCNT0 (MT7621_REG_MIB_OFFSET + 0x00) -#define MT7621_GDM1_TX_GBCNT (MT7621_REG_MIB_OFFSET + 0x400) -#define MT7621_GDM2_TX_GBCNT (MT7621_GDM1_TX_GBCNT + 0x40) - -#define GSW_REG_GDMA1_MAC_ADRL 0x508 -#define GSW_REG_GDMA1_MAC_ADRH 0x50C -#define GSW_REG_GDMA2_MAC_ADRL 0x1508 -#define GSW_REG_GDMA2_MAC_ADRH 0x150C - -#define MT7621_MTK_RST_GL 0x04 -#define MT7620_MTK_INT_STATUS2 0x08 - -/* MTK_INT_STATUS reg on mt7620 define CNT_GDM1_AF at BIT(29) - * but after test it should be BIT(13). - */ -#define MT7621_MTK_GDM1_AF BIT(28) -#define MT7621_MTK_GDM2_AF BIT(29) - -static const u16 mt7621_reg_table[MTK_REG_COUNT] = { - [MTK_REG_PDMA_GLO_CFG] = RT5350_PDMA_GLO_CFG, - [MTK_REG_PDMA_RST_CFG] = RT5350_PDMA_RST_CFG, - [MTK_REG_DLY_INT_CFG] = RT5350_DLY_INT_CFG, - [MTK_REG_TX_BASE_PTR0] = RT5350_TX_BASE_PTR0, - [MTK_REG_TX_MAX_CNT0] = RT5350_TX_MAX_CNT0, - [MTK_REG_TX_CTX_IDX0] = RT5350_TX_CTX_IDX0, - [MTK_REG_TX_DTX_IDX0] = RT5350_TX_DTX_IDX0, - [MTK_REG_RX_BASE_PTR0] = RT5350_RX_BASE_PTR0, - [MTK_REG_RX_MAX_CNT0] = RT5350_RX_MAX_CNT0, - [MTK_REG_RX_CALC_IDX0] = RT5350_RX_CALC_IDX0, - [MTK_REG_RX_DRX_IDX0] = RT5350_RX_DRX_IDX0, - [MTK_REG_MTK_INT_ENABLE] = RT5350_MTK_INT_ENABLE, - [MTK_REG_MTK_INT_STATUS] = RT5350_MTK_INT_STATUS, - [MTK_REG_MTK_DMA_VID_BASE] = 0, - [MTK_REG_MTK_COUNTER_BASE] = MT7621_GDM1_TX_GBCNT, - [MTK_REG_MTK_RST_GL] = MT7621_MTK_RST_GL, - [MTK_REG_MTK_INT_STATUS2] = MT7620_MTK_INT_STATUS2, -}; - -static void mt7621_mtk_reset(struct mtk_eth *eth) -{ - mtk_reset(eth, MT7621_RESET_FE); -} - -static int mt7621_fwd_config(struct mtk_eth *eth) -{ - /* Setup GMAC1 only, there is no support for GMAC2 yet */ - mtk_w32(eth, mtk_r32(eth, MT7620_GDMA1_FWD_CFG) & ~0xffff, - MT7620_GDMA1_FWD_CFG); - - /* Enable RX checksum */ - mtk_w32(eth, mtk_r32(eth, MT7620_GDMA1_FWD_CFG) | (GDMA_ICS_EN | - GDMA_TCS_EN | GDMA_UCS_EN), - MT7620_GDMA1_FWD_CFG); - - /* Enable RX VLan Offloading */ - mtk_w32(eth, 0, MT7621_CDMP_EG_CTRL); - - return 0; -} - -static void mt7621_set_mac(struct mtk_mac *mac, unsigned char *hwaddr) -{ - unsigned long flags; - - spin_lock_irqsave(&mac->hw->page_lock, flags); - if (mac->id == 0) { - mtk_w32(mac->hw, (hwaddr[0] << 8) | hwaddr[1], - GSW_REG_GDMA1_MAC_ADRH); - mtk_w32(mac->hw, (hwaddr[2] << 24) | (hwaddr[3] << 16) | - (hwaddr[4] << 8) | hwaddr[5], - GSW_REG_GDMA1_MAC_ADRL); - } - if (mac->id == 1) { - mtk_w32(mac->hw, (hwaddr[0] << 8) | hwaddr[1], - GSW_REG_GDMA2_MAC_ADRH); - mtk_w32(mac->hw, (hwaddr[2] << 24) | (hwaddr[3] << 16) | - (hwaddr[4] << 8) | hwaddr[5], - GSW_REG_GDMA2_MAC_ADRL); - } - spin_unlock_irqrestore(&mac->hw->page_lock, flags); -} - -static struct mtk_soc_data mt7621_data = { - .hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM | - NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | - NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 | - NETIF_F_IPV6_CSUM, - .dma_type = MTK_PDMA, - .dma_ring_size = 256, - .napi_weight = 64, - .new_stats = 1, - .padding_64b = 1, - .rx_2b_offset = 1, - .rx_sg_dma = 1, - .has_switch = 1, - .mac_count = 2, - .reset_fe = mt7621_mtk_reset, - .set_mac = mt7621_set_mac, - .fwd_config = mt7621_fwd_config, - .switch_init = mtk_gsw_init, - .reg_table = mt7621_reg_table, - .pdma_glo_cfg = MTK_PDMA_SIZE_16DWORDS, - .rx_int = RT5350_RX_DONE_INT, - .tx_int = RT5350_TX_DONE_INT, - .status_int = MT7621_MTK_GDM1_AF | MT7621_MTK_GDM2_AF, - .checksum_bit = MT7621_L4_VALID, - .has_carrier = mt7620_has_carrier, - .mdio_read = mt7620_mdio_read, - .mdio_write = mt7620_mdio_write, - .mdio_adjust_link = mt7620_mdio_link_adjust, -}; - -const struct of_device_id of_mtk_match[] = { - { .compatible = "mediatek,mt7621-eth", .data = &mt7621_data }, - {}, -}; - -MODULE_DEVICE_TABLE(of, of_mtk_match); From 4420a5611ea5d42c16628d01784dda7a8260d738 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 12 Mar 2019 10:09:37 +1100 Subject: [PATCH 088/468] staging: mt7621-dts: update ethernet settings. The ethernet in mt7621 is now supported by drivers/net/ethernet/mediatek/ which provides support for the integrated switch through DSA. This requires some devicetree changes, and particularly allows a board dts to identify which switch ports are present. The second CPU interface - gmac1 - doesn't work yet, so the device tree information may not be correct. The phy (which is present on the gnubee-pc2) can negotiate and report connection speed etc, but no traffic flows. The gnubee-pc1 has two network ports which are 'black' and 'blue'. There are connected to switch ports 0 and 4 respectively. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman --- drivers/staging/mt7621-dts/gbpc1.dts | 29 +++++----- drivers/staging/mt7621-dts/mt7621.dtsi | 73 ++++++++++++++++++++++++-- 2 files changed, 83 insertions(+), 19 deletions(-) diff --git a/drivers/staging/mt7621-dts/gbpc1.dts b/drivers/staging/mt7621-dts/gbpc1.dts index b73385540216..250c15ace2a7 100644 --- a/drivers/staging/mt7621-dts/gbpc1.dts +++ b/drivers/staging/mt7621-dts/gbpc1.dts @@ -117,22 +117,6 @@ &pcie { status = "okay"; }; -ðernet { - //mtd-mac-address = <&factory 0xe000>; - gmac1: mac@0 { - compatible = "mediatek,eth-mac"; - reg = <0>; - phy-handle = <&phy1>; - }; - - mdio-bus { - phy1: ethernet-phy@1 { - reg = <1>; - phy-mode = "rgmii"; - }; - }; -}; - &pinctrl { state_default: pinctrl0 { gpio { @@ -141,3 +125,16 @@ gpio { }; }; }; + +&switch0 { + ports { + port@0 { + label = "ethblack"; + status = "ok"; + }; + port@4 { + label = "ethblue"; + status = "ok"; + }; + }; +}; diff --git a/drivers/staging/mt7621-dts/mt7621.dtsi b/drivers/staging/mt7621-dts/mt7621.dtsi index 6aff3680ce4b..17020e24abd2 100644 --- a/drivers/staging/mt7621-dts/mt7621.dtsi +++ b/drivers/staging/mt7621-dts/mt7621.dtsi @@ -372,16 +372,83 @@ ethernet: ethernet@1e100000 { mediatek,ethsys = <ðsys>; - mediatek,switch = <&gsw>; + gmac0: mac@0 { + compatible = "mediatek,eth-mac"; + reg = <0>; + phy-mode = "rgmii"; + fixed-link { + speed = <1000>; + full-duplex; + pause; + }; + }; + gmac1: mac@1 { + compatible = "mediatek,eth-mac"; + reg = <1>; + status = "off"; + phy-mode = "rgmii"; + phy-handle = <&phy5>; + }; mdio-bus { #address-cells = <1>; #size-cells = <0>; - phy1f: ethernet-phy@1f { - reg = <0x1f>; + phy5: ethernet-phy@5 { + reg = <5>; phy-mode = "rgmii"; }; + + switch0: switch0@0 { + compatible = "mediatek,mt7621"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + mediatek,mcm; + resets = <&rstctrl 2>; + reset-names = "mcm"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + port@0 { + status = "off"; + reg = <0>; + label = "lan0"; + }; + port@1 { + status = "off"; + reg = <1>; + label = "lan1"; + }; + port@2 { + status = "off"; + reg = <2>; + label = "lan2"; + }; + port@3 { + status = "off"; + reg = <3>; + label = "lan3"; + }; + port@4 { + status = "off"; + reg = <4>; + label = "lan4"; + }; + port@6 { + reg = <6>; + label = "cpu"; + ethernet = <&gmac0>; + phy-mode = "trgmii"; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + }; }; }; From 8bce6dcede65139a087ff240127e3f3c01363eed Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Mon, 11 Mar 2019 23:10:10 +0800 Subject: [PATCH 089/468] staging: erofs: fix to handle error path of erofs_vmap() erofs_vmap() wrapped vmap() and vm_map_ram() to return virtual continuous memory, but both of them can failed due to a lot of reason, previously, erofs_vmap()'s callers didn't handle them, which can potentially cause NULL pointer access, fix it. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Fixes: 0d40d6e399c1 ("staging: erofs: add a generic z_erofs VLE decompressor") Cc: # 4.19+ Signed-off-by: Gao Xiang Signed-off-by: Chao Yu Signed-off-by: Greg Kroah-Hartman --- drivers/staging/erofs/unzip_vle.c | 4 ++++ drivers/staging/erofs/unzip_vle_lz4.c | 7 +++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index 8715bc50e09c..c7b3b21123c1 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -1029,6 +1029,10 @@ static int z_erofs_vle_unzip(struct super_block *sb, skip_allocpage: vout = erofs_vmap(pages, nr_pages); + if (!vout) { + err = -ENOMEM; + goto out; + } err = z_erofs_vle_unzip_vmap(compressed_pages, clusterpages, vout, llen, work->pageofs, overlapped); diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c index 48b263a2731a..0daac9b984a8 100644 --- a/drivers/staging/erofs/unzip_vle_lz4.c +++ b/drivers/staging/erofs/unzip_vle_lz4.c @@ -136,10 +136,13 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages, nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE); - if (clusterpages == 1) + if (clusterpages == 1) { vin = kmap_atomic(compressed_pages[0]); - else + } else { vin = erofs_vmap(compressed_pages, clusterpages); + if (!vin) + return -ENOMEM; + } preempt_disable(); vout = erofs_pcpubuf[smp_processor_id()].data; From bafd9c64056cd034a1174dcadb65cd3b294ff8f6 Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Mon, 4 Mar 2019 14:33:54 +0000 Subject: [PATCH 090/468] staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest `ni_cdio_cmdtest()` validates Comedi asynchronous commands for the DIO subdevice (subdevice 2) of supported National Instruments M-series cards. It is called when handling the `COMEDI_CMD` and `COMEDI_CMDTEST` ioctls for this subdevice. There are two causes for a possible divide-by-zero error when validating that the `stop_arg` member of the passed-in command is not too large. The first cause for the divide-by-zero is that calls to `comedi_bytes_per_scan()` are only valid once the command has been copied to `s->async->cmd`, but that copy is only done for the `COMEDI_CMD` ioctl. For the `COMEDI_CMDTEST` ioctl, it will use whatever was left there by the previous `COMEDI_CMD` ioctl, if any. (This is very likely, as it is usual for the application to use `COMEDI_CMDTEST` before `COMEDI_CMD`.) If there has been no previous, valid `COMEDI_CMD` for this subdevice, then `comedi_bytes_per_scan()` will return 0, so the subsequent division in `ni_cdio_cmdtest()` of `s->async->prealloc_bufsz / comedi_bytes_per_scan(s)` will be a divide-by-zero error. To fix this error, call a new function `comedi_bytes_per_scan_cmd(s, cmd)`, based on the existing `comedi_bytes_per_scan(s)` but using a specified `struct comedi_cmd` for its calculations. (Also refactor `comedi_bytes_per_scan()` to call the new function.) Once the first cause for the divide-by-zero has been fixed, the second cause is that `comedi_bytes_per_scan_cmd()` can legitimately return 0 if the `scan_end_arg` member of the `struct comedi_cmd` being tested is 0. Fix it by only performing the division (and validating that `stop_arg` is no more than the maximum value) if `comedi_bytes_per_scan_cmd()` returns a non-zero value. The problem was reported on the COMEDI mailing list here: https://groups.google.com/forum/#!topic/comedi_list/4t9WlHzMhKM Reported-by: Ivan Vasilyev Tested-by: Ivan Vasilyev Fixes: f164cbf98fa8 ("staging: comedi: ni_mio_common: add finite regeneration to dio output") Cc: # 4.6+ Cc: Spencer E. Olson Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/comedidev.h | 2 ++ drivers/staging/comedi/drivers.c | 33 ++++++++++++++++--- .../staging/comedi/drivers/ni_mio_common.c | 10 ++++-- 3 files changed, 38 insertions(+), 7 deletions(-) diff --git a/drivers/staging/comedi/comedidev.h b/drivers/staging/comedi/comedidev.h index a7d569cfca5d..0dff1ac057cd 100644 --- a/drivers/staging/comedi/comedidev.h +++ b/drivers/staging/comedi/comedidev.h @@ -1001,6 +1001,8 @@ int comedi_dio_insn_config(struct comedi_device *dev, unsigned int mask); unsigned int comedi_dio_update_state(struct comedi_subdevice *s, unsigned int *data); +unsigned int comedi_bytes_per_scan_cmd(struct comedi_subdevice *s, + struct comedi_cmd *cmd); unsigned int comedi_bytes_per_scan(struct comedi_subdevice *s); unsigned int comedi_nscans_left(struct comedi_subdevice *s, unsigned int nscans); diff --git a/drivers/staging/comedi/drivers.c b/drivers/staging/comedi/drivers.c index eefa62f42c0f..5a32b8fc000e 100644 --- a/drivers/staging/comedi/drivers.c +++ b/drivers/staging/comedi/drivers.c @@ -394,11 +394,13 @@ unsigned int comedi_dio_update_state(struct comedi_subdevice *s, EXPORT_SYMBOL_GPL(comedi_dio_update_state); /** - * comedi_bytes_per_scan() - Get length of asynchronous command "scan" in bytes + * comedi_bytes_per_scan_cmd() - Get length of asynchronous command "scan" in + * bytes * @s: COMEDI subdevice. + * @cmd: COMEDI command. * * Determines the overall scan length according to the subdevice type and the - * number of channels in the scan. + * number of channels in the scan for the specified command. * * For digital input, output or input/output subdevices, samples for * multiple channels are assumed to be packed into one or more unsigned @@ -408,9 +410,9 @@ EXPORT_SYMBOL_GPL(comedi_dio_update_state); * * Returns the overall scan length in bytes. */ -unsigned int comedi_bytes_per_scan(struct comedi_subdevice *s) +unsigned int comedi_bytes_per_scan_cmd(struct comedi_subdevice *s, + struct comedi_cmd *cmd) { - struct comedi_cmd *cmd = &s->async->cmd; unsigned int num_samples; unsigned int bits_per_sample; @@ -427,6 +429,29 @@ unsigned int comedi_bytes_per_scan(struct comedi_subdevice *s) } return comedi_samples_to_bytes(s, num_samples); } +EXPORT_SYMBOL_GPL(comedi_bytes_per_scan_cmd); + +/** + * comedi_bytes_per_scan() - Get length of asynchronous command "scan" in bytes + * @s: COMEDI subdevice. + * + * Determines the overall scan length according to the subdevice type and the + * number of channels in the scan for the current command. + * + * For digital input, output or input/output subdevices, samples for + * multiple channels are assumed to be packed into one or more unsigned + * short or unsigned int values according to the subdevice's %SDF_LSAMPL + * flag. For other types of subdevice, samples are assumed to occupy a + * whole unsigned short or unsigned int according to the %SDF_LSAMPL flag. + * + * Returns the overall scan length in bytes. + */ +unsigned int comedi_bytes_per_scan(struct comedi_subdevice *s) +{ + struct comedi_cmd *cmd = &s->async->cmd; + + return comedi_bytes_per_scan_cmd(s, cmd); +} EXPORT_SYMBOL_GPL(comedi_bytes_per_scan); static unsigned int __comedi_nscans_left(struct comedi_subdevice *s, diff --git a/drivers/staging/comedi/drivers/ni_mio_common.c b/drivers/staging/comedi/drivers/ni_mio_common.c index 5edf59ac6706..b04dad8c7092 100644 --- a/drivers/staging/comedi/drivers/ni_mio_common.c +++ b/drivers/staging/comedi/drivers/ni_mio_common.c @@ -3545,6 +3545,7 @@ static int ni_cdio_cmdtest(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_cmd *cmd) { struct ni_private *devpriv = dev->private; + unsigned int bytes_per_scan; int err = 0; /* Step 1 : check if triggers are trivially valid */ @@ -3579,9 +3580,12 @@ static int ni_cdio_cmdtest(struct comedi_device *dev, err |= comedi_check_trigger_arg_is(&cmd->convert_arg, 0); err |= comedi_check_trigger_arg_is(&cmd->scan_end_arg, cmd->chanlist_len); - err |= comedi_check_trigger_arg_max(&cmd->stop_arg, - s->async->prealloc_bufsz / - comedi_bytes_per_scan(s)); + bytes_per_scan = comedi_bytes_per_scan_cmd(s, cmd); + if (bytes_per_scan) { + err |= comedi_check_trigger_arg_max(&cmd->stop_arg, + s->async->prealloc_bufsz / + bytes_per_scan); + } if (err) return 3; From ae0a6d2017f733781dcc938a471ccc2d05f9bee6 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 4 Mar 2019 20:42:33 +0100 Subject: [PATCH 091/468] staging: olpc_dcon_xo_1: add missing 'const' qualifier gcc noticed a mismatch between the type qualifiers after a recent cleanup: drivers/staging/olpc_dcon/olpc_dcon_xo_1.c: In function 'dcon_init_xo_1': drivers/staging/olpc_dcon/olpc_dcon_xo_1.c:48:26: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] Add the 'const' keyword that should have been there all along. Fixes: 2159fb372929 ("staging: olpc_dcon: olpc_dcon_xo_1.c: Switch to the gpio descriptor interface") Signed-off-by: Arnd Bergmann Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/olpc_dcon/olpc_dcon_xo_1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/olpc_dcon/olpc_dcon_xo_1.c b/drivers/staging/olpc_dcon/olpc_dcon_xo_1.c index 80b8d4153414..a54286498a47 100644 --- a/drivers/staging/olpc_dcon/olpc_dcon_xo_1.c +++ b/drivers/staging/olpc_dcon/olpc_dcon_xo_1.c @@ -45,7 +45,7 @@ static int dcon_init_xo_1(struct dcon_priv *dcon) { unsigned char lob; int ret, i; - struct dcon_gpio *pin = &gpios_asis[0]; + const struct dcon_gpio *pin = &gpios_asis[0]; for (i = 0; i < ARRAY_SIZE(gpios_asis); i++) { gpios[i] = devm_gpiod_get(&dcon->client->dev, pin[i].name, From 1beea6204e2304dd11600791d8dad8e7350af6ad Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 4 Mar 2019 20:43:00 +0100 Subject: [PATCH 092/468] staging: axis-fifo: add CONFIG_OF dependency When building without CONFIG_OF, the compiler loses track of the flow control in axis_fifo_probe(), and thinks that many variables are used without an initialization even though we actually leave the function before the first use: drivers/staging/axis-fifo/axis-fifo.c: In function 'axis_fifo_probe': drivers/staging/axis-fifo/axis-fifo.c:900:5: error: 'rxd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (rxd_tdata_width != 32) { ^ drivers/staging/axis-fifo/axis-fifo.c:907:5: error: 'txd_tdata_width' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (txd_tdata_width != 32) { ^ drivers/staging/axis-fifo/axis-fifo.c:914:5: error: 'has_tdest' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (has_tdest) { ^ drivers/staging/axis-fifo/axis-fifo.c:919:5: error: 'has_tid' may be used uninitialized in this function [-Werror=maybe-uninitialized] When CONFIG_OF is set, this does not happen, and since the driver cannot work without it, just add that option as a Kconfig dependency. Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- drivers/staging/axis-fifo/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/axis-fifo/Kconfig b/drivers/staging/axis-fifo/Kconfig index 687537203d9c..d9725888af6f 100644 --- a/drivers/staging/axis-fifo/Kconfig +++ b/drivers/staging/axis-fifo/Kconfig @@ -3,6 +3,7 @@ # config XIL_AXIS_FIFO tristate "Xilinx AXI-Stream FIFO IP core driver" + depends on OF default n help This adds support for the Xilinx AXI-Stream From 45ac7b31bc6c4af885cc5b5d6c534c15bcbe7643 Mon Sep 17 00:00:00 2001 From: Samuel Thibault Date: Thu, 7 Mar 2019 23:06:57 +0100 Subject: [PATCH 093/468] staging: speakup_soft: Fix alternate speech with other synths When switching from speakup_soft to another synth, speakup_soft would keep calling synth_buffer_getc() from softsynthx_read. Let's thus make synth.c export the knowledge of the current synth, so that speakup_soft can determine whether it should be running. speakup_soft also needs to set itself alive, otherwise the switch would let it remain silent. Signed-off-by: Samuel Thibault Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/speakup/speakup_soft.c | 16 +++++++++++----- drivers/staging/speakup/spk_priv.h | 1 + drivers/staging/speakup/synth.c | 6 ++++++ 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/staging/speakup/speakup_soft.c b/drivers/staging/speakup/speakup_soft.c index edff6ce85655..9d85a3a1af4c 100644 --- a/drivers/staging/speakup/speakup_soft.c +++ b/drivers/staging/speakup/speakup_soft.c @@ -210,12 +210,15 @@ static ssize_t softsynthx_read(struct file *fp, char __user *buf, size_t count, return -EINVAL; spin_lock_irqsave(&speakup_info.spinlock, flags); + synth_soft.alive = 1; while (1) { prepare_to_wait(&speakup_event, &wait, TASK_INTERRUPTIBLE); - if (!unicode) - synth_buffer_skip_nonlatin1(); - if (!synth_buffer_empty() || speakup_info.flushing) - break; + if (synth_current() == &synth_soft) { + if (!unicode) + synth_buffer_skip_nonlatin1(); + if (!synth_buffer_empty() || speakup_info.flushing) + break; + } spin_unlock_irqrestore(&speakup_info.spinlock, flags); if (fp->f_flags & O_NONBLOCK) { finish_wait(&speakup_event, &wait); @@ -235,6 +238,8 @@ static ssize_t softsynthx_read(struct file *fp, char __user *buf, size_t count, /* Keep 3 bytes available for a 16bit UTF-8-encoded character */ while (chars_sent <= count - bytes_per_ch) { + if (synth_current() != &synth_soft) + break; if (speakup_info.flushing) { speakup_info.flushing = 0; ch = '\x18'; @@ -331,7 +336,8 @@ static __poll_t softsynth_poll(struct file *fp, struct poll_table_struct *wait) poll_wait(fp, &speakup_event, wait); spin_lock_irqsave(&speakup_info.spinlock, flags); - if (!synth_buffer_empty() || speakup_info.flushing) + if (synth_current() == &synth_soft && + (!synth_buffer_empty() || speakup_info.flushing)) ret = EPOLLIN | EPOLLRDNORM; spin_unlock_irqrestore(&speakup_info.spinlock, flags); return ret; diff --git a/drivers/staging/speakup/spk_priv.h b/drivers/staging/speakup/spk_priv.h index c8e688878fc7..ac6a74883af4 100644 --- a/drivers/staging/speakup/spk_priv.h +++ b/drivers/staging/speakup/spk_priv.h @@ -74,6 +74,7 @@ int synth_request_region(unsigned long start, unsigned long n); int synth_release_region(unsigned long start, unsigned long n); int synth_add(struct spk_synth *in_synth); void synth_remove(struct spk_synth *in_synth); +struct spk_synth *synth_current(void); extern struct speakup_info_t speakup_info; diff --git a/drivers/staging/speakup/synth.c b/drivers/staging/speakup/synth.c index 25f259ee4ffc..3568bfb89912 100644 --- a/drivers/staging/speakup/synth.c +++ b/drivers/staging/speakup/synth.c @@ -481,4 +481,10 @@ void synth_remove(struct spk_synth *in_synth) } EXPORT_SYMBOL_GPL(synth_remove); +struct spk_synth *synth_current(void) +{ + return synth; +} +EXPORT_SYMBOL_GPL(synth_current); + short spk_punc_masks[] = { 0, SOME, MOST, PUNC, PUNC | B_SYM }; From 90cd9bed5adb3e3bd4d3ac4cbcecbc4a8028bbaf Mon Sep 17 00:00:00 2001 From: Maxim Zhukov Date: Sat, 9 Mar 2019 12:54:00 +0300 Subject: [PATCH 094/468] staging, mt7621-pci: fix build without pci support Add depends on PCI for PCI_MT7621 Signed-off-by: Maxim Zhukov Signed-off-by: Greg Kroah-Hartman --- drivers/staging/mt7621-pci/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/mt7621-pci/Kconfig b/drivers/staging/mt7621-pci/Kconfig index d33533872a16..c8fa17cfa807 100644 --- a/drivers/staging/mt7621-pci/Kconfig +++ b/drivers/staging/mt7621-pci/Kconfig @@ -1,6 +1,7 @@ config PCI_MT7621 tristate "MediaTek MT7621 PCI Controller" depends on RALINK + depends on PCI select PCI_DRIVERS_GENERIC help This selects a driver for the MediaTek MT7621 PCI Controller. From 21d2b122732318b48c10b7262e15595ce54511d3 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 26 Feb 2019 13:44:51 -0800 Subject: [PATCH 095/468] drm/vgem: fix use-after-free when drm_gem_handle_create() fails If drm_gem_handle_create() fails in vgem_gem_create(), then the drm_vgem_gem_object is freed twice: once when the reference is dropped by drm_gem_object_put_unlocked(), and again by __vgem_gem_destroy(). This was hit by syzkaller using fault injection. Fix it by skipping the second free. Reported-by: syzbot+e73f2fb5ed5a5df36d33@syzkaller.appspotmail.com Fixes: af33a9190d02 ("drm/vgem: Enable dmabuf import interfaces") Reviewed-by: Chris Wilson Cc: Laura Abbott Cc: Daniel Vetter Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Acked-by: Laura Abbott Signed-off-by: Rodrigo Siqueira Link: https://patchwork.freedesktop.org/patch/msgid/20190226214451.195123-1-ebiggers@kernel.org Signed-off-by: Maxime Ripard --- drivers/gpu/drm/vgem/vgem_drv.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c index 5930facd6d2d..11a8f99ba18c 100644 --- a/drivers/gpu/drm/vgem/vgem_drv.c +++ b/drivers/gpu/drm/vgem/vgem_drv.c @@ -191,13 +191,9 @@ static struct drm_gem_object *vgem_gem_create(struct drm_device *dev, ret = drm_gem_handle_create(file, &obj->base, handle); drm_gem_object_put_unlocked(&obj->base); if (ret) - goto err; + return ERR_PTR(ret); return &obj->base; - -err: - __vgem_gem_destroy(obj); - return ERR_PTR(ret); } static int vgem_gem_dumb_create(struct drm_file *file, struct drm_device *dev, From 36b6c9ed45afe89045973e8dee1b004dd5372d40 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 26 Feb 2019 14:08:58 -0800 Subject: [PATCH 096/468] drm/vkms: fix use-after-free when drm_gem_handle_create() fails If drm_gem_handle_create() fails in vkms_gem_create(), then the vkms_gem_object is freed twice: once when the reference is dropped by drm_gem_object_put_unlocked(), and again by the extra calls to drm_gem_object_release() and kfree(). Fix it by skipping the second release and free. This bug was originally found in the vgem driver by syzkaller using fault injection, but I noticed it's also present in the vkms driver. Fixes: 559e50fd34d1 ("drm/vkms: Add dumb operations") Cc: Rodrigo Siqueira Cc: Haneen Mohammed Cc: Daniel Vetter Cc: Chris Wilson Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Reviewed-by: Chris Wilson Reviewed-by: Rodrigo Siqueira Signed-off-by: Rodrigo Siqueira Link: https://patchwork.freedesktop.org/patch/msgid/20190226220858.214438-1-ebiggers@kernel.org Signed-off-by: Maxime Ripard --- drivers/gpu/drm/vkms/vkms_gem.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_gem.c b/drivers/gpu/drm/vkms/vkms_gem.c index 138b0bb325cf..69048e73377d 100644 --- a/drivers/gpu/drm/vkms/vkms_gem.c +++ b/drivers/gpu/drm/vkms/vkms_gem.c @@ -111,11 +111,8 @@ struct drm_gem_object *vkms_gem_create(struct drm_device *dev, ret = drm_gem_handle_create(file, &obj->gem, handle); drm_gem_object_put_unlocked(&obj->gem); - if (ret) { - drm_gem_object_release(&obj->gem); - kfree(obj); + if (ret) return ERR_PTR(ret); - } return &obj->gem; } From 29b0b5d56589d66bd5793f1e09211ce7d7d3cd36 Mon Sep 17 00:00:00 2001 From: Alin Nastac Date: Mon, 11 Mar 2019 17:18:42 +0100 Subject: [PATCH 097/468] netfilter: nf_conntrack_sip: remove direct dependency on IPv6 Previous implementation was not usable with CONFIG_IPV6=m. Fixes: a3419ce3356c ("netfilter: nf_conntrack_sip: add sip_external_media logic") Signed-off-by: Alin Nastac Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_sip.c | 37 ++++++++++++++------------------ 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c index f067c6b50857..39fcc1ed18f3 100644 --- a/net/netfilter/nf_conntrack_sip.c +++ b/net/netfilter/nf_conntrack_sip.c @@ -20,9 +20,9 @@ #include #include #include +#include +#include -#include -#include #include #include #include @@ -871,38 +871,33 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff, } else if (sip_external_media) { struct net_device *dev = skb_dst(skb)->dev; struct net *net = dev_net(dev); - struct rtable *rt; - struct flowi4 fl4 = {}; -#if IS_ENABLED(CONFIG_IPV6) - struct flowi6 fl6 = {}; -#endif + struct flowi fl; struct dst_entry *dst = NULL; + memset(&fl, 0, sizeof(fl)); + switch (nf_ct_l3num(ct)) { case NFPROTO_IPV4: - fl4.daddr = daddr->ip; - rt = ip_route_output_key(net, &fl4); - if (!IS_ERR(rt)) - dst = &rt->dst; + fl.u.ip4.daddr = daddr->ip; + nf_ip_route(net, &dst, &fl, false); break; -#if IS_ENABLED(CONFIG_IPV6) case NFPROTO_IPV6: - fl6.daddr = daddr->in6; - dst = ip6_route_output(net, NULL, &fl6); - if (dst->error) { - dst_release(dst); - dst = NULL; - } + fl.u.ip6.daddr = daddr->in6; + nf_ip6_route(net, &dst, &fl, false); break; -#endif } /* Don't predict any conntracks when media endpoint is reachable * through the same interface as the signalling peer. */ - if (dst && dst->dev == dev) - return NF_ACCEPT; + if (dst) { + bool external_media = (dst->dev == dev); + + dst_release(dst); + if (external_media) + return NF_ACCEPT; + } } /* We need to check whether the registration exists before attempting From 05b7639da55f5555b9866a1f4b7e8995232a6323 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 12 Mar 2019 12:10:59 +0100 Subject: [PATCH 098/468] netfilter: nft_set_rbtree: check for inactive element after flag mismatch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Otherwise, we hit bogus ENOENT when removing elements. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Václav Zindulka Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_rbtree.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index fa61208371f8..321a0036fdf5 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -308,10 +308,6 @@ static void *nft_rbtree_deactivate(const struct net *net, else if (d > 0) parent = parent->rb_right; else { - if (!nft_set_elem_active(&rbe->ext, genmask)) { - parent = parent->rb_left; - continue; - } if (nft_rbtree_interval_end(rbe) && !nft_rbtree_interval_end(this)) { parent = parent->rb_left; @@ -320,6 +316,9 @@ static void *nft_rbtree_deactivate(const struct net *net, nft_rbtree_interval_end(this)) { parent = parent->rb_right; continue; + } else if (!nft_set_elem_active(&rbe->ext, genmask)) { + parent = parent->rb_left; + continue; } nft_rbtree_flush(net, set, rbe); return rbe; From e166e4fdaced850bee3d5ee12a5740258fb30587 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Wed, 13 Mar 2019 16:33:29 +0800 Subject: [PATCH 099/468] netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"), skb->transport_header has been always set before entering INET netfilter. This patch is to set skb->transport_header for bridge before entering INET netfilter by bridge-nf-call-iptables. It also fixes an issue that sctp_error() couldn't compute a right csum due to unset skb->transport_header. Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang Suggested-by: Pablo Neira Ayuso Signed-off-by: Xin Long Acked-by: Neil Horman Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/bridge/br_netfilter_hooks.c | 1 + net/bridge/br_netfilter_ipv6.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 9d34de68571b..22afa566cbce 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -502,6 +502,7 @@ static unsigned int br_nf_pre_routing(void *priv, nf_bridge->ipv4_daddr = ip_hdr(skb)->daddr; skb->protocol = htons(ETH_P_IP); + skb->transport_header = skb->network_header + ip_hdr(skb)->ihl * 4; NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb, skb->dev, NULL, diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c index 564710f88f93..e88d6641647b 100644 --- a/net/bridge/br_netfilter_ipv6.c +++ b/net/bridge/br_netfilter_ipv6.c @@ -235,6 +235,8 @@ unsigned int br_nf_pre_routing_ipv6(void *priv, nf_bridge->ipv6_daddr = ipv6_hdr(skb)->daddr; skb->protocol = htons(ETH_P_IPV6); + skb->transport_header = skb->network_header + sizeof(struct ipv6hdr); + NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, state->net, state->sk, skb, skb->dev, NULL, br_nf_pre_routing_finish_ipv6); From d1fa381033eb718df5c602f64b6e88676138dfc6 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 13 Mar 2019 22:15:59 +0100 Subject: [PATCH 100/468] netfilter: fix NETFILTER_XT_TARGET_TEE dependencies MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With NETFILTER_XT_TARGET_TEE=y and IP6_NF_IPTABLES=m, we get a link error when referencing the NF_DUP_IPV6 module: net/netfilter/xt_TEE.o: In function `tee_tg6': xt_TEE.c:(.text+0x14): undefined reference to `nf_dup_ipv6' The problem here is the 'select NF_DUP_IPV6 if IP6_NF_IPTABLES' that forces NF_DUP_IPV6 to be =m as well rather than setting it to =y as was intended here. Adding a soft dependency on IP6_NF_IPTABLES avoids that broken configuration. Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 dependencies") Cc: Máté Eckl Cc: Taehee Yoo Link: https://patchwork.ozlabs.org/patch/999498/ Link: https://lore.kernel.org/patchwork/patch/960062/ Reported-by: Randy Dunlap Reported-by: Stephen Rothwell Signed-off-by: Arnd Bergmann Signed-off-by: Pablo Neira Ayuso --- net/netfilter/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index d43ffb09939b..6548271209a0 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -1007,6 +1007,7 @@ config NETFILTER_XT_TARGET_TEE depends on NETFILTER_ADVANCED depends on IPV6 || IPV6=n depends on !NF_CONNTRACK || NF_CONNTRACK + depends on IP6_NF_IPTABLES || !IP6_NF_IPTABLES select NF_DUP_IPV4 select NF_DUP_IPV6 if IP6_NF_IPTABLES ---help--- From 6d65561f3d5ec933151939c543d006b79044e7a6 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Thu, 14 Mar 2019 02:58:18 -0500 Subject: [PATCH 101/468] netfilter: ip6t_srh: fix NULL pointer dereferences skb_header_pointer may return NULL. The current code dereference its return values without a NULL check. The fix inserts the checks to avoid NULL pointer dereferences. Fixes: 202a8ff545cc ("netfilter: add IPv6 segment routing header 'srh' match") Signed-off-by: Kangjie Lu Signed-off-by: Pablo Neira Ayuso --- net/ipv6/netfilter/ip6t_srh.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c index 1059894a6f4c..4cb83fb69844 100644 --- a/net/ipv6/netfilter/ip6t_srh.c +++ b/net/ipv6/netfilter/ip6t_srh.c @@ -210,6 +210,8 @@ static bool srh1_mt6(const struct sk_buff *skb, struct xt_action_param *par) psidoff = srhoff + sizeof(struct ipv6_sr_hdr) + ((srh->segments_left + 1) * sizeof(struct in6_addr)); psid = skb_header_pointer(skb, psidoff, sizeof(_psid), &_psid); + if (!psid) + return false; if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_PSID, ipv6_masked_addr_cmp(psid, &srhinfo->psid_msk, &srhinfo->psid_addr))) @@ -223,6 +225,8 @@ static bool srh1_mt6(const struct sk_buff *skb, struct xt_action_param *par) nsidoff = srhoff + sizeof(struct ipv6_sr_hdr) + ((srh->segments_left - 1) * sizeof(struct in6_addr)); nsid = skb_header_pointer(skb, nsidoff, sizeof(_nsid), &_nsid); + if (!nsid) + return false; if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_NSID, ipv6_masked_addr_cmp(nsid, &srhinfo->nsid_msk, &srhinfo->nsid_addr))) @@ -233,6 +237,8 @@ static bool srh1_mt6(const struct sk_buff *skb, struct xt_action_param *par) if (srhinfo->mt_flags & IP6T_SRH_LSID) { lsidoff = srhoff + sizeof(struct ipv6_sr_hdr); lsid = skb_header_pointer(skb, lsidoff, sizeof(_lsid), &_lsid); + if (!lsid) + return false; if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LSID, ipv6_masked_addr_cmp(lsid, &srhinfo->lsid_msk, &srhinfo->lsid_addr))) From 8ffcd32f64633926163cdd07a7d295c500a947d1 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 14 Mar 2019 10:50:20 +0100 Subject: [PATCH 102/468] netfilter: nf_tables: bogus EBUSY in helper removal from transaction Proper use counter updates when activating and deactivating the object, otherwise, this hits bogus EBUSY error. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Laura Garcia Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_objref.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c index 457a9ceb46af..8dfa798ea683 100644 --- a/net/netfilter/nft_objref.c +++ b/net/netfilter/nft_objref.c @@ -65,21 +65,34 @@ static int nft_objref_dump(struct sk_buff *skb, const struct nft_expr *expr) return -1; } -static void nft_objref_destroy(const struct nft_ctx *ctx, - const struct nft_expr *expr) +static void nft_objref_deactivate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + enum nft_trans_phase phase) { struct nft_object *obj = nft_objref_priv(expr); + if (phase == NFT_TRANS_COMMIT) + return; + obj->use--; } +static void nft_objref_activate(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct nft_object *obj = nft_objref_priv(expr); + + obj->use++; +} + static struct nft_expr_type nft_objref_type; static const struct nft_expr_ops nft_objref_ops = { .type = &nft_objref_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_object *)), .eval = nft_objref_eval, .init = nft_objref_init, - .destroy = nft_objref_destroy, + .activate = nft_objref_activate, + .deactivate = nft_objref_deactivate, .dump = nft_objref_dump, }; From 74710e05906c37da6b436386dc13c44dbe5d8308 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Fri, 15 Mar 2019 17:21:01 +0100 Subject: [PATCH 103/468] netfilter: nft_redir: fix module autoload with ip4 AF_INET4 does not exist. Fixes: c78efc99c750 ("netfilter: nf_tables: nat: merge nft_redir protocol specific modules)" Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_redir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nft_redir.c b/net/netfilter/nft_redir.c index f8092926f704..a340cd8a751b 100644 --- a/net/netfilter/nft_redir.c +++ b/net/netfilter/nft_redir.c @@ -233,5 +233,5 @@ module_exit(nft_redir_module_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Arturo Borrero Gonzalez "); -MODULE_ALIAS_NFT_AF_EXPR(AF_INET4, "redir"); +MODULE_ALIAS_NFT_AF_EXPR(AF_INET, "redir"); MODULE_ALIAS_NFT_AF_EXPR(AF_INET6, "redir"); From f01a7dbe98ae4265023fa5d3af0f076f0b18a647 Mon Sep 17 00:00:00 2001 From: Martynas Pumputis Date: Mon, 18 Mar 2019 16:10:26 +0100 Subject: [PATCH 104/468] bpf: Try harder when allocating memory for large maps It has been observed that sometimes a higher order memory allocation for BPF maps fails when there is no obvious memory pressure in a system. E.g. the map (BPF_MAP_TYPE_LRU_HASH, key=38, value=56, max_elems=524288) could not be created due to vmalloc unable to allocate 75497472B, when the system's memory consumption (in MB) was the following: Total: 3942 Used: 837 (21.24%) Free: 138 Buffers: 239 Cached: 2727 Later analysis [1] by Michal Hocko showed that the vmalloc was not trying to reclaim memory from the page cache and was failing prematurely due to __GFP_NORETRY. Considering dcda9b0471 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") and [1], we can replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL, as it won't invoke OOM killer and will try harder to fulfil allocation requests. Unfortunately, replacing the body of the BPF map memory allocation function with the kvmalloc_node helper function is not an option at this point in time, given 1) kmalloc is non-optional for higher order allocations, and 2) passing __GFP_RETRY_MAYFAIL to the kmalloc would stress the slab allocator too much for large requests. The change has been tested with the workloads mentioned above and by observing oom_kill value from /proc/vmstat. [1]: https://lore.kernel.org/bpf/20190310071318.GW5232@dhcp22.suse.cz/ Signed-off-by: Martynas Pumputis Acked-by: Yonghong Song Cc: Michal Hocko Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20190318153940.GL8924@dhcp22.suse.cz/ --- kernel/bpf/syscall.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 62f6bced3a3c..afca36f53c49 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -136,21 +136,29 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) void *bpf_map_area_alloc(size_t size, int numa_node) { - /* We definitely need __GFP_NORETRY, so OOM killer doesn't - * trigger under memory pressure as we really just want to - * fail instead. + /* We really just want to fail instead of triggering OOM killer + * under memory pressure, therefore we set __GFP_NORETRY to kmalloc, + * which is used for lower order allocation requests. + * + * It has been observed that higher order allocation requests done by + * vmalloc with __GFP_NORETRY being set might fail due to not trying + * to reclaim memory from the page cache, thus we set + * __GFP_RETRY_MAYFAIL to avoid such situations. */ - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; + + const gfp_t flags = __GFP_NOWARN | __GFP_ZERO; void *area; if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { - area = kmalloc_node(size, GFP_USER | flags, numa_node); + area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags, + numa_node); if (area != NULL) return area; } - return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | flags, - __builtin_return_address(0)); + return __vmalloc_node_flags_caller(size, numa_node, + GFP_KERNEL | __GFP_RETRY_MAYFAIL | + flags, __builtin_return_address(0)); } void bpf_map_area_free(void *area) From 25208dd856e74f2b60d053eb98e6dd335816fbc1 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Mon, 18 Mar 2019 12:08:58 +0100 Subject: [PATCH 105/468] doc: fix link to MSG_ZEROCOPY patchset Use https and link to the patch directly. Signed-off-by: Tobias Klauser Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/msg_zerocopy.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst index 18c1415e7bfa..ace56204dd03 100644 --- a/Documentation/networking/msg_zerocopy.rst +++ b/Documentation/networking/msg_zerocopy.rst @@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code. patchset [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY - http://lkml.kernel.org/r/20170803202945.70750-1-willemdebruijn.kernel@gmail.com + https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com Interface From 3028efe03be9c8c4cd7923f0f3c39b2871cc8a8f Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 18 Mar 2019 17:00:28 +0000 Subject: [PATCH 106/468] NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data() Commit 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.") changed the lock copying from memcpy() to the dedicated locks_copy_lock() function. The latter correctly increments the nfs4_lock_state.ls_count via nfs4_fl_copy_lock(), however, this refcount has already been incremented in the nfs4_alloc_{lock,unlock}data(). Kmemleak subsequently reports an unreferenced nfs4_lock_state object as below (arm64 platform): unreferenced object 0xffff8000fce0b000 (size 256): comm "systemd-sysuser", pid 1608, jiffies 4294892825 (age 32.348s) hex dump (first 32 bytes): 20 57 4c fb 00 80 ff ff 20 57 4c fb 00 80 ff ff WL..... WL..... 00 57 4c fb 00 80 ff ff 01 00 00 00 00 00 00 00 .WL............. backtrace: [<000000000d15010d>] kmem_cache_alloc+0x178/0x208 [<00000000d7c1d264>] nfs4_set_lock_state+0x124/0x1f0 [<000000009c867628>] nfs4_proc_lock+0x90/0x478 [<000000001686bd74>] do_setlk+0x64/0xe8 [<00000000e01500d4>] nfs_lock+0xe8/0x1f0 [<000000004f387d8d>] vfs_lock_file+0x18/0x40 [<00000000656ab79b>] do_lock_file_wait+0x68/0xf8 [<00000000f17c4a4b>] fcntl_setlk+0x224/0x280 [<0000000052a242c6>] do_fcntl+0x418/0x730 [<000000004f47291a>] __arm64_sys_fcntl+0x84/0xd0 [<00000000d6856e01>] el0_svc_common+0x80/0xf0 [<000000009c4bd1df>] el0_svc_handler+0x2c/0x80 [<00000000b1a0d479>] el0_svc+0x8/0xc [<0000000056c62a0f>] 0xffffffffffffffff This patch removes the original refcount_inc(&lsp->ls_count) that was paired with the memcpy() lock copying. Fixes: 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.") Cc: # 5.0.x- Cc: NeilBrown Signed-off-by: Catalin Marinas Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 4dbb0ee23432..6d2812a39287 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6301,7 +6301,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, p->arg.seqid = seqid; p->res.seqid = seqid; p->lsp = lsp; - refcount_inc(&lsp->ls_count); /* Ensure we don't close file until we're done freeing locks! */ p->ctx = get_nfs_open_context(ctx); p->l_ctx = nfs_get_lock_context(ctx); @@ -6526,7 +6525,6 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct file_lock *fl, p->res.lock_seqid = p->arg.lock_seqid; p->lsp = lsp; p->server = server; - refcount_inc(&lsp->ls_count); p->ctx = get_nfs_open_context(ctx); locks_init_lock(&p->fl); locks_copy_lock(&p->fl, fl); From e9abc611a941d4051cde1d94b2ab7473fdb50102 Mon Sep 17 00:00:00 2001 From: Jonas Karlman Date: Wed, 20 Feb 2019 22:40:06 +0000 Subject: [PATCH 107/468] drm/rockchip: vop: reset scale mode when win is disabled NV12 framebuffers produced by the VPU shows distorted on RK3288 after win has been disabled when scaling is active. This issue can be reproduced using a 1080p modeset by: - Scale a 1280x720 NV12 framebuffer to 1920x1080 on win0 - Disable win0 - Display a 1920x1080 NV12 framebuffer without scaling on win0 - Output will now show the framebuffer distorted And by: - Scale a 1280x720 NV12 framebuffer to 1920x1080 - Change to a 720p modeset (win gets disabled and scaling reset to none) - Output will now show the framebuffer distorted Fix this by setting scale mode to none when win is disabled. Fixes: 4c156c21c794 ("drm/rockchip: vop: support plane scale") Cc: stable@vger.kernel.org Signed-off-by: Jonas Karlman Signed-off-by: Heiko Stuebner Link: https://patchwork.freedesktop.org/patch/msgid/AM3PR03MB0966DE3E19BACE07328CD637AC7D0@AM3PR03MB0966.eurprd03.prod.outlook.com --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index c7d4c6073ea5..0d4ade9d4722 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -541,6 +541,18 @@ static void vop_core_clks_disable(struct vop *vop) clk_disable(vop->hclk); } +static void vop_win_disable(struct vop *vop, const struct vop_win_data *win) +{ + if (win->phy->scl && win->phy->scl->ext) { + VOP_SCL_SET_EXT(vop, win, yrgb_hor_scl_mode, SCALE_NONE); + VOP_SCL_SET_EXT(vop, win, yrgb_ver_scl_mode, SCALE_NONE); + VOP_SCL_SET_EXT(vop, win, cbcr_hor_scl_mode, SCALE_NONE); + VOP_SCL_SET_EXT(vop, win, cbcr_ver_scl_mode, SCALE_NONE); + } + + VOP_WIN_SET(vop, win, enable, 0); +} + static int vop_enable(struct drm_crtc *crtc) { struct vop *vop = to_vop(crtc); @@ -586,7 +598,7 @@ static int vop_enable(struct drm_crtc *crtc) struct vop_win *vop_win = &vop->win[i]; const struct vop_win_data *win = vop_win->data; - VOP_WIN_SET(vop, win, enable, 0); + vop_win_disable(vop, win); } spin_unlock(&vop->reg_lock); @@ -735,7 +747,7 @@ static void vop_plane_atomic_disable(struct drm_plane *plane, spin_lock(&vop->reg_lock); - VOP_WIN_SET(vop, win, enable, 0); + vop_win_disable(vop, win); spin_unlock(&vop->reg_lock); } @@ -1622,7 +1634,7 @@ static int vop_initial(struct vop *vop) int channel = i * 2 + 1; VOP_WIN_SET(vop, win, channel, (channel + 1) << 4 | channel); - VOP_WIN_SET(vop, win, enable, 0); + vop_win_disable(vop, win); VOP_WIN_SET(vop, win, gate, 1); } From 3897b6f0a859288c22fb793fad11ec2327e60fcd Mon Sep 17 00:00:00 2001 From: Andrea Righi Date: Thu, 14 Mar 2019 08:56:28 +0100 Subject: [PATCH 108/468] btrfs: raid56: properly unmap parity page in finish_parity_scrub() Parity page is incorrectly unmapped in finish_parity_scrub(), triggering a reference counter bug on i386, i.e.: [ 157.662401] kernel BUG at mm/highmem.c:349! [ 157.666725] invalid opcode: 0000 [#1] SMP PTI The reason is that kunmap(p_page) was completely left out, so we never did an unmap for the p_page and the loop unmapping the rbio page was iterating over the wrong number of stripes: unmapping should be done with nr_data instead of rbio->real_stripes. Test case to reproduce the bug: - create a raid5 btrfs filesystem: # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde - mount it: # mount /dev/sdb /mnt - run btrfs scrub in a loop: # while :; do btrfs scrub start -BR /mnt; done BugLink: https://bugs.launchpad.net/bugs/1812845 Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn Signed-off-by: Andrea Righi Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/raid56.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index e74455eb42f9..6976e2280771 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2429,8 +2429,9 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio, bitmap_clear(rbio->dbitmap, pagenr, 1); kunmap(p); - for (stripe = 0; stripe < rbio->real_stripes; stripe++) + for (stripe = 0; stripe < nr_data; stripe++) kunmap(page_in_rbio(rbio, stripe, pagenr, 0)); + kunmap(p_page); } __free_page(p_page); From e5dcc0c3223c45c94100f05f28d8ef814db3d82c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Mar 2019 10:41:14 -0700 Subject: [PATCH 109/468] net: rose: fix a possible stack overflow rose_write_internal() uses a temp buffer of 100 bytes, but a manual inspection showed that given arbitrary input, rose_create_facilities() can fill up to 110 bytes. Lets use a tailroom of 256 bytes for peace of mind, and remove the bounce buffer : we can simply allocate a big enough skb and adjust its length as needed. syzbot report : BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:352 [inline] BUG: KASAN: stack-out-of-bounds in rose_create_facilities net/rose/rose_subr.c:521 [inline] BUG: KASAN: stack-out-of-bounds in rose_write_internal+0x597/0x15d0 net/rose/rose_subr.c:116 Write of size 7 at addr ffff88808b1ffbef by task syz-executor.0/24854 CPU: 0 PID: 24854 Comm: syz-executor.0 Not tainted 5.0.0+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x123/0x190 mm/kasan/generic.c:191 memcpy+0x38/0x50 mm/kasan/common.c:131 memcpy include/linux/string.h:352 [inline] rose_create_facilities net/rose/rose_subr.c:521 [inline] rose_write_internal+0x597/0x15d0 net/rose/rose_subr.c:116 rose_connect+0x7cb/0x1510 net/rose/af_rose.c:826 __sys_connect+0x266/0x330 net/socket.c:1685 __do_sys_connect net/socket.c:1696 [inline] __se_sys_connect net/socket.c:1693 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1693 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458079 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f47b8d9dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458079 RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000004 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f47b8d9e6d4 R13: 00000000004be4a4 R14: 00000000004ceca8 R15: 00000000ffffffff The buggy address belongs to the page: page:ffffea00022c7fc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1fffc0000000000() raw: 01fffc0000000000 0000000000000000 ffffffff022c0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88808b1ffa80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88808b1ffb00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 03 >ffff88808b1ffb80: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 04 f3 ^ ffff88808b1ffc00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 ffff88808b1ffc80: 00 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 01 f2 01 Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- net/rose/rose_subr.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/net/rose/rose_subr.c b/net/rose/rose_subr.c index 7ca57741b2fb..7849f286bb93 100644 --- a/net/rose/rose_subr.c +++ b/net/rose/rose_subr.c @@ -105,16 +105,17 @@ void rose_write_internal(struct sock *sk, int frametype) struct sk_buff *skb; unsigned char *dptr; unsigned char lci1, lci2; - char buffer[100]; - int len, faclen = 0; + int maxfaclen = 0; + int len, faclen; + int reserve; - len = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + ROSE_MIN_LEN + 1; + reserve = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + 1; + len = ROSE_MIN_LEN; switch (frametype) { case ROSE_CALL_REQUEST: len += 1 + ROSE_ADDR_LEN + ROSE_ADDR_LEN; - faclen = rose_create_facilities(buffer, rose); - len += faclen; + maxfaclen = 256; break; case ROSE_CALL_ACCEPTED: case ROSE_CLEAR_REQUEST: @@ -123,15 +124,16 @@ void rose_write_internal(struct sock *sk, int frametype) break; } - if ((skb = alloc_skb(len, GFP_ATOMIC)) == NULL) + skb = alloc_skb(reserve + len + maxfaclen, GFP_ATOMIC); + if (!skb) return; /* * Space for AX.25 header and PID. */ - skb_reserve(skb, AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + 1); + skb_reserve(skb, reserve); - dptr = skb_put(skb, skb_tailroom(skb)); + dptr = skb_put(skb, len); lci1 = (rose->lci >> 8) & 0x0F; lci2 = (rose->lci >> 0) & 0xFF; @@ -146,7 +148,8 @@ void rose_write_internal(struct sock *sk, int frametype) dptr += ROSE_ADDR_LEN; memcpy(dptr, &rose->source_addr, ROSE_ADDR_LEN); dptr += ROSE_ADDR_LEN; - memcpy(dptr, buffer, faclen); + faclen = rose_create_facilities(dptr, rose); + skb_put(skb, faclen); dptr += faclen; break; From c22da36688d6298f2e546dcc43fdc1ad35036467 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Sat, 16 Mar 2019 01:00:50 +0100 Subject: [PATCH 110/468] gtp: change NET_UDP_TUNNEL dependency to select Similarly to commit a7603ac1fc8c ("geneve: change NET_UDP_TUNNEL dependency to select"), GTP has a dependency on NET_UDP_TUNNEL which makes impossible to compile it if no other protocol depending on NET_UDP_TUNNEL is selected. Fix this by changing the depends to a select, and drop NET_IP_TUNNEL from the select list, as it already depends on NET_UDP_TUNNEL. Signed-off-by: Matteo Croce Signed-off-by: David S. Miller --- drivers/net/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 5e4ca082cfcd..7a96d168efc4 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -216,8 +216,8 @@ config GENEVE config GTP tristate "GPRS Tunneling Protocol datapath (GTP-U)" - depends on INET && NET_UDP_TUNNEL - select NET_IP_TUNNEL + depends on INET + select NET_UDP_TUNNEL ---help--- This allows one to create gtp virtual interfaces that provide the GPRS Tunneling Protocol datapath (GTP-U). This tunneling protocol From bb9e5c5bcd76f4474eac3baf643d7a39f7bac7bb Mon Sep 17 00:00:00 2001 From: Finn Thain Date: Sat, 16 Mar 2019 14:21:19 +1100 Subject: [PATCH 111/468] mac8390: Fix mmio access size probe The bug that Stan reported is as follows. After a restart, a 16-bit NIC may be incorrectly identified as a 32-bit NIC and stop working. mac8390 slot.E: Memory length resource not found, probing mac8390 slot.E: Farallon EtherMac II-C (type farallon) mac8390 slot.E: MAC 00:00:c5:30:c2:99, IRQ 61, 32 KB shared memory at 0xfeed0000, 32-bit access. The bug never arises after a cold start and only intermittently after a warm start. (I didn't investigate why the bug is intermittent.) It turns out that memcpy_toio() is deprecated and memcmp_withio() also has issues. Replacing these calls with mmio accessors fixes the problem. Reported-and-tested-by: Stan Johnson Fixes: 2964db0f5904 ("m68k: Mac DP8390 update") Signed-off-by: Finn Thain Signed-off-by: David S. Miller --- drivers/net/ethernet/8390/mac8390.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/8390/mac8390.c b/drivers/net/ethernet/8390/mac8390.c index 342ae08ec3c2..d60a86aa8aa8 100644 --- a/drivers/net/ethernet/8390/mac8390.c +++ b/drivers/net/ethernet/8390/mac8390.c @@ -153,8 +153,6 @@ static void dayna_block_input(struct net_device *dev, int count, static void dayna_block_output(struct net_device *dev, int count, const unsigned char *buf, int start_page); -#define memcmp_withio(a, b, c) memcmp((a), (void *)(b), (c)) - /* Slow Sane (16-bit chunk memory read/write) Cabletron uses this */ static void slow_sane_get_8390_hdr(struct net_device *dev, struct e8390_pkt_hdr *hdr, int ring_page); @@ -233,19 +231,26 @@ static enum mac8390_type mac8390_ident(struct nubus_rsrc *fres) static enum mac8390_access mac8390_testio(unsigned long membase) { - unsigned long outdata = 0xA5A0B5B0; - unsigned long indata = 0x00000000; + u32 outdata = 0xA5A0B5B0; + u32 indata = 0; + /* Try writing 32 bits */ - memcpy_toio((void __iomem *)membase, &outdata, 4); - /* Now compare them */ - if (memcmp_withio(&outdata, membase, 4) == 0) + nubus_writel(outdata, membase); + /* Now read it back */ + indata = nubus_readl(membase); + if (outdata == indata) return ACCESS_32; + + outdata = 0xC5C0D5D0; + indata = 0; + /* Write 16 bit output */ word_memcpy_tocard(membase, &outdata, 4); /* Now read it back */ word_memcpy_fromcard(&indata, membase, 4); if (outdata == indata) return ACCESS_16; + return ACCESS_UNKNOWN; } From a7faaa0c5dc7d091cc9f72b870d7edcdd6f43f12 Mon Sep 17 00:00:00 2001 From: Dmitry Bogdanov Date: Sat, 16 Mar 2019 08:28:18 +0000 Subject: [PATCH 112/468] net: aquantia: fix rx checksum offload for UDP/TCP over IPv6 TCP/UDP checksum validity was propagated to skb only if IP checksum is valid. But for IPv6 there is no validity as there is no checksum in IPv6. This patch propagates TCP/UDP checksum validity regardless of IP checksum. Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Igor Russkikh Signed-off-by: Nikita Danilov Signed-off-by: Dmitry Bogdanov Signed-off-by: David S. Miller --- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index 74550ccc7a20..e2ffb159cbe2 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -186,11 +186,12 @@ static void aq_rx_checksum(struct aq_ring_s *self, } if (buff->is_ip_cso) { __skb_incr_checksum_unnecessary(skb); - if (buff->is_udp_cso || buff->is_tcp_cso) - __skb_incr_checksum_unnecessary(skb); } else { skb->ip_summed = CHECKSUM_NONE; } + + if (buff->is_udp_cso || buff->is_tcp_cso) + __skb_incr_checksum_unnecessary(skb); } #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) From cc4807bb609230d8959fd732b0bf3bd4c2de8eac Mon Sep 17 00:00:00 2001 From: Zhiqiang Liu Date: Sat, 16 Mar 2019 17:02:54 +0800 Subject: [PATCH 113/468] vxlan: Don't call gro_cells_destroy() before device is unregistered Commit ad6c9986bcb62 ("vxlan: Fix GRO cells race condition between receive and link delete") fixed a race condition for the typical case a vxlan device is dismantled from the current netns. But if a netns is dismantled, vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue() of all the vxlan tunnels that are related to this netns. In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before unregister_netdevice_queue(). This means that the gro_cells_destroy() call is done too soon, for the same reasons explained in above commit. So we need to fully respect the RCU rules, and thus must remove the gro_cells_destroy() call or risk use after-free. Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer") Signed-off-by: Suanming.Mou Suggested-by: Eric Dumazet Reviewed-by: Stefano Brivio Reviewed-by: Zhiqiang Liu Signed-off-by: David S. Miller --- drivers/net/vxlan.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 077f1b9f2761..d76dfed8d9bb 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -4335,10 +4335,8 @@ static void vxlan_destroy_tunnels(struct net *net, struct list_head *head) /* If vxlan->dev is in the same netns, it has already been added * to the list by the previous loop. */ - if (!net_eq(dev_net(vxlan->dev), net)) { - gro_cells_destroy(&vxlan->gro_cells); + if (!net_eq(dev_net(vxlan->dev), net)) unregister_netdevice_queue(vxlan->dev, head); - } } for (h = 0; h < PORT_HASH_SIZE; ++h) From a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Sat, 16 Mar 2019 14:41:30 +0100 Subject: [PATCH 114/468] packets: Always register packet sk in the same order When using fanouts with AF_PACKET, the demux functions such as fanout_demux_cpu will return an index in the fanout socket array, which corresponds to the selected socket. The ordering of this array depends on the order the sockets were added to a given fanout group, so for FANOUT_CPU this means sockets are bound to cpus in the order they are configured, which is OK. However, when stopping then restarting the interface these sockets are bound to, the sockets are reassigned to the fanout group in the reverse order, due to the fact that they were inserted at the head of the interface's AF_PACKET socket list. This means that traffic that was directed to the first socket in the fanout group is now directed to the last one after an interface restart. In the case of FANOUT_CPU, traffic from CPU0 will be directed to the socket that used to receive traffic from the last CPU after an interface restart. This commit introduces a helper to add a socket at the tail of a list, then uses it to register AF_PACKET sockets. Note that this changes the order in which sockets are listed in /proc and with sock_diag. Fixes: dc99f600698d ("packet: Add fanout support") Signed-off-by: Maxime Chevallier Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/net/sock.h | 6 ++++++ net/packet/af_packet.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 328cb7cb7b0b..8de5ee258b93 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -710,6 +710,12 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list) hlist_add_head_rcu(&sk->sk_node, list); } +static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list) +{ + sock_hold(sk); + hlist_add_tail_rcu(&sk->sk_node, list); +} + static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8376bc1c1508..8754d7c93b84 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3243,7 +3243,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, } mutex_lock(&net->packet.sklist_lock); - sk_add_node_rcu(sk, &net->packet.sklist); + sk_add_node_tail_rcu(sk, &net->packet.sklist); mutex_unlock(&net->packet.sklist_lock); preempt_disable(); From 18bed89107a400af0d672ec85a270f1545db2569 Mon Sep 17 00:00:00 2001 From: Yoshiki Komachi Date: Mon, 18 Mar 2019 14:39:52 +0900 Subject: [PATCH 115/468] af_packet: fix the tx skb protocol in raw sockets with ETH_P_ALL I am using "protocol ip" filters in TC to manipulate TC flower classifiers, which are only available with "protocol ip". However, I faced an issue that packets sent via raw sockets with ETH_P_ALL did not match the ip filters even if they did satisfy the condition (e.g., DHCP offer from dhcpd). I have determined that the behavior was caused by an unexpected value stored in skb->protocol, namely, ETH_P_ALL instead of ETH_P_IP, when packets were sent via raw sockets with ETH_P_ALL set. IMHO, storing ETH_P_ALL in skb->protocol is not appropriate for packets sent via raw sockets because ETH_P_ALL is not a real ether type used on wire, but a virtual one. This patch fixes the tx protocol selection in cases of transmission via raw sockets created with ETH_P_ALL so that it asks the driver to extract protocol from the Ethernet header. Fixes: 75c65772c3 ("net/packet: Ask driver for protocol if not provided by user") Signed-off-by: Yoshiki Komachi Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- net/packet/af_packet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8754d7c93b84..323655a25674 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1852,7 +1852,8 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, static void packet_parse_headers(struct sk_buff *skb, struct socket *sock) { - if (!skb->protocol && sock->type == SOCK_RAW) { + if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) && + sock->type == SOCK_RAW) { skb_reset_mac_header(skb); skb->protocol = dev_parse_header_protocol(skb); } From 273160ffc6b993c7c91627f5a84799c66dfe4dee Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 19:47:00 +0800 Subject: [PATCH 116/468] sctp: get sctphdr by offset in sctp_compute_cksum sctp_hdr(skb) only works when skb->transport_header is set properly. But in Netfilter, skb->transport_header for ipv6 is not guaranteed to be right value for sctphdr. It would cause to fail to check the checksum for sctp packets. So fix it by using offset, which is always right in all places. v1->v2: - Fix the changelog. Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang Signed-off-by: Xin Long Signed-off-by: David S. Miller --- include/net/sctp/checksum.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/sctp/checksum.h b/include/net/sctp/checksum.h index 32ee65a30aff..1c6e6c0766ca 100644 --- a/include/net/sctp/checksum.h +++ b/include/net/sctp/checksum.h @@ -61,7 +61,7 @@ static inline __wsum sctp_csum_combine(__wsum csum, __wsum csum2, static inline __le32 sctp_compute_cksum(const struct sk_buff *skb, unsigned int offset) { - struct sctphdr *sh = sctp_hdr(skb); + struct sctphdr *sh = (struct sctphdr *)(skb->data + offset); const struct skb_checksum_ops ops = { .update = sctp_csum_update, .combine = sctp_csum_combine, From 636d25d557d1073281013c43e4ff4737692da2d4 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 19:58:29 +0800 Subject: [PATCH 117/468] sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant Now sctp_copy_descendant() copies pd_lobby from old sctp scok to new sctp sock. If sctp_sock_migrate() returns error, it will panic when releasing new sock and trying to purge pd_lobby due to the incorrect pointers in pd_lobby. [ 120.485116] kasan: CONFIG_KASAN_INLINE enabled [ 120.486270] kasan: GPF could be caused by NULL-ptr deref or user [ 120.509901] Call Trace: [ 120.510443] sctp_ulpevent_free+0x1e8/0x490 [sctp] [ 120.511438] sctp_queue_purge_ulpevents+0x97/0xe0 [sctp] [ 120.512535] sctp_close+0x13a/0x700 [sctp] [ 120.517483] inet_release+0xdc/0x1c0 [ 120.518215] __sock_release+0x1d2/0x2a0 [ 120.519025] sctp_do_peeloff+0x30f/0x3c0 [sctp] We fix it by not copying sctp_sock pd_lobby in sctp_copy_descendan(), and skb_queue_head_init() can also be removed in sctp_sock_migrate(). Reported-by: syzbot+85e0b422ff140b03672a@syzkaller.appspotmail.com Fixes: 89664c623617 ("sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() fails") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 6140471efd4b..65b538604c5b 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -9169,7 +9169,7 @@ static inline void sctp_copy_descendant(struct sock *sk_to, { int ancestor_size = sizeof(struct inet_sock) + sizeof(struct sctp_sock) - - offsetof(struct sctp_sock, auto_asconf_list); + offsetof(struct sctp_sock, pd_lobby); if (sk_from->sk_family == PF_INET6) ancestor_size += sizeof(struct ipv6_pinfo); @@ -9253,7 +9253,6 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, * 2) Peeling off partial delivery; keep pd_lobby in new pd_lobby. * 3) Peeling off non-partial delivery; move pd_lobby to receive_queue. */ - skb_queue_head_init(&newsp->pd_lobby); atomic_set(&sctp_sk(newsk)->pd_mode, assoc->ulpq.pd_mode); if (atomic_read(&sctp_sk(oldsk)->pd_mode)) { From 1354e72fabf4d8763817564648984351755f0ccb Mon Sep 17 00:00:00 2001 From: Marcelo Ricardo Leitner Date: Mon, 18 Mar 2019 20:05:59 +0800 Subject: [PATCH 118/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt Currently if the user pass an invalid asoc_id to SCTP_DEFAULT_SEND_PARAM on a TCP-style socket, it will silently ignore the new parameters. That's because after not finding an asoc, it is checking asoc_id against the known values of CURRENT/FUTURE/ALL values and that fails to match. IOW, if the user supplies an invalid asoc id or not, it should either match the current asoc or the socket itself so that it will inherit these later. Fixes it by forcing asoc_id to SCTP_FUTURE_ASSOC in case it is a TCP-style socket without an asoc, so that the values get set on the socket. Fixes: 707e45b3dc5a ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt") Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 65b538604c5b..d0e5c627a266 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3024,6 +3024,9 @@ static int sctp_setsockopt_default_send_param(struct sock *sk, return 0; } + if (sctp_style(sk, TCP)) + info.sinfo_assoc_id = SCTP_FUTURE_ASSOC; + if (info.sinfo_assoc_id == SCTP_FUTURE_ASSOC || info.sinfo_assoc_id == SCTP_ALL_ASSOC) { sp->default_stream = info.sinfo_stream; From 8e2614fc1c2a525d3244df65da486fc914a2bf78 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:00 +0800 Subject: [PATCH 119/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DELAYED_SACK sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DELAYED_SACK sockopt. Fixes: 9c5829e1c49e ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index d0e5c627a266..4c5821befaf3 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2920,6 +2920,9 @@ static int sctp_setsockopt_delayed_ack(struct sock *sk, return 0; } + if (sctp_style(sk, TCP)) + params.sack_assoc_id = SCTP_FUTURE_ASSOC; + if (params.sack_assoc_id == SCTP_FUTURE_ASSOC || params.sack_assoc_id == SCTP_ALL_ASSOC) { if (params.sack_delay) { From a842e65b25a418363bf8196e2343123a984ee69b Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:01 +0800 Subject: [PATCH 120/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SNDINFO sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_SNDINFO sockopt. Fixes: 92fc3bd928c9 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 4c5821befaf3..3bac03931b67 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3087,6 +3087,9 @@ static int sctp_setsockopt_default_sndinfo(struct sock *sk, return 0; } + if (sctp_style(sk, TCP)) + info.snd_assoc_id = SCTP_FUTURE_ASSOC; + if (info.snd_assoc_id == SCTP_FUTURE_ASSOC || info.snd_assoc_id == SCTP_ALL_ASSOC) { sp->default_stream = info.snd_sid; From cface2cb585e392995cc11a4a814b433e6099ec7 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:02 +0800 Subject: [PATCH 121/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_CONTEXT sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_CONTEXT sockopt. Fixes: 49b037acca8c ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 3bac03931b67..b024a625f78a 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3540,6 +3540,9 @@ static int sctp_setsockopt_context(struct sock *sk, char __user *optval, return 0; } + if (sctp_style(sk, TCP)) + params.assoc_id = SCTP_FUTURE_ASSOC; + if (params.assoc_id == SCTP_FUTURE_ASSOC || params.assoc_id == SCTP_ALL_ASSOC) sp->default_rcv_context = params.assoc_value; From 746bc215a6b223caf7eb6b33400458c15e742920 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:03 +0800 Subject: [PATCH 122/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_MAX_BURST sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_MAX_BURST sockopt. Fixes: e0651a0dc877 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index b024a625f78a..df879b163414 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3682,6 +3682,9 @@ static int sctp_setsockopt_maxburst(struct sock *sk, return 0; } + if (sctp_style(sk, TCP)) + params.assoc_id = SCTP_FUTURE_ASSOC; + if (params.assoc_id == SCTP_FUTURE_ASSOC || params.assoc_id == SCTP_ALL_ASSOC) sp->max_burst = params.assoc_value; From 0685d6b72207a6de7ea6853e48b009e71d64fe1b Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:04 +0800 Subject: [PATCH 123/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_KEY sockopt. Fixes: 7fb3be13a236 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index df879b163414..2ac221c795c2 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3813,6 +3813,9 @@ static int sctp_setsockopt_auth_key(struct sock *sk, goto out; } + if (sctp_style(sk, TCP)) + authkey->sca_assoc_id = SCTP_FUTURE_ASSOC; + if (authkey->sca_assoc_id == SCTP_FUTURE_ASSOC || authkey->sca_assoc_id == SCTP_ALL_ASSOC) { ret = sctp_auth_set_key(ep, asoc, authkey); From 06b39e8506f6dd4e11e1d8fc4d314d72d237ad10 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:05 +0800 Subject: [PATCH 124/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_ACTIVE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_ACTIVE_KEY sockopt. Fixes: bf9fb6ad4f29 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 2ac221c795c2..1d098f0ccbb5 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3871,6 +3871,9 @@ static int sctp_setsockopt_active_key(struct sock *sk, if (asoc) return sctp_auth_set_active_key(ep, asoc, val.scact_keynumber); + if (sctp_style(sk, TCP)) + val.scact_assoc_id = SCTP_FUTURE_ASSOC; + if (val.scact_assoc_id == SCTP_FUTURE_ASSOC || val.scact_assoc_id == SCTP_ALL_ASSOC) { ret = sctp_auth_set_active_key(ep, asoc, val.scact_keynumber); From 220675eb2e485519afbe7f3b457f7ce883086482 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:06 +0800 Subject: [PATCH 125/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_DELETE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DELETE_KEY sockopt. Fixes: 3adcc300603e ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 1d098f0ccbb5..087ca0bb86bd 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3925,6 +3925,9 @@ static int sctp_setsockopt_del_key(struct sock *sk, if (asoc) return sctp_auth_del_key_id(ep, asoc, val.scact_keynumber); + if (sctp_style(sk, TCP)) + val.scact_assoc_id = SCTP_FUTURE_ASSOC; + if (val.scact_assoc_id == SCTP_FUTURE_ASSOC || val.scact_assoc_id == SCTP_ALL_ASSOC) { ret = sctp_auth_del_key_id(ep, asoc, val.scact_keynumber); From 200f3a3bcb293d8d55b860632b9d5c9b5e763273 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:07 +0800 Subject: [PATCH 126/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_DEACTIVATE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DEACTIVATE_KEY sockopt. Fixes: 2af66ff3edc7 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 087ca0bb86bd..0187444567e0 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3978,6 +3978,9 @@ static int sctp_setsockopt_deactivate_key(struct sock *sk, char __user *optval, if (asoc) return sctp_auth_deact_key_id(ep, asoc, val.scact_keynumber); + if (sctp_style(sk, TCP)) + val.scact_assoc_id = SCTP_FUTURE_ASSOC; + if (val.scact_assoc_id == SCTP_FUTURE_ASSOC || val.scact_assoc_id == SCTP_ALL_ASSOC) { ret = sctp_auth_deact_key_id(ep, asoc, val.scact_keynumber); From cbb45c6cd5e64c344798892d6e200b0b253d0b59 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:08 +0800 Subject: [PATCH 127/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_PRINFO sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_PRINFO sockopt. Fixes: 3a583059d187 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 0187444567e0..238e5c804cc9 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4196,6 +4196,9 @@ static int sctp_setsockopt_default_prinfo(struct sock *sk, goto out; } + if (sctp_style(sk, TCP)) + info.pr_assoc_id = SCTP_FUTURE_ASSOC; + if (info.pr_assoc_id == SCTP_FUTURE_ASSOC || info.pr_assoc_id == SCTP_ALL_ASSOC) { SCTP_PR_SET_POLICY(sp->default_flags, info.pr_policy); From 9430ff992644e9f9c3ba6283cc56d40b421522e9 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:09 +0800 Subject: [PATCH 128/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_ENABLE_STREAM_RESET sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_ENABLE_STREAM_RESET sockopt. Fixes: 99a62135e127 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 238e5c804cc9..6b0ee8946bdb 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4281,6 +4281,9 @@ static int sctp_setsockopt_enable_strreset(struct sock *sk, goto out; } + if (sctp_style(sk, TCP)) + params.assoc_id = SCTP_FUTURE_ASSOC; + if (params.assoc_id == SCTP_FUTURE_ASSOC || params.assoc_id == SCTP_ALL_ASSOC) ep->strreset_enable = params.assoc_value; From 995186193fd7a21c8fc6a2f2a96d33e26447eb01 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:10 +0800 Subject: [PATCH 129/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_EVENT sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_EVENT sockopt. Fixes: d251f05e3ba2 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 6b0ee8946bdb..10df48aa4e2c 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4574,6 +4574,9 @@ static int sctp_setsockopt_event(struct sock *sk, char __user *optval, if (asoc) return sctp_assoc_ulpevent_type_set(¶m, asoc); + if (sctp_style(sk, TCP)) + param.se_assoc_id = SCTP_FUTURE_ASSOC; + if (param.se_assoc_id == SCTP_FUTURE_ASSOC || param.se_assoc_id == SCTP_ALL_ASSOC) sctp_ulpevent_type_set(&sp->subscribe, From b59c19d9d901a8eb04896ec027787a55acb71fc6 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 20:06:11 +0800 Subject: [PATCH 130/468] sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_STREAM_SCHEDULER sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_STREAM_SCHEDULER sockopt. Fixes: 7efba10d6bd2 ("sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt") Signed-off-by: Xin Long Signed-off-by: David S. Miller --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 10df48aa4e2c..011c349d877a 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4409,6 +4409,9 @@ static int sctp_setsockopt_scheduler(struct sock *sk, if (asoc) return sctp_sched_set_sched(asoc, params.assoc_value); + if (sctp_style(sk, TCP)) + params.assoc_id = SCTP_FUTURE_ASSOC; + if (params.assoc_id == SCTP_FUTURE_ASSOC || params.assoc_id == SCTP_ALL_ASSOC) sp->default_ss = params.assoc_value; From fae846e2b7124d4b076ef17791c73addf3b26350 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 18 Mar 2019 08:51:06 -0500 Subject: [PATCH 131/468] mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S The device ID alone does not uniquely identify a device. Test both the vendor and device ID to make sure we don't mistakenly think some other vendor's 0xB410 device is a Digium HFC4S. Also, instead of the bare hex ID, use the same constant (PCI_DEVICE_ID_DIGIUM_HFC4S) used in the device ID table. No functional change intended. Signed-off-by: Bjorn Helgaas Signed-off-by: David S. Miller --- drivers/isdn/hardware/mISDN/hfcmulti.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/isdn/hardware/mISDN/hfcmulti.c b/drivers/isdn/hardware/mISDN/hfcmulti.c index 4d85645c87f7..0928fd1f0e0c 100644 --- a/drivers/isdn/hardware/mISDN/hfcmulti.c +++ b/drivers/isdn/hardware/mISDN/hfcmulti.c @@ -4365,7 +4365,8 @@ setup_pci(struct hfc_multi *hc, struct pci_dev *pdev, if (m->clock2) test_and_set_bit(HFC_CHIP_CLOCK2, &hc->chip); - if (ent->device == 0xB410) { + if (ent->vendor == PCI_VENDOR_ID_DIGIUM && + ent->device == PCI_DEVICE_ID_DIGIUM_HFC4S) { test_and_set_bit(HFC_CHIP_B410P, &hc->chip); test_and_set_bit(HFC_CHIP_PCM_MASTER, &hc->chip); test_and_clear_bit(HFC_CHIP_PCM_SLAVE, &hc->chip); From 12b409dd32dffad6d800774e2d250adaaaa1fdcd Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Mon, 18 Mar 2019 16:40:54 +0100 Subject: [PATCH 132/468] s390/qeth: don't erase configuration while probing The HW trap and VNICC configuration is exposed via sysfs, and may have already been modified when qeth_l?_probe_device() attempts to initialize them. So (1) initialize the VNICC values a little earlier, and (2) don't bother about the HW trap mode, it was already initialized before. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller --- drivers/s390/net/qeth_l2_main.c | 4 ++-- drivers/s390/net/qeth_l3_main.c | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 8efb2e8ff8f4..3bfdd8545776 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -645,6 +645,8 @@ static int qeth_l2_probe_device(struct ccwgroup_device *gdev) struct qeth_card *card = dev_get_drvdata(&gdev->dev); int rc; + qeth_l2_vnicc_set_defaults(card); + if (gdev->dev.type == &qeth_generic_devtype) { rc = qeth_l2_create_device_attributes(&gdev->dev); if (rc) @@ -652,8 +654,6 @@ static int qeth_l2_probe_device(struct ccwgroup_device *gdev) } hash_init(card->mac_htable); - card->info.hwtrap = 0; - qeth_l2_vnicc_set_defaults(card); return 0; } diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 7e68d9d16859..2f002843b16e 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2260,7 +2260,6 @@ static int qeth_l3_probe_device(struct ccwgroup_device *gdev) } hash_init(card->ip_htable); hash_init(card->ip_mc_htable); - card->info.hwtrap = 0; return 0; } From 7221b727f0079a32aca91f657141e1de564d4b97 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Mon, 18 Mar 2019 16:40:55 +0100 Subject: [PATCH 133/468] s390/qeth: fix race when initializing the IP address table The ucast IP table is utilized by some of the L3-specific sysfs attributes that qeth_l3_create_device_attributes() provides. So initialize the table _before_ registering the attributes. Fixes: ebccc7397e4a ("s390/qeth: add missing hash table initializations") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller --- drivers/s390/net/qeth_l3_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 2f002843b16e..d805e72edb58 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2253,12 +2253,14 @@ static int qeth_l3_probe_device(struct ccwgroup_device *gdev) struct qeth_card *card = dev_get_drvdata(&gdev->dev); int rc; + hash_init(card->ip_htable); + if (gdev->dev.type == &qeth_generic_devtype) { rc = qeth_l3_create_device_attributes(&gdev->dev); if (rc) return rc; } - hash_init(card->ip_htable); + hash_init(card->ip_mc_htable); return 0; } From 104b48592b5441c722dcd95c38ab9300f2d94856 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Mon, 18 Mar 2019 16:40:56 +0100 Subject: [PATCH 134/468] s390/qeth: be drop monitor friendly As part of the TX completion path, qeth_release_skbs() frees the completed skbs with __skb_queue_purge(). This ends in kfree_skb(), reporting every completed skb as dropped. On the other hand when dropping an skb in .ndo_start_xmit, we end up calling consume_skb()... where we should be using kfree_skb() so that drop monitors get notified. Switch the drop/consume logic around, and also don't accumulate dropped packets in the tx_errors statistics. Fixes: dc149e3764d8 ("s390/qeth: replace open-coded skb_queue_walk()") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller --- drivers/s390/net/qeth_core_main.c | 5 ++++- drivers/s390/net/qeth_l2_main.c | 3 +-- drivers/s390/net/qeth_l3_main.c | 3 +-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 197b0f5b63e7..44bd6f04c145 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -1150,13 +1150,16 @@ static void qeth_notify_skbs(struct qeth_qdio_out_q *q, static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf) { + struct sk_buff *skb; + /* release may never happen from within CQ tasklet scope */ WARN_ON_ONCE(atomic_read(&buf->state) == QETH_QDIO_BUF_IN_CQ); if (atomic_read(&buf->state) == QETH_QDIO_BUF_PENDING) qeth_notify_skbs(buf->q, buf, TX_NOTIFY_GENERALERROR); - __skb_queue_purge(&buf->skb_list); + while ((skb = __skb_dequeue(&buf->skb_list)) != NULL) + consume_skb(skb); } static void qeth_clear_output_buffer(struct qeth_qdio_out_q *queue, diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 3bfdd8545776..c3067fd3bd9e 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -629,8 +629,7 @@ static netdev_tx_t qeth_l2_hard_start_xmit(struct sk_buff *skb, } /* else fall through */ QETH_TXQ_STAT_INC(queue, tx_dropped); - QETH_TXQ_STAT_INC(queue, tx_errors); - dev_kfree_skb_any(skb); + kfree_skb(skb); netif_wake_queue(dev); return NETDEV_TX_OK; } diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index d805e72edb58..53712cf26406 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2096,8 +2096,7 @@ static netdev_tx_t qeth_l3_hard_start_xmit(struct sk_buff *skb, tx_drop: QETH_TXQ_STAT_INC(queue, tx_dropped); - QETH_TXQ_STAT_INC(queue, tx_errors); - dev_kfree_skb_any(skb); + kfree_skb(skb); netif_wake_queue(dev); return NETDEV_TX_OK; } From 4a9be28c45bf02fa0436808bb6c0baeba30e120e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 19 Mar 2019 11:33:24 +1100 Subject: [PATCH 135/468] NFS: fix mount/umount race in nlmclnt. If the last NFSv3 unmount from a given host races with a mount from the same host, we can destroy an nlm_host that is still in use. Specifically nlmclnt_lookup_host() can increment h_count on an nlm_host that nlmclnt_release_host() has just successfully called refcount_dec_and_test() on. Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock() will be called to destroy the nlmclnt which is now in use again. The cause of the problem is that the dec_and_test happens outside the locked region. This is easily fixed by using refcount_dec_and_mutex_lock(). Fixes: 8ea6ecc8b075 ("lockd: Create client-side nlm_host cache") Cc: stable@vger.kernel.org (v2.6.38+) Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- fs/lockd/host.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lockd/host.c b/fs/lockd/host.c index 93fb7cf0b92b..f0b5c987d6ae 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -290,12 +290,11 @@ void nlmclnt_release_host(struct nlm_host *host) WARN_ON_ONCE(host->h_server); - if (refcount_dec_and_test(&host->h_count)) { + if (refcount_dec_and_mutex_lock(&host->h_count, &nlm_host_mutex)) { WARN_ON_ONCE(!list_empty(&host->h_lockowners)); WARN_ON_ONCE(!list_empty(&host->h_granted)); WARN_ON_ONCE(!list_empty(&host->h_reclaim)); - mutex_lock(&nlm_host_mutex); nlm_destroy_host_locked(host); mutex_unlock(&nlm_host_mutex); } From ffa91253739ca89fc997195d8bbd1f7ba3e29fbe Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 18 Mar 2019 11:07:33 -0700 Subject: [PATCH 136/468] Documentation: networking: Update netdev-FAQ regarding patches Provide an explanation of what is expected with respect to sending new versions of specific patches within a patch series, as well as what happens if an earlier patch series accidentally gets merged). Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/networking/netdev-FAQ.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Documentation/networking/netdev-FAQ.rst b/Documentation/networking/netdev-FAQ.rst index 0ac5fa77f501..8c7a713cf657 100644 --- a/Documentation/networking/netdev-FAQ.rst +++ b/Documentation/networking/netdev-FAQ.rst @@ -131,6 +131,19 @@ it to the maintainer to figure out what is the most recent and current version that should be applied. If there is any doubt, the maintainer will reply and ask what should be done. +Q: I made changes to only a few patches in a patch series should I resend only those changed? +-------------------------------------------------------------------------------------------- +A: No, please resend the entire patch series and make sure you do number your +patches such that it is clear this is the latest and greatest set of patches +that can be applied. + +Q: I submitted multiple versions of a patch series and it looks like a version other than the last one has been accepted, what should I do? +------------------------------------------------------------------------------------------------------------------------------------------- +A: There is no revert possible, once it is pushed out, it stays like that. +Please send incremental versions on top of what has been merged in order to fix +the patches the way they would look like if your latest patch series was to be +merged. + Q: How can I tell what patches are queued up for backporting to the various stable releases? -------------------------------------------------------------------------------------------- A: Normally Greg Kroah-Hartman collects stable commits himself, but for From 1a7ee0efb26d6e25433c6d4428028ac614f55ff1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Fri, 1 Mar 2019 08:26:42 +0100 Subject: [PATCH 137/468] ARM: dts: imx6dl-yapp4: Use rgmii-id phy mode on the cpu port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use rgmii-id phy mode for the CPU port (MAC0) of the QCA8334 switch to add delays to both Tx and Rx clock. It worked with the rgmii mode before because the qca8k driver (incorrectly) enabled delays in that mode and rgmii-id was not implemented at all. Commit 5ecdd77c61c8 ("net: dsa: qca8k: disable delay for RGMII mode") removed the delays from the RGMII mode and hence broke the networking. To fix the problem, commit a968b5e9d587 ("net: dsa: qca8k: Enable delay for RGMII_ID mode") was introduced. Now the correct phy mode is available so use it. Signed-off-by: Michal Vokáč Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi index b715ab0fa1ff..091d829f6b05 100644 --- a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi +++ b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi @@ -125,7 +125,7 @@ switch_ports: ports { ethphy0: port@0 { reg = <0>; label = "cpu"; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; ethernet = <&fec>; fixed-link { From 0c17e83fe423467e3ccf0a02f99bd050a73bbeb4 Mon Sep 17 00:00:00 2001 From: Wen Yang Date: Fri, 1 Mar 2019 16:56:46 +0800 Subject: [PATCH 138/468] ARM: imx51: fix a leaked reference by adding missing of_node_put The call to of_get_next_child returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./arch/arm/mach-imx/mach-imx51.c:64:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 57, but without a corresponding object release within this function. Signed-off-by: Wen Yang Cc: Russell King Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Lucas Stach Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Shawn Guo --- arch/arm/mach-imx/mach-imx51.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mach-imx/mach-imx51.c b/arch/arm/mach-imx/mach-imx51.c index c7169c2f94c4..08c7892866c2 100644 --- a/arch/arm/mach-imx/mach-imx51.c +++ b/arch/arm/mach-imx/mach-imx51.c @@ -59,6 +59,7 @@ static void __init imx51_m4if_setup(void) return; m4if_base = of_iomap(np, 0); + of_node_put(np); if (!m4if_base) { pr_err("Unable to map M4IF registers\n"); return; From 91740fc8242b4f260cfa4d4536d8551804777fae Mon Sep 17 00:00:00 2001 From: Kohji Okuno Date: Tue, 26 Feb 2019 11:34:13 +0900 Subject: [PATCH 139/468] ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time In the current cpuidle implementation for i.MX6q, the CPU that sets 'WAIT_UNCLOCKED' and the CPU that returns to 'WAIT_CLOCKED' are always the same. While the CPU that sets 'WAIT_UNCLOCKED' is in IDLE state of "WAIT", if the other CPU wakes up and enters IDLE state of "WFI" istead of "WAIT", this CPU can not wake up at expired time. Because, in the case of "WFI", the CPU must be waked up by the local timer interrupt. But, while 'WAIT_UNCLOCKED' is set, the local timer is stopped, when all CPUs execute "wfi" instruction. As a result, the local timer interrupt is not fired. In this situation, this CPU will wake up by IRQ different from local timer. (e.g. broacast timer) So, this fix changes CPU to return to 'WAIT_CLOCKED'. Signed-off-by: Kohji Okuno Fixes: e5f9dec8ff5f ("ARM: imx6q: support WAIT mode using cpuidle") Cc: Signed-off-by: Shawn Guo --- arch/arm/mach-imx/cpuidle-imx6q.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c index bfeb25aaf9a2..326e870d7123 100644 --- a/arch/arm/mach-imx/cpuidle-imx6q.c +++ b/arch/arm/mach-imx/cpuidle-imx6q.c @@ -16,30 +16,23 @@ #include "cpuidle.h" #include "hardware.h" -static atomic_t master = ATOMIC_INIT(0); -static DEFINE_SPINLOCK(master_lock); +static int num_idle_cpus = 0; +static DEFINE_SPINLOCK(cpuidle_lock); static int imx6q_enter_wait(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - if (atomic_inc_return(&master) == num_online_cpus()) { - /* - * With this lock, we prevent other cpu to exit and enter - * this function again and become the master. - */ - if (!spin_trylock(&master_lock)) - goto idle; + spin_lock(&cpuidle_lock); + if (++num_idle_cpus == num_online_cpus()) imx6_set_lpm(WAIT_UNCLOCKED); - cpu_do_idle(); - imx6_set_lpm(WAIT_CLOCKED); - spin_unlock(&master_lock); - goto done; - } + spin_unlock(&cpuidle_lock); -idle: cpu_do_idle(); -done: - atomic_dec(&master); + + spin_lock(&cpuidle_lock); + if (num_idle_cpus-- == num_online_cpus()) + imx6_set_lpm(WAIT_CLOCKED); + spin_unlock(&cpuidle_lock); return index; } From d1252f0237238b912c3e7a51bf237acf34c97983 Mon Sep 17 00:00:00 2001 From: Kristian Evensen Date: Sat, 2 Mar 2019 13:35:53 +0100 Subject: [PATCH 140/468] USB: serial: option: add support for Quectel EM12 The Quectel EM12 is a Cat. 12 LTE modem. It behaves in the exactly the same way as the EP06 (including the dynamic configuration behavior), so the same checks on reserved interfaces, etc. are needed. Signed-off-by: Kristian Evensen Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 11b21d9410f3..561d0b641120 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -246,6 +246,7 @@ static void option_instat_callback(struct urb *urb); #define QUECTEL_PRODUCT_EC25 0x0125 #define QUECTEL_PRODUCT_BG96 0x0296 #define QUECTEL_PRODUCT_EP06 0x0306 +#define QUECTEL_PRODUCT_EM12 0x0512 #define CMOTECH_VENDOR_ID 0x16d8 #define CMOTECH_PRODUCT_6001 0x6001 @@ -1087,6 +1088,9 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0xff, 0xff), .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) | NUMEP2 }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0, 0) }, + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM12, 0xff, 0xff, 0xff), + .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) | NUMEP2 }, + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM12, 0xff, 0, 0) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6003), From 422c2537ba9d42320f8ab6573940269f87095320 Mon Sep 17 00:00:00 2001 From: George McCollister Date: Tue, 5 Mar 2019 16:05:03 -0600 Subject: [PATCH 141/468] USB: serial: ftdi_sio: add additional NovaTech products Add PIDs for the NovaTech OrionLX+ and Orion I/O so they can be automatically detected. Signed-off-by: George McCollister Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/ftdi_sio.c | 2 ++ drivers/usb/serial/ftdi_sio_ids.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c index 8f5b17471759..1d8461ae2c34 100644 --- a/drivers/usb/serial/ftdi_sio.c +++ b/drivers/usb/serial/ftdi_sio.c @@ -609,6 +609,8 @@ static const struct usb_device_id id_table_combined[] = { .driver_info = (kernel_ulong_t)&ftdi_jtag_quirk }, { USB_DEVICE(FTDI_VID, FTDI_NT_ORIONLXM_PID), .driver_info = (kernel_ulong_t)&ftdi_jtag_quirk }, + { USB_DEVICE(FTDI_VID, FTDI_NT_ORIONLX_PLUS_PID) }, + { USB_DEVICE(FTDI_VID, FTDI_NT_ORION_IO_PID) }, { USB_DEVICE(FTDI_VID, FTDI_SYNAPSE_SS200_PID) }, { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX_PID) }, { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX2_PID) }, diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h index b863bedb55a1..5755f0df0025 100644 --- a/drivers/usb/serial/ftdi_sio_ids.h +++ b/drivers/usb/serial/ftdi_sio_ids.h @@ -567,7 +567,9 @@ /* * NovaTech product ids (FTDI_VID) */ -#define FTDI_NT_ORIONLXM_PID 0x7c90 /* OrionLXm Substation Automation Platform */ +#define FTDI_NT_ORIONLXM_PID 0x7c90 /* OrionLXm Substation Automation Platform */ +#define FTDI_NT_ORIONLX_PLUS_PID 0x7c91 /* OrionLX+ Substation Automation Platform */ +#define FTDI_NT_ORION_IO_PID 0x7c92 /* Orion I/O */ /* * Synapse Wireless product ids (FTDI_VID) From f8df5c2c3e2df5ffaf9fb5503da93d477a8c7db4 Mon Sep 17 00:00:00 2001 From: Mans Rullgard Date: Tue, 26 Feb 2019 17:07:10 +0000 Subject: [PATCH 142/468] USB: serial: option: set driver_info for SIM5218 and compatibles The SIMCom SIM5218 and compatible devices have 5 USB interfaces, only 4 of which are serial ports. The fifth is a network interface supported by the qmi-wwan driver. Furthermore, the serial ports do not support modem control signals. Add driver_info flags to reflect this. Signed-off-by: Mans Rullgard Fixes: ec0cd94d881c ("usb: option: add SIMCom SIM5218") Cc: stable # 3.2 Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 561d0b641120..db5ece767d59 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1067,7 +1067,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(3) }, { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x6613)}, /* Onda H600/ZTE MF330 */ { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x0023)}, /* ONYX 3G device */ - { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x9000)}, /* SIMCom SIM5218 */ + { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x9000), /* SIMCom SIM5218 */ + .driver_info = NCTRL(0) | NCTRL(1) | NCTRL(2) | NCTRL(3) | RSVD(4) }, /* Quectel products using Qualcomm vendor ID */ { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC15)}, { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC20), From 7ff2c2a1a71e83f74574b8001ea88deb3c166ad7 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Mon, 18 Mar 2019 17:45:19 +0200 Subject: [PATCH 143/468] btrfs: Fix bound checking in qgroup_trace_new_subtree_blocks If 'cur_level' is 7 then the bound checking at the top of the function will actually pass. Later on, it's possible to dereference ds_path->nodes[cur_level+1] which will be an out of bounds. The correct check will be cur_level >= BTRFS_MAX_LEVEL - 1 . Fixes-coverty-id: 1440918 Fixes-coverty-id: 1440911 Fixes: ea49f3e73c4b ("btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree") CC: stable@vger.kernel.org # 4.20+ Reviewed-by: Qu Wenruo Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/qgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index eb680b715dd6..e659d9d61107 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1922,8 +1922,8 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans, int i; /* Level sanity check */ - if (cur_level < 0 || cur_level >= BTRFS_MAX_LEVEL || - root_level < 0 || root_level >= BTRFS_MAX_LEVEL || + if (cur_level < 0 || cur_level >= BTRFS_MAX_LEVEL - 1 || + root_level < 0 || root_level >= BTRFS_MAX_LEVEL - 1 || root_level < cur_level) { btrfs_err_rl(fs_info, "%s: bad levels, cur_level=%d root_level=%d", From 139a56170de67101791d6e6c8e940c6328393fe9 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Mon, 18 Mar 2019 17:45:20 +0200 Subject: [PATCH 144/468] btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size qgroup_rsv_size is calculated as the product of outstanding_extent * fs_info->nodesize. The product is calculated with 32 bit precision since both variables are defined as u32. Yet qgroup_rsv_size expects a 64 bit result. Avoid possible multiplication overflow by casting outstanding_extent to u64. Such overflow would in the worst case (64K nodesize) require more than 65536 extents, which is quite large and i'ts not likely that it would happen in practice. Fixes-coverity-id: 1435101 Fixes: ff6bc37eb7f6 ("btrfs: qgroup: Use independent and accurate per inode qgroup rsv") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1d49694e6ae3..c5880329ae37 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6174,7 +6174,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info, * * This is overestimating in most cases. */ - qgroup_rsv_size = outstanding_extents * fs_info->nodesize; + qgroup_rsv_size = (u64)outstanding_extents * fs_info->nodesize; spin_lock(&block_rsv->lock); block_rsv->size = reserve_size; From e82adc1074a7356f1158233551df9e86b7ebfb82 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 18 Mar 2019 16:18:30 -0500 Subject: [PATCH 145/468] usb: typec: Fix unchecked return value Currently there is no check on platform_get_irq() return value in case it fails, hence never actually reporting any errors and causing unexpected behavior when using such value as argument for function regmap_irq_get_virq(). Fix this by adding a proper check, a message error and return *irq* in case platform_get_irq() fails. Addresses-Coverity-ID: 1443899 ("Improper use of negative value") Fixes: d2061f9cc32d ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Reviewed-by: Guenter Roeck Acked-by: Heikki Krogerus Signed-off-by: Greg Kroah-Hartman --- drivers/usb/typec/tcpm/wcove.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/usb/typec/tcpm/wcove.c b/drivers/usb/typec/tcpm/wcove.c index 423208e19383..6770afd40765 100644 --- a/drivers/usb/typec/tcpm/wcove.c +++ b/drivers/usb/typec/tcpm/wcove.c @@ -615,8 +615,13 @@ static int wcove_typec_probe(struct platform_device *pdev) wcove->dev = &pdev->dev; wcove->regmap = pmic->regmap; - irq = regmap_irq_get_virq(pmic->irq_chip_data_chgr, - platform_get_irq(pdev, 0)); + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_err(&pdev->dev, "Failed to get IRQ: %d\n", irq); + return irq; + } + + irq = regmap_irq_get_virq(pmic->irq_chip_data_chgr, irq); if (irq < 0) return irq; From 40fc165304f0faaae78b761f8ee30b5d216b1850 Mon Sep 17 00:00:00 2001 From: Yasushi Asano Date: Mon, 18 Feb 2019 11:26:34 +0100 Subject: [PATCH 146/468] usb: host: xhci-rcar: Add XHCI_TRUST_TX_LENGTH quirk When plugging BUFFALO LUA4-U3-AGT USB3.0 to Gigabit Ethernet LAN Adapter, warning messages filled up dmesg. [ 101.098287] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk? [ 101.117463] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk? [ 101.136513] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk? Adding the XHCI_TRUST_TX_LENGTH quirk resolves the issue. Signed-off-by: Yasushi Asano Signed-off-by: Spyridon Papageorgiou Acked-by: Yoshihiro Shimoda Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-rcar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/host/xhci-rcar.c b/drivers/usb/host/xhci-rcar.c index a6e463715779..671bce18782c 100644 --- a/drivers/usb/host/xhci-rcar.c +++ b/drivers/usb/host/xhci-rcar.c @@ -246,6 +246,7 @@ int xhci_rcar_init_quirk(struct usb_hcd *hcd) if (!xhci_rcar_wait_for_pll_active(hcd)) return -ETIMEDOUT; + xhci->quirks |= XHCI_TRUST_TX_LENGTH; return xhci_rcar_download_firmware(hcd); } From 976daf9d1199932df80e7b04546d1a1bd4ed5ece Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sat, 16 Mar 2019 16:57:12 +0100 Subject: [PATCH 147/468] usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps PD 2.0 sinks are supposed to accept src-capabilities with a 3.0 header and simply ignore any src PDOs which the sink does not understand such as PPS but some 2.0 sinks instead ignore the entire PD_DATA_SOURCE_CAP message, causing contract negotiation to fail. This commit fixes such sinks not working by re-trying the contract negotiation with PD-2.0 source-caps messages if we don't have a contract after PD_N_HARD_RESET_COUNT hard-reset attempts. The problem fixed by this commit was noticed with a Type-C to VGA dongle. Signed-off-by: Hans de Goede Reviewed-by: Guenter Roeck Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/typec/tcpm/tcpm.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index 0f62db091d8d..a2233d72ae7c 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -37,6 +37,7 @@ S(SRC_ATTACHED), \ S(SRC_STARTUP), \ S(SRC_SEND_CAPABILITIES), \ + S(SRC_SEND_CAPABILITIES_TIMEOUT), \ S(SRC_NEGOTIATE_CAPABILITIES), \ S(SRC_TRANSITION_SUPPLY), \ S(SRC_READY), \ @@ -2966,10 +2967,34 @@ static void run_state_machine(struct tcpm_port *port) /* port->hard_reset_count = 0; */ port->caps_count = 0; port->pd_capable = true; - tcpm_set_state_cond(port, hard_reset_state(port), + tcpm_set_state_cond(port, SRC_SEND_CAPABILITIES_TIMEOUT, PD_T_SEND_SOURCE_CAP); } break; + case SRC_SEND_CAPABILITIES_TIMEOUT: + /* + * Error recovery for a PD_DATA_SOURCE_CAP reply timeout. + * + * PD 2.0 sinks are supposed to accept src-capabilities with a + * 3.0 header and simply ignore any src PDOs which the sink does + * not understand such as PPS but some 2.0 sinks instead ignore + * the entire PD_DATA_SOURCE_CAP message, causing contract + * negotiation to fail. + * + * After PD_N_HARD_RESET_COUNT hard-reset attempts, we try + * sending src-capabilities with a lower PD revision to + * make these broken sinks work. + */ + if (port->hard_reset_count < PD_N_HARD_RESET_COUNT) { + tcpm_set_state(port, HARD_RESET_SEND, 0); + } else if (port->negotiated_rev > PD_REV20) { + port->negotiated_rev--; + port->hard_reset_count = 0; + tcpm_set_state(port, SRC_SEND_CAPABILITIES, 0); + } else { + tcpm_set_state(port, hard_reset_state(port), 0); + } + break; case SRC_NEGOTIATE_CAPABILITIES: ret = tcpm_pd_check_request(port); if (ret < 0) { From 238e0268c82789e4c107a37045d529a6dbce51a9 Mon Sep 17 00:00:00 2001 From: Fabrizio Castro Date: Fri, 1 Mar 2019 11:05:45 +0000 Subject: [PATCH 148/468] usb: common: Consider only available nodes for dr_mode There are cases where multiple device tree nodes point to the same phy node by means of the "phys" property, but we should only consider those nodes that are marked as available rather than just any node. Fixes: 98bfb3946695 ("usb: of: add an api to get dr_mode by the phy node") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Fabrizio Castro Signed-off-by: Greg Kroah-Hartman --- drivers/usb/common/common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/common/common.c b/drivers/usb/common/common.c index 48277bbc15e4..73c8e6591746 100644 --- a/drivers/usb/common/common.c +++ b/drivers/usb/common/common.c @@ -145,6 +145,8 @@ enum usb_dr_mode of_usb_get_dr_mode_by_phy(struct device_node *np, int arg0) do { controller = of_find_node_with_property(controller, "phys"); + if (!of_device_is_available(controller)) + continue; index = 0; do { if (arg0 == -1) { From 7c9abe12b359d988970c5e38f0e190249e5ae00f Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 19 Mar 2019 13:51:22 +0100 Subject: [PATCH 149/468] netfilter: nf_flowtable: remove duplicated transition in diagram No direct transition from prerouting to forward hook, routing lookup needs to happen first. Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation") Signed-off-by: Pablo Neira Ayuso --- Documentation/networking/nf_flowtable.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.txt index 54128c50d508..ca2136c76042 100644 --- a/Documentation/networking/nf_flowtable.txt +++ b/Documentation/networking/nf_flowtable.txt @@ -44,10 +44,10 @@ including the Netfilter hooks and the flowtable fastpath bypass. / \ / \ |Routing | / \ --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit \_________/ \__________/ ---------- \____________/ ^ - | ^ | | ^ | - flowtable | | ____\/___ | | - | | | / \ | | - __\/___ | --------->| forward |------------ | + | ^ | ^ | + flowtable | ____\/___ | | + | | / \ | | + __\/___ | | forward |------------ | |-----| | \_________/ | |-----| | 'flow offload' rule | |-----| | adds entry to | From 22feda47b574c2854cc1a8447a2ae18598752375 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 18 Mar 2019 09:50:24 -0500 Subject: [PATCH 150/468] usb: usb251xb: Remove unnecessary comparison of unsigned integer with >= 0 There is no need to compare *port* with >= 0 because such comparison of an unsigned value is always true. Fix this by removing such comparison. Addresses-Coverity-ID: 1443949 ("Unsigned compared against 0") Fixes: 02a50b875046 ("usb: usb251xb: add usb data lane port swap feature") Signed-off-by: Gustavo A. R. Silva Reviewed-by: Richard Leitner Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/usb251xb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/misc/usb251xb.c b/drivers/usb/misc/usb251xb.c index 4d72b7d1d383..2c8e2cad7e10 100644 --- a/drivers/usb/misc/usb251xb.c +++ b/drivers/usb/misc/usb251xb.c @@ -547,7 +547,7 @@ static int usb251xb_get_ofdata(struct usb251xb *hub, */ hub->port_swap = USB251XB_DEF_PORT_SWAP; of_property_for_each_u32(np, "swap-dx-lanes", prop, p, port) { - if ((port >= 0) && (port <= data->port_cnt)) + if (port <= data->port_cnt) hub->port_swap |= BIT(port); } From 32f47179833b63de72427131169809065db6745e Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Mon, 18 Mar 2019 18:50:56 -0500 Subject: [PATCH 151/468] serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference of_match_device on failure to find a matching device can return a NULL pointer. The patch checks for such a scenrio and passes the error upstream. Signed-off-by: Aditya Pakki Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/mvebu-uart.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c index 231f751d1ef4..7e7b1559fa36 100644 --- a/drivers/tty/serial/mvebu-uart.c +++ b/drivers/tty/serial/mvebu-uart.c @@ -810,6 +810,9 @@ static int mvebu_uart_probe(struct platform_device *pdev) return -EINVAL; } + if (!match) + return -ENODEV; + /* Assume that all UART ports have a DT alias or none has */ id = of_alias_get_id(pdev->dev.of_node, "serial"); if (!pdev->dev.of_node || id < 0) From 3a10e3dd52e80b9a97a3346020024d17b2c272d6 Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Mon, 18 Mar 2019 18:44:14 -0500 Subject: [PATCH 152/468] serial: max310x: Fix to avoid potential NULL pointer dereference of_match_device can return a NULL pointer when matching device is not found. This patch avoids a scenario causing NULL pointer derefernce. Signed-off-by: Aditya Pakki Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/max310x.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c index f5bdde405627..450ba6d7996c 100644 --- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -1415,6 +1415,8 @@ static int max310x_spi_probe(struct spi_device *spi) if (spi->dev.of_node) { const struct of_device_id *of_id = of_match_device(max310x_dt_ids, &spi->dev); + if (!of_id) + return -ENODEV; devtype = (struct max310x_devtype *)of_id->data; } else { From c85be041065c0be8bc48eda4c45e0319caf1d0e5 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Fri, 15 Mar 2019 12:16:06 -0500 Subject: [PATCH 153/468] tty: atmel_serial: fix a potential NULL pointer dereference In case dmaengine_prep_dma_cyclic fails, the fix returns a proper error code to avoid NULL pointer dereference. Signed-off-by: Kangjie Lu Fixes: 34df42f59a60 ("serial: at91: add rx dma support") Acked-by: Richard Genoud Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index 05147fe24343..41b728d223d1 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -1288,6 +1288,10 @@ static int atmel_prepare_rx_dma(struct uart_port *port) sg_dma_len(&atmel_port->sg_rx)/2, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT); + if (!desc) { + dev_err(port->dev, "Preparing DMA cyclic failed\n"); + goto chan_err; + } desc->callback = atmel_complete_rx_dma; desc->callback_param = port; atmel_port->desc_rx = desc; From 6734330654dac550f12e932996b868c6d0dcb421 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Thu, 14 Mar 2019 02:21:51 -0500 Subject: [PATCH 154/468] tty: mxs-auart: fix a potential NULL pointer dereference In case ioremap fails, the fix returns -ENOMEM to avoid NULL pointer dereferences. Multiple places use port.membase. Signed-off-by: Kangjie Lu Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/mxs-auart.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c index 27235a526cce..4c188f4079b3 100644 --- a/drivers/tty/serial/mxs-auart.c +++ b/drivers/tty/serial/mxs-auart.c @@ -1686,6 +1686,10 @@ static int mxs_auart_probe(struct platform_device *pdev) s->port.mapbase = r->start; s->port.membase = ioremap(r->start, resource_size(r)); + if (!s->port.membase) { + ret = -ENOMEM; + goto out_disable_clks; + } s->port.ops = &mxs_auart_ops; s->port.iotype = UPIO_MEM; s->port.fifosize = MXS_AUART_FIFO_SIZE; From ac0cdb3d990108df795b676cd0d0e65ac34b2273 Mon Sep 17 00:00:00 2001 From: Mao Wenan Date: Fri, 8 Mar 2019 22:08:31 +0800 Subject: [PATCH 155/468] sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init() Add the missing uart_unregister_driver() and i2c_del_driver() before return from sc16is7xx_init() in the error handling case. Signed-off-by: Mao Wenan Reviewed-by: Vladimir Zapolskiy Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sc16is7xx.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 635178cf3eed..09a183dfc526 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1507,7 +1507,7 @@ static int __init sc16is7xx_init(void) ret = i2c_add_driver(&sc16is7xx_i2c_uart_driver); if (ret < 0) { pr_err("failed to init sc16is7xx i2c --> %d\n", ret); - return ret; + goto err_i2c; } #endif @@ -1515,10 +1515,18 @@ static int __init sc16is7xx_init(void) ret = spi_register_driver(&sc16is7xx_spi_uart_driver); if (ret < 0) { pr_err("failed to init sc16is7xx spi --> %d\n", ret); - return ret; + goto err_spi; } #endif return ret; + +err_spi: +#ifdef CONFIG_SERIAL_SC16IS7XX_I2C + i2c_del_driver(&sc16is7xx_i2c_uart_driver); +#endif +err_i2c: + uart_unregister_driver(&sc16is7xx_uart); + return ret; } module_init(sc16is7xx_init); From c5cbc78acf693f5605d4a85b1327fa7933daf092 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Fri, 8 Mar 2019 11:37:44 -0700 Subject: [PATCH 156/468] tty: serial: qcom_geni_serial: Initialize baud in qcom_geni_console_setup When building with -Wsometimes-uninitialized, Clang warns: drivers/tty/serial/qcom_geni_serial.c:1079:6: warning: variable 'baud' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] It's not wrong; when options is NULL, baud has no default value. Use 9600 as that is a sane default. Link: https://github.com/ClangBuiltLinux/linux/issues/395 Suggested-by: Greg Kroah-Hartman Signed-off-by: Nathan Chancellor Reviewed-by: Nick Desaulniers Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/qcom_geni_serial.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 3bcec1c20219..35e5f9c5d5be 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -1050,7 +1050,7 @@ static int __init qcom_geni_console_setup(struct console *co, char *options) { struct uart_port *uport; struct qcom_geni_serial_port *port; - int baud; + int baud = 9600; int bits = 8; int parity = 'n'; int flow = 'n'; From 72ff51d8dd262d1fef25baedc2ac35116435be47 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20=C5=A0tetiar?= Date: Wed, 6 Mar 2019 17:54:03 +0100 Subject: [PATCH 157/468] serial: ar933x_uart: Fix build failure with disabled console MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Andrey has reported on OpenWrt's bug tracking system[1], that he currently can't use ar93xx_uart as pure serial UART without console (CONFIG_SERIAL_8250_CONSOLE and CONFIG_SERIAL_AR933X_CONSOLE undefined), because compilation ends with following error: ar933x_uart.c: In function 'ar933x_uart_console_write': ar933x_uart.c:550:14: error: 'struct uart_port' has no member named 'sysrq' So this patch moves all the code related to console handling behind series of CONFIG_SERIAL_AR933X_CONSOLE ifdefs. 1. https://bugs.openwrt.org/index.php?do=details&task_id=2152 Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Andrey Batyiev Reported-by: Andrey Batyiev Tested-by: Andrey Batyiev Signed-off-by: Petr Štetiar Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/ar933x_uart.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/drivers/tty/serial/ar933x_uart.c b/drivers/tty/serial/ar933x_uart.c index db5df3d54818..3bdd56a1021b 100644 --- a/drivers/tty/serial/ar933x_uart.c +++ b/drivers/tty/serial/ar933x_uart.c @@ -49,11 +49,6 @@ struct ar933x_uart_port { struct clk *clk; }; -static inline bool ar933x_uart_console_enabled(void) -{ - return IS_ENABLED(CONFIG_SERIAL_AR933X_CONSOLE); -} - static inline unsigned int ar933x_uart_read(struct ar933x_uart_port *up, int offset) { @@ -508,6 +503,7 @@ static const struct uart_ops ar933x_uart_ops = { .verify_port = ar933x_uart_verify_port, }; +#ifdef CONFIG_SERIAL_AR933X_CONSOLE static struct ar933x_uart_port * ar933x_console_ports[CONFIG_SERIAL_AR933X_NR_UARTS]; @@ -604,14 +600,7 @@ static struct console ar933x_uart_console = { .index = -1, .data = &ar933x_uart_driver, }; - -static void ar933x_uart_add_console_port(struct ar933x_uart_port *up) -{ - if (!ar933x_uart_console_enabled()) - return; - - ar933x_console_ports[up->port.line] = up; -} +#endif /* CONFIG_SERIAL_AR933X_CONSOLE */ static struct uart_driver ar933x_uart_driver = { .owner = THIS_MODULE, @@ -700,7 +689,9 @@ static int ar933x_uart_probe(struct platform_device *pdev) baud = ar933x_uart_get_baud(port->uartclk, 0, AR933X_UART_MAX_STEP); up->max_baud = min_t(unsigned int, baud, AR933X_UART_MAX_BAUD); - ar933x_uart_add_console_port(up); +#ifdef CONFIG_SERIAL_AR933X_CONSOLE + ar933x_console_ports[up->port.line] = up; +#endif ret = uart_add_one_port(&ar933x_uart_driver, &up->port); if (ret) @@ -749,8 +740,9 @@ static int __init ar933x_uart_init(void) { int ret; - if (ar933x_uart_console_enabled()) - ar933x_uart_driver.cons = &ar933x_uart_console; +#ifdef CONFIG_SERIAL_AR933X_CONSOLE + ar933x_uart_driver.cons = &ar933x_uart_console; +#endif ret = uart_register_driver(&ar933x_uart_driver); if (ret) From 93bcefd4c6bad4c69dbc4edcd3fbf774b24d930d Mon Sep 17 00:00:00 2001 From: Hoan Nguyen An Date: Mon, 18 Mar 2019 18:26:32 +0900 Subject: [PATCH 158/468] serial: sh-sci: Fix setting SCSCR_TIE while transferring data We disable transmission interrupt (clear SCSCR_TIE) after all data has been transmitted (if uart_circ_empty(xmit)). While transmitting, if the data is still in the tty buffer, re-enable the SCSCR_TIE bit, which was done at sci_start_tx(). This is unnecessary processing, wasting CPU operation if the data transmission length is large. And further, transmit end, FIFO empty bits disabling have also been performed in the step above. Signed-off-by: Hoan Nguyen An Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index 060fcd42b6d5..2d1c626312cd 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -838,19 +838,9 @@ static void sci_transmit_chars(struct uart_port *port) if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) uart_write_wakeup(port); - if (uart_circ_empty(xmit)) { + if (uart_circ_empty(xmit)) sci_stop_tx(port); - } else { - ctrl = serial_port_in(port, SCSCR); - if (port->type != PORT_SCI) { - serial_port_in(port, SCxSR); /* Dummy read */ - sci_clear_SCxSR(port, SCxSR_TDxE_CLEAR(port)); - } - - ctrl |= SCSCR_TIE; - serial_port_out(port, SCSCR, ctrl); - } } /* On SH3, SCIF may read end-of-break as a space->mark char */ From 228de124f290e6b981b2c61fbd78215e11264044 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Tue, 19 Mar 2019 08:16:21 -0700 Subject: [PATCH 159/468] xfs: dabtree scrub needs to range-check level Make sure scrub's dabtree iterator function checks that we're not going deeper in the stack than our cursor permits. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/scrub/dabtree.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c index f1260b4bfdee..90527b094878 100644 --- a/fs/xfs/scrub/dabtree.c +++ b/fs/xfs/scrub/dabtree.c @@ -574,6 +574,11 @@ xchk_da_btree( /* Drill another level deeper. */ blkno = be32_to_cpu(key->before); level++; + if (level >= XFS_DA_NODE_MAXDEPTH) { + /* Too deep! */ + xchk_da_set_corrupt(&ds, level - 1); + break; + } ds.tree_level--; error = xchk_da_btree_block(&ds, level, blkno); if (error) From a72e9d8d69e7ca848ddd4c4db72d3ab280c1775d Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Tue, 19 Mar 2019 08:16:22 -0700 Subject: [PATCH 160/468] xfs: fix btree scrub checking with regards to root-in-inode In xchk_btree_check_owner, we can be passed a null buffer pointer. This should only happen for the root of a root-in-inode btree type, but we should program defensively in case the btree cursor state ever gets screwed up and we get a null buffer anyway. Coverity-id: 1438713 Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/scrub/btree.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c index 6f94d1f7322d..117910db51b8 100644 --- a/fs/xfs/scrub/btree.c +++ b/fs/xfs/scrub/btree.c @@ -415,8 +415,17 @@ xchk_btree_check_owner( struct xfs_btree_cur *cur = bs->cur; struct check_owner *co; - if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && bp == NULL) + /* + * In theory, xfs_btree_get_block should only give us a null buffer + * pointer for the root of a root-in-inode btree type, but we need + * to check defensively here in case the cursor state is also screwed + * up. + */ + if (bp == NULL) { + if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) + xchk_btree_set_corrupt(bs->sc, bs->cur, level); return 0; + } /* * We want to cross-reference each btree block with the bnobt From 4b0bce30f39b7733420bb8b28e340aa91c219bc1 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Tue, 19 Mar 2019 08:16:22 -0700 Subject: [PATCH 161/468] xfs: always init bma in xfs_bmapi_write Always init the tp/ip fields of bma in xfs_bmapi_write so that the bmapi_finish at the bottom never trips over null transaction or inode pointers. Coverity-id: 1443964 Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/libxfs/xfs_bmap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index ae4c3b0d84db..4637ae1ae91c 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4252,9 +4252,13 @@ xfs_bmapi_write( struct xfs_bmbt_irec *mval, /* output: map values */ int *nmap) /* i/o: mval size/count */ { + struct xfs_bmalloca bma = { + .tp = tp, + .ip = ip, + .total = total, + }; struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp; - struct xfs_bmalloca bma = { NULL }; /* args for xfs_bmap_alloc */ xfs_fileoff_t end; /* end of mapped file region */ bool eof = false; /* after the end of extents */ int error; /* error return */ @@ -4322,10 +4326,6 @@ xfs_bmapi_write( eof = true; if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev)) bma.prev.br_startoff = NULLFILEOFF; - bma.tp = tp; - bma.ip = ip; - bma.total = total; - bma.datatype = 0; bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork); n = 0; From c0316470683af0507427c3e0660246feacdf8363 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Thu, 7 Mar 2019 13:22:07 +0100 Subject: [PATCH 162/468] mt7601u: check chip version on probe Since some USB device IDs are duplicated between mt7601u and mt76x0u devices, check chip version on probe and return error if not match 0x7601. Don't think this is serious issue, probe most likely will fail at some other point for wrong device, but we do not have to configure it if we know is not mt7601u device. Reported-by: Xose Vazquez Perez Signed-off-by: Stanislaw Gruszka Acked-by: Jakub Kicinski Signed-off-by: Kalle Valo --- drivers/net/wireless/mediatek/mt7601u/usb.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt7601u/usb.c b/drivers/net/wireless/mediatek/mt7601u/usb.c index d8b7863f7926..6ae7f14dc9bf 100644 --- a/drivers/net/wireless/mediatek/mt7601u/usb.c +++ b/drivers/net/wireless/mediatek/mt7601u/usb.c @@ -303,6 +303,10 @@ static int mt7601u_probe(struct usb_interface *usb_intf, mac_rev = mt7601u_rr(dev, MT_MAC_CSR0); dev_info(dev->dev, "ASIC revision: %08x MAC revision: %08x\n", asic_rev, mac_rev); + if ((asic_rev >> 16) != 0x7601) { + ret = -ENODEV; + goto err; + } /* Note: vendor driver skips this check for MT7601U */ if (!(mt7601u_rr(dev, MT_EFUSE_CTRL) & MT_EFUSE_CTRL_SEL)) From 40b941611bd4826b7e3e449f738af98ad22e1ce6 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Thu, 7 Mar 2019 13:22:21 +0100 Subject: [PATCH 163/468] mt76x02u: check chip version on probe Since some USB device IDs are duplicated between mt76x0u, mt7601u and mt76x2u device, check chip version on probe and return error if not match the driver. Don't think this is serious issue, probe most likely will fail at some other point for wrong device, but we do not have to configure it if we know is not our device. Reported-by: Xose Vazquez Perez Signed-off-by: Stanislaw Gruszka Signed-off-by: Kalle Valo --- drivers/net/wireless/mediatek/mt76/mt76x0/usb.c | 10 +++++++--- drivers/net/wireless/mediatek/mt76/mt76x02.h | 7 +++++++ drivers/net/wireless/mediatek/mt76/mt76x2/usb.c | 4 ++++ 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c index 91718647da02..e5a06f74a6f7 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c @@ -229,7 +229,7 @@ static int mt76x0u_probe(struct usb_interface *usb_intf, struct usb_device *usb_dev = interface_to_usbdev(usb_intf); struct mt76x02_dev *dev; struct mt76_dev *mdev; - u32 asic_rev, mac_rev; + u32 mac_rev; int ret; mdev = mt76_alloc_device(&usb_intf->dev, sizeof(*dev), &mt76x0u_ops, @@ -262,10 +262,14 @@ static int mt76x0u_probe(struct usb_interface *usb_intf, goto err; } - asic_rev = mt76_rr(dev, MT_ASIC_VERSION); + mdev->rev = mt76_rr(dev, MT_ASIC_VERSION); mac_rev = mt76_rr(dev, MT_MAC_CSR0); dev_info(mdev->dev, "ASIC revision: %08x MAC revision: %08x\n", - asic_rev, mac_rev); + mdev->rev, mac_rev); + if (!is_mt76x0(dev)) { + ret = -ENODEV; + goto err; + } /* Note: vendor driver skips this check for MT76X0U */ if (!(mt76_rr(dev, MT_EFUSE_CTRL) & MT_EFUSE_CTRL_SEL)) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02.h b/drivers/net/wireless/mediatek/mt76/mt76x02.h index b5a36d9fb2bd..07061eb4d1e1 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x02.h @@ -192,6 +192,13 @@ void mt76x02_mac_start(struct mt76x02_dev *dev); void mt76x02_init_debugfs(struct mt76x02_dev *dev); +static inline bool is_mt76x0(struct mt76x02_dev *dev) +{ + return mt76_chip(&dev->mt76) == 0x7610 || + mt76_chip(&dev->mt76) == 0x7630 || + mt76_chip(&dev->mt76) == 0x7650; +} + static inline bool is_mt76x2(struct mt76x02_dev *dev) { return mt76_chip(&dev->mt76) == 0x7612 || diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c index 7a5d539873ca..ac0f13d46299 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c @@ -65,6 +65,10 @@ static int mt76x2u_probe(struct usb_interface *intf, mdev->rev = mt76_rr(dev, MT_ASIC_VERSION); dev_info(mdev->dev, "ASIC revision: %08x\n", mdev->rev); + if (!is_mt76x2(dev)) { + err = -ENODEV; + goto err; + } err = mt76x2u_register_device(dev); if (err < 0) From f2a00a821aacfa77985e4dbe83ed064c48a21bd5 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 11 Mar 2019 14:09:53 +0100 Subject: [PATCH 164/468] mt76: mt7603: use the correct hweight8() function __sw_hweight8() is only defined if CONFIG_GENERIC_HWEIGHT is enabled. The function that works on all architectures is hweight8(). Signed-off-by: Felix Fietkau Signed-off-by: Kalle Valo --- drivers/net/wireless/mediatek/mt76/mt7603/beacon.c | 3 +-- drivers/net/wireless/mediatek/mt76/mt7603/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7603/mcu.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/beacon.c b/drivers/net/wireless/mediatek/mt76/mt7603/beacon.c index afcd86f735b4..4dcb465095d1 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/beacon.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/beacon.c @@ -135,8 +135,7 @@ void mt7603_pre_tbtt_tasklet(unsigned long arg) out: mt76_queue_tx_cleanup(dev, MT_TXQ_BEACON, false); - if (dev->mt76.q_tx[MT_TXQ_BEACON].queued > - __sw_hweight8(dev->beacon_mask)) + if (dev->mt76.q_tx[MT_TXQ_BEACON].queued > hweight8(dev->beacon_mask)) dev->beacon_check++; } diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/init.c b/drivers/net/wireless/mediatek/mt76/mt7603/init.c index 15cc8f33b34d..d54dda67d036 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/init.c @@ -112,7 +112,7 @@ static void mt7603_phy_init(struct mt7603_dev *dev) { int rx_chains = dev->mt76.antenna_mask; - int tx_chains = __sw_hweight8(rx_chains) - 1; + int tx_chains = hweight8(rx_chains) - 1; mt76_rmw(dev, MT_WF_RMAC_RMCR, (MT_WF_RMAC_RMCR_SMPS_MODE | diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7603/mcu.c index 4b0713f1fd5e..d06905ea8cc6 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/mcu.c @@ -433,7 +433,7 @@ int mt7603_mcu_set_channel(struct mt7603_dev *dev) { struct cfg80211_chan_def *chandef = &dev->mt76.chandef; struct ieee80211_hw *hw = mt76_hw(dev); - int n_chains = __sw_hweight8(dev->mt76.antenna_mask); + int n_chains = hweight8(dev->mt76.antenna_mask); struct { u8 control_chan; u8 center_chan; From 13f61dfc5235cfa82b1efbcdf4b4f14d9be233da Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Mar 2019 14:24:35 +0100 Subject: [PATCH 165/468] mt76: fix schedule while atomic in mt76x02_reset_state Fix following schedule while atomic in mt76x02_reset_state since synchronize_rcu is run inside a RCU section [44036.944222] mt76x2e 0000:06:00.0: MCU message 31 (seq 3) timed out [44036.944281] BUG: sleeping function called from invalid context at kernel/rcu/tree_exp.h:818 [44036.944284] in_atomic(): 1, irqs_disabled(): 0, pid: 28066, name: kworker/u4:1 [44036.944287] INFO: lockdep is turned off. [44036.944292] CPU: 1 PID: 28066 Comm: kworker/u4:1 Tainted: G W 5.0.0-rc7-wdn-t1+ #7 [44036.944294] Hardware name: Dell Inc. Studio XPS 1340/0K183D, BIOS A11 09/08/2009 [44036.944305] Workqueue: phy1 mt76x02_wdt_work [mt76x02_lib] [44036.944308] Call Trace: [44036.944317] dump_stack+0x67/0x90 [44036.944322] ___might_sleep.cold.88+0x9f/0xaf [44036.944327] rcu_blocking_is_gp+0x13/0x50 [44036.944330] synchronize_rcu+0x17/0x80 [44036.944337] mt76_sta_state+0x138/0x1d0 [mt76] [44036.944349] mt76x02_wdt_work+0x1c9/0x610 [mt76x02_lib] [44036.944355] process_one_work+0x2a5/0x620 [44036.944361] worker_thread+0x35/0x3e0 [44036.944368] kthread+0x11c/0x140 [44036.944376] ret_from_fork+0x3a/0x50 [44036.944384] BUG: scheduling while atomic: kworker/u4:1/28066/0x00000002 [44036.944387] INFO: lockdep is turned off. [44036.944389] Modules linked in: cmac ctr ccm af_packet snd_hda_codec_hdmi Introduce __mt76_sta_remove in order to run sta_remove without holding dev->mutex. Move __mt76_sta_remove outside of RCU section in mt76x02_reset_state Fixes: e4ebb8b403d1 ("mt76: mt76x2: implement full device restart on watchdog reset") Signed-off-by: Lorenzo Bianconi Signed-off-by: Kalle Valo --- drivers/net/wireless/mediatek/mt76/mac80211.c | 18 +++++++++++------- drivers/net/wireless/mediatek/mt76/mt76.h | 2 ++ .../net/wireless/mediatek/mt76/mt76x02_mmio.c | 19 ++++++++++--------- 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c index a033745adb2f..316167404729 100644 --- a/drivers/net/wireless/mediatek/mt76/mac80211.c +++ b/drivers/net/wireless/mediatek/mt76/mac80211.c @@ -679,19 +679,15 @@ mt76_sta_add(struct mt76_dev *dev, struct ieee80211_vif *vif, return ret; } -static void -mt76_sta_remove(struct mt76_dev *dev, struct ieee80211_vif *vif, - struct ieee80211_sta *sta) +void __mt76_sta_remove(struct mt76_dev *dev, struct ieee80211_vif *vif, + struct ieee80211_sta *sta) { struct mt76_wcid *wcid = (struct mt76_wcid *)sta->drv_priv; - int idx = wcid->idx; - int i; + int i, idx = wcid->idx; rcu_assign_pointer(dev->wcid[idx], NULL); synchronize_rcu(); - mutex_lock(&dev->mutex); - if (dev->drv->sta_remove) dev->drv->sta_remove(dev, vif, sta); @@ -699,7 +695,15 @@ mt76_sta_remove(struct mt76_dev *dev, struct ieee80211_vif *vif, for (i = 0; i < ARRAY_SIZE(sta->txq); i++) mt76_txq_remove(dev, sta->txq[i]); mt76_wcid_free(dev->wcid_mask, idx); +} +EXPORT_SYMBOL_GPL(__mt76_sta_remove); +static void +mt76_sta_remove(struct mt76_dev *dev, struct ieee80211_vif *vif, + struct ieee80211_sta *sta) +{ + mutex_lock(&dev->mutex); + __mt76_sta_remove(dev, vif, sta); mutex_unlock(&dev->mutex); } diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index cb50e96e736b..bcbfd3c4a44b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -695,6 +695,8 @@ int mt76_sta_state(struct ieee80211_hw *hw, struct ieee80211_vif *vif, struct ieee80211_sta *sta, enum ieee80211_sta_state old_state, enum ieee80211_sta_state new_state); +void __mt76_sta_remove(struct mt76_dev *dev, struct ieee80211_vif *vif, + struct ieee80211_sta *sta); struct ieee80211_sta *mt76_rx_convert(struct sk_buff *skb); diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c index 9de015bcd4f9..daaed1220147 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c @@ -441,19 +441,23 @@ static void mt76x02_reset_state(struct mt76x02_dev *dev) { int i; + lockdep_assert_held(&dev->mt76.mutex); + clear_bit(MT76_STATE_RUNNING, &dev->mt76.state); rcu_read_lock(); - ieee80211_iter_keys_rcu(dev->mt76.hw, NULL, mt76x02_key_sync, NULL); + rcu_read_unlock(); for (i = 0; i < ARRAY_SIZE(dev->mt76.wcid); i++) { - struct mt76_wcid *wcid = rcu_dereference(dev->mt76.wcid[i]); - struct mt76x02_sta *msta; struct ieee80211_sta *sta; struct ieee80211_vif *vif; + struct mt76x02_sta *msta; + struct mt76_wcid *wcid; void *priv; + wcid = rcu_dereference_protected(dev->mt76.wcid[i], + lockdep_is_held(&dev->mt76.mutex)); if (!wcid) continue; @@ -463,13 +467,10 @@ static void mt76x02_reset_state(struct mt76x02_dev *dev) priv = msta->vif; vif = container_of(priv, struct ieee80211_vif, drv_priv); - mt76_sta_state(dev->mt76.hw, vif, sta, - IEEE80211_STA_NONE, IEEE80211_STA_NOTEXIST); + __mt76_sta_remove(&dev->mt76, vif, sta); memset(msta, 0, sizeof(*msta)); } - rcu_read_unlock(); - dev->vif_mask = 0; dev->beacon_mask = 0; } @@ -489,11 +490,11 @@ static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) for (i = 0; i < ARRAY_SIZE(dev->mt76.napi); i++) napi_disable(&dev->mt76.napi[i]); + mutex_lock(&dev->mt76.mutex); + if (restart) mt76x02_reset_state(dev); - mutex_lock(&dev->mt76.mutex); - if (dev->beacon_mask) mt76_clear(dev, MT_BEACON_TIME_CFG, MT_BEACON_TIME_CFG_BEACON_TX | From 7dfc45e6282a7662279d168cc1219929456f8750 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Tue, 12 Mar 2019 13:32:07 +0100 Subject: [PATCH 166/468] mt76x02: do not enable RTS/CTS by default My commit 26a7b5473191 ("mt76x02: set protection according to ht operation element") enabled by default RTS/CTS protection for OFDM and CCK traffic, because MT_TX_RTS_CFG_THRESH is configured to non 0xffff by initvals and .set_rts_threshold callback is not called by mac80211 on initialization, only on user request or during ieee80211_reconfig() (suspend/resuem or restart_hw). Enabling RTS/CTS cause some problems when sending probe request frames by hcxdumptool penetration tool, but I expect it can cause other issues on different scenarios. Restore previous setting of RTS/CTS being disabled by default for OFDM/CCK by changing MT_TX_RTS_CFG_THRESH initvals to 0xffff. Fixes: 26a7b5473191 ("mt76x02: set protection according to ht operation element") Signed-off-by: Stanislaw Gruszka Signed-off-by: Kalle Valo --- drivers/net/wireless/mediatek/mt76/mt76x0/initvals.h | 2 +- drivers/net/wireless/mediatek/mt76/mt76x2/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt76x2/usb_mac.c | 1 - 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/initvals.h b/drivers/net/wireless/mediatek/mt76/mt76x0/initvals.h index 0290ba5869a5..736f81752b5b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x0/initvals.h +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/initvals.h @@ -46,7 +46,7 @@ static const struct mt76_reg_pair common_mac_reg_table[] = { { MT_MM20_PROT_CFG, 0x01742004 }, { MT_MM40_PROT_CFG, 0x03f42084 }, { MT_TXOP_CTRL_CFG, 0x0000583f }, - { MT_TX_RTS_CFG, 0x00092b20 }, + { MT_TX_RTS_CFG, 0x00ffff20 }, { MT_EXP_ACK_TIME, 0x002400ca }, { MT_TXOP_HLDR_ET, 0x00000002 }, { MT_XIFS_TIME_CFG, 0x33a41010 }, diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/init.c b/drivers/net/wireless/mediatek/mt76/mt76x2/init.c index f8534362e2c8..a30ef2c5a9db 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/init.c @@ -106,7 +106,7 @@ void mt76_write_mac_initvals(struct mt76x02_dev *dev) { MT_TX_SW_CFG1, 0x00010000 }, { MT_TX_SW_CFG2, 0x00000000 }, { MT_TXOP_CTRL_CFG, 0x0400583f }, - { MT_TX_RTS_CFG, 0x00100020 }, + { MT_TX_RTS_CFG, 0x00ffff20 }, { MT_TX_TIMEOUT_CFG, 0x000a2290 }, { MT_TX_RETRY_CFG, 0x47f01f0f }, { MT_EXP_ACK_TIME, 0x002c00dc }, diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb_mac.c index 5e84b4535cb1..3b82345756ea 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb_mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb_mac.c @@ -93,7 +93,6 @@ int mt76x2u_mac_reset(struct mt76x02_dev *dev) mt76_wr(dev, MT_TX_LINK_CFG, 0x1020); mt76_wr(dev, MT_AUTO_RSP_CFG, 0x13); mt76_wr(dev, MT_MAX_LEN_CFG, 0x2f00); - mt76_wr(dev, MT_TX_RTS_CFG, 0x92b20); mt76_wr(dev, MT_WMM_AIFSN, 0x2273); mt76_wr(dev, MT_WMM_CWMIN, 0x2344); From 0cb98abb5bd13b9a636bde603d952d722688b428 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Tue, 19 Mar 2019 12:12:13 -0400 Subject: [PATCH 167/468] NFSv4.1 don't free interrupted slot on open Allow the async rpc task for finish and update the open state if needed, then free the slot. Otherwise, the async rpc unable to decode the reply. Signed-off-by: Olga Kornievskaia Fixes: ae55e59da0e4 ("pnfs: Don't release the sequence slot...") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 6d2812a39287..741ff8c9c6ed 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2933,7 +2933,8 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, } out: - nfs4_sequence_free_slot(&opendata->o_res.seq_res); + if (!opendata->cancelled) + nfs4_sequence_free_slot(&opendata->o_res.seq_res); return ret; } From ebff0b0e3d3c862c16c487959db5e0d879632559 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 4 Mar 2019 17:37:44 +0000 Subject: [PATCH 168/468] KVM: arm64: Reset the PMU in preemptible context We've become very cautious to now always reset the vcpu when nothing is loaded on the physical CPU. To do so, we now disable preemption and do a kvm_arch_vcpu_put() to make sure we have all the state in memory (and that it won't be loaded behind out back). This now causes issues with resetting the PMU, which calls into perf. Perf itself uses mutexes, which clashes with the lack of preemption. It is worth realizing that the PMU is fully emulated, and that no PMU state is ever loaded on the physical CPU. This means we can perfectly reset the PMU outside of the non-preemptible section. Fixes: e761a927bc9a ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded") Reported-by: Julien Grall Tested-by: Julien Grall Signed-off-by: Marc Zyngier --- arch/arm64/kvm/reset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index f16a5f8ff2b4..e2a0500cd7a2 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -123,6 +123,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) int ret = -EINVAL; bool loaded; + /* Reset PMU outside of the non-preemptible section */ + kvm_pmu_vcpu_reset(vcpu); + preempt_disable(); loaded = (vcpu->cpu != -1); if (loaded) @@ -170,9 +173,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) vcpu->arch.reset_state.reset = false; } - /* Reset PMU */ - kvm_pmu_vcpu_reset(vcpu); - /* Default workaround setup is enabled (if supported) */ if (kvm_arm_have_ssbd() == KVM_SSBD_KERNEL) vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; From ca71228b42a96908eca7658861eafacd227856c9 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 13 Mar 2019 18:07:50 +0000 Subject: [PATCH 169/468] arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is enabled The normal interrupt flow is not to enable the vgic when no virtual interrupt is to be injected (i.e. the LRs are empty). But when a guest is likely to use GICv4 for LPIs, we absolutely need to switch it on at all times. Otherwise, VLPIs only get delivered when there is something in the LRs, which doesn't happen very often. Reported-by: Nianyao Tang Tested-by: Shameerali Kolothum Thodi Signed-off-by: Marc Zyngier --- virt/kvm/arm/hyp/vgic-v3-sr.c | 4 ++-- virt/kvm/arm/vgic/vgic.c | 14 ++++++++++---- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c index 264d92da3240..370bd6c5e6cb 100644 --- a/virt/kvm/arm/hyp/vgic-v3-sr.c +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c @@ -222,7 +222,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) } } - if (used_lrs) { + if (used_lrs || cpu_if->its_vpe.its_vm) { int i; u32 elrsr; @@ -247,7 +247,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu) u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; int i; - if (used_lrs) { + if (used_lrs || cpu_if->its_vpe.its_vm) { write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2); for (i = 0; i < used_lrs; i++) diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index abd9c7352677..3af69f2a3866 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -867,15 +867,21 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) * either observe the new interrupt before or after doing this check, * and introducing additional synchronization mechanism doesn't change * this. + * + * Note that we still need to go through the whole thing if anything + * can be directly injected (GICv4). */ - if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) + if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head) && + !vgic_supports_direct_msis(vcpu->kvm)) return; DEBUG_SPINLOCK_BUG_ON(!irqs_disabled()); - raw_spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); - vgic_flush_lr_state(vcpu); - raw_spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); + if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) { + raw_spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); + vgic_flush_lr_state(vcpu); + raw_spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); + } if (can_access_vgic_from_kernel()) vgic_restore_state(vcpu); From a6ecfb11bf37743c1ac49b266595582b107b61d4 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 19 Mar 2019 12:47:11 +0000 Subject: [PATCH 170/468] KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory When halting a guest, QEMU flushes the virtual ITS caches, which amounts to writing to the various tables that the guest has allocated. When doing this, we fail to take the srcu lock, and the kernel shouts loudly if running a lockdep kernel: [ 69.680416] ============================= [ 69.680819] WARNING: suspicious RCU usage [ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted [ 69.682096] ----------------------------- [ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [ 69.683225] [ 69.683225] other info that might help us debug this: [ 69.683225] [ 69.683975] [ 69.683975] rcu_scheduler_active = 2, debug_locks = 1 [ 69.684598] 6 locks held by qemu-system-aar/4097: [ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.690729] [ 69.690729] stack backtrace: [ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18 [ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [ 69.692831] Call trace: [ 69.694072] lockdep_rcu_suspicious+0xcc/0x110 [ 69.694490] gfn_to_memslot+0x174/0x190 [ 69.694853] kvm_write_guest+0x50/0xb0 [ 69.695209] vgic_its_save_tables_v0+0x248/0x330 [ 69.695639] vgic_its_set_attr+0x298/0x3a0 [ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8 [ 69.696424] kvm_device_ioctl+0x8c/0xf8 [ 69.696788] do_vfs_ioctl+0xc8/0x960 [ 69.697128] ksys_ioctl+0x8c/0xa0 [ 69.697445] __arm64_sys_ioctl+0x28/0x38 [ 69.697817] el0_svc_common+0xd8/0x138 [ 69.698173] el0_svc_handler+0x38/0x78 [ 69.698528] el0_svc+0x8/0xc The fix is to obviously take the srcu lock, just like we do on the read side of things since bf308242ab98. One wonders why this wasn't fixed at the same time, but hey... Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Signed-off-by: Marc Zyngier --- arch/arm/include/asm/kvm_mmu.h | 11 +++++++++++ arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++++ virt/kvm/arm/vgic/vgic-its.c | 8 ++++---- virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- 4 files changed, 28 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 2de96a180166..31de4ab93005 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -381,6 +381,17 @@ static inline int kvm_read_guest_lock(struct kvm *kvm, return ret; } +static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa, + const void *data, unsigned long len) +{ + int srcu_idx = srcu_read_lock(&kvm->srcu); + int ret = kvm_write_guest(kvm, gpa, data, len); + + srcu_read_unlock(&kvm->srcu, srcu_idx); + + return ret; +} + static inline void *kvm_get_hyp_vector(void) { switch(read_cpuid_part()) { diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b0742a16c6c9..ebeefcf835e8 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -445,6 +445,17 @@ static inline int kvm_read_guest_lock(struct kvm *kvm, return ret; } +static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa, + const void *data, unsigned long len) +{ + int srcu_idx = srcu_read_lock(&kvm->srcu); + int ret = kvm_write_guest(kvm, gpa, data, len); + + srcu_read_unlock(&kvm->srcu, srcu_idx); + + return ret; +} + #ifdef CONFIG_KVM_INDIRECT_VECTORS /* * EL2 vectors can be mapped and rerouted in a number of ways, diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index ab3f47745d9c..c41e11fd841c 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1919,7 +1919,7 @@ static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev, ((u64)ite->irq->intid << KVM_ITS_ITE_PINTID_SHIFT) | ite->collection->collection_id; val = cpu_to_le64(val); - return kvm_write_guest(kvm, gpa, &val, ite_esz); + return kvm_write_guest_lock(kvm, gpa, &val, ite_esz); } /** @@ -2066,7 +2066,7 @@ static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev, (itt_addr_field << KVM_ITS_DTE_ITTADDR_SHIFT) | (dev->num_eventid_bits - 1)); val = cpu_to_le64(val); - return kvm_write_guest(kvm, ptr, &val, dte_esz); + return kvm_write_guest_lock(kvm, ptr, &val, dte_esz); } /** @@ -2246,7 +2246,7 @@ static int vgic_its_save_cte(struct vgic_its *its, ((u64)collection->target_addr << KVM_ITS_CTE_RDBASE_SHIFT) | collection->collection_id); val = cpu_to_le64(val); - return kvm_write_guest(its->dev->kvm, gpa, &val, esz); + return kvm_write_guest_lock(its->dev->kvm, gpa, &val, esz); } static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz) @@ -2317,7 +2317,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its) */ val = 0; BUG_ON(cte_esz > sizeof(val)); - ret = kvm_write_guest(its->dev->kvm, gpa, &val, cte_esz); + ret = kvm_write_guest_lock(its->dev->kvm, gpa, &val, cte_esz); return ret; } diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 408a78eb6a97..9f87e58dbd4a 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -358,7 +358,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq) if (status) { /* clear consumed data */ val &= ~(1 << bit_nr); - ret = kvm_write_guest(kvm, ptr, &val, 1); + ret = kvm_write_guest_lock(kvm, ptr, &val, 1); if (ret) return ret; } @@ -409,7 +409,7 @@ int vgic_v3_save_pending_tables(struct kvm *kvm) else val &= ~(1 << bit_nr); - ret = kvm_write_guest(kvm, ptr, &val, 1); + ret = kvm_write_guest_lock(kvm, ptr, &val, 1); if (ret) return ret; } From 7494cec6cb3ba7385a6a223b81906384f15aae34 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 19 Mar 2019 12:56:23 +0000 Subject: [PATCH 171/468] KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots Calling kvm_is_visible_gfn() implies that we're parsing the memslots, and doing this without the srcu lock is frown upon: [12704.164532] ============================= [12704.164544] WARNING: suspicious RCU usage [12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G W [12704.164573] ----------------------------- [12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [12704.164602] other info that might help us debug this: [12704.164616] rcu_scheduler_active = 2, debug_locks = 1 [12704.164631] 6 locks held by qemu-system-aar/13968: [12704.164644] #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [12704.164691] #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [12704.164726] #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164761] #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164794] #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164827] #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164861] stack backtrace: [12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G W 5.1.0-rc1-00008-g600025238f51-dirty #16 [12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [12704.164896] Call trace: [12704.164910] dump_backtrace+0x0/0x138 [12704.164920] show_stack+0x24/0x30 [12704.164934] dump_stack+0xbc/0x104 [12704.164946] lockdep_rcu_suspicious+0xcc/0x110 [12704.164958] gfn_to_memslot+0x174/0x190 [12704.164969] kvm_is_visible_gfn+0x28/0x70 [12704.164980] vgic_its_check_id.isra.0+0xec/0x1e8 [12704.164991] vgic_its_save_tables_v0+0x1ac/0x330 [12704.165001] vgic_its_set_attr+0x298/0x3a0 [12704.165012] kvm_device_ioctl_attr+0x9c/0xd8 [12704.165022] kvm_device_ioctl+0x8c/0xf8 [12704.165035] do_vfs_ioctl+0xc8/0x960 [12704.165045] ksys_ioctl+0x8c/0xa0 [12704.165055] __arm64_sys_ioctl+0x28/0x38 [12704.165067] el0_svc_common+0xd8/0x138 [12704.165078] el0_svc_handler+0x38/0x78 [12704.165089] el0_svc+0x8/0xc Make sure the lock is taken when doing this. Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Reviewed-by: Eric Auger Signed-off-by: Marc Zyngier --- virt/kvm/arm/vgic/vgic-its.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index c41e11fd841c..fcb2fceaa4a5 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -754,8 +754,9 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, u64 indirect_ptr, type = GITS_BASER_TYPE(baser); phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser); int esz = GITS_BASER_ENTRY_SIZE(baser); - int index; + int index, idx; gfn_t gfn; + bool ret; switch (type) { case GITS_BASER_TYPE_DEVICE: @@ -782,7 +783,8 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (eaddr) *eaddr = addr; - return kvm_is_visible_gfn(its->dev->kvm, gfn); + + goto out; } /* calculate and check the index into the 1st level */ @@ -812,7 +814,12 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (eaddr) *eaddr = indirect_ptr; - return kvm_is_visible_gfn(its->dev->kvm, gfn); + +out: + idx = srcu_read_lock(&its->dev->kvm->srcu); + ret = kvm_is_visible_gfn(its->dev->kvm, gfn); + srcu_read_unlock(&its->dev->kvm->srcu, idx); + return ret; } static int vgic_its_alloc_collection(struct vgic_its *its, From a80868f398554842b14d07060012c06efb57c456 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Tue, 12 Mar 2019 09:52:51 +0000 Subject: [PATCH 172/468] KVM: arm/arm64: Enforce PTE mappings at stage2 when needed commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") made the checks to skip huge mappings, stricter. However it introduced a bug where we still use huge mappings, ignoring the flag to use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. Also, the checks do not cover the PUD huge pages, that was under review during the same period. This patch fixes both the issues. Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") Reported-by: Zenghui Yu Cc: Zenghui Yu Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier --- virt/kvm/arm/mmu.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ffd7acdceac7..bcdf978c0d1d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1594,8 +1594,9 @@ static void kvm_send_hwpoison_signal(unsigned long address, send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); } -static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, - unsigned long hva) +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, + unsigned long hva, + unsigned long map_size) { gpa_t gpa_start; hva_t uaddr_start, uaddr_end; @@ -1610,34 +1611,34 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, /* * Pages belonging to memslots that don't have the same alignment - * within a PMD for userspace and IPA cannot be mapped with stage-2 - * PMD entries, because we'll end up mapping the wrong pages. + * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 + * PMD/PUD entries, because we'll end up mapping the wrong pages. * * Consider a layout like the following: * * memslot->userspace_addr: * +-----+--------------------+--------------------+---+ - * |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz| + * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| * +-----+--------------------+--------------------+---+ * * memslot->base_gfn << PAGE_SIZE: * +---+--------------------+--------------------+-----+ - * |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz| + * |abc|def Stage-2 block | Stage-2 block |tvxyz| * +---+--------------------+--------------------+-----+ * - * If we create those stage-2 PMDs, we'll end up with this incorrect + * If we create those stage-2 blocks, we'll end up with this incorrect * mapping: * d -> f * e -> g * f -> h */ - if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) + if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1))) return false; /* * Next, let's make sure we're not trying to map anything not covered - * by the memslot. This means we have to prohibit PMD size mappings - * for the beginning and end of a non-PMD aligned and non-PMD sized + * by the memslot. This means we have to prohibit block size mappings + * for the beginning and end of a non-block aligned and non-block sized * memory slot (illustrated by the head and tail parts of the * userspace view above containing pages 'abcde' and 'xyz', * respectively). @@ -1646,8 +1647,8 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, * userspace_addr or the base_gfn, as both are equally aligned (per * the check above) and equally sized. */ - return (hva & S2_PMD_MASK) >= uaddr_start && - (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; + return (hva & ~(map_size - 1)) >= uaddr_start && + (hva & ~(map_size - 1)) + map_size <= uaddr_end; } static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, @@ -1676,12 +1677,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (!fault_supports_stage2_pmd_mappings(memslot, hva)) - force_pte = true; - - if (logging_active) - force_pte = true; - /* Let's check if we will get back a huge page backed by hugetlbfs */ down_read(¤t->mm->mmap_sem); vma = find_vma_intersection(current->mm, hva, hva + 1); @@ -1692,6 +1687,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } vma_pagesize = vma_kernel_pagesize(vma); + if (logging_active || + !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { + force_pte = true; + vma_pagesize = PAGE_SIZE; + } + /* * The stage2 has a minimum of 2 level table (For arm64 see * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can @@ -1699,11 +1700,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * As for PUD huge maps, we must make sure that we have at least * 3 levels, i.e, PMD is not folded. */ - if ((vma_pagesize == PMD_SIZE || - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && - !force_pte) { + if (vma_pagesize == PMD_SIZE || + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - } up_read(¤t->mm->mmap_sem); /* We need minimum second+third level pages */ From 7442c483b963dbee7d1b655cbad99c727c047828 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 18 Mar 2019 17:35:11 +0100 Subject: [PATCH 173/468] mlxsw: core: mlxsw: core: avoid -Wint-in-bool-context warning A recently added function in mlxsw triggers a harmless compiler warning: In file included from drivers/net/ethernet/mellanox/mlxsw/core.h:17, from drivers/net/ethernet/mellanox/mlxsw/core_env.c:7: drivers/net/ethernet/mellanox/mlxsw/core_env.c: In function 'mlxsw_env_module_temp_thresholds_get': drivers/net/ethernet/mellanox/mlxsw/reg.h:8015:45: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] #define MLXSW_REG_MTMP_TEMP_TO_MC(val) (val * 125) ~~~~~^~~~~~ drivers/net/ethernet/mellanox/mlxsw/core_env.c:116:8: note: in expansion of macro 'MLXSW_REG_MTMP_TEMP_TO_MC' if (!MLXSW_REG_MTMP_TEMP_TO_MC(module_temp)) { ^~~~~~~~~~~~~~~~~~~~~~~~~ The warning is normally disabled, but it would be nice to enable it to find real bugs, and there are no other known instances at the moment. Replace the negation with a zero-comparison, which also matches the comment above it. Fixes: d93c19a1d95c ("mlxsw: core: Add API for QSFP module temperature thresholds reading") Signed-off-by: Arnd Bergmann Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlxsw/core_env.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c index 7a15e932ed2f..c1c1965d7acc 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c @@ -113,7 +113,7 @@ int mlxsw_env_module_temp_thresholds_get(struct mlxsw_core *core, int module, return 0; default: /* Do not consider thresholds for zero temperature. */ - if (!MLXSW_REG_MTMP_TEMP_TO_MC(module_temp)) { + if (MLXSW_REG_MTMP_TEMP_TO_MC(module_temp) == 0) { *temp = 0; return 0; } From 223a960c01227e4dbcb6f9fa06b47d73bda21274 Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Mon, 18 Mar 2019 23:36:08 +0200 Subject: [PATCH 174/468] net: stmmac: fix memory corruption with large MTUs When using 16K DMA buffers and ring mode, the DES3 refill is not working correctly as the function is using a bogus pointer for checking the private data. As a result stale pointers will remain in the RX descriptor ring, so DMA will now likely overwrite/corrupt some already freed memory. As simple reproducer, just receive some UDP traffic: # ifconfig eth0 down; ifconfig eth0 mtu 9000; ifconfig eth0 up # iperf3 -c 192.168.253.40 -u -b 0 -R If you didn't crash by now check the RX descriptors to find non-contiguous RX buffers: cat /sys/kernel/debug/stmmaceth/eth0/descriptors_status [...] 1 [0x2be5020]: 0xa3220321 0x9ffc1ffc 0x72d70082 0x130e207e ^^^^^^^^^^^^^^^^^^^^^ 2 [0x2be5040]: 0xa3220321 0x9ffc1ffc 0x72998082 0x1311a07e ^^^^^^^^^^^^^^^^^^^^^ A simple ping test will now report bad data: # ping -s 8200 192.168.253.40 PING 192.168.253.40 (192.168.253.40) 8200(8228) bytes of data. 8208 bytes from 192.168.253.40: icmp_seq=1 ttl=64 time=1.00 ms wrong data byte #8144 should be 0xd0 but was 0x88 Fix the wrong pointer. Also we must refill DES3 only if the DMA buffer size is 16K. Fixes: 54139cf3bb33 ("net: stmmac: adding multiple buffers for rx") Signed-off-by: Aaro Koskinen Acked-by: Jose Abreu Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c index f936166d8910..4d9bcb4d0378 100644 --- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c +++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c @@ -113,10 +113,11 @@ static unsigned int is_jumbo_frm(int len, int enh_desc) static void refill_desc3(void *priv_ptr, struct dma_desc *p) { - struct stmmac_priv *priv = (struct stmmac_priv *)priv_ptr; + struct stmmac_rx_queue *rx_q = priv_ptr; + struct stmmac_priv *priv = rx_q->priv_data; /* Fill DES3 in case of RING mode */ - if (priv->dma_buf_sz >= BUF_SIZE_8KiB) + if (priv->dma_buf_sz == BUF_SIZE_16KiB) p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB); } From a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 19 Mar 2019 10:16:53 +0800 Subject: [PATCH 175/468] net-sysfs: call dev_hold if kobject_init_and_add success In netdev_queue_add_kobject and rx_queue_add_kobject, if sysfs_create_group failed, kobject_put will call netdev_queue_release to decrease dev refcont, however dev_hold has not be called. So we will see this while unregistering dev: unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1 Reported-by: Hulk Robot Fixes: d0d668371679 ("net: don't decrement kobj reference count on init failure") Signed-off-by: YueHaibing Signed-off-by: David S. Miller --- net/core/net-sysfs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 4ff661f6f989..8f8b7b6c2945 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -928,6 +928,8 @@ static int rx_queue_add_kobject(struct net_device *dev, int index) if (error) return error; + dev_hold(queue->dev); + if (dev->sysfs_rx_queue_group) { error = sysfs_create_group(kobj, dev->sysfs_rx_queue_group); if (error) { @@ -937,7 +939,6 @@ static int rx_queue_add_kobject(struct net_device *dev, int index) } kobject_uevent(kobj, KOBJ_ADD); - dev_hold(queue->dev); return error; } @@ -1464,6 +1465,8 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index) if (error) return error; + dev_hold(queue->dev); + #ifdef CONFIG_BQL error = sysfs_create_group(kobj, &dql_group); if (error) { @@ -1473,7 +1476,6 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index) #endif kobject_uevent(kobj, KOBJ_ADD); - dev_hold(queue->dev); return 0; } From d7737d4257459ca8921ff911c88937be1a11ea9d Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Mon, 18 Mar 2019 22:19:44 -0500 Subject: [PATCH 176/468] nfc: Fix to check for kmemdup failure In case of kmemdup failure while setting the service name the patch returns -ENOMEM upstream for processing. Signed-off-by: Aditya Pakki Signed-off-by: David S. Miller --- net/nfc/llcp_sock.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c index ae296273ce3d..17dcd0b5eb32 100644 --- a/net/nfc/llcp_sock.c +++ b/net/nfc/llcp_sock.c @@ -726,6 +726,10 @@ static int llcp_sock_connect(struct socket *sock, struct sockaddr *_addr, llcp_sock->service_name = kmemdup(addr->service_name, llcp_sock->service_name_len, GFP_KERNEL); + if (!llcp_sock->service_name) { + ret = -ENOMEM; + goto sock_llcp_release; + } nfc_llcp_sock_link(&local->connecting_sockets, sk); @@ -745,10 +749,11 @@ static int llcp_sock_connect(struct socket *sock, struct sockaddr *_addr, return ret; sock_unlink: - nfc_llcp_put_ssap(local, llcp_sock->ssap); - nfc_llcp_sock_unlink(&local->connecting_sockets, sk); +sock_llcp_release: + nfc_llcp_put_ssap(local, llcp_sock->ssap); + put_dev: nfc_put_device(dev); From 89e4130939a20304f4059ab72179da81f5347528 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 19 Mar 2019 05:45:35 -0700 Subject: [PATCH 177/468] tcp: do not use ipv6 header for ipv4 flow When a dual stack tcp listener accepts an ipv4 flow, it should not attempt to use an ipv6 header or tcp_v6_iif() helper. Fixes: 1397ed35f22d ("ipv6: add flowinfo for tcp6 pkt_options for all cases") Fixes: df3687ffc665 ("ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/tcp_ipv6.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 57ef69a10889..44d431849d39 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1110,11 +1110,11 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * newnp->ipv6_fl_list = NULL; newnp->pktoptions = NULL; newnp->opt = NULL; - newnp->mcast_oif = tcp_v6_iif(skb); - newnp->mcast_hops = ipv6_hdr(skb)->hop_limit; - newnp->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(skb)); + newnp->mcast_oif = inet_iif(skb); + newnp->mcast_hops = ip_hdr(skb)->ttl; + newnp->rcv_flowinfo = 0; if (np->repflow) - newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb)); + newnp->flow_label = 0; /* * No need to charge this sock to the relevant IPv6 refcnt debug socks count From e0aa67709f89d08c8d8e5bdd9e0b649df61d0090 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 19 Mar 2019 05:46:18 -0700 Subject: [PATCH 178/468] dccp: do not use ipv6 header for ipv4 flow When a dual stack dccp listener accepts an ipv4 flow, it should not attempt to use an ipv6 header or inet6_iif() helper. Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/dccp/ipv6.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index d5740bad5b18..57d84e9b7b6f 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -436,8 +436,8 @@ static struct sock *dccp_v6_request_recv_sock(const struct sock *sk, newnp->ipv6_mc_list = NULL; newnp->ipv6_ac_list = NULL; newnp->ipv6_fl_list = NULL; - newnp->mcast_oif = inet6_iif(skb); - newnp->mcast_hops = ipv6_hdr(skb)->hop_limit; + newnp->mcast_oif = inet_iif(skb); + newnp->mcast_hops = ip_hdr(skb)->ttl; /* * No need to charge this sock to the relevant IPv6 refcnt debug socks count From fb6fafbc7de4a813bb5364358bbe27f71e62b24a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 19 Mar 2019 22:15:58 +0100 Subject: [PATCH 179/468] 3c515: fix integer overflow warning clang points out a harmless signed integer overflow: drivers/net/ethernet/3com/3c515.c:1530:66: error: implicit conversion from 'int' to 'short' changes value from 32783 to -32753 [-Werror,-Wconstant-conversion] new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast | RxProm; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ drivers/net/ethernet/3com/3c515.c:1532:52: error: implicit conversion from 'int' to 'short' changes value from 32775 to -32761 [-Werror,-Wconstant-conversion] new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ drivers/net/ethernet/3com/3c515.c:1534:38: error: implicit conversion from 'int' to 'short' changes value from 32773 to -32763 [-Werror,-Wconstant-conversion] new_mode = SetRxFilter | RxStation | RxBroadcast; ~ ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ Make the variable unsigned to avoid the overflow. Fixes: Linux-2.1.128pre1 Signed-off-by: Arnd Bergmann Signed-off-by: David S. Miller --- drivers/net/ethernet/3com/3c515.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/3com/3c515.c b/drivers/net/ethernet/3com/3c515.c index 808abb6b3671..b15752267c8d 100644 --- a/drivers/net/ethernet/3com/3c515.c +++ b/drivers/net/ethernet/3com/3c515.c @@ -1521,7 +1521,7 @@ static void update_stats(int ioaddr, struct net_device *dev) static void set_rx_mode(struct net_device *dev) { int ioaddr = dev->base_addr; - short new_mode; + unsigned short new_mode; if (dev->flags & IFF_PROMISC) { if (corkscrew_debug > 3) From f84532ce5887dac2ef67498b897a8713793eebde Mon Sep 17 00:00:00 2001 From: Vinay K Nallamothu Date: Tue, 19 Mar 2019 22:41:18 +0000 Subject: [PATCH 180/468] mpls: Fix 6PE forwarding This patch adds support for 6PE (RFC 4798) which uses IPv4-mapped IPv6 nexthop to connect IPv6 islands over IPv4 only MPLS network core. Prior to this fix, to find the link-layer destination mac address, 6PE enabled host/router was sending IPv6 ND requests for IPv4-mapped IPv6 nexthop address over the interface facing the IPv4 only core which wouldn't success as the core is IPv6 free. This fix changes that behavior on 6PE host to treat the nexthop as IPv4 address and send ARP requests whenever the next-hop address is an IPv4-mapped IPv6 address. Below topology illustrates the issue and how the patch addresses it. abcd::1.1.1.1 (lo) abcd::2.2.2.2 (lo) R0 (PE/host)------------------------R1--------------------------------R2 (PE/host) <--- IPv4 MPLS core ---> <------ IPv4 MPLS core --------> eth1 eth2 eth3 eth4 172.18.0.10 172.18.0.11 172.19.0.11 172.19.0.12 ffff::172.18.0.10 ffff::172.19.0.12 <------------------IPv6 MPLS tunnel ----------------------> R0 and R2 act as 6PE routers of IPv6 islands. R1 is IPv4 only with MPLS tunnels between R0,R1 and R1,R2. docker exec r0 ip -f inet6 route add abcd::2.2.2.2/128 nexthop encap mpls 100 via ::ffff:172.18.0.11 dev eth1 docker exec r2 ip -f inet6 route add abcd::1.1.1.1/128 nexthop encap mpls 200 via ::ffff:172.19.0.11 dev eth4 docker exec r1 ip -f mpls route add 100 via inet 172.19.0.12 dev eth3 docker exec r1 ip -f mpls route add 200 via inet 172.18.0.10 dev eth2 With the change, when R0 sends an IPv6 packet over MPLS tunnel to abcd::2.2.2.2, using ::ffff:172.18.0.11 as the nexthop, it does neighbor discovery for 172.18.18.0.11. Signed-off-by: Vinay K Nallamothu Tested-by: Avinash Lingala Tested-by: Aravind Srinivas Srinivasa Prabhakar Signed-off-by: David S. Miller --- net/mpls/mpls_iptunnel.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index dda8930f20e7..f3a8557494d6 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -140,9 +140,15 @@ static int mpls_xmit(struct sk_buff *skb) if (rt) err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gateway, skb); - else if (rt6) - err = neigh_xmit(NEIGH_ND_TABLE, out_dev, &rt6->rt6i_gateway, - skb); + else if (rt6) { + if (ipv6_addr_v4mapped(&rt6->rt6i_gateway)) { + /* 6PE (RFC 4798) */ + err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt6->rt6i_gateway.s6_addr32[3], + skb); + } else + err = neigh_xmit(NEIGH_ND_TABLE, out_dev, &rt6->rt6i_gateway, + skb); + } if (err) net_dbg_ratelimited("%s: packet transmission failed: %d\n", __func__, err); From 7ae622c978db6b2e28b4fced6ecd2a174492059d Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Thu, 31 Jan 2019 11:04:19 +0200 Subject: [PATCH 181/468] usb: dwc3: pci: add support for Comet Lake PCH ID This patch simply adds a new PCI Device ID Signed-off-by: Felipe Balbi --- drivers/usb/dwc3/dwc3-pci.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index fdc6e4e403e8..8cced3609e24 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -29,6 +29,7 @@ #define PCI_DEVICE_ID_INTEL_BXT_M 0x1aaa #define PCI_DEVICE_ID_INTEL_APL 0x5aaa #define PCI_DEVICE_ID_INTEL_KBP 0xa2b0 +#define PCI_DEVICE_ID_INTEL_CMLH 0x02ee #define PCI_DEVICE_ID_INTEL_GLK 0x31aa #define PCI_DEVICE_ID_INTEL_CNPLP 0x9dee #define PCI_DEVICE_ID_INTEL_CNPH 0xa36e @@ -305,6 +306,9 @@ static const struct pci_device_id dwc3_pci_id_table[] = { { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_MRFLD), (kernel_ulong_t) &dwc3_pci_mrfld_properties, }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_CMLH), + (kernel_ulong_t) &dwc3_pci_intel_properties, }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_SPTLP), (kernel_ulong_t) &dwc3_pci_intel_properties, }, From 9d6a54c1430647355a5e23434881b2ca3d192b48 Mon Sep 17 00:00:00 2001 From: Guido Kiener Date: Tue, 19 Mar 2019 19:12:03 +0100 Subject: [PATCH 182/468] usb: gadget: net2280: Fix overrun of OUT messages The OUT endpoint normally blocks (NAK) subsequent packets when a short packet was received and returns an incomplete queue entry to the gadget driver. Thereby the gadget driver can detect a short packet when reading queue entries with a length that is not equal to a multiple of packet size. The start_queue() function enables receiving OUT packets regardless of the content of the OUT FIFO. This results in a race: With the current code, it's possible that the "!ep->is_in && (readl(&ep->regs->ep_stat) & BIT(NAK_OUT_PACKETS))" test in start_dma() will fail, then a short packet will be received, and then start_queue() will call stop_out_naking(). That's what we don't want (OUT naking gets turned off while there is data in the FIFO) because then the next driver request might receive a mixture of old and new packets. With the patch, this race can't occur because the FIFO's state is tested after we know that OUT naking is already turned on, and OUT naking is stopped only when both of the conditions are met. This ensures that all received data is delivered to the gadget driver, which can detect a short packet now before new packets are appended to the last short packet. Acked-by: Alan Stern Signed-off-by: Guido Kiener Signed-off-by: Felipe Balbi --- drivers/usb/gadget/udc/net2280.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index f63f82450bf4..e0b413e9e532 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -866,9 +866,6 @@ static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) (void) readl(&ep->dev->pci->pcimstctl); writel(BIT(DMA_START), &dma->dmastat); - - if (!ep->is_in) - stop_out_naking(ep); } static void start_dma(struct net2280_ep *ep, struct net2280_request *req) @@ -907,6 +904,7 @@ static void start_dma(struct net2280_ep *ep, struct net2280_request *req) writel(BIT(DMA_START), &dma->dmastat); return; } + stop_out_naking(ep); } tmp = dmactl_default; From f1d3fba17cd4eeea20397f1324b7b9c69a6a935c Mon Sep 17 00:00:00 2001 From: Guido Kiener Date: Mon, 18 Mar 2019 09:18:33 +0100 Subject: [PATCH 183/468] usb: gadget: net2280: Fix net2280_dequeue() When a request must be dequeued with net2280_dequeue() e.g. due to a device clear action and the same request is finished by the function scan_dma_completions() then the function net2280_dequeue() does not find the request in the following search loop and returns the error -EINVAL without restoring the status ep->stopped. Thus the endpoint keeps blocked and does not receive any data anymore. This fix restores the status and does not issue an error message. Acked-by: Alan Stern Signed-off-by: Guido Kiener Signed-off-by: Felipe Balbi --- drivers/usb/gadget/udc/net2280.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index e0b413e9e532..898339e5df10 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -1273,9 +1273,9 @@ static int net2280_dequeue(struct usb_ep *_ep, struct usb_request *_req) break; } if (&req->req != _req) { + ep->stopped = stopped; spin_unlock_irqrestore(&ep->dev->lock, flags); - dev_err(&ep->dev->pdev->dev, "%s: Request mismatch\n", - __func__); + ep_dbg(ep->dev, "%s: Request mismatch\n", __func__); return -EINVAL; } From 091dacc3cc10979ab0422f0a9f7fcc27eee97e69 Mon Sep 17 00:00:00 2001 From: Guido Kiener Date: Mon, 18 Mar 2019 09:18:34 +0100 Subject: [PATCH 184/468] usb: gadget: net2272: Fix net2272_dequeue() Restore the status of ep->stopped in function net2272_dequeue(). When the given request is not found in the endpoint queue the function returns -EINVAL without restoring the state of ep->stopped. Thus the endpoint keeps blocked and does not transfer any data anymore. This fix is only compile-tested, since we do not have a corresponding hardware. An analogous fix was tested in the sibling driver. See "usb: gadget: net2280: Fix net2280_dequeue()" Acked-by: Alan Stern Signed-off-by: Guido Kiener Signed-off-by: Felipe Balbi --- drivers/usb/gadget/udc/net2272.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/gadget/udc/net2272.c b/drivers/usb/gadget/udc/net2272.c index b77f3126580e..c2011cd7df8c 100644 --- a/drivers/usb/gadget/udc/net2272.c +++ b/drivers/usb/gadget/udc/net2272.c @@ -945,6 +945,7 @@ net2272_dequeue(struct usb_ep *_ep, struct usb_request *_req) break; } if (&req->req != _req) { + ep->stopped = stopped; spin_unlock_irqrestore(&ep->dev->lock, flags); return -EINVAL; } From b25a31bf0ca091aa8bdb9ab329b0226257568bbe Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Tue, 19 Mar 2019 13:22:41 +0900 Subject: [PATCH 185/468] netfilter: nf_tables: add missing ->release_ops() in error path of newrule() ->release_ops() callback releases resources and this is used in error path. If nf_tables_newrule() fails after ->select_ops(), it should release resources. but it can not call ->destroy() because that should be called after ->init(). At this point, ->release_ops() should be used for releasing resources. Test commands: modprobe -rv xt_tcpudp iptables-nft -I INPUT -m tcp <-- error command lsmod Result: Module Size Used by xt_tcpudp 20480 2 <-- it should be 0 Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Signed-off-by: Taehee Yoo Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 513f93118604..ef7772e976cc 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2806,8 +2806,11 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, nf_tables_rule_release(&ctx, rule); err1: for (i = 0; i < n; i++) { - if (info[i].ops != NULL) + if (info[i].ops) { module_put(info[i].ops->type->owner); + if (info[i].ops->type->release_ops) + info[i].ops->type->release_ops(info[i].ops); + } } kvfree(info); return err; From 072684e8c58d17e853f8e8b9f6d9ce2e58d2b036 Mon Sep 17 00:00:00 2001 From: Radoslav Gerganov Date: Tue, 5 Mar 2019 10:10:34 +0000 Subject: [PATCH 186/468] USB: gadget: f_hid: fix deadlock in f_hidg_write() In f_hidg_write() the write_spinlock is acquired before calling usb_ep_queue() which causes a deadlock when dummy_hcd is being used. This is because dummy_queue() callbacks into f_hidg_req_complete() which tries to acquire the same spinlock. This is (part of) the backtrace when the deadlock occurs: 0xffffffffc06b1410 in f_hidg_req_complete 0xffffffffc06a590a in usb_gadget_giveback_request 0xffffffffc06cfff2 in dummy_queue 0xffffffffc06a4b96 in usb_ep_queue 0xffffffffc06b1eb6 in f_hidg_write 0xffffffff8127730b in __vfs_write 0xffffffff812774d1 in vfs_write 0xffffffff81277725 in SYSC_write Fix this by releasing the write_spinlock before calling usb_ep_queue() Reviewed-by: James Bottomley Tested-by: James Bottomley Cc: stable@vger.kernel.org # 4.11+ Fixes: 749494b6bdbb ("usb: gadget: f_hid: fix: Move IN request allocation to set_alt()") Signed-off-by: Radoslav Gerganov Signed-off-by: Felipe Balbi --- drivers/usb/gadget/function/f_hid.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/usb/gadget/function/f_hid.c b/drivers/usb/gadget/function/f_hid.c index 75b113a5b25c..f3816a5c861e 100644 --- a/drivers/usb/gadget/function/f_hid.c +++ b/drivers/usb/gadget/function/f_hid.c @@ -391,20 +391,20 @@ static ssize_t f_hidg_write(struct file *file, const char __user *buffer, req->complete = f_hidg_req_complete; req->context = hidg; + spin_unlock_irqrestore(&hidg->write_spinlock, flags); + status = usb_ep_queue(hidg->in_ep, req, GFP_ATOMIC); if (status < 0) { ERROR(hidg->func.config->cdev, "usb_ep_queue error on int endpoint %zd\n", status); - goto release_write_pending_unlocked; + goto release_write_pending; } else { status = count; } - spin_unlock_irqrestore(&hidg->write_spinlock, flags); return status; release_write_pending: spin_lock_irqsave(&hidg->write_spinlock, flags); -release_write_pending_unlocked: hidg->write_pending = 0; spin_unlock_irqrestore(&hidg->write_spinlock, flags); From 032f85c9360fb1a08385c584c2c4ed114b33c260 Mon Sep 17 00:00:00 2001 From: Marco Felsch Date: Mon, 4 Mar 2019 11:49:40 +0100 Subject: [PATCH 187/468] ARM: dts: pfla02: increase phy reset duration Increase the reset duration to ensure correct phy functionality. The reset duration is taken from barebox commit 52fdd510de ("ARM: dts: pfla02: use long enough reset for ethernet phy"): Use a longer reset time for ethernet phy Micrel KSZ9031RNX. Otherwise a small percentage of modules have 'transmission timeouts' errors like barebox@Phytec phyFLEX-i.MX6 Quad Carrier-Board:/ ifup eth0 warning: No MAC address set. Using random address 7e:94:4d:02:f8:f3 eth0: 1000Mbps full duplex link detected eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout Cc: Stefan Christ Cc: Christian Hemp Signed-off-by: Marco Felsch Fixes: 3180f956668e ("ARM: dts: Phytec imx6q pfla02 and pbab01 support") Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi b/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi index 433bf09a1954..027df06c5dc7 100644 --- a/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi +++ b/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi @@ -91,6 +91,7 @@ &fec { pinctrl-0 = <&pinctrl_enet>; phy-handle = <ðphy>; phy-mode = "rgmii"; + phy-reset-duration = <10>; /* in msecs */ phy-reset-gpios = <&gpio3 23 GPIO_ACTIVE_LOW>; phy-supply = <&vdd_eth_io_reg>; status = "disabled"; From 2908b076f5198d231de62713cb2b633a3a4b95ac Mon Sep 17 00:00:00 2001 From: Lin Yi Date: Wed, 20 Mar 2019 19:04:56 +0800 Subject: [PATCH 188/468] USB: serial: mos7720: fix mos_parport refcount imbalance on error path The write_parport_reg_nonblock() helper takes a reference to the struct mos_parport, but failed to release it in a couple of error paths after allocation failures, leading to a memory leak. Johan said that move the kref_get() and mos_parport assignment to the end of urbtrack initialisation is a better way, so move it. and mos_parport do not used until urbtrack initialisation. Signed-off-by: Lin Yi Fixes: b69578df7e98 ("USB: usbserial: mos7720: add support for parallel port on moschip 7715") Cc: stable # 2.6.35 Signed-off-by: Johan Hovold --- drivers/usb/serial/mos7720.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/mos7720.c b/drivers/usb/serial/mos7720.c index fc52ac75fbf6..18110225d506 100644 --- a/drivers/usb/serial/mos7720.c +++ b/drivers/usb/serial/mos7720.c @@ -366,8 +366,6 @@ static int write_parport_reg_nonblock(struct mos7715_parport *mos_parport, if (!urbtrack) return -ENOMEM; - kref_get(&mos_parport->ref_count); - urbtrack->mos_parport = mos_parport; urbtrack->urb = usb_alloc_urb(0, GFP_ATOMIC); if (!urbtrack->urb) { kfree(urbtrack); @@ -388,6 +386,8 @@ static int write_parport_reg_nonblock(struct mos7715_parport *mos_parport, usb_sndctrlpipe(usbdev, 0), (unsigned char *)urbtrack->setup, NULL, 0, async_complete, urbtrack); + kref_get(&mos_parport->ref_count); + urbtrack->mos_parport = mos_parport; kref_init(&urbtrack->ref_count); INIT_LIST_HEAD(&urbtrack->urblist_entry); From a75bb4eb9e565b9f5115e2e8c07377ce32cbe69a Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Mon, 18 Mar 2019 17:10:05 -0400 Subject: [PATCH 189/468] Revert "kbuild: use -Oz instead of -Os when using clang" The clang option -Oz enables *aggressive* optimization for size, which doesn't necessarily result in smaller images, but can have negative impact on performance. Switch back to the less aggressive -Os. This reverts commit 6748cb3c299de1ffbe56733647b01dbcc398c419. Suggested-by: Peter Zijlstra Signed-off-by: Matthias Kaehlcke Reviewed-by: Nick Desaulniers Signed-off-by: Masahiro Yamada --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 99c0530489ef..8c6acfdfe660 100644 --- a/Makefile +++ b/Makefile @@ -677,7 +677,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context) ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE -KBUILD_CFLAGS += $(call cc-option,-Oz,-Os) +KBUILD_CFLAGS += -Os else KBUILD_CFLAGS += -O2 endif From 5cd1c56c42beb6d228cc8d4373fdc5f5ec78a5ad Mon Sep 17 00:00:00 2001 From: Jarkko Nikula Date: Fri, 15 Mar 2019 12:56:49 +0200 Subject: [PATCH 190/468] i2c: i801: Add support for Intel Comet Lake Add PCI ID for Intel Comet Lake PCH. Signed-off-by: Jarkko Nikula Reviewed-by: Jean Delvare Signed-off-by: Wolfram Sang --- Documentation/i2c/busses/i2c-i801 | 1 + drivers/i2c/busses/Kconfig | 1 + drivers/i2c/busses/i2c-i801.c | 4 ++++ 3 files changed, 6 insertions(+) diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index d1ee484a787d..ee9984f35868 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -36,6 +36,7 @@ Supported adapters: * Intel Cannon Lake (PCH) * Intel Cedar Fork (PCH) * Intel Ice Lake (PCH) + * Intel Comet Lake (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index f2c681971201..f8979abb9a19 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -131,6 +131,7 @@ config I2C_I801 Cannon Lake (PCH) Cedar Fork (PCH) Ice Lake (PCH) + Comet Lake (PCH) This driver can also be built as a module. If so, the module will be called i2c-i801. diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index c91e145ef5a5..679c6c41f64b 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -71,6 +71,7 @@ * Cannon Lake-LP (PCH) 0x9da3 32 hard yes yes yes * Cedar Fork (PCH) 0x18df 32 hard yes yes yes * Ice Lake-LP (PCH) 0x34a3 32 hard yes yes yes + * Comet Lake (PCH) 0x02a3 32 hard yes yes yes * * Features supported by this driver: * Software PEC no @@ -240,6 +241,7 @@ #define PCI_DEVICE_ID_INTEL_LEWISBURG_SSKU_SMBUS 0xa223 #define PCI_DEVICE_ID_INTEL_KABYLAKE_PCH_H_SMBUS 0xa2a3 #define PCI_DEVICE_ID_INTEL_CANNONLAKE_H_SMBUS 0xa323 +#define PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS 0x02a3 struct i801_mux_config { char *gpio_chip; @@ -1038,6 +1040,7 @@ static const struct pci_device_id i801_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CANNONLAKE_H_SMBUS) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CANNONLAKE_LP_SMBUS) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICELAKE_LP_SMBUS) }, + { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS) }, { 0, } }; @@ -1534,6 +1537,7 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) case PCI_DEVICE_ID_INTEL_DNV_SMBUS: case PCI_DEVICE_ID_INTEL_KABYLAKE_PCH_H_SMBUS: case PCI_DEVICE_ID_INTEL_ICELAKE_LP_SMBUS: + case PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS: priv->features |= FEATURE_I2C_BLOCK_READ; priv->features |= FEATURE_IRQ; priv->features |= FEATURE_SMBUS_PEC; From 3c3736cd32bf5197aed1410ae826d2d254a5b277 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Wed, 20 Mar 2019 14:57:19 +0000 Subject: [PATCH 191/468] KVM: arm/arm64: Fix handling of stage2 huge mappings We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zenghui Yu Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier --- arch/arm/include/asm/stage2_pgtable.h | 2 + virt/kvm/arm/mmu.c | 59 +++++++++++++++++++-------- 2 files changed, 45 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de2089501b8b..9e11dce55e06 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -75,6 +75,8 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm) #define S2_PMD_MASK PMD_MASK #define S2_PMD_SIZE PMD_SIZE +#define S2_PUD_MASK PUD_MASK +#define S2_PUD_SIZE PUD_SIZE static inline bool kvm_stage2_has_pmd(struct kvm *kvm) { diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index bcdf978c0d1d..f9da2fad9bd6 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1067,25 +1067,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* - * Multiple vcpus faulting on the same PMD entry, can - * lead to them sequentially updating the PMD with the - * same value. Following the break-before-make - * (pmd_clear() followed by tlb_flush()) process can - * hinder forward progress due to refaults generated - * on missing translations. + * If we already have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB state and + * leaking the table page. We could end up in this situation + * if the memory slot was marked for dirty logging and was + * reverted, leaving PTE level mappings for the pages accessed + * during the period. So, unmap the PTE level mapping for this + * block and retry, as we could have released the upper level + * table in the process. * - * Skip updating the page table if the entry is - * unchanged. + * Normal THP split/merge follows mmu_notifier callbacks and do + * get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1097,8 +1115,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1114,6 +1131,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud; +retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp); @@ -1121,14 +1139,23 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac /* * A large number of vcpus faulting on the same stage 2 entry, - * can lead to a refault due to the - * stage2_pud_clear()/tlb_flush(). Skip updating the page - * tables if there is no change. + * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). + * Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0; if (stage2_pud_present(kvm, old_pud)) { + /* + * If we already have table level mapping for this block, unmap + * the range for this block and retry. + */ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); + goto retry; + } + + WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp)); stage2_pud_clear(kvm, pudp); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { From d9ea27a3304812500de3674981a9c3a2086d517b Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 20 Mar 2019 22:18:13 +0800 Subject: [PATCH 192/468] KVM: arm/arm64: vgic-its: Make attribute accessors static Fix sparse warnings: arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning: symbol 'vgic_its_has_attr_regs' was not declared. Should it be static? arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning: symbol 'vgic_its_attr_regs_access' was not declared. Should it be static? Signed-off-by: YueHaibing [maz: fixed subject] Signed-off-by: Marc Zyngier --- virt/kvm/arm/vgic/vgic-its.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index fcb2fceaa4a5..44ceaccb18cf 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1736,8 +1736,8 @@ static void vgic_its_destroy(struct kvm_device *kvm_dev) kfree(its); } -int vgic_its_has_attr_regs(struct kvm_device *dev, - struct kvm_device_attr *attr) +static int vgic_its_has_attr_regs(struct kvm_device *dev, + struct kvm_device_attr *attr) { const struct vgic_register_region *region; gpa_t offset = attr->attr; @@ -1757,9 +1757,9 @@ int vgic_its_has_attr_regs(struct kvm_device *dev, return 0; } -int vgic_its_attr_regs_access(struct kvm_device *dev, - struct kvm_device_attr *attr, - u64 *reg, bool is_write) +static int vgic_its_attr_regs_access(struct kvm_device *dev, + struct kvm_device_attr *attr, + u64 *reg, bool is_write) { const struct vgic_register_region *region; struct vgic_its *its; From 398f0132c14754fcd03c1c4f8e7176d001ce8ea1 Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Mon, 18 Mar 2019 23:14:52 -0700 Subject: [PATCH 193/468] net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec Since commit fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check") one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller found that that triggers a warning: [ 21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0 [ 21.101490] Modules linked in: [ 21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146 [ 21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 [ 21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630 [ 21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3 [ 21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246 [ 21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000 [ 21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000 [ 21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67 [ 21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d [ 21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d [ 21.112552] FS: 00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000 [ 21.113612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0 [ 21.115367] Call Trace: [ 21.115705] ? __alloc_pages_slowpath+0x21c0/0x21c0 [ 21.116362] alloc_pages_current+0xac/0x1e0 [ 21.116923] kmalloc_order+0x18/0x70 [ 21.117393] kmalloc_order_trace+0x18/0x110 [ 21.117949] packet_set_ring+0x9d5/0x1770 [ 21.118524] ? packet_rcv_spkt+0x440/0x440 [ 21.119094] ? lock_downgrade+0x620/0x620 [ 21.119646] ? __might_fault+0x177/0x1b0 [ 21.120177] packet_setsockopt+0x981/0x2940 [ 21.120753] ? __fget+0x2fb/0x4b0 [ 21.121209] ? packet_release+0xab0/0xab0 [ 21.121740] ? sock_has_perm+0x1cd/0x260 [ 21.122297] ? selinux_secmark_relabel_packet+0xd0/0xd0 [ 21.123013] ? __fget+0x324/0x4b0 [ 21.123451] ? selinux_netlbl_socket_setsockopt+0x101/0x320 [ 21.124186] ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0 [ 21.124908] ? __lock_acquire+0x529/0x3200 [ 21.125453] ? selinux_socket_setsockopt+0x5d/0x70 [ 21.126075] ? __sys_setsockopt+0x131/0x210 [ 21.126533] ? packet_release+0xab0/0xab0 [ 21.127004] __sys_setsockopt+0x131/0x210 [ 21.127449] ? kernel_accept+0x2f0/0x2f0 [ 21.127911] ? ret_from_fork+0x8/0x50 [ 21.128313] ? do_raw_spin_lock+0x11b/0x280 [ 21.128800] __x64_sys_setsockopt+0xba/0x150 [ 21.129271] ? lockdep_hardirqs_on+0x37f/0x560 [ 21.129769] do_syscall_64+0x9f/0x450 [ 21.130182] entry_SYSCALL_64_after_hwframe+0x49/0xbe We should allocate with __GFP_NOWARN to handle this. Cc: Kal Conley Cc: Andrey Konovalov Fixes: fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check") Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 323655a25674..9419c5cf4de5 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4210,7 +4210,7 @@ static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order) struct pgv *pg_vec; int i; - pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL); + pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL | __GFP_NOWARN); if (unlikely(!pg_vec)) goto out; From 1c87e79a002f6a159396138cd3f3ab554a2a8887 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Wed, 20 Mar 2019 14:45:48 +0800 Subject: [PATCH 194/468] ipv6: make ip6_create_rt_rcu return ip6_null_entry instead of NULL Jianlin reported a crash: [ 381.484332] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [ 381.619802] RIP: 0010:fib6_rule_lookup+0xa3/0x160 [ 382.009615] Call Trace: [ 382.020762] [ 382.030174] ip6_route_redirect.isra.52+0xc9/0xf0 [ 382.050984] ip6_redirect+0xb6/0xf0 [ 382.066731] icmpv6_notify+0xca/0x190 [ 382.083185] ndisc_redirect_rcv+0x10f/0x160 [ 382.102569] ndisc_rcv+0xfb/0x100 [ 382.117725] icmpv6_rcv+0x3f2/0x520 [ 382.133637] ip6_input_finish+0xbf/0x460 [ 382.151634] ip6_input+0x3b/0xb0 [ 382.166097] ipv6_rcv+0x378/0x4e0 It was caused by the lookup function __ip6_route_redirect() returns NULL in fib6_rule_lookup() when ip6_create_rt_rcu() returns NULL. So we fix it by simply making ip6_create_rt_rcu() return ip6_null_entry instead of NULL. v1->v2: - move down 'fallback:' to make it more readable. Fixes: e873e4b9cc7e ("ipv6: use fib6_info_hold_safe() when necessary") Reported-by: Jianlin Shi Suggested-by: Paolo Abeni Signed-off-by: Xin Long Reviewed-by: David Ahern Acked-by: Wei Wang Signed-off-by: David S. Miller --- net/ipv6/route.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 4ef4bbdb49d4..0302e0eb07af 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1040,14 +1040,20 @@ static struct rt6_info *ip6_create_rt_rcu(struct fib6_info *rt) struct rt6_info *nrt; if (!fib6_info_hold_safe(rt)) - return NULL; + goto fallback; nrt = ip6_dst_alloc(dev_net(dev), dev, flags); - if (nrt) - ip6_rt_copy_init(nrt, rt); - else + if (!nrt) { fib6_info_release(rt); + goto fallback; + } + ip6_rt_copy_init(nrt, rt); + return nrt; + +fallback: + nrt = dev_net(dev)->ipv6.ip6_null_entry; + dst_hold(&nrt->dst); return nrt; } @@ -1096,10 +1102,6 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net, dst_hold(&rt->dst); } else { rt = ip6_create_rt_rcu(f6i); - if (!rt) { - rt = net->ipv6.ip6_null_entry; - dst_hold(&rt->dst); - } } rcu_read_unlock(); From ef82bcfa671b9a635bab5fa669005663d8b177c5 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Wed, 20 Mar 2019 14:49:38 +0800 Subject: [PATCH 195/468] sctp: use memdup_user instead of vmemdup_user In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates memory with addrs_size which is passed from userspace. We used flag GFP_USER to put some more restrictions on it in Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc"). However, since Commit c981f254cc82 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()"), vmemdup_user() has been used, which doesn't check GFP_USER flag when goes to vmalloc_*(). So when addrs_size is a huge value, it could exhaust memory and even trigger oom killer. This patch is to use memdup_user() instead, in which GFP_USER would work to limit the memory allocation with a huge addrs_size. Note we can't fix it by limiting 'addrs_size', as there's no demand for it from RFC. Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com Fixes: c981f254cc82 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()") Signed-off-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller --- net/sctp/socket.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 011c349d877a..9874e60c9b0d 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk, if (unlikely(addrs_size <= 0)) return -EINVAL; - kaddrs = vmemdup_user(addrs, addrs_size); + kaddrs = memdup_user(addrs, addrs_size); if (unlikely(IS_ERR(kaddrs))) return PTR_ERR(kaddrs); @@ -1007,7 +1007,7 @@ static int sctp_setsockopt_bindx(struct sock *sk, addr_buf = kaddrs; while (walk_size < addrs_size) { if (walk_size + sizeof(sa_family_t) > addrs_size) { - kvfree(kaddrs); + kfree(kaddrs); return -EINVAL; } @@ -1018,7 +1018,7 @@ static int sctp_setsockopt_bindx(struct sock *sk, * causes the address buffer to overflow return EINVAL. */ if (!af || (walk_size + af->sockaddr_len) > addrs_size) { - kvfree(kaddrs); + kfree(kaddrs); return -EINVAL; } addrcnt++; @@ -1054,7 +1054,7 @@ static int sctp_setsockopt_bindx(struct sock *sk, } out: - kvfree(kaddrs); + kfree(kaddrs); return err; } @@ -1329,7 +1329,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk, if (unlikely(addrs_size <= 0)) return -EINVAL; - kaddrs = vmemdup_user(addrs, addrs_size); + kaddrs = memdup_user(addrs, addrs_size); if (unlikely(IS_ERR(kaddrs))) return PTR_ERR(kaddrs); @@ -1349,7 +1349,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk, err = __sctp_connect(sk, kaddrs, addrs_size, flags, assoc_id); out_free: - kvfree(kaddrs); + kfree(kaddrs); return err; } From 0ccc3876e4b2a1559a4dbe3126dda4459d38a83b Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Tue, 19 Mar 2019 17:18:13 +0000 Subject: [PATCH 196/468] Btrfs: fix assertion failure on fsync with NO_HOLES enabled Back in commit a89ca6f24ffe4 ("Btrfs: fix fsync after truncate when no_holes feature is enabled") I added an assertion that is triggered when an inline extent is found to assert that the length of the (uncompressed) data the extent represents is the same as the i_size of the inode, since that is true most of the time I couldn't find or didn't remembered about any exception at that time. Later on the assertion was expanded twice to deal with a case of a compressed inline extent representing a range that matches the sector size followed by an expanding truncate, and another case where fallocate can update the i_size of the inode without adding or updating existing extents (if the fallocate range falls entirely within the first block of the file). These two expansion/fixes of the assertion were done by commit 7ed586d0a8241 ("Btrfs: fix assertion on fsync of regular file when using no-holes feature") and commit 6399fb5a0b69a ("Btrfs: fix assertion failure during fsync in no-holes mode"). These however missed the case where an falloc expands the i_size of an inode to exactly the sector size and inline extent exists, for example: $ mkfs.btrfs -f -O no-holes /dev/sdc $ mount /dev/sdc /mnt $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar wrote 1096/1096 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec) $ xfs_io -c "falloc 1096 3000" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar Segmentation fault $ dmesg [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727 [701253.602962] ------------[ cut here ]------------ [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533! [701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G W 5.0.0-rc8-btrfs-next-45 #1 [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs] (...) [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286 [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000 [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868 [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000 [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18 [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80 [701253.607769] FS: 00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000 [701253.608163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0 [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [701253.609608] Call Trace: [701253.609994] btrfs_log_inode+0xdfb/0xe40 [btrfs] [701253.610383] btrfs_log_inode_parent+0x2be/0xa60 [btrfs] [701253.610770] ? do_raw_spin_unlock+0x49/0xc0 [701253.611150] btrfs_log_dentry_safe+0x4a/0x70 [btrfs] [701253.611537] btrfs_sync_file+0x3b2/0x440 [btrfs] [701253.612010] ? do_sysinfo+0xb0/0xf0 [701253.612552] do_fsync+0x38/0x60 [701253.612988] __x64_sys_fsync+0x10/0x20 [701253.613360] do_syscall_64+0x60/0x1b0 [701253.613733] entry_SYSCALL_64_after_hwframe+0x49/0xbe [701253.614103] RIP: 0033:0x7f14da4e66d0 (...) [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0 [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003 [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010 [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260 [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10 (...) [701253.619941] ---[ end trace e088d74f132b6da5 ]--- Updating the assertion again to allow for this particular case would result in a meaningless assertion, plus there is currently no risk of logging content that would result in any corruption after a log replay if the size of the data encoded in an inline extent is greater than the inode's i_size (which is not currently possibe either with or without compression), therefore just remove the assertion. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/tree-log.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 960ea49a7a0f..561884f60d35 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4725,15 +4725,8 @@ static int btrfs_log_trailing_hole(struct btrfs_trans_handle *trans, struct btrfs_file_extent_item); if (btrfs_file_extent_type(leaf, extent) == - BTRFS_FILE_EXTENT_INLINE) { - len = btrfs_file_extent_ram_bytes(leaf, extent); - ASSERT(len == i_size || - (len == fs_info->sectorsize && - btrfs_file_extent_compression(leaf, extent) != - BTRFS_COMPRESS_NONE) || - (len < i_size && i_size < fs_info->sectorsize)); + BTRFS_FILE_EXTENT_INLINE) return 0; - } len = btrfs_file_extent_num_bytes(leaf, extent); /* Last extent goes beyond i_size, no need to log a hole. */ From 536d3680fd2dab5c39857d62a3e084198fc74ff9 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 197/468] net: ks8851: Dequeue RX packets explicitly The ks8851 driver lets the chip auto-dequeue received packets once they have been read in full. It achieves that by setting the ADRFE flag in the RXQCR register ("Auto-Dequeue RXQ Frame Enable"). However if allocation of a packet's socket buffer or retrieval of the packet over the SPI bus fails, the packet will not have been read in full and is not auto-dequeued. Such partial retrieval of a packet confuses the chip's RX queue management: On the next RX interrupt, the first packet read from the queue will be the one left there previously and this one can be retrieved without issues. But for any newly received packets, the frame header status and byte count registers (RXFHSR and RXFHBCR) contain bogus values, preventing their retrieval. The chip allows explicitly dequeueing a packet from the RX queue by setting the RRXEF flag in the RXQCR register ("Release RX Error Frame"). This could be used to dequeue the packet in case of an error, but if that error is a failed SPI transfer, it is unknown if the packet was transferred in full and was auto-dequeued or if it was only transferred in part and requires an explicit dequeue. The safest approach is thus to always dequeue packets explicitly and forgo auto-dequeueing. Without this change, I've witnessed packet retrieval break completely when an SPI DMA transfer fails, requiring a chip reset. Explicit dequeueing magically fixes this and makes packet retrieval absolutely robust for me. The chip's documentation suggests auto-dequeuing and uses the RRXEF flag only to dequeue error frames which the driver doesn't want to retrieve. But that seems to be a fair-weather approach. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Ben Dooks Cc: Tristram Ha Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index bd6e9014bc74..a93f8e842c07 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -535,9 +535,8 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) /* set dma read address */ ks8851_wrreg16(ks, KS_RXFDPR, RXFDPR_RXFPAI | 0x00); - /* start the packet dma process, and set auto-dequeue rx */ - ks8851_wrreg16(ks, KS_RXQCR, - ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE); + /* start DMA access */ + ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_SDA); if (rxlen > 4) { unsigned int rxalign; @@ -568,7 +567,8 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) } } - ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr); + /* end DMA access and dequeue packet */ + ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_RRXEF); } } From 761cfa979a0c177d6c2d93ef5585cd79ae49a7d5 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 198/468] net: ks8851: Reassert reset pin if chip ID check fails Commit 73fdeb82e963 ("net: ks8851: Add optional vdd_io regulator and reset gpio") amended the ks8851 driver to briefly assert the chip's reset pin on probe. It also amended the probe routine's error path to reassert the reset pin if a subsequent initialization step fails. However the commit misplaced reassertion of the reset pin in the error path such that it is not performed if the check of the Chip ID and Enable Register (CIDER) fails. The error path is therefore slightly asymmetrical to the probe routine's body. Fix it. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Stephen Boyd Cc: Nishanth Menon Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index a93f8e842c07..1633fa5c709c 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -1554,9 +1554,9 @@ static int ks8851_probe(struct spi_device *spi) free_irq(ndev->irq, ks); err_irq: +err_id: if (gpio_is_valid(gpio)) gpio_set_value(gpio, 0); -err_id: regulator_disable(ks->vdd_reg); err_reg: regulator_disable(ks->vdd_io); From d268f31552794abf5b6aa5af31021643411f25f5 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 199/468] net: ks8851: Delay requesting IRQ until opened The ks8851 driver currently requests the IRQ before registering the net_device. Because the net_device name is used as IRQ name and is still "eth%d" when the IRQ is requested, it's impossibe to tell IRQs apart if multiple ks8851 chips are present. Most other drivers delay requesting the IRQ until the net_device is opened. Do the same. The driver doesn't enable interrupts on the chip before opening the net_device and disables them when closing it, so there doesn't seem to be a need to request the IRQ already on probe. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Ben Dooks Cc: Tristram Ha Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index 1633fa5c709c..c9faec4c5b25 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -785,6 +785,15 @@ static void ks8851_tx_work(struct work_struct *work) static int ks8851_net_open(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); + int ret; + + ret = request_threaded_irq(dev->irq, NULL, ks8851_irq, + IRQF_TRIGGER_LOW | IRQF_ONESHOT, + dev->name, ks); + if (ret < 0) { + netdev_err(dev, "failed to get irq\n"); + return ret; + } /* lock the card, even if we may not actually be doing anything * else at the moment */ @@ -899,6 +908,8 @@ static int ks8851_net_stop(struct net_device *dev) dev_kfree_skb(txb); } + free_irq(dev->irq, ks); + return 0; } @@ -1529,14 +1540,6 @@ static int ks8851_probe(struct spi_device *spi) ks8851_read_selftest(ks); ks8851_init_mac(ks); - ret = request_threaded_irq(spi->irq, NULL, ks8851_irq, - IRQF_TRIGGER_LOW | IRQF_ONESHOT, - ndev->name, ks); - if (ret < 0) { - dev_err(&spi->dev, "failed to get irq\n"); - goto err_irq; - } - ret = register_netdev(ndev); if (ret) { dev_err(&spi->dev, "failed to register network device\n"); @@ -1549,11 +1552,7 @@ static int ks8851_probe(struct spi_device *spi) return 0; - err_netdev: - free_irq(ndev->irq, ks); - -err_irq: err_id: if (gpio_is_valid(gpio)) gpio_set_value(gpio, 0); @@ -1574,7 +1573,6 @@ static int ks8851_remove(struct spi_device *spi) dev_info(&spi->dev, "remove\n"); unregister_netdev(priv->netdev); - free_irq(spi->irq, priv); if (gpio_is_valid(priv->gpio)) gpio_set_value(priv->gpio, 0); regulator_disable(priv->vdd_reg); From 9624bafa5f6418b9ca5b3f66d1f6a6a2e8bf6d4c Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 200/468] net: ks8851: Set initial carrier state to down The ks8851 chip's initial carrier state is down. A Link Change Interrupt is signaled once interrupts are enabled if the carrier is up. The ks8851 driver has it backwards by assuming that the initial carrier state is up. The state is therefore misrepresented if the interface is opened with no cable attached. Fix it. The Link Change interrupt is sometimes not signaled unless the P1MBSR register (which contains the Link Status bit) is read on ->ndo_open(). This might be a hardware erratum. Read the register by calling mii_check_link(), which has the desirable side effect of setting the carrier state to down if the cable was detached while the interface was closed. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Ben Dooks Cc: Tristram Ha Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index c9faec4c5b25..b83b070a9eec 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -858,6 +858,7 @@ static int ks8851_net_open(struct net_device *dev) netif_dbg(ks, ifup, ks->netdev, "network device up\n"); mutex_unlock(&ks->lock); + mii_check_link(&ks->mii); return 0; } @@ -1519,6 +1520,7 @@ static int ks8851_probe(struct spi_device *spi) spi_set_drvdata(spi, ks); + netif_carrier_off(ks->netdev); ndev->if_port = IF_PORT_100BASET; ndev->netdev_ops = &ks8851_netdev_ops; ndev->irq = spi->irq; From cbda74a12c4b738feb90752fbca3648d24646079 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 201/468] net: ks8851: Fix register macro misnomers In the header file accompanying the ks8851 driver, the P1SCLMD register macros are misnamed, they actually pertain to the P1CR register. The P1CR macros in turn pertain to the P1SR register, see pages 65 to 68 of the spec: http://www.hqchip.com/uploads/pdf/201703/47c98946d6c97a4766e14db3f24955f2.pdf The misnomers have no negative consequences so far because the macros aren't used by ks8851.c, but that's about to change. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Ben Dooks Cc: Tristram Ha Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.h | 52 +++++++++++++++------------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h index 852256ef1f22..1050fa48b73c 100644 --- a/drivers/net/ethernet/micrel/ks8851.h +++ b/drivers/net/ethernet/micrel/ks8851.h @@ -257,33 +257,35 @@ #define KS_P1ANLPR 0xEE #define KS_P1SCLMD 0xF4 -#define P1SCLMD_LEDOFF (1 << 15) -#define P1SCLMD_TXIDS (1 << 14) -#define P1SCLMD_RESTARTAN (1 << 13) -#define P1SCLMD_DISAUTOMDIX (1 << 10) -#define P1SCLMD_FORCEMDIX (1 << 9) -#define P1SCLMD_AUTONEGEN (1 << 7) -#define P1SCLMD_FORCE100 (1 << 6) -#define P1SCLMD_FORCEFDX (1 << 5) -#define P1SCLMD_ADV_FLOW (1 << 4) -#define P1SCLMD_ADV_100BT_FDX (1 << 3) -#define P1SCLMD_ADV_100BT_HDX (1 << 2) -#define P1SCLMD_ADV_10BT_FDX (1 << 1) -#define P1SCLMD_ADV_10BT_HDX (1 << 0) #define KS_P1CR 0xF6 -#define P1CR_HP_MDIX (1 << 15) -#define P1CR_REV_POL (1 << 13) -#define P1CR_OP_100M (1 << 10) -#define P1CR_OP_FDX (1 << 9) -#define P1CR_OP_MDI (1 << 7) -#define P1CR_AN_DONE (1 << 6) -#define P1CR_LINK_GOOD (1 << 5) -#define P1CR_PNTR_FLOW (1 << 4) -#define P1CR_PNTR_100BT_FDX (1 << 3) -#define P1CR_PNTR_100BT_HDX (1 << 2) -#define P1CR_PNTR_10BT_FDX (1 << 1) -#define P1CR_PNTR_10BT_HDX (1 << 0) +#define P1CR_LEDOFF (1 << 15) +#define P1CR_TXIDS (1 << 14) +#define P1CR_RESTARTAN (1 << 13) +#define P1CR_DISAUTOMDIX (1 << 10) +#define P1CR_FORCEMDIX (1 << 9) +#define P1CR_AUTONEGEN (1 << 7) +#define P1CR_FORCE100 (1 << 6) +#define P1CR_FORCEFDX (1 << 5) +#define P1CR_ADV_FLOW (1 << 4) +#define P1CR_ADV_100BT_FDX (1 << 3) +#define P1CR_ADV_100BT_HDX (1 << 2) +#define P1CR_ADV_10BT_FDX (1 << 1) +#define P1CR_ADV_10BT_HDX (1 << 0) + +#define KS_P1SR 0xF8 +#define P1SR_HP_MDIX (1 << 15) +#define P1SR_REV_POL (1 << 13) +#define P1SR_OP_100M (1 << 10) +#define P1SR_OP_FDX (1 << 9) +#define P1SR_OP_MDI (1 << 7) +#define P1SR_AN_DONE (1 << 6) +#define P1SR_LINK_GOOD (1 << 5) +#define P1SR_PNTR_FLOW (1 << 4) +#define P1SR_PNTR_100BT_FDX (1 << 3) +#define P1SR_PNTR_100BT_HDX (1 << 2) +#define P1SR_PNTR_10BT_FDX (1 << 1) +#define P1SR_PNTR_10BT_HDX (1 << 0) /* TX Frame control */ From aae079aa76d0cc3e679db31370364cb87a405651 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 15:02:00 +0100 Subject: [PATCH 202/468] net: ks8851: Deduplicate register macros The ks8851 chip is sold either with an SPI interface (KSZ8851SNL) or with a so-called non-PCI interface (KSZ8851-16MLL). When the driver for the latter was introduced with commit a55c0a0ed415 ("drivers/net: ks8851_mll ethernet network driver"), it duplicated the register macros introduced by the driver for the former with commit 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver"). The chips are almost identical, so the duplication seems unwarranted. There are a handful of bits which are in use on the KSZ8851-16MLL but reserved on the KSZ8851SNL, and vice-versa, but there are no actual collisions. Thus, remove the duplicate definitions from the KSZ8851-16MLL driver. Mark all bits which differ between the two chips. Move the SPI frame opcodes, which are specific to KSZ8851SNL, to its driver. The KSZ8851-16MLL driver added a RXFCTR_THRESHOLD_MASK macro which is a duplication of the RXFCTR_RXFCT_MASK macro, rename it where it's used. Same for P1MBCR_FORCE_FDX, which duplicates the BMCR_FULLDPLX macro and OBCR_ODS_16MA, which duplicates OBCR_ODS_16mA. Signed-off-by: Lukas Wunner Cc: Frank Pavlic Cc: Ben Dooks Cc: Tristram Ha Signed-off-by: David S. Miller --- drivers/net/ethernet/micrel/ks8851.c | 6 + drivers/net/ethernet/micrel/ks8851.h | 41 +-- drivers/net/ethernet/micrel/ks8851_mll.c | 317 +---------------------- 3 files changed, 34 insertions(+), 330 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index b83b070a9eec..7849119d407a 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -142,6 +142,12 @@ struct ks8851_net { static int msg_enable; +/* SPI frame opcodes */ +#define KS_SPIOP_RD (0x00) +#define KS_SPIOP_WR (0x40) +#define KS_SPIOP_RXFIFO (0x80) +#define KS_SPIOP_TXFIFO (0xC0) + /* shift for byte-enable data */ #define BYTE_EN(_x) ((_x) << 2) diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h index 1050fa48b73c..23da1e3ee429 100644 --- a/drivers/net/ethernet/micrel/ks8851.h +++ b/drivers/net/ethernet/micrel/ks8851.h @@ -11,9 +11,15 @@ */ #define KS_CCR 0x08 +#define CCR_LE (1 << 10) /* KSZ8851-16MLL */ #define CCR_EEPROM (1 << 9) -#define CCR_SPI (1 << 8) -#define CCR_32PIN (1 << 0) +#define CCR_SPI (1 << 8) /* KSZ8851SNL */ +#define CCR_8BIT (1 << 7) /* KSZ8851-16MLL */ +#define CCR_16BIT (1 << 6) /* KSZ8851-16MLL */ +#define CCR_32BIT (1 << 5) /* KSZ8851-16MLL */ +#define CCR_SHARED (1 << 4) /* KSZ8851-16MLL */ +#define CCR_48PIN (1 << 1) /* KSZ8851-16MLL */ +#define CCR_32PIN (1 << 0) /* KSZ8851SNL */ /* MAC address registers */ #define KS_MAR(_m) (0x15 - (_m)) @@ -112,13 +118,13 @@ #define RXCR1_RXE (1 << 0) #define KS_RXCR2 0x76 -#define RXCR2_SRDBL_MASK (0x7 << 5) -#define RXCR2_SRDBL_SHIFT (5) -#define RXCR2_SRDBL_4B (0x0 << 5) -#define RXCR2_SRDBL_8B (0x1 << 5) -#define RXCR2_SRDBL_16B (0x2 << 5) -#define RXCR2_SRDBL_32B (0x3 << 5) -#define RXCR2_SRDBL_FRAME (0x4 << 5) +#define RXCR2_SRDBL_MASK (0x7 << 5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_SHIFT (5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_4B (0x0 << 5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_8B (0x1 << 5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_16B (0x2 << 5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_32B (0x3 << 5) /* KSZ8851SNL */ +#define RXCR2_SRDBL_FRAME (0x4 << 5) /* KSZ8851SNL */ #define RXCR2_IUFFP (1 << 4) #define RXCR2_RXIUFCEZ (1 << 3) #define RXCR2_UDPLFE (1 << 2) @@ -143,8 +149,10 @@ #define RXFSHR_RXCE (1 << 0) #define KS_RXFHBCR 0x7E +#define RXFHBCR_CNT_MASK (0xfff << 0) + #define KS_TXQCR 0x80 -#define TXQCR_AETFE (1 << 2) +#define TXQCR_AETFE (1 << 2) /* KSZ8851SNL */ #define TXQCR_TXQMAM (1 << 1) #define TXQCR_METFE (1 << 0) @@ -167,6 +175,10 @@ #define KS_RXFDPR 0x86 #define RXFDPR_RXFPAI (1 << 14) +#define RXFDPR_WST (1 << 12) /* KSZ8851-16MLL */ +#define RXFDPR_EMS (1 << 11) /* KSZ8851-16MLL */ +#define RXFDPR_RXFP_MASK (0x7ff << 0) +#define RXFDPR_RXFP_SHIFT (0) #define KS_RXDTTR 0x8C #define KS_RXDBCTR 0x8E @@ -184,7 +196,7 @@ #define IRQ_RXMPDI (1 << 4) #define IRQ_LDI (1 << 3) #define IRQ_EDI (1 << 2) -#define IRQ_SPIBEI (1 << 1) +#define IRQ_SPIBEI (1 << 1) /* KSZ8851SNL */ #define IRQ_DEDI (1 << 0) #define KS_RXFCTR 0x9C @@ -288,13 +300,6 @@ #define P1SR_PNTR_10BT_HDX (1 << 0) /* TX Frame control */ - #define TXFR_TXIC (1 << 15) #define TXFR_TXFID_MASK (0x3f << 0) #define TXFR_TXFID_SHIFT (0) - -/* SPI frame opcodes */ -#define KS_SPIOP_RD (0x00) -#define KS_SPIOP_WR (0x40) -#define KS_SPIOP_RXFIFO (0x80) -#define KS_SPIOP_TXFIFO (0xC0) diff --git a/drivers/net/ethernet/micrel/ks8851_mll.c b/drivers/net/ethernet/micrel/ks8851_mll.c index 35f8c9ef204d..c946841c0a06 100644 --- a/drivers/net/ethernet/micrel/ks8851_mll.c +++ b/drivers/net/ethernet/micrel/ks8851_mll.c @@ -40,6 +40,8 @@ #include #include +#include "ks8851.h" + #define DRV_NAME "ks8851_mll" static u8 KS_DEFAULT_MAC_ADDRESS[] = { 0x00, 0x10, 0xA1, 0x86, 0x95, 0x11 }; @@ -48,319 +50,10 @@ static u8 KS_DEFAULT_MAC_ADDRESS[] = { 0x00, 0x10, 0xA1, 0x86, 0x95, 0x11 }; #define TX_BUF_SIZE 2000 #define RX_BUF_SIZE 2000 -#define KS_CCR 0x08 -#define CCR_EEPROM (1 << 9) -#define CCR_SPI (1 << 8) -#define CCR_8BIT (1 << 7) -#define CCR_16BIT (1 << 6) -#define CCR_32BIT (1 << 5) -#define CCR_SHARED (1 << 4) -#define CCR_32PIN (1 << 0) - -/* MAC address registers */ -#define KS_MARL 0x10 -#define KS_MARM 0x12 -#define KS_MARH 0x14 - -#define KS_OBCR 0x20 -#define OBCR_ODS_16MA (1 << 6) - -#define KS_EEPCR 0x22 -#define EEPCR_EESA (1 << 4) -#define EEPCR_EESB (1 << 3) -#define EEPCR_EEDO (1 << 2) -#define EEPCR_EESCK (1 << 1) -#define EEPCR_EECS (1 << 0) - -#define KS_MBIR 0x24 -#define MBIR_TXMBF (1 << 12) -#define MBIR_TXMBFA (1 << 11) -#define MBIR_RXMBF (1 << 4) -#define MBIR_RXMBFA (1 << 3) - -#define KS_GRR 0x26 -#define GRR_QMU (1 << 1) -#define GRR_GSR (1 << 0) - -#define KS_WFCR 0x2A -#define WFCR_MPRXE (1 << 7) -#define WFCR_WF3E (1 << 3) -#define WFCR_WF2E (1 << 2) -#define WFCR_WF1E (1 << 1) -#define WFCR_WF0E (1 << 0) - -#define KS_WF0CRC0 0x30 -#define KS_WF0CRC1 0x32 -#define KS_WF0BM0 0x34 -#define KS_WF0BM1 0x36 -#define KS_WF0BM2 0x38 -#define KS_WF0BM3 0x3A - -#define KS_WF1CRC0 0x40 -#define KS_WF1CRC1 0x42 -#define KS_WF1BM0 0x44 -#define KS_WF1BM1 0x46 -#define KS_WF1BM2 0x48 -#define KS_WF1BM3 0x4A - -#define KS_WF2CRC0 0x50 -#define KS_WF2CRC1 0x52 -#define KS_WF2BM0 0x54 -#define KS_WF2BM1 0x56 -#define KS_WF2BM2 0x58 -#define KS_WF2BM3 0x5A - -#define KS_WF3CRC0 0x60 -#define KS_WF3CRC1 0x62 -#define KS_WF3BM0 0x64 -#define KS_WF3BM1 0x66 -#define KS_WF3BM2 0x68 -#define KS_WF3BM3 0x6A - -#define KS_TXCR 0x70 -#define TXCR_TCGICMP (1 << 8) -#define TXCR_TCGUDP (1 << 7) -#define TXCR_TCGTCP (1 << 6) -#define TXCR_TCGIP (1 << 5) -#define TXCR_FTXQ (1 << 4) -#define TXCR_TXFCE (1 << 3) -#define TXCR_TXPE (1 << 2) -#define TXCR_TXCRC (1 << 1) -#define TXCR_TXE (1 << 0) - -#define KS_TXSR 0x72 -#define TXSR_TXLC (1 << 13) -#define TXSR_TXMC (1 << 12) -#define TXSR_TXFID_MASK (0x3f << 0) -#define TXSR_TXFID_SHIFT (0) -#define TXSR_TXFID_GET(_v) (((_v) >> 0) & 0x3f) - - -#define KS_RXCR1 0x74 -#define RXCR1_FRXQ (1 << 15) -#define RXCR1_RXUDPFCC (1 << 14) -#define RXCR1_RXTCPFCC (1 << 13) -#define RXCR1_RXIPFCC (1 << 12) -#define RXCR1_RXPAFMA (1 << 11) -#define RXCR1_RXFCE (1 << 10) -#define RXCR1_RXEFE (1 << 9) -#define RXCR1_RXMAFMA (1 << 8) -#define RXCR1_RXBE (1 << 7) -#define RXCR1_RXME (1 << 6) -#define RXCR1_RXUE (1 << 5) -#define RXCR1_RXAE (1 << 4) -#define RXCR1_RXINVF (1 << 1) -#define RXCR1_RXE (1 << 0) #define RXCR1_FILTER_MASK (RXCR1_RXINVF | RXCR1_RXAE | \ RXCR1_RXMAFMA | RXCR1_RXPAFMA) - -#define KS_RXCR2 0x76 -#define RXCR2_SRDBL_MASK (0x7 << 5) -#define RXCR2_SRDBL_SHIFT (5) -#define RXCR2_SRDBL_4B (0x0 << 5) -#define RXCR2_SRDBL_8B (0x1 << 5) -#define RXCR2_SRDBL_16B (0x2 << 5) -#define RXCR2_SRDBL_32B (0x3 << 5) -/* #define RXCR2_SRDBL_FRAME (0x4 << 5) */ -#define RXCR2_IUFFP (1 << 4) -#define RXCR2_RXIUFCEZ (1 << 3) -#define RXCR2_UDPLFE (1 << 2) -#define RXCR2_RXICMPFCC (1 << 1) -#define RXCR2_RXSAF (1 << 0) - -#define KS_TXMIR 0x78 - -#define KS_RXFHSR 0x7C -#define RXFSHR_RXFV (1 << 15) -#define RXFSHR_RXICMPFCS (1 << 13) -#define RXFSHR_RXIPFCS (1 << 12) -#define RXFSHR_RXTCPFCS (1 << 11) -#define RXFSHR_RXUDPFCS (1 << 10) -#define RXFSHR_RXBF (1 << 7) -#define RXFSHR_RXMF (1 << 6) -#define RXFSHR_RXUF (1 << 5) -#define RXFSHR_RXMR (1 << 4) -#define RXFSHR_RXFT (1 << 3) -#define RXFSHR_RXFTL (1 << 2) -#define RXFSHR_RXRF (1 << 1) -#define RXFSHR_RXCE (1 << 0) -#define RXFSHR_ERR (RXFSHR_RXCE | RXFSHR_RXRF |\ - RXFSHR_RXFTL | RXFSHR_RXMR |\ - RXFSHR_RXICMPFCS | RXFSHR_RXIPFCS |\ - RXFSHR_RXTCPFCS) -#define KS_RXFHBCR 0x7E -#define RXFHBCR_CNT_MASK 0x0FFF - -#define KS_TXQCR 0x80 -#define TXQCR_AETFE (1 << 2) -#define TXQCR_TXQMAM (1 << 1) -#define TXQCR_METFE (1 << 0) - -#define KS_RXQCR 0x82 -#define RXQCR_RXDTTS (1 << 12) -#define RXQCR_RXDBCTS (1 << 11) -#define RXQCR_RXFCTS (1 << 10) -#define RXQCR_RXIPHTOE (1 << 9) -#define RXQCR_RXDTTE (1 << 7) -#define RXQCR_RXDBCTE (1 << 6) -#define RXQCR_RXFCTE (1 << 5) -#define RXQCR_ADRFE (1 << 4) -#define RXQCR_SDA (1 << 3) -#define RXQCR_RRXEF (1 << 0) #define RXQCR_CMD_CNTL (RXQCR_RXFCTE|RXQCR_ADRFE) -#define KS_TXFDPR 0x84 -#define TXFDPR_TXFPAI (1 << 14) -#define TXFDPR_TXFP_MASK (0x7ff << 0) -#define TXFDPR_TXFP_SHIFT (0) - -#define KS_RXFDPR 0x86 -#define RXFDPR_RXFPAI (1 << 14) - -#define KS_RXDTTR 0x8C -#define KS_RXDBCTR 0x8E - -#define KS_IER 0x90 -#define KS_ISR 0x92 -#define IRQ_LCI (1 << 15) -#define IRQ_TXI (1 << 14) -#define IRQ_RXI (1 << 13) -#define IRQ_RXOI (1 << 11) -#define IRQ_TXPSI (1 << 9) -#define IRQ_RXPSI (1 << 8) -#define IRQ_TXSAI (1 << 6) -#define IRQ_RXWFDI (1 << 5) -#define IRQ_RXMPDI (1 << 4) -#define IRQ_LDI (1 << 3) -#define IRQ_EDI (1 << 2) -#define IRQ_SPIBEI (1 << 1) -#define IRQ_DEDI (1 << 0) - -#define KS_RXFCTR 0x9C -#define RXFCTR_THRESHOLD_MASK 0x00FF - -#define KS_RXFC 0x9D -#define RXFCTR_RXFC_MASK (0xff << 8) -#define RXFCTR_RXFC_SHIFT (8) -#define RXFCTR_RXFC_GET(_v) (((_v) >> 8) & 0xff) -#define RXFCTR_RXFCT_MASK (0xff << 0) -#define RXFCTR_RXFCT_SHIFT (0) - -#define KS_TXNTFSR 0x9E - -#define KS_MAHTR0 0xA0 -#define KS_MAHTR1 0xA2 -#define KS_MAHTR2 0xA4 -#define KS_MAHTR3 0xA6 - -#define KS_FCLWR 0xB0 -#define KS_FCHWR 0xB2 -#define KS_FCOWR 0xB4 - -#define KS_CIDER 0xC0 -#define CIDER_ID 0x8870 -#define CIDER_REV_MASK (0x7 << 1) -#define CIDER_REV_SHIFT (1) -#define CIDER_REV_GET(_v) (((_v) >> 1) & 0x7) - -#define KS_CGCR 0xC6 -#define KS_IACR 0xC8 -#define IACR_RDEN (1 << 12) -#define IACR_TSEL_MASK (0x3 << 10) -#define IACR_TSEL_SHIFT (10) -#define IACR_TSEL_MIB (0x3 << 10) -#define IACR_ADDR_MASK (0x1f << 0) -#define IACR_ADDR_SHIFT (0) - -#define KS_IADLR 0xD0 -#define KS_IAHDR 0xD2 - -#define KS_PMECR 0xD4 -#define PMECR_PME_DELAY (1 << 14) -#define PMECR_PME_POL (1 << 12) -#define PMECR_WOL_WAKEUP (1 << 11) -#define PMECR_WOL_MAGICPKT (1 << 10) -#define PMECR_WOL_LINKUP (1 << 9) -#define PMECR_WOL_ENERGY (1 << 8) -#define PMECR_AUTO_WAKE_EN (1 << 7) -#define PMECR_WAKEUP_NORMAL (1 << 6) -#define PMECR_WKEVT_MASK (0xf << 2) -#define PMECR_WKEVT_SHIFT (2) -#define PMECR_WKEVT_GET(_v) (((_v) >> 2) & 0xf) -#define PMECR_WKEVT_ENERGY (0x1 << 2) -#define PMECR_WKEVT_LINK (0x2 << 2) -#define PMECR_WKEVT_MAGICPKT (0x4 << 2) -#define PMECR_WKEVT_FRAME (0x8 << 2) -#define PMECR_PM_MASK (0x3 << 0) -#define PMECR_PM_SHIFT (0) -#define PMECR_PM_NORMAL (0x0 << 0) -#define PMECR_PM_ENERGY (0x1 << 0) -#define PMECR_PM_SOFTDOWN (0x2 << 0) -#define PMECR_PM_POWERSAVE (0x3 << 0) - -/* Standard MII PHY data */ -#define KS_P1MBCR 0xE4 -#define P1MBCR_FORCE_FDX (1 << 8) - -#define KS_P1MBSR 0xE6 -#define P1MBSR_AN_COMPLETE (1 << 5) -#define P1MBSR_AN_CAPABLE (1 << 3) -#define P1MBSR_LINK_UP (1 << 2) - -#define KS_PHY1ILR 0xE8 -#define KS_PHY1IHR 0xEA -#define KS_P1ANAR 0xEC -#define KS_P1ANLPR 0xEE - -#define KS_P1SCLMD 0xF4 -#define P1SCLMD_LEDOFF (1 << 15) -#define P1SCLMD_TXIDS (1 << 14) -#define P1SCLMD_RESTARTAN (1 << 13) -#define P1SCLMD_DISAUTOMDIX (1 << 10) -#define P1SCLMD_FORCEMDIX (1 << 9) -#define P1SCLMD_AUTONEGEN (1 << 7) -#define P1SCLMD_FORCE100 (1 << 6) -#define P1SCLMD_FORCEFDX (1 << 5) -#define P1SCLMD_ADV_FLOW (1 << 4) -#define P1SCLMD_ADV_100BT_FDX (1 << 3) -#define P1SCLMD_ADV_100BT_HDX (1 << 2) -#define P1SCLMD_ADV_10BT_FDX (1 << 1) -#define P1SCLMD_ADV_10BT_HDX (1 << 0) - -#define KS_P1CR 0xF6 -#define P1CR_HP_MDIX (1 << 15) -#define P1CR_REV_POL (1 << 13) -#define P1CR_OP_100M (1 << 10) -#define P1CR_OP_FDX (1 << 9) -#define P1CR_OP_MDI (1 << 7) -#define P1CR_AN_DONE (1 << 6) -#define P1CR_LINK_GOOD (1 << 5) -#define P1CR_PNTR_FLOW (1 << 4) -#define P1CR_PNTR_100BT_FDX (1 << 3) -#define P1CR_PNTR_100BT_HDX (1 << 2) -#define P1CR_PNTR_10BT_FDX (1 << 1) -#define P1CR_PNTR_10BT_HDX (1 << 0) - -/* TX Frame control */ - -#define TXFR_TXIC (1 << 15) -#define TXFR_TXFID_MASK (0x3f << 0) -#define TXFR_TXFID_SHIFT (0) - -#define KS_P1SR 0xF8 -#define P1SR_HP_MDIX (1 << 15) -#define P1SR_REV_POL (1 << 13) -#define P1SR_OP_100M (1 << 10) -#define P1SR_OP_FDX (1 << 9) -#define P1SR_OP_MDI (1 << 7) -#define P1SR_AN_DONE (1 << 6) -#define P1SR_LINK_GOOD (1 << 5) -#define P1SR_PNTR_FLOW (1 << 4) -#define P1SR_PNTR_100BT_FDX (1 << 3) -#define P1SR_PNTR_100BT_HDX (1 << 2) -#define P1SR_PNTR_10BT_FDX (1 << 1) -#define P1SR_PNTR_10BT_HDX (1 << 0) - #define ENUM_BUS_NONE 0 #define ENUM_BUS_8BIT 1 #define ENUM_BUS_16BIT 2 @@ -1475,7 +1168,7 @@ static void ks_setup(struct ks_net *ks) ks_wrreg16(ks, KS_RXFDPR, RXFDPR_RXFPAI); /* Setup Receive Frame Threshold - 1 frame (RXFCTFC) */ - ks_wrreg16(ks, KS_RXFCTR, 1 & RXFCTR_THRESHOLD_MASK); + ks_wrreg16(ks, KS_RXFCTR, 1 & RXFCTR_RXFCT_MASK); /* Setup RxQ Command Control (RXQCR) */ ks->rc_rxqcr = RXQCR_CMD_CNTL; @@ -1488,7 +1181,7 @@ static void ks_setup(struct ks_net *ks) */ w = ks_rdreg16(ks, KS_P1MBCR); - w &= ~P1MBCR_FORCE_FDX; + w &= ~BMCR_FULLDPLX; ks_wrreg16(ks, KS_P1MBCR, w); w = TXCR_TXFCE | TXCR_TXPE | TXCR_TXCRC | TXCR_TCGIP; @@ -1629,7 +1322,7 @@ static int ks8851_probe(struct platform_device *pdev) ks_setup_int(ks); data = ks_rdreg16(ks, KS_OBCR); - ks_wrreg16(ks, KS_OBCR, data | OBCR_ODS_16MA); + ks_wrreg16(ks, KS_OBCR, data | OBCR_ODS_16mA); /* overwriting the default MAC address */ if (pdev->dev.of_node) { From 64447506f152cf0f88a0fc23140ca1c5f7ff34a8 Mon Sep 17 00:00:00 2001 From: Ioana Ciocoi Radulescu Date: Wed, 20 Mar 2019 14:11:04 +0000 Subject: [PATCH 203/468] dpaa2-eth: Fix possible access beyond end of array Make sure we don't try to enqueue XDP_REDIRECT frames to an inexistent FQ. While it is guaranteed not to have more than one queue per core, having fewer queues than CPUs on an interface is a valid configuration. Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support") Reported-by: Jesper Dangaard Brouer Signed-off-by: Ioana Radulescu Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 2ba49e959c3f..1a68052abb94 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -1817,7 +1817,7 @@ static int dpaa2_eth_xdp_xmit_frame(struct net_device *net_dev, dpaa2_fd_set_format(&fd, dpaa2_fd_single); dpaa2_fd_set_ctrl(&fd, FD_CTRL_PTA); - fq = &priv->fq[smp_processor_id()]; + fq = &priv->fq[smp_processor_id() % dpaa2_eth_queue_count(priv)]; for (i = 0; i < DPAA2_ETH_ENQUEUE_RETRIES; i++) { err = priv->enqueue(priv, fq, &fd, 0); if (err != -EBUSY) From 688931a5ad4e55ba0c215248ba510cd67bc3afb4 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 19 Mar 2019 13:02:36 +0900 Subject: [PATCH 204/468] kbuild: skip sub-make for in-tree build with GNU Make 4.x Commit 2b50f7ab6368 ("kbuild: add workaround for Debian make-kpkg") annoyed people who want to wrap the top Makefile with GNUmakefile to customize it for their use. On second thought, we do not need to run the sub-make for in-tree build with Make 4.x because the 'MAKEFLAGS += -rR' issue only happens on GNU Make 3.x. With this commit, people will get back their workflow, and the Debian make-kpkg will still work. Fixes: 2b50f7ab6368 ("kbuild: add workaround for Debian make-kpkg") Reported-by: Andreas Schwab Reported-by: David Howells Signed-off-by: Masahiro Yamada Tested-by: Andreas Schwab Tested-by: David Howells --- Makefile | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/Makefile b/Makefile index 8c6acfdfe660..6c0635f69982 100644 --- a/Makefile +++ b/Makefile @@ -31,26 +31,12 @@ _all: # descending is started. They are now explicitly listed as the # prepare rule. -# Ugly workaround for Debian make-kpkg: -# make-kpkg directly includes the top Makefile of Linux kernel. In such a case, -# skip sub-make to support debian_* targets in ruleset/kernel_version.mk, but -# displays warning to discourage such abusage. -ifneq ($(word 2, $(MAKEFILE_LIST)),) -$(warning Do not include top Makefile of Linux Kernel) -sub-make-done := 1 -MAKEFLAGS += -rR -endif - ifneq ($(sub-make-done),1) # Do not use make's built-in rules and variables # (this increases performance and avoids hard-to-debug behaviour) MAKEFLAGS += -rR -# 'MAKEFLAGS += -rR' does not become immediately effective for old -# GNU Make versions. Cancel implicit rules for this Makefile. -$(lastword $(MAKEFILE_LIST)): ; - # Avoid funny character set dependencies unexport LC_ALL LC_COLLATE=C @@ -153,6 +139,7 @@ $(if $(KBUILD_OUTPUT),, \ # 'sub-make' below. MAKEFLAGS += --include-dir=$(CURDIR) +need-sub-make := 1 else # Do not print "Entering directory ..." at all for in-tree build. @@ -160,6 +147,16 @@ MAKEFLAGS += --no-print-directory endif # ifneq ($(KBUILD_OUTPUT),) +ifneq ($(filter 3.%,$(MAKE_VERSION)),) +# 'MAKEFLAGS += -rR' does not immediately become effective for GNU Make 3.x +# We need to invoke sub-make to avoid implicit rules in the top Makefile. +need-sub-make := 1 +# Cancel implicit rules for this Makefile. +$(lastword $(MAKEFILE_LIST)): ; +endif + +ifeq ($(need-sub-make),1) + PHONY += $(MAKECMDGOALS) sub-make $(filter-out _all sub-make $(CURDIR)/Makefile, $(MAKECMDGOALS)) _all: sub-make @@ -171,8 +168,11 @@ sub-make: $(if $(KBUILD_OUTPUT),-C $(KBUILD_OUTPUT) KBUILD_SRC=$(CURDIR)) \ -f $(CURDIR)/Makefile $(filter-out _all sub-make,$(MAKECMDGOALS)) -else # sub-make-done +endif # need-sub-make +endif # sub-make-done + # We process the rest of the Makefile if this is the final invocation of make +ifeq ($(need-sub-make),) # Do not print "Entering directory ...", # but we want to display it when entering to the output directory @@ -1757,7 +1757,7 @@ existing-targets := $(wildcard $(sort $(targets))) endif # ifeq ($(config-targets),1) endif # ifeq ($(mixed-targets),1) -endif # sub-make-done +endif # need-sub-make PHONY += FORCE FORCE: From cba368c1f01c27ed62fca7a853531845d263bb01 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Mon, 18 Mar 2019 10:37:13 -0700 Subject: [PATCH 205/468] bpf: Only print ref_obj_id for refcounted reg Naresh reported that test_align fails because of the mismatch at the verbose printout of the register states. The reason is due to the newly added ref_obj_id. ref_obj_id is only useful for refcounted reg. Thus, this patch fixes it by only printing ref_obj_id for refcounted reg. While at it, it also uses comma instead of space to separate between "id" and "ref_obj_id". Fixes: 1b986589680a ("bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_release") Reported-by: Naresh Kamboju Signed-off-by: Martin KaFai Lau Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 86f9cd5d1c4e..5aa810882583 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -352,6 +352,14 @@ static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) map_value_has_spin_lock(reg->map_ptr); } +static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type) +{ + return type == PTR_TO_SOCKET || + type == PTR_TO_SOCKET_OR_NULL || + type == PTR_TO_TCP_SOCK || + type == PTR_TO_TCP_SOCK_OR_NULL; +} + static bool arg_type_may_be_refcounted(enum bpf_arg_type type) { return type == ARG_PTR_TO_SOCK_COMMON; @@ -451,8 +459,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, if (t == PTR_TO_STACK) verbose(env, ",call_%d", func(env, reg)->callsite); } else { - verbose(env, "(id=%d ref_obj_id=%d", reg->id, - reg->ref_obj_id); + verbose(env, "(id=%d", reg->id); + if (reg_type_may_be_refcounted_or_null(t)) + verbose(env, ",ref_obj_id=%d", reg->ref_obj_id); if (t != SCALAR_VALUE) verbose(env, ",off=%d", reg->off); if (type_is_pkt_pointer(t)) From 3123be11683ed8c2f26f787df81966b538ca9f72 Mon Sep 17 00:00:00 2001 From: Nishad Kamdar Date: Mon, 11 Mar 2019 19:57:04 +0530 Subject: [PATCH 206/468] ARM: dts: imx6ull: Use the correct style for SPDX License Identifier This patch corrects the SPDX License Identifier style in imx6ull-pinfunc-snvs.h. Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46 and making some manual changes. Suggested-by: Joe Perches Signed-off-by: Nishad Kamdar Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6ull-pinfunc-snvs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx6ull-pinfunc-snvs.h b/arch/arm/boot/dts/imx6ull-pinfunc-snvs.h index f6fb6783c193..54cfe72295aa 100644 --- a/arch/arm/boot/dts/imx6ull-pinfunc-snvs.h +++ b/arch/arm/boot/dts/imx6ull-pinfunc-snvs.h @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: GPL-2.0 +/* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2016 Freescale Semiconductor, Inc. * Copyright (C) 2017 NXP From 5997da82145bb7c9a56d834894cb81f81f219344 Mon Sep 17 00:00:00 2001 From: Todd Kjos Date: Wed, 20 Mar 2019 15:35:45 -0700 Subject: [PATCH 207/468] binder: fix BUG_ON found by selinux-testsuite The selinux-testsuite found an issue resulting in a BUG_ON() where a conditional relied on a size_t going negative when checking the validity of a buffer offset. Fixes: 7a67a39320df ("binder: add function to copy binder object from buffer") Reported-by: Paul Moore Tested-by: Paul Moore Signed-off-by: Todd Kjos Signed-off-by: Greg Kroah-Hartman --- drivers/android/binder.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 8685882da64c..4b9c7ca492e6 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -2057,7 +2057,8 @@ static size_t binder_get_object(struct binder_proc *proc, size_t object_size = 0; read_size = min_t(size_t, sizeof(*object), buffer->data_size - offset); - if (read_size < sizeof(*hdr) || !IS_ALIGNED(offset, sizeof(u32))) + if (offset > buffer->data_size || read_size < sizeof(*hdr) || + !IS_ALIGNED(offset, sizeof(u32))) return 0; binder_alloc_copy_from_buffer(&proc->alloc, object, buffer, offset, read_size); From 5cec2d2e5839f9c0fec319c523a911e0a7fd299f Mon Sep 17 00:00:00 2001 From: Todd Kjos Date: Fri, 1 Mar 2019 15:06:06 -0800 Subject: [PATCH 208/468] binder: fix race between munmap() and direct reclaim An munmap() on a binder device causes binder_vma_close() to be called which clears the alloc->vma pointer. If direct reclaim causes binder_alloc_free_page() to be called, there is a race where alloc->vma is read into a local vma pointer and then used later after the mm->mmap_sem is acquired. This can result in calling zap_page_range() with an invalid vma which manifests as a use-after-free in zap_page_range(). The fix is to check alloc->vma after acquiring the mmap_sem (which we were acquiring anyway) and skip zap_page_range() if it has changed to NULL. Signed-off-by: Todd Kjos Reviewed-by: Joel Fernandes (Google) Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/android/binder_alloc.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 6389467670a0..195f120c4e8c 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -927,14 +927,13 @@ enum lru_status binder_alloc_free_page(struct list_head *item, index = page - alloc->pages; page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE; + + mm = alloc->vma_vm_mm; + if (!mmget_not_zero(mm)) + goto err_mmget; + if (!down_write_trylock(&mm->mmap_sem)) + goto err_down_write_mmap_sem_failed; vma = binder_alloc_get_vma(alloc); - if (vma) { - if (!mmget_not_zero(alloc->vma_vm_mm)) - goto err_mmget; - mm = alloc->vma_vm_mm; - if (!down_read_trylock(&mm->mmap_sem)) - goto err_down_write_mmap_sem_failed; - } list_lru_isolate(lru, item); spin_unlock(lock); @@ -945,10 +944,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item, zap_page_range(vma, page_addr, PAGE_SIZE); trace_binder_unmap_user_end(alloc, index); - - up_read(&mm->mmap_sem); - mmput(mm); } + up_write(&mm->mmap_sem); + mmput(mm); trace_binder_unmap_kernel_start(alloc, index); From 7671ce0d92933762f469266daf43bd34d422d58c Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Wed, 20 Mar 2019 12:21:35 -0500 Subject: [PATCH 209/468] staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc hwxmits is allocated via kcalloc and not checked for failure before its dereference. The patch fixes this problem by returning error upstream in rtl8723bs, rtl8188eu. Signed-off-by: Aditya Pakki Acked-by: Mukesh Ojha Reviewed-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_xmit.c | 9 +++++++-- drivers/staging/rtl8188eu/include/rtw_xmit.h | 2 +- drivers/staging/rtl8723bs/core/rtw_xmit.c | 14 +++++++------- drivers/staging/rtl8723bs/include/rtw_xmit.h | 2 +- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c b/drivers/staging/rtl8188eu/core/rtw_xmit.c index 1723a47a96b4..952f2ab51347 100644 --- a/drivers/staging/rtl8188eu/core/rtw_xmit.c +++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c @@ -174,7 +174,9 @@ s32 _rtw_init_xmit_priv(struct xmit_priv *pxmitpriv, struct adapter *padapter) pxmitpriv->free_xmit_extbuf_cnt = num_xmit_extbuf; - rtw_alloc_hwxmits(padapter); + res = rtw_alloc_hwxmits(padapter); + if (res == _FAIL) + goto exit; rtw_init_hwxmits(pxmitpriv->hwxmits, pxmitpriv->hwxmit_entry); for (i = 0; i < 4; i++) @@ -1503,7 +1505,7 @@ s32 rtw_xmit_classifier(struct adapter *padapter, struct xmit_frame *pxmitframe) return res; } -void rtw_alloc_hwxmits(struct adapter *padapter) +s32 rtw_alloc_hwxmits(struct adapter *padapter) { struct hw_xmit *hwxmits; struct xmit_priv *pxmitpriv = &padapter->xmitpriv; @@ -1512,6 +1514,8 @@ void rtw_alloc_hwxmits(struct adapter *padapter) pxmitpriv->hwxmits = kcalloc(pxmitpriv->hwxmit_entry, sizeof(struct hw_xmit), GFP_KERNEL); + if (!pxmitpriv->hwxmits) + return _FAIL; hwxmits = pxmitpriv->hwxmits; @@ -1519,6 +1523,7 @@ void rtw_alloc_hwxmits(struct adapter *padapter) hwxmits[1] .sta_queue = &pxmitpriv->vi_pending; hwxmits[2] .sta_queue = &pxmitpriv->be_pending; hwxmits[3] .sta_queue = &pxmitpriv->bk_pending; + return _SUCCESS; } void rtw_free_hwxmits(struct adapter *padapter) diff --git a/drivers/staging/rtl8188eu/include/rtw_xmit.h b/drivers/staging/rtl8188eu/include/rtw_xmit.h index 788f59c74ea1..ba7e15fbde72 100644 --- a/drivers/staging/rtl8188eu/include/rtw_xmit.h +++ b/drivers/staging/rtl8188eu/include/rtw_xmit.h @@ -336,7 +336,7 @@ s32 rtw_txframes_sta_ac_pending(struct adapter *padapter, void rtw_init_hwxmits(struct hw_xmit *phwxmit, int entry); s32 _rtw_init_xmit_priv(struct xmit_priv *pxmitpriv, struct adapter *padapter); void _rtw_free_xmit_priv(struct xmit_priv *pxmitpriv); -void rtw_alloc_hwxmits(struct adapter *padapter); +s32 rtw_alloc_hwxmits(struct adapter *padapter); void rtw_free_hwxmits(struct adapter *padapter); s32 rtw_xmit(struct adapter *padapter, struct sk_buff **pkt); diff --git a/drivers/staging/rtl8723bs/core/rtw_xmit.c b/drivers/staging/rtl8723bs/core/rtw_xmit.c index 094d61bcb469..b87f13a0b563 100644 --- a/drivers/staging/rtl8723bs/core/rtw_xmit.c +++ b/drivers/staging/rtl8723bs/core/rtw_xmit.c @@ -260,7 +260,9 @@ s32 _rtw_init_xmit_priv(struct xmit_priv *pxmitpriv, struct adapter *padapter) } } - rtw_alloc_hwxmits(padapter); + res = rtw_alloc_hwxmits(padapter); + if (res == _FAIL) + goto exit; rtw_init_hwxmits(pxmitpriv->hwxmits, pxmitpriv->hwxmit_entry); for (i = 0; i < 4; i++) { @@ -2144,7 +2146,7 @@ s32 rtw_xmit_classifier(struct adapter *padapter, struct xmit_frame *pxmitframe) return res; } -void rtw_alloc_hwxmits(struct adapter *padapter) +s32 rtw_alloc_hwxmits(struct adapter *padapter) { struct hw_xmit *hwxmits; struct xmit_priv *pxmitpriv = &padapter->xmitpriv; @@ -2155,10 +2157,8 @@ void rtw_alloc_hwxmits(struct adapter *padapter) pxmitpriv->hwxmits = rtw_zmalloc(sizeof(struct hw_xmit) * pxmitpriv->hwxmit_entry); - if (pxmitpriv->hwxmits == NULL) { - DBG_871X("alloc hwxmits fail!...\n"); - return; - } + if (!pxmitpriv->hwxmits) + return _FAIL; hwxmits = pxmitpriv->hwxmits; @@ -2204,7 +2204,7 @@ void rtw_alloc_hwxmits(struct adapter *padapter) } - + return _SUCCESS; } void rtw_free_hwxmits(struct adapter *padapter) diff --git a/drivers/staging/rtl8723bs/include/rtw_xmit.h b/drivers/staging/rtl8723bs/include/rtw_xmit.h index 1b38b9182b31..37f42b2f22f1 100644 --- a/drivers/staging/rtl8723bs/include/rtw_xmit.h +++ b/drivers/staging/rtl8723bs/include/rtw_xmit.h @@ -487,7 +487,7 @@ s32 _rtw_init_xmit_priv(struct xmit_priv *pxmitpriv, struct adapter *padapter); void _rtw_free_xmit_priv (struct xmit_priv *pxmitpriv); -void rtw_alloc_hwxmits(struct adapter *padapter); +s32 rtw_alloc_hwxmits(struct adapter *padapter); void rtw_free_hwxmits(struct adapter *padapter); From d70d70aec9632679dd00dcc1b1e8b2517e2c7da0 Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Wed, 20 Mar 2019 12:02:49 -0500 Subject: [PATCH 210/468] staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference skb allocated via dev_alloc_skb can fail and return a NULL pointer. This patch avoids such a scenario and returns, consistent with other invocations. Signed-off-by: Aditya Pakki Reviewed-by: Mukesh Ojha Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtlwifi/rtl8822be/fw.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/staging/rtlwifi/rtl8822be/fw.c b/drivers/staging/rtlwifi/rtl8822be/fw.c index f061dd1382aa..cf6b7a80b753 100644 --- a/drivers/staging/rtlwifi/rtl8822be/fw.c +++ b/drivers/staging/rtlwifi/rtl8822be/fw.c @@ -743,6 +743,8 @@ void rtl8822be_set_fw_rsvdpagepkt(struct ieee80211_hw *hw, bool b_dl_finished) u1_rsvd_page_loc, 3); skb = dev_alloc_skb(totalpacketlen); + if (!skb) + return; memcpy((u8 *)skb_put(skb, totalpacketlen), &reserved_page_packet, totalpacketlen); From 22c971db7dd4b0ad8dd88e99c407f7a1f4231a2e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 21 Mar 2019 09:26:38 +0300 Subject: [PATCH 211/468] staging: rtl8712: uninitialized memory in read_bbreg_hdl() Colin King reported a bug in read_bbreg_hdl(): memcpy(pcmd->rsp, (u8 *)&val, pcmd->rspsz); The problem is that "val" is uninitialized. This code is obviously not useful, but so far as I can tell "pcmd->cmdcode" is never GEN_CMD_CODE(_Read_BBREG) so it's not harmful either. For now the easiest fix is to just call r8712_free_cmd_obj() and return. Fixes: 2865d42c78a9 ("staging: r8712u: Add the new driver to the mainline kernel") Reported-by: Colin Ian King Signed-off-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8712/rtl8712_cmd.c | 10 +--------- drivers/staging/rtl8712/rtl8712_cmd.h | 2 +- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/staging/rtl8712/rtl8712_cmd.c b/drivers/staging/rtl8712/rtl8712_cmd.c index 1920d02f7c9f..8c36acedf507 100644 --- a/drivers/staging/rtl8712/rtl8712_cmd.c +++ b/drivers/staging/rtl8712/rtl8712_cmd.c @@ -147,17 +147,9 @@ static u8 write_macreg_hdl(struct _adapter *padapter, u8 *pbuf) static u8 read_bbreg_hdl(struct _adapter *padapter, u8 *pbuf) { - u32 val; - void (*pcmd_callback)(struct _adapter *dev, struct cmd_obj *pcmd); struct cmd_obj *pcmd = (struct cmd_obj *)pbuf; - if (pcmd->rsp && pcmd->rspsz > 0) - memcpy(pcmd->rsp, (u8 *)&val, pcmd->rspsz); - pcmd_callback = cmd_callback[pcmd->cmdcode].callback; - if (!pcmd_callback) - r8712_free_cmd_obj(pcmd); - else - pcmd_callback(padapter, pcmd); + r8712_free_cmd_obj(pcmd); return H2C_SUCCESS; } diff --git a/drivers/staging/rtl8712/rtl8712_cmd.h b/drivers/staging/rtl8712/rtl8712_cmd.h index 92fb77666d44..1ef86b8c592f 100644 --- a/drivers/staging/rtl8712/rtl8712_cmd.h +++ b/drivers/staging/rtl8712/rtl8712_cmd.h @@ -140,7 +140,7 @@ enum rtl8712_h2c_cmd { static struct _cmd_callback cmd_callback[] = { {GEN_CMD_CODE(_Read_MACREG), NULL}, /*0*/ {GEN_CMD_CODE(_Write_MACREG), NULL}, - {GEN_CMD_CODE(_Read_BBREG), &r8712_getbbrfreg_cmdrsp_callback}, + {GEN_CMD_CODE(_Read_BBREG), NULL}, {GEN_CMD_CODE(_Write_BBREG), NULL}, {GEN_CMD_CODE(_Read_RFREG), &r8712_getbbrfreg_cmdrsp_callback}, {GEN_CMD_CODE(_Write_RFREG), NULL}, /*5*/ From 6a8ca24590a2136921439b376c926c11a6effc0e Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Wed, 20 Mar 2019 10:42:32 -0500 Subject: [PATCH 212/468] staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc phydm.internal is allocated using kzalloc which is used multiple times without a check for NULL pointer. This patch avoids such a scenario by returning 0, consistent with the failure case. Signed-off-by: Aditya Pakki Reviewed-by: Mukesh Ojha Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtlwifi/phydm/rtl_phydm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/staging/rtlwifi/phydm/rtl_phydm.c b/drivers/staging/rtlwifi/phydm/rtl_phydm.c index 9930ed954abb..4cc77b2016e1 100644 --- a/drivers/staging/rtlwifi/phydm/rtl_phydm.c +++ b/drivers/staging/rtlwifi/phydm/rtl_phydm.c @@ -180,6 +180,8 @@ static int rtl_phydm_init_priv(struct rtl_priv *rtlpriv, rtlpriv->phydm.internal = kzalloc(sizeof(struct phy_dm_struct), GFP_KERNEL); + if (!rtlpriv->phydm.internal) + return 0; _rtl_phydm_init_com_info(rtlpriv, ic, params); From 0803278b0b4d8eeb2b461fb698785df65a725d9e Mon Sep 17 00:00:00 2001 From: Xu Yu Date: Thu, 21 Mar 2019 18:00:35 +0800 Subject: [PATCH 213/468] bpf: do not restore dst_reg when cur_state is freed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug. Call trace: dump_stack+0xbf/0x12e print_address_description+0x6a/0x280 kasan_report+0x237/0x360 sanitize_ptr_alu+0x85a/0x8d0 adjust_ptr_min_max_vals+0x8f2/0x1ca0 adjust_reg_min_max_vals+0x8ed/0x22e0 do_check+0x1ca6/0x5d00 bpf_check+0x9ca/0x2570 bpf_prog_load+0xc91/0x1030 __se_sys_bpf+0x61e/0x1f00 do_syscall_64+0xc8/0x550 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fault injection trace:  kfree+0xea/0x290  free_func_state+0x4a/0x60  free_verifier_state+0x61/0xe0  push_stack+0x216/0x2f0 <- inject failslab  sanitize_ptr_alu+0x2b1/0x8d0  adjust_ptr_min_max_vals+0x8f2/0x1ca0  adjust_reg_min_max_vals+0x8ed/0x22e0  do_check+0x1ca6/0x5d00  bpf_check+0x9ca/0x2570  bpf_prog_load+0xc91/0x1030  __se_sys_bpf+0x61e/0x1f00  do_syscall_64+0xc8/0x550  entry_SYSCALL_64_after_hwframe+0x49/0xbe When kzalloc() fails in push_stack(), free_verifier_state() will free current verifier state. As push_stack() returns, dst_reg was restored if ptr_is_dst_reg is false. However, as member of the cur_state, dst_reg is also freed, and error occurs when dereferencing dst_reg. Simply fix it by testing ret of push_stack() before restoring dst_reg. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Xu Yu Signed-off-by: Daniel Borkmann --- kernel/bpf/verifier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5aa810882583..f19d5e04c69d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3381,7 +3381,7 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env, *dst_reg = *ptr_reg; } ret = push_stack(env, env->insn_idx + 1, env->insn_idx, true); - if (!ptr_is_dst_reg) + if (!ptr_is_dst_reg && ret) *dst_reg = tmp; return !ret ? -EFAULT : 0; } From 2b1d9c8f87235f593826b9cf46ec10247741fff9 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Wed, 20 Mar 2019 16:15:24 -0500 Subject: [PATCH 214/468] ALSA: rawmidi: Fix potential Spectre v1 vulnerability info->stream is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: sound/core/rawmidi.c:604 __snd_rawmidi_info_select() warn: potential spectre issue 'rmidi->streams' [r] (local cap) Fix this by sanitizing info->stream before using it to index rmidi->streams. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/ Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: Takashi Iwai --- sound/core/rawmidi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index ee601d7f0926..c0690d1ecd55 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -601,6 +602,7 @@ static int __snd_rawmidi_info_select(struct snd_card *card, return -ENXIO; if (info->stream < 0 || info->stream > 1) return -EINVAL; + info->stream = array_index_nospec(info->stream, 2); pstr = &rmidi->streams[info->stream]; if (pstr->substream_count == 0) return -ENOENT; From c709f14f0616482b67f9fbcb965e1493a03ff30b Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Wed, 20 Mar 2019 18:42:01 -0500 Subject: [PATCH 215/468] ALSA: seq: oss: Fix Spectre v1 vulnerability dev is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: sound/core/seq/oss/seq_oss_synth.c:626 snd_seq_oss_synth_make_info() warn: potential spectre issue 'dp->synths' [w] (local cap) Fix this by sanitizing dev before using it to index dp->synths. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/ Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: Takashi Iwai --- sound/core/seq/oss/seq_oss_synth.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sound/core/seq/oss/seq_oss_synth.c b/sound/core/seq/oss/seq_oss_synth.c index 278ebb993122..c93945917235 100644 --- a/sound/core/seq/oss/seq_oss_synth.c +++ b/sound/core/seq/oss/seq_oss_synth.c @@ -617,13 +617,14 @@ int snd_seq_oss_synth_make_info(struct seq_oss_devinfo *dp, int dev, struct synth_info *inf) { struct seq_oss_synth *rec; + struct seq_oss_synthinfo *info = get_synthinfo_nospec(dp, dev); - if (dev < 0 || dev >= dp->max_synthdev) + if (!info) return -ENXIO; - if (dp->synths[dev].is_midi) { + if (info->is_midi) { struct midi_info minf; - snd_seq_oss_midi_make_info(dp, dp->synths[dev].midi_mapped, &minf); + snd_seq_oss_midi_make_info(dp, info->midi_mapped, &minf); inf->synth_type = SYNTH_TYPE_MIDI; inf->synth_subtype = 0; inf->nr_voices = 16; From 2733ccebf4a937a0858e7d05a4a003b89715033f Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Thu, 21 Mar 2019 16:39:04 +0800 Subject: [PATCH 216/468] ALSA: hda/realtek: Enable headset MIC of Acer Aspire Z24-890 with ALC286 The Acer Aspire Z24-890 cannot detect the headset MIC until ALC286_FIXUP_ACER_AIO_HEADSET_MIC quirk applied. Signed-off-by: Jian-Hong Pan Signed-off-by: Daniel Drake Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 191830d4fa40..916bb5a3f324 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6715,6 +6715,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1025, 0x128f, "Acer Veriton Z6860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), + SND_PCI_QUIRK(0x1025, 0x1308, "Acer Aspire Z24-890", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1330, "Acer TravelMate X514-51T", ALC255_FIXUP_ACER_HEADSET_MIC), SND_PCI_QUIRK(0x1028, 0x0470, "Dell M101z", ALC269_FIXUP_DELL_M101Z), SND_PCI_QUIRK(0x1028, 0x054b, "Dell XPS one 2710", ALC275_FIXUP_DELL_XPS), From c7531e31c8a440b5fe6bd62664def5bcb6262f96 Mon Sep 17 00:00:00 2001 From: Chris Chiu Date: Thu, 21 Mar 2019 17:17:31 +0800 Subject: [PATCH 217/468] ALSA: hda/realtek - Add support for Acer Aspire E5-523G/ES1-432 headset mic The Acer laptop Aspire E5-523G and ES1-432 with ALC255 can't detect the headset microphone until ALC255_FIXUP_ACER_MIC_NO_PRESENCE quirk applied. Signed-off-by: Chris Chiu Signed-off-by: Daniel Drake Signed-off-by: Jian-Hong Pan Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 916bb5a3f324..6042ddf2b2ae 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6712,6 +6712,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1025, 0x079b, "Acer Aspire V5-573G", ALC282_FIXUP_ASPIRE_V5_PINS), SND_PCI_QUIRK(0x1025, 0x102b, "Acer Aspire C24-860", ALC286_FIXUP_ACER_AIO_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x106d, "Acer Cloudbook 14", ALC283_FIXUP_CHROME_BOOK), + SND_PCI_QUIRK(0x1025, 0x1099, "Acer Aspire E5-523G", ALC255_FIXUP_ACER_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1025, 0x110e, "Acer Aspire ES1-432", ALC255_FIXUP_ACER_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x128f, "Acer Veriton Z6860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), From 33872d79f5d1cbedaaab79669cc38f16097a9450 Mon Sep 17 00:00:00 2001 From: Erik Hugne Date: Thu, 21 Mar 2019 09:11:59 +0100 Subject: [PATCH 218/468] tipc: fix cancellation of topology subscriptions When cancelling a subscription, we have to clear the cancel bit in the request before iterating over any established subscriptions with memcmp. Otherwise no subscription will ever be found, and it will not be possible to explicitly unsubscribe individual subscriptions. Fixes: 8985ecc7c1e0 ("tipc: simplify endianness handling in topology subscriber") Signed-off-by: Erik Hugne Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/topsrv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index 4a708a4e8583..b45932d78004 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -363,6 +363,7 @@ static int tipc_conn_rcv_sub(struct tipc_topsrv *srv, struct tipc_subscription *sub; if (tipc_sub_read(s, filter) & TIPC_SUB_CANCEL) { + s->filter &= __constant_ntohl(~TIPC_SUB_CANCEL); tipc_conn_delete_sub(con, s); return 0; } From ceabee6c59943bdd5e1da1a6a20dc7ee5f8113a2 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 21 Mar 2019 15:02:50 +0800 Subject: [PATCH 219/468] genetlink: Fix a memory leak on error path In genl_register_family(), when idr_alloc() fails, we forget to free the memory we possibly allocate for family->attrbuf. Reported-by: Hulk Robot Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: YueHaibing Reviewed-by: Kirill Tkhai Signed-off-by: David S. Miller --- net/netlink/genetlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 25eeb6d2a75a..f0ec068e1d02 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -366,7 +366,7 @@ int genl_register_family(struct genl_family *family) start, end + 1, GFP_KERNEL); if (family->id < 0) { err = family->id; - goto errout_locked; + goto errout_free; } err = genl_validate_assign_mc_groups(family); @@ -385,6 +385,7 @@ int genl_register_family(struct genl_family *family) errout_remove: idr_remove(&genl_fam_idr, family->id); +errout_free: kfree(family->attrbuf); errout_locked: genl_unlock_all(); From 0ab925d3690614aa44cd29fb84cdcef03eab97dc Mon Sep 17 00:00:00 2001 From: Nicholas Kazlauskas Date: Thu, 21 Mar 2019 11:53:45 -0400 Subject: [PATCH 220/468] drm/amd/display: Only allow VRR when vrefresh is within supported range [Why] Black screens or artifacting can occur when enabling FreeSync outside of the supported range of the monitor. This can happen since the supported range isn't always the min/max vrefresh range available for the monitor. [How] There was previously a fix that prevented this from happening in the low range but it didn't cover the upper range. Expand the condition to include both. Cc: Sun peng Li Cc: Harry Wentland Signed-off-by: Nicholas Kazlauskas Acked-by: Alex Deucher Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index fb27783d7a54..81127f7d6ed1 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5429,9 +5429,11 @@ static void get_freesync_config_for_crtc( struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(new_con_state->base.connector); struct drm_display_mode *mode = &new_crtc_state->base.mode; + int vrefresh = drm_mode_vrefresh(mode); new_crtc_state->vrr_supported = new_con_state->freesync_capable && - aconnector->min_vfreq <= drm_mode_vrefresh(mode); + vrefresh >= aconnector->min_vfreq && + vrefresh <= aconnector->max_vfreq; if (new_crtc_state->vrr_supported) { new_crtc_state->stream->ignore_msa_timing_param = true; From 69903dfae0310afe8a15f5cd4e376ebb7c6da1d2 Mon Sep 17 00:00:00 2001 From: Manasi Navare Date: Tue, 19 Mar 2019 15:18:47 -0700 Subject: [PATCH 221/468] drm/i915/icl: Fix the TRANS_DDI_FUNC_CTL2 bitfield macro This patch fixes the PORT_SYNC_MODE_MASTER_SELECT macro to correctly do the left shifting to set the port sync master select correctly. I have tested this fix on ICL. Fixes: 49edbd49786e ("drm/i915/icl: Define TRANS_DDI_FUNC_CTL DSI registers") Cc: Madhav Chauhan Cc: Jani Nikula Cc: # v5.0+ Signed-off-by: Manasi Navare Reviewed-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20190319221847.21311-1-manasi.d.navare@intel.com (cherry picked from commit 7264aebb81d15aa6bbed650c816bba90f026bc35) Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/i915_reg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 638a586469f9..684589acc368 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -9243,7 +9243,7 @@ enum skl_power_gate { #define TRANS_DDI_FUNC_CTL2(tran) _MMIO_TRANS2(tran, \ _TRANS_DDI_FUNC_CTL2_A) #define PORT_SYNC_MODE_ENABLE (1 << 4) -#define PORT_SYNC_MODE_MASTER_SELECT(x) ((x) < 0) +#define PORT_SYNC_MODE_MASTER_SELECT(x) ((x) << 0) #define PORT_SYNC_MODE_MASTER_SELECT_MASK (0x7 << 0) #define PORT_SYNC_MODE_MASTER_SELECT_SHIFT 0 From 06acc17a96215a11134114aee26532b12dc8fde1 Mon Sep 17 00:00:00 2001 From: Dan Murphy Date: Wed, 20 Mar 2019 07:36:55 -0500 Subject: [PATCH 222/468] net: phy: Add DP83825I to the DP83822 driver Add the DP83825I ethernet PHY to the DP83822 driver. These devices share the same WoL register bits and addresses. The phy_driver init was made into a macro as there may be future devices appended to this driver that will share the register space. http://www.ti.com/lit/gpn/dp83825i Reviewed-by: Florian Fainelli Signed-off-by: Dan Murphy Signed-off-by: David S. Miller --- drivers/net/phy/dp83822.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c index bbd8c22067f3..97d45bd5b38e 100644 --- a/drivers/net/phy/dp83822.c +++ b/drivers/net/phy/dp83822.c @@ -15,6 +15,8 @@ #include #define DP83822_PHY_ID 0x2000a240 +#define DP83825I_PHY_ID 0x2000a150 + #define DP83822_DEVADDR 0x1f #define MII_DP83822_PHYSCR 0x11 @@ -304,26 +306,30 @@ static int dp83822_resume(struct phy_device *phydev) return 0; } +#define DP83822_PHY_DRIVER(_id, _name) \ + { \ + PHY_ID_MATCH_MODEL(_id), \ + .name = (_name), \ + .features = PHY_BASIC_FEATURES, \ + .soft_reset = dp83822_phy_reset, \ + .config_init = dp83822_config_init, \ + .get_wol = dp83822_get_wol, \ + .set_wol = dp83822_set_wol, \ + .ack_interrupt = dp83822_ack_interrupt, \ + .config_intr = dp83822_config_intr, \ + .suspend = dp83822_suspend, \ + .resume = dp83822_resume, \ + } + static struct phy_driver dp83822_driver[] = { - { - .phy_id = DP83822_PHY_ID, - .phy_id_mask = 0xfffffff0, - .name = "TI DP83822", - .features = PHY_BASIC_FEATURES, - .config_init = dp83822_config_init, - .soft_reset = dp83822_phy_reset, - .get_wol = dp83822_get_wol, - .set_wol = dp83822_set_wol, - .ack_interrupt = dp83822_ack_interrupt, - .config_intr = dp83822_config_intr, - .suspend = dp83822_suspend, - .resume = dp83822_resume, - }, + DP83822_PHY_DRIVER(DP83822_PHY_ID, "TI DP83822"), + DP83822_PHY_DRIVER(DP83825I_PHY_ID, "TI DP83825I"), }; module_phy_driver(dp83822_driver); static struct mdio_device_id __maybe_unused dp83822_tbl[] = { { DP83822_PHY_ID, 0xfffffff0 }, + { DP83825I_PHY_ID, 0xfffffff0 }, { }, }; MODULE_DEVICE_TABLE(mdio, dp83822_tbl); From cd5afa91f078c0787be0a62b5ef90301c00b0271 Mon Sep 17 00:00:00 2001 From: Harini Katakam Date: Wed, 20 Mar 2019 19:12:22 +0530 Subject: [PATCH 223/468] net: macb: Add null check for PCLK and HCLK Both PCLK and HCLK are "required" clocks according to macb devicetree documentation. There is a chance that devm_clk_get doesn't return a negative error but just a NULL clock structure instead. In such a case the driver proceeds as usual and uses pclk value 0 to calculate MDC divisor which is incorrect. Hence fix the same in clock initialization. Signed-off-by: Harini Katakam Signed-off-by: David S. Miller --- drivers/net/ethernet/cadence/macb_main.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index ad099fd01b45..1522aee81884 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3370,14 +3370,20 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk, *hclk = devm_clk_get(&pdev->dev, "hclk"); } - if (IS_ERR(*pclk)) { + if (IS_ERR_OR_NULL(*pclk)) { err = PTR_ERR(*pclk); + if (!err) + err = -ENODEV; + dev_err(&pdev->dev, "failed to get macb_clk (%u)\n", err); return err; } - if (IS_ERR(*hclk)) { + if (IS_ERR_OR_NULL(*hclk)) { err = PTR_ERR(*hclk); + if (!err) + err = -ENODEV; + dev_err(&pdev->dev, "failed to get hclk (%u)\n", err); return err; } From 85d0966fa57e0ef2d30d913c98ca93674f7a03c9 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 14:59:59 +0100 Subject: [PATCH 224/468] net/sched: prepare TC actions to properly validate the control action - pass a pointer to struct tcf_proto in each actions's init() handler, to allow validating the control action, checking whether the chain exists and (eventually) refcounting it. - remove code that validates the control action after a successful call to the action's init() handler, and replace it with a test that forbids addition of actions having 'goto_chain' and NULL goto_chain pointer at the same time. - add tcf_action_check_ctrlact(), that will validate the control action and eventually allocate the action 'goto_chain' within the init() handler. - add tcf_action_set_ctrlact(), that will assign the control action and swap the current 'goto_chain' pointer with the new given one. This disallows 'goto_chain' on actions that don't initialize it properly in their init() handler, i.e. calling tcf_action_check_ctrlact() after successful IDR reservation and then calling tcf_action_set_ctrlact() to assign 'goto_chain' and 'tcf_action' consistently. By doing this, the kernel does not leak anymore refcounts when a valid 'goto chain' handle is replaced in TC actions, causing kmemleak splats like the following one: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto tcp action drop # tc chain add dev dd0 chain 43 ingress protocol ip flower \ > ip_proto udp action drop # tc filter add dev dd0 ingress matchall \ > action gact goto chain 42 index 66 # tc filter replace dev dd0 ingress matchall \ > action gact goto chain 43 index 66 # echo scan >/sys/kernel/debug/kmemleak <...> unreferenced object 0xffff93c0ee09f000 (size 1024): comm "tc", pid 2565, jiffies 4295339808 (age 65.426s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0 [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0 [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110 [<000000006126a348>] netlink_unicast+0x1a0/0x250 [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0 [<00000000a25a2171>] sock_sendmsg+0x36/0x40 [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0 [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0 [<000000007a6c61f9>] do_syscall_64+0x5b/0x180 [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0000000013eaa334>] 0xffffffffffffffff Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- include/net/act_api.h | 7 ++- net/sched/act_api.c | 97 ++++++++++++++++++++++---------------- net/sched/act_bpf.c | 2 +- net/sched/act_connmark.c | 1 + net/sched/act_csum.c | 2 +- net/sched/act_gact.c | 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 11 +++-- net/sched/act_mirred.c | 1 + net/sched/act_nat.c | 3 +- net/sched/act_pedit.c | 2 +- net/sched/act_police.c | 1 + net/sched/act_sample.c | 2 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 1 + net/sched/act_skbmod.c | 1 + net/sched/act_tunnel_key.c | 1 + net/sched/act_vlan.c | 2 +- 18 files changed, 84 insertions(+), 56 deletions(-) diff --git a/include/net/act_api.h b/include/net/act_api.h index c745e9ccfab2..54fbb49bd08a 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -90,7 +90,7 @@ struct tc_action_ops { int (*lookup)(struct net *net, struct tc_action **a, u32 index); int (*init)(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **act, int ovr, - int bind, bool rtnl_held, + int bind, bool rtnl_held, struct tcf_proto *tp, struct netlink_ext_ack *extack); int (*walk)(struct net *, struct sk_buff *, struct netlink_callback *, int, @@ -181,6 +181,11 @@ int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int); int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int); int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int); +int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, + struct tcf_chain **handle, + struct netlink_ext_ack *newchain); +struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, + struct tcf_chain *newchain); #endif /* CONFIG_NET_CLS_ACT */ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes, diff --git a/net/sched/act_api.c b/net/sched/act_api.c index aecf1bf233c8..fe67b98ac641 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -28,23 +28,6 @@ #include #include -static int tcf_action_goto_chain_init(struct tc_action *a, struct tcf_proto *tp) -{ - u32 chain_index = a->tcfa_action & TC_ACT_EXT_VAL_MASK; - - if (!tp) - return -EINVAL; - a->goto_chain = tcf_chain_get_by_act(tp->chain->block, chain_index); - if (!a->goto_chain) - return -ENOMEM; - return 0; -} - -static void tcf_action_goto_chain_fini(struct tc_action *a) -{ - tcf_chain_put_by_act(a->goto_chain); -} - static void tcf_action_goto_chain_exec(const struct tc_action *a, struct tcf_result *res) { @@ -71,6 +54,53 @@ static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie, call_rcu(&old->rcu, tcf_free_cookie_rcu); } +int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, + struct tcf_chain **newchain, + struct netlink_ext_ack *extack) +{ + int opcode = TC_ACT_EXT_OPCODE(action), ret = -EINVAL; + u32 chain_index; + + if (!opcode) + ret = action > TC_ACT_VALUE_MAX ? -EINVAL : 0; + else if (opcode <= TC_ACT_EXT_OPCODE_MAX || action == TC_ACT_UNSPEC) + ret = 0; + if (ret) { + NL_SET_ERR_MSG(extack, "invalid control action"); + goto end; + } + + if (TC_ACT_EXT_CMP(action, TC_ACT_GOTO_CHAIN)) { + chain_index = action & TC_ACT_EXT_VAL_MASK; + if (!tp || !newchain) { + ret = -EINVAL; + NL_SET_ERR_MSG(extack, + "can't goto NULL proto/chain"); + goto end; + } + *newchain = tcf_chain_get_by_act(tp->chain->block, chain_index); + if (!*newchain) { + ret = -ENOMEM; + NL_SET_ERR_MSG(extack, + "can't allocate goto_chain"); + } + } +end: + return ret; +} +EXPORT_SYMBOL(tcf_action_check_ctrlact); + +struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, + struct tcf_chain *newchain) +{ + struct tcf_chain *oldchain = a->goto_chain; + + a->tcfa_action = action; + a->goto_chain = newchain; + return oldchain; +} +EXPORT_SYMBOL(tcf_action_set_ctrlact); + /* XXX: For standalone actions, we don't need a RCU grace period either, because * actions are always connected to filters and filters are already destroyed in * RCU callbacks, so after a RCU grace period actions are already disconnected @@ -78,13 +108,15 @@ static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie, */ static void free_tcf(struct tc_action *p) { + struct tcf_chain *chain = p->goto_chain; + free_percpu(p->cpu_bstats); free_percpu(p->cpu_bstats_hw); free_percpu(p->cpu_qstats); tcf_set_action_cookie(&p->act_cookie, NULL); - if (p->goto_chain) - tcf_action_goto_chain_fini(p); + if (chain) + tcf_chain_put_by_act(chain); kfree(p); } @@ -800,15 +832,6 @@ static struct tc_cookie *nla_memdup_cookie(struct nlattr **tb) return c; } -static bool tcf_action_valid(int action) -{ - int opcode = TC_ACT_EXT_OPCODE(action); - - if (!opcode) - return action <= TC_ACT_VALUE_MAX; - return opcode <= TC_ACT_EXT_OPCODE_MAX || action == TC_ACT_UNSPEC; -} - struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, @@ -890,10 +913,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, /* backward compatibility for policer */ if (name == NULL) err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind, - rtnl_held, extack); + rtnl_held, tp, extack); else err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held, - extack); + tp, extack); if (err < 0) goto err_mod; @@ -907,18 +930,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, if (err != ACT_P_CREATED) module_put(a_o->owner); - if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) { - err = tcf_action_goto_chain_init(a, tp); - if (err) { - tcf_action_destroy_1(a, bind); - NL_SET_ERR_MSG(extack, "Failed to init TC action chain"); - return ERR_PTR(err); - } - } - - if (!tcf_action_valid(a->tcfa_action)) { + if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) && + !a->goto_chain) { tcf_action_destroy_1(a, bind); - NL_SET_ERR_MSG(extack, "Invalid control action value"); + NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain"); return ERR_PTR(-EINVAL); } diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index aa5c38d11a30..3c0468f2aae6 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -278,7 +278,7 @@ static void tcf_bpf_prog_fill_cfg(const struct tcf_bpf *prog, static int tcf_bpf_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **act, int replace, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, bpf_net_id); struct nlattr *tb[TCA_ACT_BPF_MAX + 1]; diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c index 5d24993cccfe..44aa046a92ea 100644 --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -97,6 +97,7 @@ static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = { static int tcf_connmark_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, connmark_net_id); diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c index c79aca29505e..9ba0f61a1e82 100644 --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -46,7 +46,7 @@ static struct tc_action_ops act_csum_ops; static int tcf_csum_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, - int bind, bool rtnl_held, + int bind, bool rtnl_held, struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, csum_net_id); diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index 93da0004e9f4..b8ad311bd8cc 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -57,7 +57,7 @@ static const struct nla_policy gact_policy[TCA_GACT_MAX + 1] = { static int tcf_gact_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, gact_net_id); struct nlattr *tb[TCA_GACT_MAX + 1]; diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c index 9b1f2b3990ee..c1ba74d5c1e3 100644 --- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -469,7 +469,7 @@ static int populate_metalist(struct tcf_ife_info *ife, struct nlattr **tb, static int tcf_ife_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, ife_net_id); struct nlattr *tb[TCA_IFE_MAX + 1]; diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c index 98f5b6ea77b4..04a0b5c61194 100644 --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -97,7 +97,8 @@ static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = { static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla, struct nlattr *est, struct tc_action **a, - const struct tc_action_ops *ops, int ovr, int bind) + const struct tc_action_ops *ops, int ovr, int bind, + struct tcf_proto *tp) { struct tc_action_net *tn = net_generic(net, id); struct nlattr *tb[TCA_IPT_MAX + 1]; @@ -205,20 +206,20 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla, static int tcf_ipt_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, - int bind, bool rtnl_held, + int bind, bool rtnl_held, struct tcf_proto *tp, struct netlink_ext_ack *extack) { return __tcf_ipt_init(net, ipt_net_id, nla, est, a, &act_ipt_ops, ovr, - bind); + bind, tp); } static int tcf_xt_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, - int bind, bool unlocked, + int bind, bool unlocked, struct tcf_proto *tp, struct netlink_ext_ack *extack) { return __tcf_ipt_init(net, xt_net_id, nla, est, a, &act_xt_ops, ovr, - bind); + bind, tp); } static int tcf_ipt_act(struct sk_buff *skb, const struct tc_action *a, diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 6692fd054617..383f4024452c 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -94,6 +94,7 @@ static struct tc_action_ops act_mirred_ops; static int tcf_mirred_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, mirred_net_id); diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c index 543eab9193f1..de4b493e26d2 100644 --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -38,7 +38,8 @@ static const struct nla_policy nat_policy[TCA_NAT_MAX + 1] = { static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, - bool rtnl_held, struct netlink_ext_ack *extack) + bool rtnl_held, struct tcf_proto *tp, + struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, nat_net_id); struct nlattr *tb[TCA_NAT_MAX + 1]; diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index a80373878df7..8ca82aefa11a 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -138,7 +138,7 @@ static int tcf_pedit_key_ex_dump(struct sk_buff *skb, static int tcf_pedit_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, pedit_net_id); struct nlattr *tb[TCA_PEDIT_MAX + 1]; diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 8271a6263824..229eba7925e5 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -83,6 +83,7 @@ static const struct nla_policy police_policy[TCA_POLICE_MAX + 1] = { static int tcf_police_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { int ret = 0, tcfp_result = TC_ACT_OK, err, size; diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index 203e399e5c85..36b8adbe935d 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -37,7 +37,7 @@ static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = { static int tcf_sample_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, - int bind, bool rtnl_held, + int bind, bool rtnl_held, struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, sample_net_id); diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c index d54cb608dbaf..4916dc3e3668 100644 --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -78,7 +78,7 @@ static const struct nla_policy simple_policy[TCA_DEF_MAX + 1] = { static int tcf_simp_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, simp_net_id); struct nlattr *tb[TCA_DEF_MAX + 1]; diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 65879500b688..4566eff3d027 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -96,6 +96,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = { static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, skbedit_net_id); diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c index 7bac1d78e7a3..b9ab2c8f07f1 100644 --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -82,6 +82,7 @@ static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = { static int tcf_skbmod_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, skbmod_net_id); diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 7c6591b991d5..fc295d91559a 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -210,6 +210,7 @@ static void tunnel_key_release_params(struct tcf_tunnel_key_params *p) static int tunnel_key_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, tunnel_key_net_id); diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c index ac0061599225..4651ee15e35d 100644 --- a/net/sched/act_vlan.c +++ b/net/sched/act_vlan.c @@ -105,7 +105,7 @@ static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = { static int tcf_vlan_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, - struct netlink_ext_ack *extack) + struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, vlan_net_id); struct nlattr *tb[TCA_VLAN_MAX + 1]; From 4e1810049c267c203c19130301203d0591174535 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:00 +0100 Subject: [PATCH 225/468] net/sched: act_bpf: validate the control action inside init() the following script: # tc filter add dev crash0 egress matchall \ > action bpf bytecode '1,6 0 0 4294967295' pass index 90 # tc actions replace action bpf \ > bytecode '1,6 0 0 4294967295' goto chain 42 index 90 cookie c1a0c1a0 # tc action show action bpf had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: bpf bytecode '1,6 0 0 4294967295' default-action goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb3a0803dfa90 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff942b347ada00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffb3a08034d038 RDI: ffff942b347ada00 RBP: ffffb3a0803dfb30 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffb3a0803dfb0c R12: ffff942b3b682b00 R13: ffff942b3b682b08 R14: 0000000000000001 R15: ffff942b3b682f00 FS: 00007f6160a72740(0000) GS:ffff942b3da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000795a4002 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip_finish_output2+0x16f/0x430 ip_finish_output2+0x16f/0x430 ? ip_output+0x69/0xe0 ip_output+0x69/0xe0 ? ip_forward_options+0x1a0/0x1a0 ip_send_skb+0x15/0x40 raw_sendmsg+0x8e1/0xbd0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xa0 ? try_to_wake_up+0x54/0x480 ? ldsem_down_read+0x3f/0x280 ? _cond_resched+0x15/0x40 ? down_read+0xe/0x30 ? copy_termios+0x1e/0x70 ? tty_mode_ioctl+0x1b6/0x4c0 ? sock_sendmsg+0x36/0x40 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 ? do_vfs_ioctl+0xa4/0x640 ? handle_mm_fault+0xdc/0x210 ? syscall_trace_enter+0x1df/0x2e0 ? __audit_syscall_exit+0x216/0x260 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f615f7e3c03 Code: 48 8b 0d 90 62 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 9d c3 2c 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 4b cc 00 00 48 89 04 24 RSP: 002b:00007ffee5d8cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055a4f28f1700 RCX: 00007f615f7e3c03 RDX: 0000000000000040 RSI: 000055a4f28f1700 RDI: 0000000000000003 RBP: 00007ffee5d8e340 R08: 000055a4f28ee510 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 000055a4f28f16c0 R14: 000055a4f28ef69c R15: 0000000000000080 Modules linked in: act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache crct10dif_pclmul jbd2 crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper pcspkr joydev virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_blk virtio_net virtio_console net_failover failover syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ata_piix serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_bpf_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_bpf.c | 23 +++++++++++++---- .../tc-testing/tc-tests/actions/bpf.json | 25 +++++++++++++++++++ 2 files changed, 43 insertions(+), 5 deletions(-) diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 3c0468f2aae6..3841156aa09f 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -17,6 +17,7 @@ #include #include +#include #include #include @@ -282,6 +283,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, bpf_net_id); struct nlattr *tb[TCA_ACT_BPF_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tcf_bpf_cfg cfg, old; struct tc_act_bpf *parm; struct tcf_bpf *prog; @@ -323,12 +325,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, return ret; } + ret = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (ret < 0) + goto release_idr; + is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS]; is_ebpf = tb[TCA_ACT_BPF_FD]; if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf)) { ret = -EINVAL; - goto out; + goto put_chain; } memset(&cfg, 0, sizeof(cfg)); @@ -336,7 +342,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, ret = is_bpf ? tcf_bpf_init_from_ops(tb, &cfg) : tcf_bpf_init_from_efd(tb, &cfg); if (ret < 0) - goto out; + goto put_chain; prog = to_bpf(*act); @@ -350,10 +356,13 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, if (cfg.bpf_num_ops) prog->bpf_num_ops = cfg.bpf_num_ops; - prog->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*act, parm->action, goto_ch); rcu_assign_pointer(prog->filter, cfg.filter); spin_unlock_bh(&prog->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + if (res == ACT_P_CREATED) { tcf_idr_insert(tn, *act); } else { @@ -363,9 +372,13 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, } return res; -out: - tcf_idr_release(*act, bind); +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + +release_idr: + tcf_idr_release(*act, bind); return ret; } diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json b/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json index 5970cee6d05f..b074ea9b6fe8 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json @@ -286,5 +286,30 @@ "teardown": [ "$TC action flush action bpf" ] + }, + { + "id": "b8a1", + "name": "Replace bpf action with invalid goto_chain control", + "category": [ + "actions", + "bpf" + ], + "setup": [ + [ + "$TC actions flush action bpf", + 0, + 1, + 255 + ], + "$TC action add action bpf bytecode '1,6 0 0 4294967295' pass index 90" + ], + "cmdUnderTest": "$TC action replace action bpf bytecode '1,6 0 0 4294967295' goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC action list action bpf", + "matchPattern": "action order [0-9]*: bpf.* default-action pass.*index 90", + "matchCount": "1", + "teardown": [ + "$TC action flush action bpf" + ] } ] From f5c29d83866d75585f9c89754de0b86b45ceab89 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:01 +0100 Subject: [PATCH 226/468] net/sched: act_csum: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall action csum icmp pass index 90 # tc actions replace action csum icmp goto chain 42 index 90 \ > cookie c1a0c1a0 # tc actions show action csum had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: csum (icmp) action goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 8000000074692067 P4D 8000000074692067 PUD 2e210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff93153da03be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9314ee40f700 RCX: 0000000000003a00 RDX: 0000000000000000 RSI: ffff931537c87828 RDI: ffff931537c87818 RBP: ffff93153da03c80 R08: 00000000527cffff R09: 0000000000000003 R10: 000000000000003f R11: 0000000000000028 R12: ffff9314edf68400 R13: ffff9314edf68408 R14: 0000000000000001 R15: ffff9314ed67b600 FS: 0000000000000000(0000) GS:ffff93153da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000073e32003 CR4: 00000000001606f0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 66 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffffff9a803e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff99e184f0 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000000 RBP: 0000000000000000 R08: 000eb5c4572376b3 R09: 0000000000000000 R10: ffffa53e806a3ca0 R11: 00000000000f4240 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_kernel+0x49e/0x4be secondary_startup_64+0xa4/0xb0 Modules linked in: act_csum veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel mbcache snd_hda_codec jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt net_failover fb_sys_fops virtio_console virtio_blk ttm failover drm ata_piix crc32c_intel floppy virtio_pci serio_raw libata virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_csum_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_csum.c | 20 ++++++++++++--- .../tc-testing/tc-tests/actions/csum.json | 25 +++++++++++++++++++ 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c index 9ba0f61a1e82..0c77e7bdf6d5 100644 --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -33,6 +33,7 @@ #include #include +#include #include #include @@ -52,6 +53,7 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, csum_net_id); struct tcf_csum_params *params_new; struct nlattr *tb[TCA_CSUM_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_csum *parm; struct tcf_csum *p; int ret = 0, err; @@ -87,21 +89,27 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla, return err; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + p = to_tcf_csum(*a); params_new = kzalloc(sizeof(*params_new), GFP_KERNEL); if (unlikely(!params_new)) { - tcf_idr_release(*a, bind); - return -ENOMEM; + err = -ENOMEM; + goto put_chain; } params_new->update_flags = parm->update_flags; spin_lock_bh(&p->tcf_lock); - p->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(p->params, params_new, lockdep_is_held(&p->tcf_lock)); spin_unlock_bh(&p->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (params_new) kfree_rcu(params_new, rcu); @@ -109,6 +117,12 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla, tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } /** diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json index a022792d392a..ddabb2fbb7c7 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json @@ -500,5 +500,30 @@ "matchPattern": "^[ \t]+index [0-9]+ ref", "matchCount": "0", "teardown": [] + }, + { + "id": "d128", + "name": "Replace csum action with invalid goto chain control", + "category": [ + "actions", + "csum" + ], + "setup": [ + [ + "$TC actions flush action csum", + 0, + 1, + 255 + ], + "$TC actions add action csum iph index 90" + ], + "cmdUnderTest": "$TC actions replace action csum iph goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action csum index 90", + "matchPattern": "action order [0-9]*: csum \\(iph\\) action pass.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action csum" + ] } ] From 0da2dbd6029c2be4191651bafa57c3c006eff63c Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:02 +0100 Subject: [PATCH 227/468] net/sched: act_gact: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action gact pass index 90 # tc actions replace action gact \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action gact had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: gact action goto chain 42 random type none pass val 0 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff8c2434703be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8c23ed6d7e00 RCX: 000000000000005a RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c23ed6d7e00 RBP: ffff8c2434703c80 R08: ffff8c243b639ac8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2429e68b00 R13: ffff8c2429e68b08 R14: 0000000000000001 R15: ffff8c2429c5a480 FS: 0000000000000000(0000) GS:ffff8c2434700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002dc0e005 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 74 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffff9c8640387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff8b2184f0 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002 RBP: 0000000000000002 R08: 000eb57882b36cc3 R09: 0000000000000020 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_gact act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper virtio_balloon joydev pcspkr snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea virtio_net sysfillrect net_failover virtio_blk sysimgblt fb_sys_fops virtio_console ttm failover drm crc32c_intel serio_raw ata_piix libata floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf] CR2: 0000000000000000 Validating the control action within tcf_gact_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_gact.c | 13 +++++++++- .../tc-testing/tc-tests/actions/gact.json | 25 +++++++++++++++++++ 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index b8ad311bd8cc..e540e31069d7 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -61,6 +62,7 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, gact_net_id); struct nlattr *tb[TCA_GACT_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_gact *parm; struct tcf_gact *gact; int ret = 0; @@ -116,10 +118,13 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, return err; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; gact = to_gact(*a); spin_lock_bh(&gact->tcf_lock); - gact->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); #ifdef CONFIG_GACT_PROB if (p_parm) { gact->tcfg_paction = p_parm->paction; @@ -133,9 +138,15 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, #endif spin_unlock_bh(&gact->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +release_idr: + tcf_idr_release(*a, bind); + return err; } static int tcf_gact_act(struct sk_buff *skb, const struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json index 89189a03ce3d..814b7a8a478b 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json @@ -560,5 +560,30 @@ "teardown": [ "$TC actions flush action gact" ] + }, + { + "id": "ca89", + "name": "Replace gact action with invalid goto chain control", + "category": [ + "actions", + "gact" + ], + "setup": [ + [ + "$TC actions flush action gact", + 0, + 1, + 255 + ], + "$TC actions add action pass random determ drop 2 index 90" + ], + "cmdUnderTest": "$TC actions replace action goto chain 42 random determ drop 5 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions list action gact", + "matchPattern": "action order [0-9]*: gact action pass.*random type determ drop val 2.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action gact" + ] } ] From 11a94d7fd80f92325e7b8653290ad3d2cd67f119 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:03 +0100 Subject: [PATCH 228/468] net/sched: act_ife: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action ife encode allow mark pass index 90 # tc actions replace action ife \ > encode allow mark goto chain 42 index 90 cookie c1a0c1a0 # tc action show action ife had the following output: IFE type 0xED3E IFE type 0xED3E Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: ife encode action goto chain 42 type 0XED3E allow mark index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007b4e7067 P4D 800000007b4e7067 PUD 7b4e6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 164 Comm: kworker/2:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffa6a7c0553ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9796ee1bbd00 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa6a7c0553b70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9797385bb038 R12: ffff9796ead9d700 R13: ffff9796ead9d708 R14: 0000000000000001 R15: ffff9796ead9d800 FS: 0000000000000000(0000) GS:ffff97973db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007c41e006 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_gact act_meta_mark act_ife dummy veth ip6table_filter ip6_tables iptable_filter binfmt_misc snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hwdep snd_hda_core ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd snd cryptd glue_helper virtio_balloon joydev pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl virtio_net drm_kms_helper virtio_blk net_failover syscopyarea failover sysfillrect virtio_console sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw ata_piix virtio_pci virtio_ring libata virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_ife] CR2: 0000000000000000 Validating the control action within tcf_ife_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_ife.c | 33 +++++++++++-------- .../tc-testing/tc-tests/actions/ife.json | 25 ++++++++++++++ 2 files changed, 45 insertions(+), 13 deletions(-) diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c index c1ba74d5c1e3..31c6ffb6abe7 100644 --- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -474,6 +475,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, ife_net_id); struct nlattr *tb[TCA_IFE_MAX + 1]; struct nlattr *tb2[IFE_META_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tcf_ife_params *p; struct tcf_ife_info *ife; u16 ife_type = ETH_P_IFE; @@ -531,6 +533,10 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla, } ife = to_ife(*a); + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + p->flags = parm->flags; if (parm->flags & IFE_ENCODE) { @@ -563,13 +569,8 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla, if (tb[TCA_IFE_METALST]) { err = nla_parse_nested(tb2, IFE_META_MAX, tb[TCA_IFE_METALST], NULL, NULL); - if (err) { -metadata_parse_err: - tcf_idr_release(*a, bind); - kfree(p); - return err; - } - + if (err) + goto metadata_parse_err; err = populate_metalist(ife, tb2, exists, rtnl_held); if (err) goto metadata_parse_err; @@ -581,21 +582,20 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla, * going to bail out */ err = use_all_metadata(ife, exists); - if (err) { - tcf_idr_release(*a, bind); - kfree(p); - return err; - } + if (err) + goto metadata_parse_err; } if (exists) spin_lock_bh(&ife->tcf_lock); - ife->tcf_action = parm->action; /* protected by tcf_lock when modifying existing action */ + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(ife->params, p, 1); if (exists) spin_unlock_bh(&ife->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (p) kfree_rcu(p, rcu); @@ -603,6 +603,13 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla, tcf_idr_insert(tn, *a); return ret; +metadata_parse_err: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + kfree(p); + tcf_idr_release(*a, bind); + return err; } static int tcf_ife_dump(struct sk_buff *skb, struct tc_action *a, int bind, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json index 0da3545cabdb..c13a68b98fc7 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json @@ -1060,5 +1060,30 @@ "matchPattern": "action order [0-9]*: ife encode action pipe.*allow prio.*index 4", "matchCount": "0", "teardown": [] + }, + { + "id": "a0e2", + "name": "Replace ife encode action with invalid goto chain control", + "category": [ + "actions", + "ife" + ], + "setup": [ + [ + "$TC actions flush action ife", + 0, + 1, + 255 + ], + "$TC actions add action ife encode allow mark pass index 90" + ], + "cmdUnderTest": "$TC actions replace action ife encode allow mark goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action ife index 90", + "matchPattern": "action order [0-9]*: ife encode action pass.*type 0[xX]ED3E .*allow mark.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action ife" + ] } ] From ff9721d32b1aba8bf46a06df20827d0a5d52ec48 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:04 +0100 Subject: [PATCH 229/468] net/sched: act_mirred: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action mirred ingress mirror dev lo pass # tc actions replace action mirred \ > ingress mirror dev lo goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action mirred had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: mirred (Ingress Mirror to device lo) goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: Mirror/redirect action on BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 47 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffa772404b7ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9c5afc3f4300 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9c5afdba9380 RDI: 0000000000029380 RBP: ffffa772404b7b70 R08: ffff9c5af7010028 R09: ffff9c5af7010029 R10: 0000000000000000 R11: ffff9c5af94c6a38 R12: ffff9c5af7953000 R13: ffff9c5af7953008 R14: 0000000000000001 R15: ffff9c5af7953d00 FS: 0000000000000000(0000) GS:ffff9c5afdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007c514004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_mirred veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_codec mbcache ghash_clmulni_intel jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer snd crypto_simd cryptd glue_helper soundcore virtio_balloon joydev pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm virtio_blk net_failover virtio_console failover drm ata_piix crc32c_intel virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_mirred_init() proved to fix the above issue. For the same reason, postpone the assignment of tcfa_action and tcfm_eaction to avoid partial reconfiguration of a mirred rule when it's replaced by another one that mirrors to a device that does not exist. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_mirred.c | 21 +++++++++++++--- .../tc-testing/tc-tests/actions/mirred.json | 25 +++++++++++++++++++ 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 383f4024452c..cd712e4e8998 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -99,6 +99,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, mirred_net_id); struct nlattr *tb[TCA_MIRRED_MAX + 1]; + struct tcf_chain *goto_ch = NULL; bool mac_header_xmit = false; struct tc_mirred *parm; struct tcf_mirred *m; @@ -158,18 +159,20 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + m = to_mirred(*a); spin_lock_bh(&m->tcf_lock); - m->tcf_action = parm->action; - m->tcfm_eaction = parm->eaction; if (parm->ifindex) { dev = dev_get_by_index(net, parm->ifindex); if (!dev) { spin_unlock_bh(&m->tcf_lock); - tcf_idr_release(*a, bind); - return -ENODEV; + err = -ENODEV; + goto put_chain; } mac_header_xmit = dev_is_mac_header_xmit(dev); rcu_swap_protected(m->tcfm_dev, dev, @@ -178,7 +181,11 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, dev_put(dev); m->tcfm_mac_header_xmit = mac_header_xmit; } + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); + m->tcfm_eaction = parm->eaction; spin_unlock_bh(&m->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) { spin_lock(&mirred_list_lock); @@ -189,6 +196,12 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, } return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json index db49fd0f8445..6e5fb3d25681 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json @@ -434,5 +434,30 @@ "teardown": [ "$TC actions flush action mirred" ] + }, + { + "id": "2a9a", + "name": "Replace mirred action with invalid goto chain control", + "category": [ + "actions", + "mirred" + ], + "setup": [ + [ + "$TC actions flush action mirred", + 0, + 1, + 255 + ], + "$TC actions add action mirred ingress mirror dev lo drop index 90" + ], + "cmdUnderTest": "$TC actions replace action mirred ingress mirror dev lo goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action mirred index 90", + "matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) drop.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action mirred" + ] } ] From c53075ea5d3c44849992523d5d83e2810d05a00e Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:05 +0100 Subject: [PATCH 230/468] net/sched: act_connmark: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action connmark pass index 90 # tc actions replace action connmark \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action connmark had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: connmark zone 0 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0 RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00 R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40 FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_connmark_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_connmark.c | 21 +++++++++++++--- .../tc-testing/tc-tests/actions/connmark.json | 25 +++++++++++++++++++ 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c index 44aa046a92ea..32ae0cd6e31c 100644 --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -102,9 +103,10 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, connmark_net_id); struct nlattr *tb[TCA_CONNMARK_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tcf_connmark_info *ci; struct tc_connmark *parm; - int ret = 0; + int ret = 0, err; if (!nla) return -EINVAL; @@ -129,7 +131,11 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, } ci = to_connmark(*a); - ci->tcf_action = parm->action; + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, + extack); + if (err < 0) + goto release_idr; + tcf_action_set_ctrlact(*a, parm->action, goto_ch); ci->net = net; ci->zone = parm->zone; @@ -143,15 +149,24 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, + extack); + if (err < 0) + goto release_idr; /* replacing action and zone */ spin_lock_bh(&ci->tcf_lock); - ci->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); ci->zone = parm->zone; spin_unlock_bh(&ci->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); ret = 0; } return ret; +release_idr: + tcf_idr_release(*a, bind); + return err; } static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/connmark.json b/tools/testing/selftests/tc-testing/tc-tests/actions/connmark.json index 13147a1f5731..cadde8f41fcd 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/connmark.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/connmark.json @@ -287,5 +287,30 @@ "teardown": [ "$TC actions flush action connmark" ] + }, + { + "id": "c506", + "name": "Replace connmark with invalid goto chain control", + "category": [ + "actions", + "connmark" + ], + "setup": [ + [ + "$TC actions flush action connmark", + 0, + 1, + 255 + ], + "$TC actions add action connmark pass index 90" + ], + "cmdUnderTest": "$TC actions replace action connmark goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action connmark index 90", + "matchPattern": "action order [0-9]+: connmark zone 0 pass.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action connmark" + ] } ] From 1e45d043a8bb2ed8a541384db3cc133e92001f0c Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:06 +0100 Subject: [PATCH 231/468] net/sched: act_nat: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action nat ingress 1.18.1.1 1.18.2.2 pass index 90 # tc actions replace action nat \ > ingress 1.18.1.1 1.18.2.2 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action nat had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: nat ingress 1.18.1.1/32 1.18.2.2 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002d180067 P4D 800000002d180067 PUD 7cb8b067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 164 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffae4500e2fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9fa52e28c800 RCX: 0000000001011201 RDX: 0000000000000000 RSI: 0000000000000056 RDI: ffff9fa52ca12800 RBP: ffffae4500e2fb70 R08: 0000000000000022 R09: 000000000000000e R10: 00000000ffffffff R11: 0000000001011201 R12: ffff9fa52cbc9c00 R13: ffff9fa52cbc9c08 R14: 0000000000000001 R15: ffff9fa52ca12780 FS: 0000000000000000(0000) GS:ffff9fa57db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000073f8c004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_nat veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk net_failover failover virtio_console drm crc32c_intel floppy ata_piix libata virtio_pci virtio_ring virtio serio_raw dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_nat_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_nat.c | 12 ++++++++- .../tc-testing/tc-tests/actions/nat.json | 25 +++++++++++++++++++ 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c index de4b493e26d2..e91bb8eb81ec 100644 --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -43,6 +44,7 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, { struct tc_action_net *tn = net_generic(net, nat_net_id); struct nlattr *tb[TCA_NAT_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_nat *parm; int ret = 0, err; struct tcf_nat *p; @@ -77,6 +79,9 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, } else { return err; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; p = to_tcf_nat(*a); spin_lock_bh(&p->tcf_lock); @@ -85,13 +90,18 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, p->mask = parm->mask; p->flags = parm->flags; - p->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); spin_unlock_bh(&p->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +release_idr: + tcf_idr_release(*a, bind); + return err; } static int tcf_nat_act(struct sk_buff *skb, const struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/nat.json b/tools/testing/selftests/tc-testing/tc-tests/actions/nat.json index 0080dc2fd41c..bc12c1ccad30 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/nat.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/nat.json @@ -589,5 +589,30 @@ "teardown": [ "$TC actions flush action nat" ] + }, + { + "id": "4b12", + "name": "Replace nat action with invalid goto chain control", + "category": [ + "actions", + "nat" + ], + "setup": [ + [ + "$TC actions flush action nat", + 0, + 1, + 255 + ], + "$TC actions add action nat ingress 1.18.1.1 1.18.2.2 drop index 90" + ], + "cmdUnderTest": "$TC actions replace action nat ingress 1.18.1.1 1.18.2.2 goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action nat index 90", + "matchPattern": "action order [0-9]+: nat ingress 1.18.1.1/32 1.18.2.2 drop.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action nat" + ] } ] From 6ac86ca3524b4549d31c45d11487b0626c334f10 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:07 +0100 Subject: [PATCH 232/468] net/sched: act_pedit: validate the control action inside init() the following script: # tc filter add dev crash0 egress matchall \ > action pedit ex munge ip ttl set 10 pass index 90 # tc actions replace action pedit \ > ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action pedit had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: pedit action goto chain 42 keys 1 index 90 ref 2 bind 1 key #0 at ipv4+8: val 0a000000 mask 00ffffff cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff94a73db03be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff94a6ee4c0700 RCX: 000000000000000a RDX: 0000000000000000 RSI: ffff94a6ed22c800 RDI: 0000000000000000 RBP: ffff94a73db03c80 R08: ffff94a7386fa4c8 R09: ffff94a73229ea20 R10: 0000000000000000 R11: 0000000000000000 R12: ffff94a6ed22cb00 R13: ffff94a6ed22cb08 R14: 0000000000000001 R15: ffff94a6ed22c800 FS: 0000000000000000(0000) GS:ffff94a73db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007120e002 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 4e ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffab1740387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffb18184f0 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002 RBP: 0000000000000002 R08: 000f168fa695f9a9 R09: 0000000000000020 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_pedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep aesni_intel snd_hda_core crypto_simd snd_seq cryptd glue_helper snd_seq_device snd_pcm joydev snd_timer pcspkr virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper virtio_net net_failover syscopyarea sysfillrect sysimgblt failover virtio_blk fb_sys_fops virtio_console ttm drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_pedit_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_pedit.c | 16 +++++- .../tc-testing/tc-tests/actions/pedit.json | 51 +++++++++++++++++++ 2 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/pedit.json diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 8ca82aefa11a..287793abfaf9 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -23,6 +23,7 @@ #include #include #include +#include static unsigned int pedit_net_id; static struct tc_action_ops act_pedit_ops; @@ -142,6 +143,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, pedit_net_id); struct nlattr *tb[TCA_PEDIT_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_pedit_key *keys = NULL; struct tcf_pedit_key_ex *keys_ex; struct tc_pedit *parm; @@ -205,6 +207,11 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, goto out_free; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) { + ret = err; + goto out_release; + } p = to_pedit(*a); spin_lock_bh(&p->tcf_lock); @@ -214,7 +221,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, if (!keys) { spin_unlock_bh(&p->tcf_lock); ret = -ENOMEM; - goto out_release; + goto put_chain; } kfree(p->tcfp_keys); p->tcfp_keys = keys; @@ -223,16 +230,21 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, memcpy(p->tcfp_keys, parm->keys, ksize); p->tcfp_flags = parm->flags; - p->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); kfree(p->tcfp_keys_ex); p->tcfp_keys_ex = keys_ex; spin_unlock_bh(&p->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); out_release: tcf_idr_release(*a, bind); out_free: diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/pedit.json b/tools/testing/selftests/tc-testing/tc-tests/actions/pedit.json new file mode 100644 index 000000000000..b73ceb9e28b1 --- /dev/null +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/pedit.json @@ -0,0 +1,51 @@ +[ + { + "id": "319a", + "name": "Add pedit action that mangles IP TTL", + "category": [ + "actions", + "pedit" + ], + "setup": [ + [ + "$TC actions flush action pedit", + 0, + 1, + 255 + ] + ], + "cmdUnderTest": "$TC actions add action pedit ex munge ip ttl set 10", + "expExitCode": "0", + "verifyCmd": "$TC actions ls action pedit", + "matchPattern": "action order [0-9]+: pedit action pass keys 1.*index 1 ref.*key #0 at ipv4\\+8: val 0a000000 mask 00ffffff", + "matchCount": "1", + "teardown": [ + "$TC actions flush action pedit" + ] + }, + { + "id": "7e67", + "name": "Replace pedit action with invalid goto chain", + "category": [ + "actions", + "pedit" + ], + "setup": [ + [ + "$TC actions flush action pedit", + 0, + 1, + 255 + ], + "$TC actions add action pedit ex munge ip ttl set 10 pass index 90" + ], + "cmdUnderTest": "$TC actions replace action pedit ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions ls action pedit", + "matchPattern": "action order [0-9]+: pedit action pass keys 1.*index 90 ref.*key #0 at ipv4\\+8: val 0a000000 mask 00ffffff", + "matchCount": "1", + "teardown": [ + "$TC actions flush action pedit" + ] + } +] From d6124d6ba697413efc53ff6919b1e0c250f1902a Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:08 +0100 Subject: [PATCH 233/468] net/sched: act_police: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action police rate 3mbit burst 250k pass index 90 # tc actions replace action police \ > rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action police rate 3mbit burst had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: police 0x5a rate 3Mbit burst 250Kb mtu 2Kb action goto chain 42 overhead 0b ref 2 bind 1 cookie c1a0c1a0 Then, when crash0 starts transmitting more than 3Mbit/s, the following kernel crash is observed: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007a779067 P4D 800000007a779067 PUD 2ad96067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 5032 Comm: netperf Not tainted 5.0.0-rc4.gotochain_crash+ #533 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb0e04064fa60 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff93bb3322cce0 RCX: 0000000000000005 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff93bb3322cce0 RBP: ffffb0e04064fb00 R08: 0000000000000022 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff93bb3beed300 R13: ffff93bb3beed308 R14: 0000000000000001 R15: ffff93bb3b64d000 FS: 00007f0bc6be5740(0000) GS:ffff93bb3db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000746a8001 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ipt_do_table+0x31c/0x420 [ip_tables] ? ip_finish_output2+0x16f/0x430 ip_finish_output2+0x16f/0x430 ? ip_output+0x69/0xe0 ip_output+0x69/0xe0 ? ip_forward_options+0x1a0/0x1a0 __tcp_transmit_skb+0x563/0xa40 tcp_write_xmit+0x243/0xfa0 __tcp_push_pending_frames+0x32/0xf0 tcp_sendmsg_locked+0x404/0xd30 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 ? __sys_connect+0x87/0xf0 ? syscall_trace_enter+0x1df/0x2e0 ? __audit_syscall_exit+0x216/0x260 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f0bc5ffbafd Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 ae c4 2c 00 85 c0 75 2d 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 63 63 2c 00 f7 d8 64 89 02 48 RSP: 002b:00007fffef94b7f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f0bc5ffbafd RDX: 0000000000004000 RSI: 00000000017e5420 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 R13: 00000000017e51d0 R14: 0000000000000010 R15: 0000000000000006 Modules linked in: act_police veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk virtio_net virtio_console net_failover failover crc32c_intel ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_police_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_police.c | 12 ++++++++- .../tc-testing/tc-tests/actions/police.json | 25 +++++++++++++++++++ 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 229eba7925e5..2b8581f6ab51 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -21,6 +21,7 @@ #include #include #include +#include struct tcf_police_params { int tcfp_result; @@ -88,6 +89,7 @@ static int tcf_police_init(struct net *net, struct nlattr *nla, { int ret = 0, tcfp_result = TC_ACT_OK, err, size; struct nlattr *tb[TCA_POLICE_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_police *parm; struct tcf_police *police; struct qdisc_rate_table *R_tab = NULL, *P_tab = NULL; @@ -129,6 +131,9 @@ static int tcf_police_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; police = to_police(*a); if (parm->rate.rate) { @@ -214,12 +219,14 @@ static int tcf_police_init(struct net *net, struct nlattr *nla, if (new->peak_present) police->tcfp_ptoks = new->tcfp_mtu_ptoks; spin_unlock_bh(&police->tcfp_lock); - police->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(police->params, new, lockdep_is_held(&police->tcf_lock)); spin_unlock_bh(&police->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (new) kfree_rcu(new, rcu); @@ -230,6 +237,9 @@ static int tcf_police_init(struct net *net, struct nlattr *nla, failure: qdisc_put_rtab(P_tab); qdisc_put_rtab(R_tab); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: tcf_idr_release(*a, bind); return err; } diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json index 4086a50a670e..b8268da5adaa 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json @@ -739,5 +739,30 @@ "teardown": [ "$TC actions flush action police" ] + }, + { + "id": "689e", + "name": "Replace police action with invalid goto chain control", + "category": [ + "actions", + "police" + ], + "setup": [ + [ + "$TC actions flush action police", + 0, + 1, + 255 + ], + "$TC actions add action police rate 3mbit burst 250k drop index 90" + ], + "cmdUnderTest": "$TC actions replace action police rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action police index 90", + "matchPattern": "action order [0-9]*: police 0x5a rate 3Mbit burst 250Kb mtu 2Kb action drop", + "matchCount": "1", + "teardown": [ + "$TC actions flush action police" + ] } ] From e8c87c643ef3603c6e63bdecb67b03d794648493 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:09 +0100 Subject: [PATCH 234/468] net/sched: act_sample: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action sample rate 1024 group 4 pass index 90 # tc actions replace action sample \ > rate 1024 group 4 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action sample had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: sample rate 1/1024 group 4 goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 8000000079966067 P4D 8000000079966067 PUD 7987b067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffbee60033fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff99d7ae6e3b00 RCX: 00000000e555df9b RDX: 0000000000000000 RSI: 00000000b0352718 RDI: ffff99d7fda1fcf0 RBP: ffffbee60033fb70 R08: 0000000070731ab1 R09: 0000000000000400 R10: 0000000000000000 R11: ffff99d7ac733838 R12: ffff99d7f3c2be00 R13: ffff99d7f3c2be08 R14: 0000000000000001 R15: ffff99d7f3c2b600 FS: 0000000000000000(0000) GS:ffff99d7fda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000797de006 CR4: 00000000001606f0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_sample psample veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd snd_pcm cryptd glue_helper snd_timer joydev snd pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover ttm failover virtio_blk virtio_console drm ata_piix serio_raw crc32c_intel libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_sample_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_sample.c | 19 +++++++++++--- .../tc-testing/tc-tests/actions/sample.json | 25 +++++++++++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index 36b8adbe935d..4060b0955c97 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -22,6 +22,7 @@ #include #include #include +#include #include @@ -43,6 +44,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, sample_net_id); struct nlattr *tb[TCA_SAMPLE_MAX + 1]; struct psample_group *psample_group; + struct tcf_chain *goto_ch = NULL; struct tc_sample *parm; u32 psample_group_num; struct tcf_sample *s; @@ -79,18 +81,21 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]); psample_group = psample_group_get(net, psample_group_num); if (!psample_group) { - tcf_idr_release(*a, bind); - return -ENOMEM; + err = -ENOMEM; + goto put_chain; } s = to_sample(*a); spin_lock_bh(&s->tcf_lock); - s->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); s->psample_group_num = psample_group_num; RCU_INIT_POINTER(s->psample_group, psample_group); @@ -100,10 +105,18 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, s->trunc_size = nla_get_u32(tb[TCA_SAMPLE_TRUNC_SIZE]); } spin_unlock_bh(&s->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static void tcf_sample_cleanup(struct tc_action *a) diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json index 3aca33c00039..27f0acaed880 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json @@ -584,5 +584,30 @@ "teardown": [ "$TC actions flush action sample" ] + }, + { + "id": "0a6e", + "name": "Replace sample action with invalid goto chain control", + "category": [ + "actions", + "sample" + ], + "setup": [ + [ + "$TC actions flush action sample", + 0, + 1, + 255 + ], + "$TC actions add action sample rate 1024 group 4 pass index 90" + ], + "cmdUnderTest": "$TC actions replace action sample rate 1024 group 7 goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions list action sample", + "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 pass.*index 90", + "matchCount": "1", + "teardown": [ + "$TC actions flush action sample" + ] } ] From 4b006b0c139e486773335f4c23b4d82348cfeb04 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:10 +0100 Subject: [PATCH 235/468] net/sched: act_simple: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action simple sdata hello pass index 90 # tc actions replace action simple \ > sdata world goto chain 42 index 90 cookie c1a0c1a0 # tc action show action simple had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: Simple index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000006a6fb067 P4D 800000006a6fb067 PUD 6aed6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 3241 Comm: kworker/2:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffbe6781763ad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9e59bdb80e00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9e59b4716738 RDI: ffff9e59ab12d140 RBP: ffffbe6781763b70 R08: 0000000000000234 R09: 0000000000aaaaaa R10: 0000000000000000 R11: ffff9e59b247cd50 R12: ffff9e59b112f100 R13: ffff9e59b112f108 R14: 0000000000000001 R15: ffff9e59ab12d0c0 FS: 0000000000000000(0000) GS:ffff9e59b4700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000006af92004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_simple veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm net_failover virtio_console virtio_blk failover drm crc32c_intel serio_raw floppy ata_piix libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_simple_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_simple.c | 52 ++++++++++++++----- .../tc-testing/tc-tests/actions/simple.json | 25 +++++++++ 2 files changed, 63 insertions(+), 14 deletions(-) diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c index 4916dc3e3668..23c8ca5615e5 100644 --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -60,14 +61,26 @@ static int alloc_defdata(struct tcf_defact *d, const struct nlattr *defdata) return 0; } -static void reset_policy(struct tcf_defact *d, const struct nlattr *defdata, - struct tc_defact *p) +static int reset_policy(struct tc_action *a, const struct nlattr *defdata, + struct tc_defact *p, struct tcf_proto *tp, + struct netlink_ext_ack *extack) { + struct tcf_chain *goto_ch = NULL; + struct tcf_defact *d; + int err; + + err = tcf_action_check_ctrlact(p->action, tp, &goto_ch, extack); + if (err < 0) + return err; + d = to_defact(a); spin_lock_bh(&d->tcf_lock); - d->tcf_action = p->action; + goto_ch = tcf_action_set_ctrlact(a, p->action, goto_ch); memset(d->tcfd_defdata, 0, SIMP_MAX_DATA); nla_strlcpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA); spin_unlock_bh(&d->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + return 0; } static const struct nla_policy simple_policy[TCA_DEF_MAX + 1] = { @@ -82,6 +95,7 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, simp_net_id); struct nlattr *tb[TCA_DEF_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_defact *parm; struct tcf_defact *d; bool exists = false; @@ -122,27 +136,37 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla, } d = to_defact(*a); - ret = alloc_defdata(d, tb[TCA_DEF_DATA]); - if (ret < 0) { - tcf_idr_release(*a, bind); - return ret; - } - d->tcf_action = parm->action; + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, + extack); + if (err < 0) + goto release_idr; + + err = alloc_defdata(d, tb[TCA_DEF_DATA]); + if (err < 0) + goto put_chain; + + tcf_action_set_ctrlact(*a, parm->action, goto_ch); ret = ACT_P_CREATED; } else { - d = to_defact(*a); - if (!ovr) { - tcf_idr_release(*a, bind); - return -EEXIST; + err = -EEXIST; + goto release_idr; } - reset_policy(d, tb[TCA_DEF_DATA], parm); + err = reset_policy(*a, tb[TCA_DEF_DATA], parm, tp, extack); + if (err) + goto release_idr; } if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static int tcf_simp_dump(struct sk_buff *skb, struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/simple.json b/tools/testing/selftests/tc-testing/tc-tests/actions/simple.json index e89a7aa4012d..8e8c1ae12260 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/simple.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/simple.json @@ -126,5 +126,30 @@ "teardown": [ "" ] + }, + { + "id": "b776", + "name": "Replace simple action with invalid goto chain control", + "category": [ + "actions", + "simple" + ], + "setup": [ + [ + "$TC actions flush action simple", + 0, + 1, + 255 + ], + "$TC actions add action simple sdata \"hello\" pass index 90" + ], + "cmdUnderTest": "$TC actions replace action simple sdata \"world\" goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions list action simple", + "matchPattern": "action order [0-9]*: Simple .*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action simple" + ] } ] From ec7727bb24b01e96b0c46068addf355ee4f794d8 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:11 +0100 Subject: [PATCH 236/468] net/sched: act_skbedit: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action skbedit ptype host pass index 90 # tc actions replace action skbedit \ > ptype host goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action skbedit had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: skbedit ptype host goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 3467 Comm: kworker/3:3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffffb50a81e1fad0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9aa47ba4ea00 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff9aa469eeb3c0 RDI: ffff9aa47ba4ea00 RBP: ffffb50a81e1fb70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9aa47bce0638 R12: ffff9aa4793b0c00 R13: ffff9aa4793b0c08 R14: 0000000000000001 R15: ffff9aa469eeb3c0 FS: 0000000000000000(0000) GS:ffff9aa474780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007360e005 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ndisc_next_option+0x50/0x50 ? ___neigh_create+0x4d5/0x680 ? ip6_finish_output2+0x1b5/0x590 ip6_finish_output2+0x1b5/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.28+0x79/0xc0 ndisc_send_skb+0x248/0x2e0 ndisc_send_ns+0xf8/0x200 ? addrconf_dad_work+0x389/0x4b0 addrconf_dad_work+0x389/0x4b0 ? __switch_to_asm+0x34/0x70 ? process_one_work+0x195/0x380 ? addrconf_dad_completed+0x370/0x370 process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: act_skbedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev soundcore pcspkr virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net net_failover drm failover virtio_blk virtio_console ata_piix virtio_pci crc32c_intel serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_skbedit_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_skbedit.c | 19 +++++++++++--- .../tc-testing/tc-tests/actions/skbedit.json | 25 +++++++++++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 4566eff3d027..7e1d261a31d2 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -102,6 +103,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, skbedit_net_id); struct tcf_skbedit_params *params_new; struct nlattr *tb[TCA_SKBEDIT_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tc_skbedit *parm; struct tcf_skbedit *d; u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL; @@ -187,11 +189,14 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, return -EEXIST; } } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; params_new = kzalloc(sizeof(*params_new), GFP_KERNEL); if (unlikely(!params_new)) { - tcf_idr_release(*a, bind); - return -ENOMEM; + err = -ENOMEM; + goto put_chain; } params_new->flags = flags; @@ -209,16 +214,24 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, params_new->mask = *mask; spin_lock_bh(&d->tcf_lock); - d->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(d->params, params_new, lockdep_is_held(&d->tcf_lock)); spin_unlock_bh(&d->tcf_lock); if (params_new) kfree_rcu(params_new, rcu); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a, diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json index 5aaf593b914a..ecd96eda7f6a 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json @@ -484,5 +484,30 @@ "teardown": [ "$TC actions flush action skbedit" ] + }, + { + "id": "1b2b", + "name": "Replace skbedit action with invalid goto_chain control", + "category": [ + "actions", + "skbedit" + ], + "setup": [ + [ + "$TC actions flush action skbedit", + 0, + 1, + 255 + ], + "$TC actions add action skbedit ptype host pass index 90" + ], + "cmdUnderTest": "$TC actions replace action skbedit ptype host goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions list action skbedit", + "matchPattern": "action order [0-9]*: skbedit ptype host pass.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action skbedit" + ] } ] From 7c3d825d12c5e6056ea73c0a202cbdef9d9ab9e6 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:12 +0100 Subject: [PATCH 237/468] net/sched: act_skbmod: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90 # tc actions replace action skbmod \ > set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action skbmod had the following output: src MAC address <00:c1:a0:c1:a0:00> src MAC address <00:c1:a0:c1:a0:00> Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002d5c7067 P4D 800000002d5c7067 PUD 77e16067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff8987ffd83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff8987aeb68800 RCX: ffff8987fa263640 RDX: 0000000000000000 RSI: ffff8987f51c8802 RDI: 00000000000000a0 RBP: ffff8987ffd83c80 R08: ffff8987f939bac8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8987f5c77d00 R13: ffff8987f5c77d08 R14: 0000000000000001 R15: ffff8987f0c29f00 FS: 0000000000000000(0000) GS:ffff8987ffd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007832c004 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 56 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa2a1c038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffa94184f0 RBX: 0000000000000003 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003 RBP: 0000000000000003 R08: 001123cfc2ba71ac R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000f4240 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_skbmod veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd cryptd glue_helper snd_pcm joydev pcspkr virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover virtio_console ttm virtio_blk failover drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_skbmod_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_skbmod.c | 19 +++++++++++--- .../tc-testing/tc-tests/actions/skbmod.json | 25 +++++++++++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c index b9ab2c8f07f1..1d4c324d0a42 100644 --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -88,6 +89,7 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, skbmod_net_id); struct nlattr *tb[TCA_SKBMOD_MAX + 1]; struct tcf_skbmod_params *p, *p_old; + struct tcf_chain *goto_ch = NULL; struct tc_skbmod *parm; struct tcf_skbmod *d; bool exists = false; @@ -154,21 +156,24 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; d = to_skbmod(*a); p = kzalloc(sizeof(struct tcf_skbmod_params), GFP_KERNEL); if (unlikely(!p)) { - tcf_idr_release(*a, bind); - return -ENOMEM; + err = -ENOMEM; + goto put_chain; } p->flags = lflags; - d->tcf_action = parm->action; if (ovr) spin_lock_bh(&d->tcf_lock); /* Protected by tcf_lock if overwriting existing action. */ + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); p_old = rcu_dereference_protected(d->skbmod_p, 1); if (lflags & SKBMOD_F_DMAC) @@ -184,10 +189,18 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla, if (p_old) kfree_rcu(p_old, rcu); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static void tcf_skbmod_cleanup(struct tc_action *a) diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json b/tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json index fe3326e939c1..6eb4c4f97060 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json @@ -392,5 +392,30 @@ "teardown": [ "$TC actions flush action skbmod" ] + }, + { + "id": "b651", + "name": "Replace skbmod action with invalid goto_chain control", + "category": [ + "actions", + "skbmod" + ], + "setup": [ + [ + "$TC actions flush action skbmod", + 0, + 1, + 255 + ], + "$TC actions add action skbmod set etype 0x1111 pass index 90" + ], + "cmdUnderTest": "$TC actions replace action skbmod set etype 0x1111 goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions ls action skbmod", + "matchPattern": "action order [0-9]*: skbmod pass set etype 0x1111\\s+index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action skbmod" + ] } ] From e5fdabacbffc5d321bf9f51410fe0db0834606eb Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:13 +0100 Subject: [PATCH 238/468] net/sched: act_tunnel_key: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \ > nocsum id 1 pass index 90 # tc actions replace action tunnel_key \ > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \ > goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action tunnel_key had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2.0 key_id 1 dst_port 3128 nocsum goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40 FS: 0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? tick_sched_timer+0x37/0x70 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_tunnel_key_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_tunnel_key.c | 18 +++++++++++-- .../tc-tests/actions/tunnel_key.json | 25 +++++++++++++++++++ 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index fc295d91559a..d5aaf90a3971 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -217,6 +218,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1]; struct tcf_tunnel_key_params *params_new; struct metadata_dst *metadata = NULL; + struct tcf_chain *goto_ch = NULL; struct tc_tunnel_key *parm; struct tcf_tunnel_key *t; bool exists = false; @@ -360,6 +362,12 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, goto release_tun_meta; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) { + ret = err; + exists = true; + goto release_tun_meta; + } t = to_tunnel_key(*a); params_new = kzalloc(sizeof(*params_new), GFP_KERNEL); @@ -367,23 +375,29 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, NL_SET_ERR_MSG(extack, "Cannot allocate tunnel key parameters"); ret = -ENOMEM; exists = true; - goto release_tun_meta; + goto put_chain; } params_new->tcft_action = parm->t_action; params_new->tcft_enc_metadata = metadata; spin_lock_bh(&t->tcf_lock); - t->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(t->params, params_new, lockdep_is_held(&t->tcf_lock)); spin_unlock_bh(&t->tcf_lock); tunnel_key_release_params(params_new); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + release_tun_meta: if (metadata) dst_release(&metadata->dst); diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json index e7e15a7336b6..28453a445fdb 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json @@ -884,5 +884,30 @@ "teardown": [ "$TC actions flush action tunnel_key" ] + }, + { + "id": "8242", + "name": "Replace tunnel_key set action with invalid goto chain", + "category": [ + "actions", + "tunnel_key" + ], + "setup": [ + [ + "$TC actions flush action tunnel_key", + 0, + 1, + 255 + ], + "$TC actions add action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.20.2 dst_port 3128 nocsum id 1 pass index 90" + ], + "cmdUnderTest": "$TC actions replace action tunnel_key set src_ip 10.10.10.2 dst_ip 20.20.20.1 dst_port 3129 id 2 csum goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action tunnel_key index 90", + "matchPattern": "action order [0-9]+: tunnel_key.*set.*src_ip 10.10.10.1.*dst_ip 20.20.20.2.*key_id 1.*dst_port 3128.*csum pass.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action tunnel_key" + ] } ] From 7e0c8892df7d0316ec853adbf84db536cd53258c Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:14 +0100 Subject: [PATCH 239/468] net/sched: act_vlan: validate the control action inside init() the following script: # tc qdisc add dev crash0 clsact # tc filter add dev crash0 egress matchall \ > action vlan pop pass index 90 # tc actions replace action vlan \ > pop goto chain 42 index 90 cookie c1a0c1a0 # tc actions show action vlan had the following output: Error: Failed to init TC action chain. We have an error talking to the kernel total acts 1 action order 0: vlan pop goto chain 42 index 90 ref 2 bind 1 cookie c1a0c1a0 Then, the first packet transmitted by crash0 made the kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 800000007974f067 P4D 800000007974f067 PUD 79638067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff982dfdb83be0 EFLAGS: 00010246 RAX: 000000002000002a RBX: ffff982dfc55db00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff982df97099c0 RDI: ffff982dfc55db00 RBP: ffff982dfdb83c80 R08: ffff982df983fec8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff982df5aacd00 R13: ffff982df5aacd08 R14: 0000000000000001 R15: ffff982df97099c0 FS: 0000000000000000(0000) GS:ffff982dfdb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000796d0005 CR4: 00000000001606e0 Call Trace: tcf_classify+0x58/0x120 __dev_queue_xmit+0x40a/0x890 ? ip6_finish_output2+0x369/0x590 ip6_finish_output2+0x369/0x590 ? ip6_output+0x68/0x110 ip6_output+0x68/0x110 ? nf_hook.constprop.35+0x79/0xc0 mld_sendpack+0x16f/0x220 mld_ifc_timer_expire+0x195/0x2c0 ? igmp6_timer_handler+0x70/0x70 call_timer_fn+0x2b/0x130 run_timer_softirq+0x3e8/0x440 ? enqueue_hrtimer+0x39/0x90 __do_softirq+0xe3/0x2f5 irq_exit+0xf0/0x100 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 RIP: 0010:native_safe_halt+0x2/0x10 Code: 7b ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffffa4714038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff840184f0 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000001e57d3f387 RBP: 0000000000000003 R08: 001125d9ca39e1eb R09: 0000000000000000 R10: 000000000000027d R11: 000000000009f400 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x1/0x1 default_idle+0x1c/0x140 do_idle+0x1c4/0x280 cpu_startup_entry+0x19/0x20 start_secondary+0x1a7/0x200 secondary_startup_64+0xa4/0xb0 Modules linked in: act_vlan veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon snd pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk ttm net_failover virtio_console failover ata_piix drm libata crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 Validating the control action within tcf_vlan_init() proved to fix the above issue. A TDC selftest is added to verify the correct behavior. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- net/sched/act_vlan.c | 20 ++++++++++++--- .../tc-testing/tc-tests/actions/vlan.json | 25 +++++++++++++++++++ 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c index 4651ee15e35d..0f40d0a74423 100644 --- a/net/sched/act_vlan.c +++ b/net/sched/act_vlan.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -109,6 +110,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, vlan_net_id); struct nlattr *tb[TCA_VLAN_MAX + 1]; + struct tcf_chain *goto_ch = NULL; struct tcf_vlan_params *p; struct tc_vlan *parm; struct tcf_vlan *v; @@ -200,12 +202,16 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla, return -EEXIST; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + v = to_vlan(*a); p = kzalloc(sizeof(*p), GFP_KERNEL); if (!p) { - tcf_idr_release(*a, bind); - return -ENOMEM; + err = -ENOMEM; + goto put_chain; } p->tcfv_action = action; @@ -214,16 +220,24 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla, p->tcfv_push_proto = push_proto; spin_lock_bh(&v->tcf_lock); - v->tcf_action = parm->action; + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); rcu_swap_protected(v->vlan_p, p, lockdep_is_held(&v->tcf_lock)); spin_unlock_bh(&v->tcf_lock); + if (goto_ch) + tcf_chain_put_by_act(goto_ch); if (p) kfree_rcu(p, rcu); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; +put_chain: + if (goto_ch) + tcf_chain_put_by_act(goto_ch); +release_idr: + tcf_idr_release(*a, bind); + return err; } static void tcf_vlan_cleanup(struct tc_action *a) diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json index 69ea09eefffc..cc7c7d758008 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json @@ -688,5 +688,30 @@ "teardown": [ "$TC actions flush action vlan" ] + }, + { + "id": "e394", + "name": "Replace vlan push action with invalid goto chain control", + "category": [ + "actions", + "vlan" + ], + "setup": [ + [ + "$TC actions flush action vlan", + 0, + 1, + 255 + ], + "$TC actions add action vlan push id 500 pass index 90" + ], + "cmdUnderTest": "$TC actions replace action vlan push id 500 goto chain 42 index 90 cookie c1a0c1a0", + "expExitCode": "255", + "verifyCmd": "$TC actions get action vlan index 90", + "matchPattern": "action order [0-9]+: vlan.*push id 500 protocol 802.1Q priority 0 pass.*index 90 ref", + "matchCount": "1", + "teardown": [ + "$TC actions flush action vlan" + ] } ] From fe384e2fa36ca084a456fd30558cccc75b4b3fbd Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:15 +0100 Subject: [PATCH 240/468] net/sched: don't dereference a->goto_chain to read the chain index callers of tcf_gact_goto_chain_index() can potentially read an old value of the chain index, or even dereference a NULL 'goto_chain' pointer, because 'goto_chain' and 'tcfa_action' are read in the traffic path without caring of concurrent write in the control path. The most recent value of chain index can be read also from a->tcfa_action (it's encoded there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to dereference 'goto_chain': just read the chain id from the control action. Fixes: e457d86ada27 ("net: sched: add couple of goto_chain helpers") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- include/net/tc_act/tc_gact.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h index ee8d005f56fc..eb8f01c819e6 100644 --- a/include/net/tc_act/tc_gact.h +++ b/include/net/tc_act/tc_gact.h @@ -56,7 +56,7 @@ static inline bool is_tcf_gact_goto_chain(const struct tc_action *a) static inline u32 tcf_gact_goto_chain_index(const struct tc_action *a) { - return a->goto_chain->index; + return READ_ONCE(a->tcfa_action) & TC_ACT_EXT_VAL_MASK; } #endif /* __NET_TC_GACT_H */ From ee3bbfe806cdb46b02cda63626cb50a7a7b19fc5 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Wed, 20 Mar 2019 15:00:16 +0100 Subject: [PATCH 241/468] net/sched: let actions use RCU to access 'goto_chain' use RCU when accessing the action chain, to avoid use after free in the traffic path when 'goto chain' is replaced on existing TC actions (see script below). Since the control action is read in the traffic path without holding the action spinlock, we need to explicitly ensure that a->goto_chain is not NULL before dereferencing (i.e it's not sufficient to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL dereferences in tcf_action_goto_chain_exec() when the following script: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto udp action pass index 4 # tc filter add dev dd0 ingress protocol ip flower \ > ip_proto udp action csum udp goto chain 42 index 66 # tc chain del dev dd0 chain 42 ingress (start UDP traffic towards dd0) # tc action replace action csum udp pass index 66 was run repeatedly for several hours. Suggested-by: Cong Wang Suggested-by: Vlad Buslov Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- include/net/act_api.h | 2 +- include/net/sch_generic.h | 1 + net/sched/act_api.c | 18 ++++++++++-------- net/sched/cls_api.c | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-) diff --git a/include/net/act_api.h b/include/net/act_api.h index 54fbb49bd08a..c61a1bf4e3de 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -39,7 +39,7 @@ struct tc_action { struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw; struct gnet_stats_queue __percpu *cpu_qstats; struct tc_cookie __rcu *act_cookie; - struct tcf_chain *goto_chain; + struct tcf_chain __rcu *goto_chain; }; #define tcf_index common.tcfa_index #define tcf_refcnt common.tcfa_refcnt diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 31284c078d06..7d1a0483a17b 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -378,6 +378,7 @@ struct tcf_chain { bool flushing; const struct tcf_proto_ops *tmplt_ops; void *tmplt_priv; + struct rcu_head rcu; }; struct tcf_block { diff --git a/net/sched/act_api.c b/net/sched/act_api.c index fe67b98ac641..5a87e271d35a 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -31,7 +31,7 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a, struct tcf_result *res) { - const struct tcf_chain *chain = a->goto_chain; + const struct tcf_chain *chain = rcu_dereference_bh(a->goto_chain); res->goto_tp = rcu_dereference_bh(chain->filter_chain); } @@ -91,13 +91,11 @@ int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, EXPORT_SYMBOL(tcf_action_check_ctrlact); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, - struct tcf_chain *newchain) + struct tcf_chain *goto_chain) { - struct tcf_chain *oldchain = a->goto_chain; - a->tcfa_action = action; - a->goto_chain = newchain; - return oldchain; + rcu_swap_protected(a->goto_chain, goto_chain, 1); + return goto_chain; } EXPORT_SYMBOL(tcf_action_set_ctrlact); @@ -108,7 +106,7 @@ EXPORT_SYMBOL(tcf_action_set_ctrlact); */ static void free_tcf(struct tc_action *p) { - struct tcf_chain *chain = p->goto_chain; + struct tcf_chain *chain = rcu_dereference_protected(p->goto_chain, 1); free_percpu(p->cpu_bstats); free_percpu(p->cpu_bstats_hw); @@ -686,6 +684,10 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, return TC_ACT_OK; } } else if (TC_ACT_EXT_CMP(ret, TC_ACT_GOTO_CHAIN)) { + if (unlikely(!rcu_access_pointer(a->goto_chain))) { + net_warn_ratelimited("can't go to NULL chain!\n"); + return TC_ACT_SHOT; + } tcf_action_goto_chain_exec(a, res); } @@ -931,7 +933,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, module_put(a_o->owner); if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) && - !a->goto_chain) { + !rcu_access_pointer(a->goto_chain)) { tcf_action_destroy_1(a, bind); NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain"); return ERR_PTR(-EINVAL); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index dc10525e90e7..99ae30c177c7 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -367,7 +367,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain, bool free_block) struct tcf_block *block = chain->block; mutex_destroy(&chain->filter_chain_lock); - kfree(chain); + kfree_rcu(chain, rcu); if (free_block) tcf_block_destroy(block); } From 6b70fc94afd165342876e53fc4b2f7d085009945 Mon Sep 17 00:00:00 2001 From: Wang Hai Date: Wed, 20 Mar 2019 14:25:05 -0400 Subject: [PATCH 242/468] net-sysfs: Fix memory leak in netdev_register_kobject When registering struct net_device, it will call register_netdevice -> netdev_register_kobject -> device_initialize(dev); dev_set_name(dev, "%s", ndev->name) device_add(dev) register_queue_kobjects(ndev) In netdev_register_kobject(), if device_add(dev) or register_queue_kobjects(ndev) failed. Register_netdevice() will return error, causing netdev_freemem(ndev) to be called to free net_device, however put_device(&dev->dev)->..-> kobject_cleanup() won't be called, resulting in a memory leak. syzkaller report this: BUG: memory leak unreferenced object 0xffff8881f4fad168 (size 8): comm "syz-executor.0", pid 3575, jiffies 4294778002 (age 20.134s) hex dump (first 8 bytes): 77 70 61 6e 30 00 ff ff wpan0... backtrace: [<000000006d2d91d7>] kstrdup_const+0x3d/0x50 mm/util.c:73 [<00000000ba9ff953>] kvasprintf_const+0x112/0x170 lib/kasprintf.c:48 [<000000005555ec09>] kobject_set_name_vargs+0x55/0x130 lib/kobject.c:281 [<0000000098d28ec3>] dev_set_name+0xbb/0xf0 drivers/base/core.c:1915 [<00000000b7553017>] netdev_register_kobject+0xc0/0x410 net/core/net-sysfs.c:1727 [<00000000c826a797>] register_netdevice+0xa51/0xeb0 net/core/dev.c:8711 [<00000000857bfcfd>] cfg802154_update_iface_num.isra.2+0x13/0x90 [ieee802154] [<000000003126e453>] ieee802154_llsec_fill_key_id+0x1d5/0x570 [ieee802154] [<00000000e4b3df51>] 0xffffffffc1500e0e [<00000000b4319776>] platform_drv_probe+0xc6/0x180 drivers/base/platform.c:614 [<0000000037669347>] really_probe+0x491/0x7c0 drivers/base/dd.c:509 [<000000008fed8862>] driver_probe_device+0xdc/0x240 drivers/base/dd.c:671 [<00000000baf52041>] device_driver_attach+0xf2/0x130 drivers/base/dd.c:945 [<00000000c7cc8dec>] __driver_attach+0x10e/0x210 drivers/base/dd.c:1022 [<0000000057a757c2>] bus_for_each_dev+0x154/0x1e0 drivers/base/bus.c:304 [<000000005f5ae04b>] bus_add_driver+0x427/0x5e0 drivers/base/bus.c:645 Reported-by: Hulk Robot Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Wang Hai Reviewed-by: Andy Shevchenko Reviewed-by: Stephen Hemminger Signed-off-by: David S. Miller --- net/core/net-sysfs.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 8f8b7b6c2945..f8f94303a1f5 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1747,16 +1747,20 @@ int netdev_register_kobject(struct net_device *ndev) error = device_add(dev); if (error) - return error; + goto error_put_device; error = register_queue_kobjects(ndev); - if (error) { - device_del(dev); - return error; - } + if (error) + goto error_device_del; pm_runtime_set_memalloc_noio(dev, true); + return 0; + +error_device_del: + device_del(dev); +error_put_device: + put_device(dev); return error; } From 408f13ef358aa5ad56dc6230c2c7deb92cf462b1 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 21 Mar 2019 09:39:52 +0800 Subject: [PATCH 243/468] rhashtable: Still do rehash when we get EEXIST As it stands if a shrink is delayed because of an outstanding rehash, we will go into a rescheduling loop without ever doing the rehash. This patch fixes this by still carrying out the rehash and then rescheduling so that we can shrink after the completion of the rehash should it still be necessary. The return value of EEXIST captures this case and other cases (e.g., another thread expanded/rehashed the table at the same time) where we should still proceed with the rehash. Fixes: da20420f83ea ("rhashtable: Add nested tables") Reported-by: Josh Elsasser Signed-off-by: Herbert Xu Tested-by: Josh Elsasser Signed-off-by: David S. Miller --- lib/rhashtable.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 0a105d4af166..97f59abc3e92 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -416,8 +416,12 @@ static void rht_deferred_worker(struct work_struct *work) else if (tbl->nest) err = rhashtable_rehash_alloc(ht, tbl, tbl->size); - if (!err) - err = rhashtable_rehash_table(ht); + if (!err || err == -EEXIST) { + int nerr; + + nerr = rhashtable_rehash_table(ht); + err = err ?: nerr; + } mutex_unlock(&ht->mutex); From 5f543a54eec08228ab0cc0a49cf5d79dd32c9e5e Mon Sep 17 00:00:00 2001 From: Yunsheng Lin Date: Thu, 21 Mar 2019 11:28:43 +0800 Subject: [PATCH 244/468] net: hns3: fix for not calculating tx bd num correctly When there is only one byte in a frag, the current calculation using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET" will return zero, because HNS3_MAX_BD_SIZE is 65535 and HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when a frag's size is one byte. This patch fixes it by using DIV_ROUND_UP. Fixes: 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path") Signed-off-by: Yunsheng Lin Signed-off-by: David S. Miller --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 13 ++++++------- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 1 - 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 1c1f17ec6be2..162cb9afa0e7 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -22,6 +22,7 @@ #include "hns3_enet.h" #define hns3_set_field(origin, shift, val) ((origin) |= ((val) << (shift))) +#define hns3_tx_bd_count(S) DIV_ROUND_UP(S, HNS3_MAX_BD_SIZE) static void hns3_clear_all_ring(struct hnae3_handle *h); static void hns3_force_clear_all_rx_ring(struct hnae3_handle *h); @@ -1079,7 +1080,7 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv, desc_cb->length = size; - frag_buf_num = (size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET; + frag_buf_num = hns3_tx_bd_count(size); sizeoflast = size & HNS3_TX_LAST_SIZE_M; sizeoflast = sizeoflast ? sizeoflast : HNS3_MAX_BD_SIZE; @@ -1124,14 +1125,13 @@ static int hns3_nic_maybe_stop_tso(struct sk_buff **out_skb, int *bnum, int i; size = skb_headlen(skb); - buf_num = (size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET; + buf_num = hns3_tx_bd_count(size); frag_num = skb_shinfo(skb)->nr_frags; for (i = 0; i < frag_num; i++) { frag = &skb_shinfo(skb)->frags[i]; size = skb_frag_size(frag); - bdnum_for_frag = (size + HNS3_MAX_BD_SIZE - 1) >> - HNS3_MAX_BD_SIZE_OFFSET; + bdnum_for_frag = hns3_tx_bd_count(size); if (unlikely(bdnum_for_frag > HNS3_MAX_BD_PER_FRAG)) return -ENOMEM; @@ -1139,8 +1139,7 @@ static int hns3_nic_maybe_stop_tso(struct sk_buff **out_skb, int *bnum, } if (unlikely(buf_num > HNS3_MAX_BD_PER_FRAG)) { - buf_num = (skb->len + HNS3_MAX_BD_SIZE - 1) >> - HNS3_MAX_BD_SIZE_OFFSET; + buf_num = hns3_tx_bd_count(skb->len); if (ring_space(ring) < buf_num) return -EBUSY; /* manual split the send packet */ @@ -1169,7 +1168,7 @@ static int hns3_nic_maybe_stop_tx(struct sk_buff **out_skb, int *bnum, buf_num = skb_shinfo(skb)->nr_frags + 1; if (unlikely(buf_num > HNS3_MAX_BD_PER_FRAG)) { - buf_num = (skb->len + HNS3_MAX_BD_SIZE - 1) / HNS3_MAX_BD_SIZE; + buf_num = hns3_tx_bd_count(skb->len); if (ring_space(ring) < buf_num) return -EBUSY; /* manual split the send packet */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index 1db0bd41d209..75669cd0c311 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -193,7 +193,6 @@ enum hns3_nic_state { #define HNS3_VECTOR_INITED 1 #define HNS3_MAX_BD_SIZE 65535 -#define HNS3_MAX_BD_SIZE_OFFSET 16 #define HNS3_MAX_BD_PER_FRAG 8 #define HNS3_MAX_BD_PER_PKT MAX_SKB_FRAGS From 41b37f4c0fa67185691bcbd30201cad566f2f0d1 Mon Sep 17 00:00:00 2001 From: Masanari Iida Date: Tue, 19 Mar 2019 01:30:09 +0900 Subject: [PATCH 245/468] ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi This patch fixes a spelling typo. Signed-off-by: Masanari Iida Fixes: cc42603de320 ("ARM: dts: imx6q-icore-rqs: Add Engicam IMX6 Q7 initial support") Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6qdl-icore-rqs.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/imx6qdl-icore-rqs.dtsi b/arch/arm/boot/dts/imx6qdl-icore-rqs.dtsi index 1d1b4bd0670f..a4217f564a53 100644 --- a/arch/arm/boot/dts/imx6qdl-icore-rqs.dtsi +++ b/arch/arm/boot/dts/imx6qdl-icore-rqs.dtsi @@ -264,7 +264,7 @@ &usdhc3 { pinctrl-2 = <&pinctrl_usdhc3_200mhz>; vmcc-supply = <®_sd3_vmmc>; cd-gpios = <&gpio1 1 GPIO_ACTIVE_LOW>; - bus-witdh = <4>; + bus-width = <4>; no-1-8-v; status = "okay"; }; @@ -275,7 +275,7 @@ &usdhc4 { pinctrl-1 = <&pinctrl_usdhc4_100mhz>; pinctrl-2 = <&pinctrl_usdhc4_200mhz>; vmcc-supply = <®_sd4_vmmc>; - bus-witdh = <8>; + bus-width = <8>; no-1-8-v; non-removable; status = "okay"; From 15b43e497ffd80ca44c00d781869a0e71f223ea5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 20 Mar 2019 12:09:05 +0100 Subject: [PATCH 246/468] ARM: dts: imx6dl-yapp4: Use correct pseudo PHY address for the switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The switch is accessible through pseudo PHY which is located at 0x10. Signed-off-by: Michal Vokáč Fixes: 87489ec3a77f ("ARM: dts: imx: Add Y Soft IOTA Draco, Hydra and Ursa boards") Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi index 091d829f6b05..e8d800fec637 100644 --- a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi +++ b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi @@ -114,9 +114,9 @@ phy_port3: phy@2 { reg = <2>; }; - switch@0 { + switch@10 { compatible = "qca,qca8334"; - reg = <0>; + reg = <10>; switch_ports: ports { #address-cells = <1>; From 728e096dd70889c2e80dd4153feee91afb1daf72 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Thu, 10 Jan 2019 21:19:33 +0100 Subject: [PATCH 247/468] ARM: imx_v6_v7_defconfig: continue compiling the pwm driver MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After the pwm-imx driver was split into two drivers and the Kconfig symbol changed accordingly, use the new name to continue being able to use the PWM hardware. Signed-off-by: Uwe Kleine-König Signed-off-by: Shawn Guo --- arch/arm/configs/imx_v6_v7_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index 5586a5074a96..50fb01d70b10 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -398,7 +398,7 @@ CONFIG_MAG3110=y CONFIG_MPL3115=y CONFIG_PWM=y CONFIG_PWM_FSL_FTM=y -CONFIG_PWM_IMX=y +CONFIG_PWM_IMX27=y CONFIG_NVMEM_IMX_OCOTP=y CONFIG_NVMEM_VF610_OCOTP=y CONFIG_TEE=y From 507aaeeef80d70c46bdf07cda49234b36c2bbdcb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Thu, 10 Jan 2019 21:19:34 +0100 Subject: [PATCH 248/468] ARM: imx_v4_v5_defconfig: enable PWM driver MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While there is no mainline board that makes use of the PWM still enable the driver for it to increase compile test coverage. Signed-off-by: Uwe Kleine-König Signed-off-by: Shawn Guo --- arch/arm/configs/imx_v4_v5_defconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/configs/imx_v4_v5_defconfig b/arch/arm/configs/imx_v4_v5_defconfig index 8661dd9b064a..b37f8e675e40 100644 --- a/arch/arm/configs/imx_v4_v5_defconfig +++ b/arch/arm/configs/imx_v4_v5_defconfig @@ -170,6 +170,9 @@ CONFIG_IMX_SDMA=y # CONFIG_IOMMU_SUPPORT is not set CONFIG_IIO=y CONFIG_FSL_MX25_ADC=y +CONFIG_PWM=y +CONFIG_PWM_IMX1=y +CONFIG_PWM_IMX27=y CONFIG_EXT4_FS=y # CONFIG_DNOTIFY is not set CONFIG_VFAT_FS=y From 83d163124cf1104cca5b668d5fe6325715a60855 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 21 Mar 2019 14:34:36 -0700 Subject: [PATCH 249/468] bpf: verifier: propagate liveness on all frames Commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences") connected up parentage chains of all frames of the stack. It didn't, however, ensure propagate_liveness() propagates all liveness information along those chains. This means pruning happening in the callee may generate explored states with incomplete liveness for the chains in lower frames of the stack. The included selftest is similar to the prior one from commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences"), where callee would prune regardless of the difference in r8 state. Now we also initialize r9 to 0 or 1 based on a result from get_random(). r9 is never read so the walk with r9 = 0 gets pruned (correctly) after the walk with r9 = 1 completes. The selftest is so arranged that the pruning will happen in the callee. Since callee does not propagate read marks of r8, the explored state at the pruning point prior to the callee will now ignore r8. Propagate liveness on all frames of the stack when pruning. Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Signed-off-by: Jakub Kicinski Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 20 +++++++++------- tools/testing/selftests/bpf/verifier/calls.c | 25 ++++++++++++++++++++ 2 files changed, 36 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f19d5e04c69d..fd502c1f71eb 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6078,15 +6078,17 @@ static int propagate_liveness(struct bpf_verifier_env *env, } /* Propagate read liveness of registers... */ BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG); - /* We don't need to worry about FP liveness because it's read-only */ - for (i = 0; i < BPF_REG_FP; i++) { - if (vparent->frame[vparent->curframe]->regs[i].live & REG_LIVE_READ) - continue; - if (vstate->frame[vstate->curframe]->regs[i].live & REG_LIVE_READ) { - err = mark_reg_read(env, &vstate->frame[vstate->curframe]->regs[i], - &vparent->frame[vstate->curframe]->regs[i]); - if (err) - return err; + for (frame = 0; frame <= vstate->curframe; frame++) { + /* We don't need to worry about FP liveness, it's read-only */ + for (i = frame < vstate->curframe ? BPF_REG_6 : 0; i < BPF_REG_FP; i++) { + if (vparent->frame[frame]->regs[i].live & REG_LIVE_READ) + continue; + if (vstate->frame[frame]->regs[i].live & REG_LIVE_READ) { + err = mark_reg_read(env, &vstate->frame[frame]->regs[i], + &vparent->frame[frame]->regs[i]); + if (err) + return err; + } } } diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index 4004891afa9c..f2ccae39ee66 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -1940,3 +1940,28 @@ .errstr = "!read_ok", .result = REJECT, }, +{ + "calls: cross frame pruning - liveness propagation", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_IMM(BPF_REG_8, 0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_MOV64_IMM(BPF_REG_8, 1), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_IMM(BPF_REG_9, 0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_MOV64_IMM(BPF_REG_9, 1), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_8, 1, 1), + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, + .errstr_unpriv = "function calls to other bpf functions are allowed for root only", + .errstr = "!read_ok", + .result = REJECT, +}, From e1037354a0a75acdea2b27043c0a371ed85cf262 Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Fri, 22 Mar 2019 11:37:18 +0800 Subject: [PATCH 250/468] ALSA: hda/realtek: Enable ASUS X441MB and X705FD headset MIC with ALC256 The ASUS laptop X441MB and X705FD with ALC256 cannot detect the headset MIC until ALC256_FIXUP_ASUS_MIC_NO_PRESENCE quirk applied. Signed-off-by: Chris Chiu Signed-off-by: Daniel Drake Signed-off-by: Jian-Hong Pan Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 6042ddf2b2ae..f2539007d09e 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5688,6 +5688,7 @@ enum { ALC225_FIXUP_WYSE_AUTO_MUTE, ALC225_FIXUP_WYSE_DISABLE_MIC_VREF, ALC286_FIXUP_ACER_AIO_HEADSET_MIC, + ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, }; static const struct hda_fixup alc269_fixups[] = { @@ -6696,6 +6697,15 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC286_FIXUP_ACER_AIO_MIC_NO_PRESENCE }, + [ALC256_FIXUP_ASUS_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x19, 0x04a11120 }, /* use as headset mic, without its own jack detect */ + { } + }, + .chained = true, + .chain_id = ALC256_FIXUP_ASUS_HEADSET_MODE + }, }; static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -7334,6 +7344,10 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x14, 0x90170110}, {0x1b, 0x90a70130}, {0x21, 0x03211020}), + SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, + {0x1a, 0x90a70130}, + {0x1b, 0x90170110}, + {0x21, 0x03211020}), SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, {0x12, 0xb7a60130}, {0x13, 0xb8a61140}, From a806ef1cf3bbc0baadc6cdeb11f12b5dd27e91c2 Mon Sep 17 00:00:00 2001 From: Chris Chiu Date: Fri, 22 Mar 2019 11:37:20 +0800 Subject: [PATCH 251/468] ALSA: hda/realtek: Enable headset mic of ASUS P5440FF with ALC256 The ASUS laptop P5440FF with ALC256 can't detect the headset microphone until ALC256_FIXUP_ASUS_MIC_NO_PRESENCE quirk applied. Signed-off-by: Chris Chiu Signed-off-by: Daniel Drake Signed-off-by: Jian-Hong Pan Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index f2539007d09e..8ec0cd95341a 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7344,6 +7344,10 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x14, 0x90170110}, {0x1b, 0x90a70130}, {0x21, 0x03211020}), + SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, + {0x12, 0x90a60130}, + {0x14, 0x90170110}, + {0x21, 0x03211020}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, {0x1a, 0x90a70130}, {0x1b, 0x90170110}, From 6ac371aa1a74240fb910c98aa3484d5ece8473d3 Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Fri, 22 Mar 2019 11:37:22 +0800 Subject: [PATCH 252/468] ALSA: hda/realtek: Enable headset MIC of ASUS X430UN and X512DK with ALC256 The ASUS X430UN and X512DK with ALC256 cannot detect the headset MIC until ALC256_FIXUP_ASUS_MIC_NO_PRESENCE quirk applied. Signed-off-by: Jian-Hong Pan Signed-off-by: Daniel Drake Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 8ec0cd95341a..01c71467600b 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7348,6 +7348,10 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x12, 0x90a60130}, {0x14, 0x90170110}, {0x21, 0x03211020}), + SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, + {0x12, 0x90a60130}, + {0x14, 0x90170110}, + {0x21, 0x04211020}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, {0x1a, 0x90a70130}, {0x1b, 0x90170110}, From 7cf77b273a8fc51e7de622fa6691abd4436a9a6b Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Mon, 11 Feb 2019 11:51:20 +0100 Subject: [PATCH 253/468] drm/tegra: hub: Fix dereference before check Reported-by: Dan Carpenter Signed-off-by: Thierry Reding --- drivers/gpu/drm/tegra/hub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index ba9b3cfb8c3d..b3436c2aed68 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -378,14 +378,16 @@ static int tegra_shared_plane_atomic_check(struct drm_plane *plane, static void tegra_shared_plane_atomic_disable(struct drm_plane *plane, struct drm_plane_state *old_state) { - struct tegra_dc *dc = to_tegra_dc(old_state->crtc); struct tegra_plane *p = to_tegra_plane(plane); + struct tegra_dc *dc; u32 value; /* rien ne va plus */ if (!old_state || !old_state->crtc) return; + dc = to_tegra_dc(old_state->crtc); + /* * XXX Legacy helpers seem to sometimes call ->atomic_disable() even * on planes that are already disabled. Make sure we fallback to the From 509869a2fec36ecb2b841180915995f41d5a0219 Mon Sep 17 00:00:00 2001 From: Anders Roxell Date: Mon, 18 Feb 2019 12:00:50 +0100 Subject: [PATCH 254/468] drm/tegra: vic: Fix implicit function declaration warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When CONFIG_IOMMU_API isn't set the following warnings pops up: drivers/gpu/drm/tegra/vic.c: In function ‘vic_boot’: drivers/gpu/drm/tegra/vic.c:110:31: error: implicit declaration of function ‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration] struct iommu_fwspec *spec = dev_iommu_fwspec_get(vic->dev); ^~~~~~~~~~~~~~~~~~~~ iommu_fwspec_free drivers/gpu/drm/tegra/vic.c:110:31: warning: initialization of ‘struct iommu_fwspec *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] drivers/gpu/drm/tegra/vic.c:117:19: error: ‘struct iommu_fwspec’ has no member named ‘num_ids’ if (spec && spec->num_ids > 0) { ^~ drivers/gpu/drm/tegra/vic.c:118:16: error: ‘struct iommu_fwspec’ has no member named ‘ids’ value = spec->ids[0] & 0xffff; ^~ Rework so that its inside a '#ifdef CONFIG_IOMMU_API' block. Fixes: f3779cb190a5 ("drm/tegra: vic: Support stream ID register programming") Signed-off-by: Anders Roxell Signed-off-by: Thierry Reding --- drivers/gpu/drm/tegra/vic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index 39bfed9623de..982ce37ecde1 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -106,6 +106,7 @@ static int vic_boot(struct vic *vic) if (vic->booted) return 0; +#ifdef CONFIG_IOMMU_API if (vic->config->supports_sid) { struct iommu_fwspec *spec = dev_iommu_fwspec_get(vic->dev); u32 value; @@ -121,6 +122,7 @@ static int vic_boot(struct vic *vic) vic_writel(vic, value, VIC_THI_STREAMID1); } } +#endif /* setup clockgating registers */ vic_writel(vic, CG_IDLE_CG_DLY_CNT(4) | From c8248c6c1a3d5db944753dd8e1c143d92c2c74fc Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 21 Mar 2019 21:23:14 +0100 Subject: [PATCH 255/468] r8169: don't read interrupt mask register in interrupt handler After the original patch network starts to crash on heavy load. It's not fully clear why this additional register read has such side effects, but removing it fixes the issue. Thanks also to Alex for his contribution and hints. [0] https://marc.info/?t=155268170400002&r=1&w=2 Fixes: e782410ed237 ("r8169: improve spurious interrupt detection") Reported-by: VDR User Tested-by: VDR User Signed-off-by: Heiner Kallweit Reviewed-by: Alexander Duyck Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index c29dde064078..9dd1cd2c0c68 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -678,6 +678,7 @@ struct rtl8169_private { struct work_struct work; } wk; + unsigned irq_enabled:1; unsigned supports_gmii:1; dma_addr_t counters_phys_addr; struct rtl8169_counters *counters; @@ -1293,6 +1294,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits) static void rtl_irq_disable(struct rtl8169_private *tp) { RTL_W16(tp, IntrMask, 0); + tp->irq_enabled = 0; } #define RTL_EVENT_NAPI_RX (RxOK | RxErr) @@ -1301,6 +1303,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp) static void rtl_irq_enable(struct rtl8169_private *tp) { + tp->irq_enabled = 1; RTL_W16(tp, IntrMask, tp->irq_mask); } @@ -6520,9 +6523,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) { struct rtl8169_private *tp = dev_instance; u16 status = RTL_R16(tp, IntrStatus); - u16 irq_mask = RTL_R16(tp, IntrMask); - if (status == 0xffff || !(status & irq_mask)) + if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask)) return IRQ_NONE; if (unlikely(status & SYSErr)) { From ca0214ee2802dd47239a4e39fb21c5b00ef61b22 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 22 Mar 2019 16:00:54 +0100 Subject: [PATCH 256/468] ALSA: pcm: Fix possible OOB access in PCM oss plugins The PCM OSS emulation converts and transfers the data on the fly via "plugins". The data is converted over the dynamically allocated buffer for each plugin, and recently syzkaller caught OOB in this flow. Although the bisection by syzbot pointed out to the commit 65766ee0bf7f ("ALSA: oss: Use kvzalloc() for local buffer allocations"), this is merely a commit to replace vmalloc() with kvmalloc(), hence it can't be the cause. The further debug action revealed that this happens in the case where a slave PCM doesn't support only the stereo channels while the OSS stream is set up for a mono channel. Below is a brief explanation: At each OSS parameter change, the driver sets up the PCM hw_params again in snd_pcm_oss_change_params_lock(). This is also the place where plugins are created and local buffers are allocated. The problem is that the plugins are created before the final hw_params is determined. Namely, two snd_pcm_hw_param_near() calls for setting the period size and periods may influence on the final result of channels, rates, etc, too, while the current code has already created plugins beforehand with the premature values. So, the plugin believes that channels=1, while the actual I/O is with channels=2, which makes the driver reading/writing over the allocated buffer size. The fix is simply to move the plugin allocation code after the final hw_params call. Reported-by: syzbot+d4503ae45b65c5bc1194@syzkaller.appspotmail.com Cc: Signed-off-by: Takashi Iwai --- sound/core/oss/pcm_oss.c | 43 ++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index d5b0d7ba83c4..f6ae68017608 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -940,6 +940,28 @@ static int snd_pcm_oss_change_params_locked(struct snd_pcm_substream *substream) oss_frame_size = snd_pcm_format_physical_width(params_format(params)) * params_channels(params) / 8; + err = snd_pcm_oss_period_size(substream, params, sparams); + if (err < 0) + goto failure; + + n = snd_pcm_plug_slave_size(substream, runtime->oss.period_bytes / oss_frame_size); + err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, NULL); + if (err < 0) + goto failure; + + err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIODS, + runtime->oss.periods, NULL); + if (err < 0) + goto failure; + + snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_DROP, NULL); + + err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, sparams); + if (err < 0) { + pcm_dbg(substream->pcm, "HW_PARAMS failed: %i\n", err); + goto failure; + } + #ifdef CONFIG_SND_PCM_OSS_PLUGINS snd_pcm_oss_plugin_clear(substream); if (!direct) { @@ -974,27 +996,6 @@ static int snd_pcm_oss_change_params_locked(struct snd_pcm_substream *substream) } #endif - err = snd_pcm_oss_period_size(substream, params, sparams); - if (err < 0) - goto failure; - - n = snd_pcm_plug_slave_size(substream, runtime->oss.period_bytes / oss_frame_size); - err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, NULL); - if (err < 0) - goto failure; - - err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIODS, - runtime->oss.periods, NULL); - if (err < 0) - goto failure; - - snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_DROP, NULL); - - if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, sparams)) < 0) { - pcm_dbg(substream->pcm, "HW_PARAMS failed: %i\n", err); - goto failure; - } - if (runtime->oss.trigger) { sw_params->start_threshold = 1; } else { From 7ecced0934e574b528a1ba6c237731e682216a74 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Fri, 8 Mar 2019 22:07:57 -0600 Subject: [PATCH 257/468] gpio: exar: add a check for the return value of ida_simple_get fails ida_simple_get may fail and return a negative error number. The fix checks its return value; if it fails, go to err_destroy. Cc: Signed-off-by: Kangjie Lu Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpio-exar.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpio-exar.c b/drivers/gpio/gpio-exar.c index 0ecd2369c2ca..a09d2f9ebacc 100644 --- a/drivers/gpio/gpio-exar.c +++ b/drivers/gpio/gpio-exar.c @@ -148,6 +148,8 @@ static int gpio_exar_probe(struct platform_device *pdev) mutex_init(&exar_gpio->lock); index = ida_simple_get(&ida_index, 0, 0, GFP_KERNEL); + if (index < 0) + goto err_destroy; sprintf(exar_gpio->name, "exar_gpio%d", index); exar_gpio->gpio_chip.label = exar_gpio->name; From c5bc6e526d3f217ed2cc3681d256dc4a2af4cc2b Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Mon, 11 Mar 2019 21:29:37 +0800 Subject: [PATCH 258/468] gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input Current code test wrong value so it does not verify if the written data is correctly read back. Fix it. Also make it return -EPERM if read value does not match written bit, just like it done for adnp_gpio_direction_output(). Fixes: 5e969a401a01 ("gpio: Add Avionic Design N-bit GPIO expander support") Cc: Signed-off-by: Axel Lin Reviewed-by: Thierry Reding Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpio-adnp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-adnp.c b/drivers/gpio/gpio-adnp.c index 91b90c0cea73..12acdac85820 100644 --- a/drivers/gpio/gpio-adnp.c +++ b/drivers/gpio/gpio-adnp.c @@ -132,8 +132,10 @@ static int adnp_gpio_direction_input(struct gpio_chip *chip, unsigned offset) if (err < 0) goto out; - if (err & BIT(pos)) - err = -EACCES; + if (value & BIT(pos)) { + err = -EPERM; + goto out; + } err = 0; From b45a02e13ee74b6fde56df4d76786058821a3aba Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 19 Mar 2019 15:54:16 +0100 Subject: [PATCH 259/468] gpio: amd-fch: Fix bogus SPDX identifier spdxcheck.py complains: include/linux/platform_data/gpio/gpio-amd-fch.h: 1:28 Invalid License ID: GPL+ which is correct because GPL+ is not a valid identifier. Of course this could have been caught by checkpatch.pl _before_ submitting or merging the patch. WARNING: 'SPDX-License-Identifier: GPL+ */' is not supported in LICENSES/... #271: FILE: include/linux/platform_data/gpio/gpio-amd-fch.h:1: +/* SPDX-License-Identifier: GPL+ */ Fix it under the assumption that the author meant GPL-2.0+, which makes sense as the corresponding C file is using that identifier. Fixes: e09d168f13f0 ("gpio: AMD G-Series PCH gpio driver") Signed-off-by: Thomas Gleixner Signed-off-by: Bartosz Golaszewski --- include/linux/platform_data/gpio/gpio-amd-fch.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/platform_data/gpio/gpio-amd-fch.h b/include/linux/platform_data/gpio/gpio-amd-fch.h index a867637e172d..9e46678edb2a 100644 --- a/include/linux/platform_data/gpio/gpio-amd-fch.h +++ b/include/linux/platform_data/gpio/gpio-amd-fch.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: GPL+ */ +/* SPDX-License-Identifier: GPL-2.0+ */ /* * AMD FCH gpio driver platform-data From 6cbcf596934c8e16d6288c7cc62dfb7ad8eadf15 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 22 Mar 2019 17:50:15 +0200 Subject: [PATCH 260/468] xhci: Fix port resume done detection for SS ports with LPM enabled A suspended SS port in U3 link state will go to U0 when resumed, but can almost immediately after that enter U1 or U2 link power save states before host controller driver reads the port status. Host controller driver only checks for U0 state, and might miss the finished resume, leaving flags unclear and skip notifying usb code of the wake. Add U1 and U2 to the possible link states when checking for finished port resume. Cc: stable Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 40fa25c4d041..9215a28dad40 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1647,10 +1647,13 @@ static void handle_port_status(struct xhci_hcd *xhci, } } - if ((portsc & PORT_PLC) && (portsc & PORT_PLS_MASK) == XDEV_U0 && - DEV_SUPERSPEED_ANY(portsc)) { + if ((portsc & PORT_PLC) && + DEV_SUPERSPEED_ANY(portsc) && + ((portsc & PORT_PLS_MASK) == XDEV_U0 || + (portsc & PORT_PLS_MASK) == XDEV_U1 || + (portsc & PORT_PLS_MASK) == XDEV_U2)) { xhci_dbg(xhci, "resume SS port %d finished\n", port_id); - /* We've just brought the device into U0 through either the + /* We've just brought the device into U0/1/2 through either the * Resume state after a device remote wakeup, or through the * U3Exit state after a host-initiated resume. If it's a device * initiated remote wake, don't pass up the link state change, From 8867ea262196a6945c24a0fb739575af646ec0e9 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 22 Mar 2019 17:50:16 +0200 Subject: [PATCH 261/468] usb: xhci: dbc: Don't free all memory with spinlock held The xhci debug capability (DbC) feature did its memory cleanup with spinlock held. dma_free_coherent() warns if called with interrupts disabled move the memory cleanup outside the spinlock Cc: stable Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-dbgcap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/usb/host/xhci-dbgcap.c b/drivers/usb/host/xhci-dbgcap.c index c78be578abb0..d932cc31711e 100644 --- a/drivers/usb/host/xhci-dbgcap.c +++ b/drivers/usb/host/xhci-dbgcap.c @@ -516,7 +516,6 @@ static int xhci_do_dbc_stop(struct xhci_hcd *xhci) return -1; writel(0, &dbc->regs->control); - xhci_dbc_mem_cleanup(xhci); dbc->state = DS_DISABLED; return 0; @@ -562,8 +561,10 @@ static void xhci_dbc_stop(struct xhci_hcd *xhci) ret = xhci_do_dbc_stop(xhci); spin_unlock_irqrestore(&dbc->lock, flags); - if (!ret) + if (!ret) { + xhci_dbc_mem_cleanup(xhci); pm_runtime_put_sync(xhci_to_hcd(xhci)->self.controller); + } } static void From d92f2c59cc2cbca6bfb2cc54882b58ba76b15fd4 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 22 Mar 2019 17:50:17 +0200 Subject: [PATCH 262/468] xhci: Don't let USB3 ports stuck in polling state prevent suspend Commit 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected") was intended to prevent ports that were still link training from being forced to U3 suspend state mid enumeration. This solved enumeration issues for devices with slow link training. Turns out some devices are stuck in the link training/polling state, and thus that patch will prevent suspend completely for these devices. This is seen with USB3 card readers in some MacBooks. Instead of preventing suspend, give some time to complete the link training. On successful training the port will end up as connected and enabled. If port instead is stuck in link training the bus suspend will continue suspending after 360ms (10 * 36ms) timeout (tPollingLFPSTimeout). Original patch was sent to stable, this one should go there as well Fixes: 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-hub.c | 19 ++++++++++++------- drivers/usb/host/xhci.h | 8 ++++++++ 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index e2eece693655..96a740543183 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -1545,20 +1545,25 @@ int xhci_bus_suspend(struct usb_hcd *hcd) port_index = max_ports; while (port_index--) { u32 t1, t2; - + int retries = 10; +retry: t1 = readl(ports[port_index]->addr); t2 = xhci_port_state_to_neutral(t1); portsc_buf[port_index] = 0; - /* Bail out if a USB3 port has a new device in link training */ - if ((hcd->speed >= HCD_USB3) && + /* + * Give a USB3 port in link training time to finish, but don't + * prevent suspend as port might be stuck + */ + if ((hcd->speed >= HCD_USB3) && retries-- && (t1 & PORT_PLS_MASK) == XDEV_POLLING) { - bus_state->bus_suspended = 0; spin_unlock_irqrestore(&xhci->lock, flags); - xhci_dbg(xhci, "Bus suspend bailout, port in polling\n"); - return -EBUSY; + msleep(XHCI_PORT_POLLING_LFPS_TIME); + spin_lock_irqsave(&xhci->lock, flags); + xhci_dbg(xhci, "port %d polling in bus suspend, waiting\n", + port_index); + goto retry; } - /* suspend ports in U0, or bail out for new connect changes */ if ((t1 & PORT_PE) && (t1 & PORT_PLS_MASK) == XDEV_U0) { if ((t1 & PORT_CSC) && wake_enabled) { diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 652dc36e3012..9334cdee382a 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -452,6 +452,14 @@ struct xhci_op_regs { */ #define XHCI_DEFAULT_BESL 4 +/* + * USB3 specification define a 360ms tPollingLFPSTiemout for USB3 ports + * to complete link training. usually link trainig completes much faster + * so check status 10 times with 36ms sleep in places we need to wait for + * polling to complete. + */ +#define XHCI_PORT_POLLING_LFPS_TIME 36 + /** * struct xhci_intr_reg - Interrupt Register Set * @irq_pending: IMAN - Interrupt Management Register. Used to enable From 4fc90fb883fcb72d6bfbf84d554a3e820a05ef62 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 22 Mar 2019 15:51:36 +0100 Subject: [PATCH 263/468] ALSA: hda/ca0132 - Simplify alt firmware loading code ca0132 codec driver loads the firmware selectively depending on the model in addition to the fallback of the default firmware. The code works good, but a minor problem is that the current code seems confusing for Clang where it spews a warning about uninitialized variable. This patch simplifies the code flow for such a false-positive warning. After this refactoring, the ca0132_spec.alt_firmware_present field is no longer used, hence it's eliminated as well. Reported-and-tested-by: Arnd Bergmann Reviewed-by: Nathan Chancellor Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_ca0132.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c index 29882bda7632..e1ebc6d5f382 100644 --- a/sound/pci/hda/patch_ca0132.c +++ b/sound/pci/hda/patch_ca0132.c @@ -1005,7 +1005,6 @@ struct ca0132_spec { unsigned int scp_resp_header; unsigned int scp_resp_data[4]; unsigned int scp_resp_count; - bool alt_firmware_present; bool startup_check_entered; bool dsp_reload; @@ -7518,7 +7517,7 @@ static bool ca0132_download_dsp_images(struct hda_codec *codec) bool dsp_loaded = false; struct ca0132_spec *spec = codec->spec; const struct dsp_image_seg *dsp_os_image; - const struct firmware *fw_entry; + const struct firmware *fw_entry = NULL; /* * Alternate firmwares for different variants. The Recon3Di apparently * can use the default firmware, but I'll leave the option in case @@ -7529,33 +7528,26 @@ static bool ca0132_download_dsp_images(struct hda_codec *codec) case QUIRK_R3D: case QUIRK_AE5: if (request_firmware(&fw_entry, DESKTOP_EFX_FILE, - codec->card->dev) != 0) { + codec->card->dev) != 0) codec_dbg(codec, "Desktop firmware not found."); - spec->alt_firmware_present = false; - } else { + else codec_dbg(codec, "Desktop firmware selected."); - spec->alt_firmware_present = true; - } break; case QUIRK_R3DI: if (request_firmware(&fw_entry, R3DI_EFX_FILE, - codec->card->dev) != 0) { + codec->card->dev) != 0) codec_dbg(codec, "Recon3Di alt firmware not detected."); - spec->alt_firmware_present = false; - } else { + else codec_dbg(codec, "Recon3Di firmware selected."); - spec->alt_firmware_present = true; - } break; default: - spec->alt_firmware_present = false; break; } /* * Use default ctefx.bin if no alt firmware is detected, or if none * exists for your particular codec. */ - if (!spec->alt_firmware_present) { + if (!fw_entry) { codec_dbg(codec, "Default firmware selected."); if (request_firmware(&fw_entry, EFX_FILE, codec->card->dev) != 0) From d84dd3fb82fa7a094de7f08f10610d55a70cf0ca Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 19 Mar 2019 11:24:54 -0400 Subject: [PATCH 264/468] SUNRPC: Don't let RPC_SOFTCONN tasks time out if the transport is connected If the transport is still connected, then we do want to allow RPC_SOFTCONN tasks to retry. They should time out if and only if the connection is broken. Signed-off-by: Trond Myklebust --- net/sunrpc/clnt.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 228970e6e52b..187d10443a15 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2311,6 +2311,15 @@ call_status(struct rpc_task *task) rpc_exit(task, status); } +static bool +rpc_check_connected(const struct rpc_rqst *req) +{ + /* No allocated request or transport? return true */ + if (!req || !req->rq_xprt) + return true; + return xprt_connected(req->rq_xprt); +} + static void rpc_check_timeout(struct rpc_task *task) { @@ -2322,10 +2331,11 @@ rpc_check_timeout(struct rpc_task *task) dprintk("RPC: %5u call_timeout (major)\n", task->tk_pid); task->tk_timeouts++; - if (RPC_IS_SOFTCONN(task)) { + if (RPC_IS_SOFTCONN(task) && !rpc_check_connected(task->tk_rqstp)) { rpc_exit(task, -ETIMEDOUT); return; } + if (RPC_IS_SOFT(task)) { if (clnt->cl_chatty) { printk(KERN_NOTICE "%s: server %s not responding, timed out\n", From 5a698243930c441afccec04e4d5dc8febfd2b775 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 21 Mar 2019 17:57:56 -0400 Subject: [PATCH 265/468] NFS: Fix a typo in nfs_init_timeout_values() Specifying a retrans=0 mount parameter to a NFS/TCP mount, is inadvertently causing the NFS client to rewrite any specified timeout parameter to the default of 60 seconds. Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust --- fs/nfs/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index fb1cf1a4bda2..90d71fda65ce 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -453,7 +453,7 @@ void nfs_init_timeout_values(struct rpc_timeout *to, int proto, case XPRT_TRANSPORT_RDMA: if (retrans == NFS_UNSPEC_RETRANS) to->to_retries = NFS_DEF_TCP_RETRANS; - if (timeo == NFS_UNSPEC_TIMEO || to->to_retries == 0) + if (timeo == NFS_UNSPEC_TIMEO || to->to_initval == 0) to->to_initval = NFS_DEF_TCP_TIMEO * HZ / 10; if (to->to_initval > NFS_MAX_TCP_TIMEOUT) to->to_initval = NFS_MAX_TCP_TIMEOUT; From 166bd5b889ac61369c34650887a5c6b899f5e976 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 22 Mar 2019 23:03:56 -0400 Subject: [PATCH 266/468] pNFS/flexfiles: Fix layoutstats handling during read failovers During a read failover, we may end up changing the value of the pgio_mirror_idx, so make sure that we record the layout stats before that update. Signed-off-by: Trond Myklebust --- fs/nfs/flexfilelayout/flexfilelayout.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index f9264e1922a2..6673d4ff5a2a 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1289,6 +1289,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg, static int ff_layout_read_done_cb(struct rpc_task *task, struct nfs_pgio_header *hdr) { + int new_idx = hdr->pgio_mirror_idx; int err; trace_nfs4_pnfs_read(hdr, task->tk_status); @@ -1307,7 +1308,7 @@ static int ff_layout_read_done_cb(struct rpc_task *task, case -NFS4ERR_RESET_TO_PNFS: if (ff_layout_choose_best_ds_for_read(hdr->lseg, hdr->pgio_mirror_idx + 1, - &hdr->pgio_mirror_idx)) + &new_idx)) goto out_layouterror; set_bit(NFS_IOHDR_RESEND_PNFS, &hdr->flags); return task->tk_status; @@ -1320,7 +1321,9 @@ static int ff_layout_read_done_cb(struct rpc_task *task, return 0; out_layouterror: + ff_layout_read_record_layoutstats_done(task, hdr); ff_layout_send_layouterror(hdr->lseg); + hdr->pgio_mirror_idx = new_idx; out_eagain: rpc_restart_call_prepare(task); return -EAGAIN; From a7fb107b7d8982ac76c958a0d2838a151b03e97e Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 21 Mar 2019 16:34:44 -0700 Subject: [PATCH 267/468] net: phy: Re-parent menus for MDIO bus drivers correctly After 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") the various MDIO bus drivers were no longer parented with config PHYLIB but with config MDIO_BUS which is not a menuconfig, fix this by depending on MDIO_DEVICE which is a menuconfig. This is visually nicer and less confusing for users. Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/Kconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 071869db44cf..520657945b82 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -7,6 +7,8 @@ menuconfig MDIO_DEVICE help MDIO devices and driver infrastructure code. +if MDIO_DEVICE + config MDIO_BUS tristate default m if PHYLIB=m @@ -179,6 +181,7 @@ config MDIO_XGENE APM X-Gene SoC's. endif +endif config PHYLINK tristate From fa3a419d2f674b431d38748cb58fb7da17ee8949 Mon Sep 17 00:00:00 2001 From: Wen Yang Date: Fri, 22 Mar 2019 11:04:07 +0800 Subject: [PATCH 268/468] net: xilinx: fix possible object reference leak The call to of_parse_phandle returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1624:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1569, but without a corresponding object release within this function. Signed-off-by: Wen Yang Cc: Anirudha Sarangi Cc: John Linn Cc: "David S. Miller" Cc: Michal Simek Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index ec7e7ec24ff9..4041c75997ba 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1575,12 +1575,14 @@ static int axienet_probe(struct platform_device *pdev) ret = of_address_to_resource(np, 0, &dmares); if (ret) { dev_err(&pdev->dev, "unable to get DMA resource\n"); + of_node_put(np); goto free_netdev; } lp->dma_regs = devm_ioremap_resource(&pdev->dev, &dmares); if (IS_ERR(lp->dma_regs)) { dev_err(&pdev->dev, "could not map DMA regs\n"); ret = PTR_ERR(lp->dma_regs); + of_node_put(np); goto free_netdev; } lp->rx_irq = irq_of_parse_and_map(np, 1); From be693df3cf9dd113ff1d2c0d8150199efdba37f6 Mon Sep 17 00:00:00 2001 From: Wen Yang Date: Fri, 22 Mar 2019 11:04:08 +0800 Subject: [PATCH 269/468] net: ibm: fix possible object reference leak The call to ehea_get_eth_dn returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/net/ethernet/ibm/ehea/ehea_main.c:3163:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3154, but without a corresponding object release within this function. Signed-off-by: Wen Yang Cc: Douglas Miller Cc: "David S. Miller" Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller --- drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c index 3baabdc89726..90b62c1412c8 100644 --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c @@ -3160,6 +3160,7 @@ static ssize_t ehea_probe_port(struct device *dev, if (ehea_add_adapter_mr(adapter)) { pr_err("creating MR failed\n"); + of_node_put(eth_dn); return -EIO; } From 75eac7b5f68b0a0671e795ac636457ee27cc11d8 Mon Sep 17 00:00:00 2001 From: Wen Yang Date: Fri, 22 Mar 2019 11:04:09 +0800 Subject: [PATCH 270/468] net: ethernet: ti: fix possible object reference leak The call to of_get_child_by_name returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/net/ethernet/ti/netcp_ethss.c:3661:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function. ./drivers/net/ethernet/ti/netcp_ethss.c:3665:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function. Signed-off-by: Wen Yang Cc: Wingman Kwok Cc: Murali Karicheri Cc: "David S. Miller" Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller --- drivers/net/ethernet/ti/netcp_ethss.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 5174d318901e..0a920c5936b2 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -3657,12 +3657,16 @@ static int gbe_probe(struct netcp_device *netcp_device, struct device *dev, ret = netcp_txpipe_init(&gbe_dev->tx_pipe, netcp_device, gbe_dev->dma_chan_name, gbe_dev->tx_queue_id); - if (ret) + if (ret) { + of_node_put(interfaces); return ret; + } ret = netcp_txpipe_open(&gbe_dev->tx_pipe); - if (ret) + if (ret) { + of_node_put(interfaces); return ret; + } /* Create network interfaces */ INIT_LIST_HEAD(&gbe_dev->gbe_intf_head); From 23c78343ec36990709b636a9e02bad814f4384ad Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 22 Mar 2019 07:39:35 +0100 Subject: [PATCH 271/468] r8169: fix cable re-plugging issue Bartek reported that after few cable unplug/replug cycles suddenly replug isn't detected any longer. His system uses a RTL8106, I wasn't able to reproduce the issue with RTL8168g. According to his bisect the referenced commit caused the regression. As Realtek doesn't release datasheets or errata it's hard to say what's the actual root cause, but this change was reported to fix the issue. Fixes: 38caff5a445b ("r8169: handle all interrupt events in the hard irq handler") Reported-by: Bartosz Skrzypczak Suggested-by: Bartosz Skrzypczak Tested-by: Bartosz Skrzypczak Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 9dd1cd2c0c68..7562ccbbb39a 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6542,7 +6542,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) set_bit(RTL_FLAG_TASK_RESET_PENDING, tp->wk.flags); } - if (status & RTL_EVENT_NAPI) { + if (status & (RTL_EVENT_NAPI | LinkChg)) { rtl_irq_disable(tp); napi_schedule_irqoff(&tp->napi); } From 064c5d6881e897077639e04973de26440ee205e6 Mon Sep 17 00:00:00 2001 From: John Hurley Date: Fri, 22 Mar 2019 12:37:35 +0000 Subject: [PATCH 272/468] net: sched: fix cleanup NULL pointer exception in act_mirr A new mirred action is created by the tcf_mirred_init function. This contains a list head struct which is inserted into a global list on successful creation of a new action. However, after a creation, it is still possible to error out and call the tcf_idr_release function. This, in turn, calls the act_mirr cleanup function via __tcf_idr_release and __tcf_action_put. This cleanup function tries to delete the list entry which is as yet uninitialised, leading to a NULL pointer exception. Fix this by initialising the list entry on creation of a new action. Bug report: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186 Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015 RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred] Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00 RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0 RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000 R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001 R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000 FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0 Call Trace: tcf_action_cleanup+0x59/0xca __tcf_action_put+0x54/0x6b __tcf_idr_release.cold.33+0x9/0x12 tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred] tcf_action_init_1+0x3d0/0x4c0 tcf_action_init+0x9c/0x130 tcf_exts_validate+0xab/0xc0 fl_change+0x1ca/0x982 [cls_flower] tc_new_tfilter+0x647/0x8d0 ? load_balance+0x14b/0x9e0 rtnetlink_rcv_msg+0xe3/0x370 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? _cond_resched+0x15/0x30 ? __kmalloc_node_track_caller+0x1d4/0x2b0 ? rtnl_calcit.isra.31+0xf0/0xf0 netlink_rcv_skb+0x49/0x110 netlink_unicast+0x16f/0x210 netlink_sendmsg+0x1df/0x390 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x27b/0x2c0 ? futex_wake+0x80/0x140 ? do_futex+0x2b9/0xac0 ? ep_scan_ready_list.constprop.22+0x1f2/0x210 ? ep_poll+0x7a/0x430 __sys_sendmsg+0x47/0x80 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock") Signed-off-by: John Hurley Reviewed-by: Jakub Kicinski Acked-by: Cong Wang Signed-off-by: David S. Miller --- net/sched/act_mirred.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index cd712e4e8998..17cc6bd4c57c 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -159,12 +159,15 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, tcf_idr_release(*a, bind); return -EEXIST; } + + m = to_mirred(*a); + if (ret == ACT_P_CREATED) + INIT_LIST_HEAD(&m->tcfm_list); + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); if (err < 0) goto release_idr; - m = to_mirred(*a); - spin_lock_bh(&m->tcf_lock); if (parm->ifindex) { From 737889efe9713a0f20a75fd0de952841d9275e6b Mon Sep 17 00:00:00 2001 From: Jon Maloy Date: Fri, 22 Mar 2019 15:03:51 +0100 Subject: [PATCH 273/468] tipc: tipc clang warning When checking the code with clang -Wsometimes-uninitialized we get the following warning: if (!tipc_link_is_establishing(l)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/tipc/node.c:847:46: note: uninitialized use occurs here tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr); net/tipc/node.c:831:2: note: remove the 'if' if its condition is always true if (!tipc_link_is_establishing(l)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/tipc/node.c:821:31: note: initialize the variable 'maddr' to silence this warning struct tipc_media_addr *maddr; We fix this by initializing 'maddr' to NULL. For the matter of clarity, we also test if 'xmitq' is non-empty before we use it and 'maddr' further down in the function. It will never happen that 'xmitq' is non- empty at the same time as 'maddr' is NULL, so this is a sufficient test. Fixes: 598411d70f85 ("tipc: make resetting of links non-atomic") Reported-by: Nathan Chancellor Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/node.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index 2dc4919ab23c..dd3b6dc17662 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -817,10 +817,10 @@ static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id, static void tipc_node_link_down(struct tipc_node *n, int bearer_id, bool delete) { struct tipc_link_entry *le = &n->links[bearer_id]; + struct tipc_media_addr *maddr = NULL; struct tipc_link *l = le->link; - struct tipc_media_addr *maddr; - struct sk_buff_head xmitq; int old_bearer_id = bearer_id; + struct sk_buff_head xmitq; if (!l) return; @@ -844,7 +844,8 @@ static void tipc_node_link_down(struct tipc_node *n, int bearer_id, bool delete) tipc_node_write_unlock(n); if (delete) tipc_mon_remove_peer(n->net, n->addr, old_bearer_id); - tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr); + if (!skb_queue_empty(&xmitq)) + tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr); tipc_sk_rcv(n->net, &le->inputq); } From 526949e877f44672d408bfe291e39860c13f2e24 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 22 Mar 2019 15:18:43 +0100 Subject: [PATCH 274/468] rxrpc: avoid clang -Wuninitialized warning clang produces a false-positive warning as it fails to notice that "lost = true" implies that "ret" is initialized: net/rxrpc/output.c:402:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (lost) ^~~~ net/rxrpc/output.c:437:6: note: uninitialized use occurs here if (ret >= 0) { ^~~ net/rxrpc/output.c:402:2: note: remove the 'if' if its condition is always false if (lost) ^~~~~~~~~ net/rxrpc/output.c:339:9: note: initialize the variable 'ret' to silence this warning int ret, opt; ^ = 0 Rearrange the code to make that more obvious and avoid the warning. Signed-off-by: Arnd Bergmann Reviewed-by: Nathan Chancellor Signed-off-by: David S. Miller --- net/rxrpc/output.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 736aa9281100..004c762c2e8d 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -335,7 +335,6 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, struct kvec iov[2]; rxrpc_serial_t serial; size_t len; - bool lost = false; int ret, opt; _enter(",{%d}", skb->len); @@ -393,14 +392,14 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, static int lose; if ((lose++ & 7) == 7) { ret = 0; - lost = true; + trace_rxrpc_tx_data(call, sp->hdr.seq, serial, + whdr.flags, retrans, true); + goto done; } } - trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags, - retrans, lost); - if (lost) - goto done; + trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags, retrans, + false); /* send the packet with the don't fragment bit set if we currently * think it's small enough */ From 13f063815265c5397ee92d84436804bc9fb6b58b Mon Sep 17 00:00:00 2001 From: Yufen Yu Date: Sun, 24 Mar 2019 17:57:07 +0800 Subject: [PATCH 275/468] blk-mq: use blk_mq_put_driver_tag() to put tag Expect arguments, blk_mq_put_driver_tag_hctx() and blk_mq_put_driver_tag() is same. We can just use argument 'request' to put tag by blk_mq_put_driver_tag(). Then we can remove the unused blk_mq_put_driver_tag_hctx(). Signed-off-by: Yufen Yu Signed-off-by: Jens Axboe --- block/blk-flush.c | 4 ++-- block/blk-mq.h | 9 --------- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 6e0f2d97fc6d..d95f94892015 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -220,7 +220,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); flush_rq->tag = -1; } else { - blk_mq_put_driver_tag_hctx(hctx, flush_rq); + blk_mq_put_driver_tag(flush_rq); flush_rq->internal_tag = -1; } @@ -324,7 +324,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error) if (q->elevator) { WARN_ON(rq->tag < 0); - blk_mq_put_driver_tag_hctx(hctx, rq); + blk_mq_put_driver_tag(rq); } /* diff --git a/block/blk-mq.h b/block/blk-mq.h index 0ed8e5a8729f..d704fc7766f4 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -224,15 +224,6 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, } } -static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx, - struct request *rq) -{ - if (rq->tag == -1 || rq->internal_tag == -1) - return; - - __blk_mq_put_driver_tag(hctx, rq); -} - static inline void blk_mq_put_driver_tag(struct request *rq) { if (rq->tag == -1 || rq->internal_tag == -1) From 85fae294e1a506b4213668716acb586bd6b4ae1e Mon Sep 17 00:00:00 2001 From: Yufen Yu Date: Sun, 24 Mar 2019 17:57:08 +0800 Subject: [PATCH 276/468] blk-mq: update comment for blk_mq_hctx_has_pending() For now, blk_mq_hctx_has_pending() checks any of ctx, hctx->dispatch or io scheduler have pending work. So, update the comment accordingly. Signed-off-by: Yufen Yu Signed-off-by: Jens Axboe --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 70b210a308c4..28080b0235f0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -59,7 +59,8 @@ static int blk_mq_poll_stats_bkt(const struct request *rq) } /* - * Check if any of the ctx's have pending work in this hardware queue + * Check if any of the ctx, dispatch list or elevator + * have pending work in this hardware queue. */ static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) { From 7f2daa96759b0700ad28579133aa91bc663632a7 Mon Sep 17 00:00:00 2001 From: Peng Hao Date: Sun, 10 Mar 2019 01:29:44 +0800 Subject: [PATCH 277/468] x86/resctrl: Remove unused variable Variable "struct rdt_resource *r" is set but not used. So remove it. Signed-off-by: Peng Hao Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1552152584-26087-1-git-send-email-peng.hao2@zte.com.cn --- arch/x86/kernel/cpu/resctrl/monitor.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index f33f11f69078..1573a0a6b525 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -501,11 +501,8 @@ void cqm_handle_limbo(struct work_struct *work) void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms) { unsigned long delay = msecs_to_jiffies(delay_ms); - struct rdt_resource *r; int cpu; - r = &rdt_resources_all[RDT_RESOURCE_L3]; - cpu = cpumask_any(&dom->cpu_mask); dom->cqm_work_cpu = cpu; From b6a36e5ddf8496ec83a14589f32f58bbaf022046 Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Fri, 15 Mar 2019 11:46:21 +1000 Subject: [PATCH 278/468] drm/fb: avoid setting 0 depth. If the downscaling fails and we end up with a best_depth of 0, then ignore it. This actually works around a cascade of failure, but it the simplest fix for now. The scaling patch broke the udl driver, as the udl driver doesn't expose planes at all, so gets the two default 32-bit formats, but the udl driver then ask for 16bpp fbdev, and the scaling code falls over. This fixes the udl driver since the scaled depth support was added. Fixes: f4bd542bcaee ("drm/fb-helper: Scale back depth to supported maximum") Cc: Daniel Vetter Reviewed-by: Linus Walleij Signed-off-by: Dave Airlie Link: https://patchwork.freedesktop.org/patch/msgid/20190315014621.21816-2-airlied@gmail.com --- drivers/gpu/drm/drm_fb_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 0e9349ff2d16..af2ab640cadb 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1963,7 +1963,7 @@ static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper, best_depth = fmt->depth; } } - if (sizes.surface_depth != best_depth) { + if (sizes.surface_depth != best_depth && best_depth) { DRM_INFO("requested bpp %d, scaled depth down to %d", sizes.surface_bpp, best_depth); sizes.surface_depth = best_depth; From 1d382264d911d91a8be5dbed1f0e053eb3245d81 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Sat, 23 Mar 2019 01:49:10 +0100 Subject: [PATCH 279/468] bpf, libbpf: fix version info and add it to shared object Even though libbpf's versioning script for the linker (libbpf.map) is pointing to 0.0.2, the BPF_EXTRAVERSION in the Makefile has not been updated along with it and is therefore still on 0.0.1. While fixing up, I also noticed that the generated shared object versioning information is missing, typical convention is to have a linker name (libbpf.so), soname (libbpf.so.0) and real name (libbpf.so.0.0.2) for library management. This is based upon the LIBBPF_VERSION as well. The build will then produce the following bpf libraries: # ll libbpf* libbpf.a libbpf.so -> libbpf.so.0.0.2 libbpf.so.0 -> libbpf.so.0.0.2 libbpf.so.0.0.2 # readelf -d libbpf.so.0.0.2 | grep SONAME 0x000000000000000e (SONAME) Library soname: [libbpf.so.0] And install them accordingly: # rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld install Auto-detecting system features: ... libelf: [ on ] ... bpf: [ on ] CC /tmp/bld/libbpf.o CC /tmp/bld/bpf.o CC /tmp/bld/nlattr.o CC /tmp/bld/btf.o CC /tmp/bld/libbpf_errno.o CC /tmp/bld/str_error.o CC /tmp/bld/netlink.o CC /tmp/bld/bpf_prog_linfo.o CC /tmp/bld/libbpf_probes.o CC /tmp/bld/xsk.o LD /tmp/bld/libbpf-in.o LINK /tmp/bld/libbpf.a LINK /tmp/bld/libbpf.so.0.0.2 LINK /tmp/bld/test_libbpf INSTALL /tmp/bld/libbpf.a INSTALL /tmp/bld/libbpf.so.0.0.2 # ll /usr/local/lib64/libbpf.* /usr/local/lib64/libbpf.a /usr/local/lib64/libbpf.so -> libbpf.so.0.0.2 /usr/local/lib64/libbpf.so.0 -> libbpf.so.0.0.2 /usr/local/lib64/libbpf.so.0.0.2 Fixes: 1bf4b05810fe ("tools: bpftool: add probes for eBPF program types") Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check") Reported-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/Makefile | 42 ++++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 61aaacf0cfa1..5bf8e52c41fc 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -3,7 +3,7 @@ BPF_VERSION = 0 BPF_PATCHLEVEL = 0 -BPF_EXTRAVERSION = 1 +BPF_EXTRAVERSION = 2 MAKEFLAGS += --no-print-directory @@ -79,8 +79,6 @@ export prefix libdir src obj libdir_SQ = $(subst ','\'',$(libdir)) libdir_relative_SQ = $(subst ','\'',$(libdir_relative)) -LIB_FILE = libbpf.a libbpf.so - VERSION = $(BPF_VERSION) PATCHLEVEL = $(BPF_PATCHLEVEL) EXTRAVERSION = $(BPF_EXTRAVERSION) @@ -88,7 +86,10 @@ EXTRAVERSION = $(BPF_EXTRAVERSION) OBJ = $@ N = -LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION) +LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION) + +LIB_TARGET = libbpf.a libbpf.so.$(LIBBPF_VERSION) +LIB_FILE = libbpf.a libbpf.so* # Set compile option CFLAGS ifdef EXTRA_CFLAGS @@ -128,16 +129,18 @@ all: export srctree OUTPUT CC LD CFLAGS V include $(srctree)/tools/build/Makefile.include -BPF_IN := $(OUTPUT)libbpf-in.o -LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE)) -VERSION_SCRIPT := libbpf.map +BPF_IN := $(OUTPUT)libbpf-in.o +VERSION_SCRIPT := libbpf.map + +LIB_TARGET := $(addprefix $(OUTPUT),$(LIB_TARGET)) +LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE)) GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \ awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}') VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l) -CMD_TARGETS = $(LIB_FILE) +CMD_TARGETS = $(LIB_TARGET) CXX_TEST_TARGET = $(OUTPUT)test_libbpf @@ -170,9 +173,13 @@ $(BPF_IN): force elfdep bpfdep echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true $(Q)$(MAKE) $(build)=libbpf -$(OUTPUT)libbpf.so: $(BPF_IN) - $(QUIET_LINK)$(CC) --shared -Wl,--version-script=$(VERSION_SCRIPT) \ - $^ -o $@ +$(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION) + +$(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN) + $(QUIET_LINK)$(CC) --shared -Wl,-soname,libbpf.so.$(VERSION) \ + -Wl,--version-script=$(VERSION_SCRIPT) $^ -o $@ + @ln -sf $(@F) $(OUTPUT)libbpf.so + @ln -sf $(@F) $(OUTPUT)libbpf.so.$(VERSION) $(OUTPUT)libbpf.a: $(BPF_IN) $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ @@ -192,6 +199,12 @@ check_abi: $(OUTPUT)libbpf.so exit 1; \ fi +define do_install_mkdir + if [ ! -d '$(DESTDIR_SQ)$1' ]; then \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$1'; \ + fi +endef + define do_install if [ ! -d '$(DESTDIR_SQ)$2' ]; then \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \ @@ -200,8 +213,9 @@ define do_install endef install_lib: all_cmd - $(call QUIET_INSTALL, $(LIB_FILE)) \ - $(call do_install,$(LIB_FILE),$(libdir_SQ)) + $(call QUIET_INSTALL, $(LIB_TARGET)) \ + $(call do_install_mkdir,$(libdir_SQ)); \ + cp -fpR $(LIB_FILE) $(DESTDIR)$(libdir_SQ) install_headers: $(call QUIET_INSTALL, headers) \ @@ -219,7 +233,7 @@ config-clean: clean: $(call QUIET_CLEAN, libbpf) $(RM) $(TARGETS) $(CXX_TEST_TARGET) \ - *.o *~ *.a *.so .*.d .*.cmd LIBBPF-CFLAGS + *.o *~ *.a *.so *.so.$(VERSION) .*.d .*.cmd LIBBPF-CFLAGS $(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP.libbpf From 63197f78bca2d86093126783b0ee6519bd652435 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Sat, 23 Mar 2019 01:49:11 +0100 Subject: [PATCH 280/468] bpf, libbpf: clarify bump in libbpf version info The current documentation suggests that we would need to bump the libbpf version on every change. Lets clarify this a bit more and reflect what we do today in practice, that is, bumping it once per development cycle. Fixes: 76d1b894c515 ("libbpf: Document API and ABI conventions") Reported-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/lib/bpf/README.rst b/tools/lib/bpf/README.rst index 5788479384ca..cef7b77eab69 100644 --- a/tools/lib/bpf/README.rst +++ b/tools/lib/bpf/README.rst @@ -111,6 +111,7 @@ starting from ``0.0.1``. Every time ABI is being changed, e.g. because a new symbol is added or semantic of existing symbol is changed, ABI version should be bumped. +This bump in ABI version is at most once per kernel development cycle. For example, if current state of ``libbpf.map`` is: From 3f04e0a6cfebf48152ac64502346cdc258811f79 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Noralf=20Tr=C3=B8nnes?= Date: Fri, 8 Feb 2019 15:01:02 +0100 Subject: [PATCH 281/468] drm: Fix drm_release() and device unplug MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If userspace has open fd(s) when drm_dev_unplug() is run, it will result in drm_dev_unregister() being called twice. First in drm_dev_unplug() and then later in drm_release() through the call to drm_put_dev(). Since userspace already holds a ref on drm_device through the drm_minor, it's not necessary to add extra ref counting based on no open file handles. Instead just drm_dev_put() unconditionally in drm_dev_unplug(). We now have this: - Userpace holds a ref on drm_device as long as there's open fd(s) - The driver holds a ref on drm_device as long as it's bound to the struct device When both sides are done with drm_device, it is released. Signed-off-by: Noralf Trønnes Reviewed-by: Oleksandr Andrushchenko Reviewed-by: Daniel Vetter Reviewed-by: Sean Paul Signed-off-by: Dave Airlie Link: https://patchwork.freedesktop.org/patch/msgid/20190208140103.28919-2-noralf@tronnes.org --- drivers/gpu/drm/drm_drv.c | 6 +----- drivers/gpu/drm/drm_file.c | 6 ++---- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 381581b01d48..05bbc2b622fc 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -376,11 +376,7 @@ void drm_dev_unplug(struct drm_device *dev) synchronize_srcu(&drm_unplug_srcu); drm_dev_unregister(dev); - - mutex_lock(&drm_global_mutex); - if (dev->open_count == 0) - drm_dev_put(dev); - mutex_unlock(&drm_global_mutex); + drm_dev_put(dev); } EXPORT_SYMBOL(drm_dev_unplug); diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 83a5bbca6e7e..7caa3c7ed978 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -489,11 +489,9 @@ int drm_release(struct inode *inode, struct file *filp) drm_close_helper(filp); - if (!--dev->open_count) { + if (!--dev->open_count) drm_lastclose(dev); - if (drm_dev_is_unplugged(dev)) - drm_put_dev(dev); - } + mutex_unlock(&drm_global_mutex); drm_minor_release(minor); From a51143001d9e0683ab6f7968a516fc9243527e44 Mon Sep 17 00:00:00 2001 From: Robert Tarasov Date: Thu, 14 Mar 2019 15:53:39 -0700 Subject: [PATCH 282/468] drm/udl: Refactor edid retrieving in UDL driver (v2) Now drm/udl driver uses drm_do_get_edid() function to retrieve and validate all blocks of EDID data. Old approach had insufficient validation routine and had problems with retrieving of extra blocks Signed-off-by: Robert Tarasov Reviewed-by: Jani Nikula Signed-off-by: Dave Airlie [airlied: Fix spelling mistakes] Link: https://patchwork.freedesktop.org/patch/msgid/20190314225339.162386-1-tutankhamen@chromium.org --- drivers/gpu/drm/udl/udl_connector.c | 72 +++++------------------------ 1 file changed, 11 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/udl/udl_connector.c b/drivers/gpu/drm/udl/udl_connector.c index 66885c24590f..c1bd5e3d9e4a 100644 --- a/drivers/gpu/drm/udl/udl_connector.c +++ b/drivers/gpu/drm/udl/udl_connector.c @@ -18,18 +18,19 @@ #include "udl_connector.h" #include "udl_drv.h" -static bool udl_get_edid_block(struct udl_device *udl, int block_idx, - u8 *buff) +static int udl_get_edid_block(void *data, u8 *buf, unsigned int block, + size_t len) { int ret, i; u8 *read_buff; + struct udl_device *udl = data; read_buff = kmalloc(2, GFP_KERNEL); if (!read_buff) - return false; + return -1; - for (i = 0; i < EDID_LENGTH; i++) { - int bval = (i + block_idx * EDID_LENGTH) << 8; + for (i = 0; i < len; i++) { + int bval = (i + block * EDID_LENGTH) << 8; ret = usb_control_msg(udl->udev, usb_rcvctrlpipe(udl->udev, 0), (0x02), (0x80 | (0x02 << 5)), bval, @@ -37,60 +38,13 @@ static bool udl_get_edid_block(struct udl_device *udl, int block_idx, if (ret < 1) { DRM_ERROR("Read EDID byte %d failed err %x\n", i, ret); kfree(read_buff); - return false; + return -1; } - buff[i] = read_buff[1]; + buf[i] = read_buff[1]; } kfree(read_buff); - return true; -} - -static bool udl_get_edid(struct udl_device *udl, u8 **result_buff, - int *result_buff_size) -{ - int i, extensions; - u8 *block_buff = NULL, *buff_ptr; - - block_buff = kmalloc(EDID_LENGTH, GFP_KERNEL); - if (block_buff == NULL) - return false; - - if (udl_get_edid_block(udl, 0, block_buff) && - memchr_inv(block_buff, 0, EDID_LENGTH)) { - extensions = ((struct edid *)block_buff)->extensions; - if (extensions > 0) { - /* we have to read all extensions one by one */ - *result_buff_size = EDID_LENGTH * (extensions + 1); - *result_buff = kmalloc(*result_buff_size, GFP_KERNEL); - buff_ptr = *result_buff; - if (buff_ptr == NULL) { - kfree(block_buff); - return false; - } - memcpy(buff_ptr, block_buff, EDID_LENGTH); - kfree(block_buff); - buff_ptr += EDID_LENGTH; - for (i = 1; i < extensions; ++i) { - if (udl_get_edid_block(udl, i, buff_ptr)) { - buff_ptr += EDID_LENGTH; - } else { - kfree(*result_buff); - *result_buff = NULL; - return false; - } - } - return true; - } - /* we have only base edid block */ - *result_buff = block_buff; - *result_buff_size = EDID_LENGTH; - return true; - } - - kfree(block_buff); - - return false; + return 0; } static int udl_get_modes(struct drm_connector *connector) @@ -122,8 +76,6 @@ static enum drm_mode_status udl_mode_valid(struct drm_connector *connector, static enum drm_connector_status udl_detect(struct drm_connector *connector, bool force) { - u8 *edid_buff = NULL; - int edid_buff_size = 0; struct udl_device *udl = connector->dev->dev_private; struct udl_drm_connector *udl_connector = container_of(connector, @@ -136,12 +88,10 @@ udl_detect(struct drm_connector *connector, bool force) udl_connector->edid = NULL; } - - if (!udl_get_edid(udl, &edid_buff, &edid_buff_size)) + udl_connector->edid = drm_do_get_edid(connector, udl_get_edid_block, udl); + if (!udl_connector->edid) return connector_status_disconnected; - udl_connector->edid = (struct edid *)edid_buff; - return connector_status_connected; } From 6cf4511e9729c00a7306cf94085f9cc3c52ee723 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Sun, 24 Mar 2019 18:10:02 -0500 Subject: [PATCH 283/468] gpio: aspeed: fix a potential NULL pointer dereference In case devm_kzalloc, the patch returns ENOMEM to avoid potential NULL pointer dereference. Signed-off-by: Kangjie Lu Reviewed-by: Andrew Jeffery Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpio-aspeed.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpio-aspeed.c b/drivers/gpio/gpio-aspeed.c index 854bce4fb9e7..217507002dbc 100644 --- a/drivers/gpio/gpio-aspeed.c +++ b/drivers/gpio/gpio-aspeed.c @@ -1224,6 +1224,8 @@ static int __init aspeed_gpio_probe(struct platform_device *pdev) gpio->offset_timer = devm_kzalloc(&pdev->dev, gpio->chip.ngpio, GFP_KERNEL); + if (!gpio->offset_timer) + return -ENOMEM; return aspeed_gpio_setup_irqs(gpio, pdev); } From aa9aaa4d61c0048d3faad056893cd7860bbc084c Mon Sep 17 00:00:00 2001 From: Erik Schmauss Date: Thu, 21 Mar 2019 18:20:21 -0700 Subject: [PATCH 284/468] ACPI: use different default debug value than ACPICA Rather than setting debug output flags during early init, its makes more sense to simply re-define ACPI_DEBUG_DEFAULT specifically for Linux. ACPICA commit 60903715711f4b00ca1831779a8a23279a66497d Link: https://github.com/acpica/acpica/commit/60903715 Fixes: ce5cbf53496b ("ACPI: Set debug output flags independent of ACPICA") Reported-by: Alexandru Gagniuc Tested-by: Alexandru Gagniuc Signed-off-by: Erik Schmauss Signed-off-by: Bob Moore Signed-off-by: Rafael J. Wysocki --- drivers/acpi/bus.c | 3 --- include/acpi/acoutput.h | 3 +++ include/acpi/platform/aclinux.h | 5 +++++ 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 6ecbbabf1233..eec263c9019e 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -1043,9 +1043,6 @@ void __init acpi_early_init(void) acpi_permanent_mmap = true; - /* Initialize debug output. Linux does not use ACPICA defaults */ - acpi_dbg_level = ACPI_LV_INFO | ACPI_LV_REPAIR; - #ifdef CONFIG_X86 /* * If the machine falls into the DMI check table, diff --git a/include/acpi/acoutput.h b/include/acpi/acoutput.h index 30b1ae53689f..c50542dc71e0 100644 --- a/include/acpi/acoutput.h +++ b/include/acpi/acoutput.h @@ -150,7 +150,10 @@ /* Defaults for debug_level, debug and normal */ +#ifndef ACPI_DEBUG_DEFAULT #define ACPI_DEBUG_DEFAULT (ACPI_LV_INIT | ACPI_LV_DEBUG_OBJECT | ACPI_LV_EVALUATION | ACPI_LV_REPAIR) +#endif + #define ACPI_NORMAL_DEFAULT (ACPI_LV_INIT | ACPI_LV_DEBUG_OBJECT | ACPI_LV_REPAIR) #define ACPI_DEBUG_ALL (ACPI_LV_AML_DISASSEMBLE | ACPI_LV_ALL_EXCEPTIONS | ACPI_LV_ALL) diff --git a/include/acpi/platform/aclinux.h b/include/acpi/platform/aclinux.h index 9ff328fd946a..624b90b34085 100644 --- a/include/acpi/platform/aclinux.h +++ b/include/acpi/platform/aclinux.h @@ -82,6 +82,11 @@ #define ACPI_NO_ERROR_MESSAGES #undef ACPI_DEBUG_OUTPUT +/* Use a specific bugging default separate from ACPICA */ + +#undef ACPI_DEBUG_DEFAULT +#define ACPI_DEBUG_DEFAULT (ACPI_LV_INFO | ACPI_LV_REPAIR) + /* External interface for __KERNEL__, stub is needed */ #define ACPI_EXTERNAL_RETURN_STATUS(prototype) \ From 776e78677f514ecddd12dba48b9040958999bd5a Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Fri, 22 Mar 2019 15:26:56 +0000 Subject: [PATCH 285/468] drm/meson: Fix invalid pointer in meson_drv_unbind() meson_drv_bind() registers a meson_drm struct as the device's privdata, but meson_drv_unbind() tries to retrieve a drm_device. This may cause a segfault on shutdown: [ 5194.593429] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000197 ... [ 5194.788850] Call trace: [ 5194.791349] drm_dev_unregister+0x1c/0x118 [drm] [ 5194.795848] meson_drv_unbind+0x50/0x78 [meson_drm] Retrieve the right pointer in meson_drv_unbind(). Fixes: bbbe775ec5b5 ("drm: Add support for Amlogic Meson Graphic Controller") Signed-off-by: Jean-Philippe Brucker Acked-by: Neil Armstrong Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20190322152657.13752-1-jean-philippe.brucker@arm.com --- drivers/gpu/drm/meson/meson_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/meson/meson_drv.c b/drivers/gpu/drm/meson/meson_drv.c index 2281ed3eb774..7e85802c5398 100644 --- a/drivers/gpu/drm/meson/meson_drv.c +++ b/drivers/gpu/drm/meson/meson_drv.c @@ -356,8 +356,8 @@ static int meson_drv_bind(struct device *dev) static void meson_drv_unbind(struct device *dev) { - struct drm_device *drm = dev_get_drvdata(dev); - struct meson_drm *priv = drm->dev_private; + struct meson_drm *priv = dev_get_drvdata(dev); + struct drm_device *drm = priv->drm; if (priv->canvas) { meson_canvas_free(priv->canvas, priv->canvas_id_osd1); From 2d8f92897ad816f5dda54b2ed2fd9f2d7cb1abde Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Fri, 22 Mar 2019 15:26:57 +0000 Subject: [PATCH 286/468] drm/meson: Uninstall IRQ handler meson_drv_unbind() doesn't unregister the IRQ handler, which can lead to use-after-free if the IRQ fires after unbind: [ 64.656876] Unable to handle kernel paging request at virtual address ffff000011706dbc ... [ 64.662001] pc : meson_irq+0x18/0x30 [meson_drm] I'm assuming that a similar problem could happen on the error path of bind(), so uninstall the IRQ handler there as well. Fixes: bbbe775ec5b5 ("drm: Add support for Amlogic Meson Graphic Controller") Signed-off-by: Jean-Philippe Brucker Acked-by: Neil Armstrong Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20190322152657.13752-2-jean-philippe.brucker@arm.com --- drivers/gpu/drm/meson/meson_drv.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/meson/meson_drv.c b/drivers/gpu/drm/meson/meson_drv.c index 7e85802c5398..8a4ebcb6405c 100644 --- a/drivers/gpu/drm/meson/meson_drv.c +++ b/drivers/gpu/drm/meson/meson_drv.c @@ -337,12 +337,14 @@ static int meson_drv_bind_master(struct device *dev, bool has_components) ret = drm_dev_register(drm, 0); if (ret) - goto free_drm; + goto uninstall_irq; drm_fbdev_generic_setup(drm, 32); return 0; +uninstall_irq: + drm_irq_uninstall(drm); free_drm: drm_dev_put(drm); @@ -367,6 +369,7 @@ static void meson_drv_unbind(struct device *dev) } drm_dev_unregister(drm); + drm_irq_uninstall(drm); drm_kms_helper_poll_fini(drm); drm_mode_config_cleanup(drm); drm_dev_put(drm); From 3d565a21f2ce1f37479e91914734478c39b5c6fc Mon Sep 17 00:00:00 2001 From: Neil Armstrong Date: Wed, 20 Mar 2019 09:11:10 +0100 Subject: [PATCH 287/468] drm/meson: fix TMDS clock filtering for DMT monitors DMT monitors does not necessarely report a maximum TMDS clock in a VSDB EDID extension. In this case, all modes are wrongly rejected, including the DRM fallback EDID. This patch only rejects modes whith clock > max_tmds_clock if the max_tmds_clock is specified. This will only reject 4:2:0 HDMI2.0 modes, who reports a clock > max_tmds_clock. Reported-by: Maxime Jourdan Fixes: d7d8fb7046b6 ("drm/meson: add HDMI div40 TMDS mode") Signed-off-by: Neil Armstrong Tested-by: Maxime Jourdan Reviewed-by: Maxime Jourdan Link: https://patchwork.freedesktop.org/patch/msgid/20190320081110.1718-1-narmstrong@baylibre.com --- drivers/gpu/drm/meson/meson_dw_hdmi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/meson/meson_dw_hdmi.c b/drivers/gpu/drm/meson/meson_dw_hdmi.c index e28814f4ea6c..563953ec6ad0 100644 --- a/drivers/gpu/drm/meson/meson_dw_hdmi.c +++ b/drivers/gpu/drm/meson/meson_dw_hdmi.c @@ -569,7 +569,8 @@ dw_hdmi_mode_valid(struct drm_connector *connector, DRM_DEBUG_DRIVER("Modeline " DRM_MODE_FMT "\n", DRM_MODE_ARG(mode)); /* If sink max TMDS clock, we reject the mode */ - if (mode->clock > connector->display_info.max_tmds_clock) + if (connector->display_info.max_tmds_clock && + mode->clock > connector->display_info.max_tmds_clock) return MODE_BAD; /* Check against non-VIC supported modes */ From d9470757398a700d9450a43508000bcfd010c7a4 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 22 Mar 2019 23:37:24 +1100 Subject: [PATCH 288/468] powerpc/64: Fix memcmp reading past the end of src/dest Chandan reported that fstests' generic/026 test hit a crash: BUG: Unable to handle kernel data access at 0xc00000062ac40000 Faulting instruction address: 0xc000000000092240 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1 NIP: c000000000092240 LR: c00000000066a55c CTR: 0000000000000000 REGS: c00000062c0c3430 TRAP: 0300 Not tainted (5.0.0-rc2-next-20190115-00001-g6de6dba64dda) MSR: 8000000002009033 CR: 44000842 XER: 20000000 CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0 GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660 GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4 GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0 GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002 GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000 GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000 NIP memcmp+0x120/0x690 LR xfs_attr3_leaf_lookup_int+0x53c/0x5b0 Call Trace: xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable) xfs_da3_node_lookup_int+0x32c/0x5a0 xfs_attr_node_addname+0x170/0x6b0 xfs_attr_set+0x2ac/0x340 __xfs_set_acl+0xf0/0x230 xfs_set_acl+0xd0/0x160 set_posix_acl+0xc0/0x130 posix_acl_xattr_set+0x68/0x110 __vfs_setxattr+0xa4/0x110 __vfs_setxattr_noperm+0xac/0x240 vfs_setxattr+0x128/0x130 setxattr+0x248/0x600 path_setxattr+0x108/0x120 sys_setxattr+0x28/0x40 system_call+0x5c/0x70 Instruction dump: 7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000 4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040 The instruction dump decodes as: subfic r6,r5,8 rlwinm r6,r6,3,0,28 ldbrx r9,0,r3 ldbrx r10,0,r4 <- Which shows us doing an 8 byte load from c00000062ac3fff9, which crosses the page boundary at c00000062ac40000 and faults. It's not OK for memcmp to read past the end of the source or destination buffers if that would cross a page boundary, because we don't know that the next page is mapped. As pointed out by Segher, we can read past the end of the source or destination as long as we don't cross a 4K boundary, because that's our minimum page size on all platforms. The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get there we know that s1 is 8-byte aligned and we have at least 1 byte to read, so a single 8-byte load won't read past the end of s1 and cross a page boundary. But we have to be more careful with s2. So check if it's within 8 bytes of a 4K boundary and if so go to the byte-by-byte loop. Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Chandan Rajendra Signed-off-by: Michael Ellerman Reviewed-by: Segher Boessenkool Tested-by: Chandan Rajendra Signed-off-by: Michael Ellerman --- arch/powerpc/lib/memcmp_64.S | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index 844d8e774492..b7f6f6e0b6e8 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -215,11 +215,20 @@ _GLOBAL_TOC(memcmp) beq .Lzero .Lcmp_rest_lt8bytes: - /* Here we have only less than 8 bytes to compare with. at least s1 - * Address is aligned with 8 bytes. - * The next double words are load and shift right with appropriate - * bits. + /* + * Here we have less than 8 bytes to compare. At least s1 is aligned to + * 8 bytes, but s2 may not be. We must make sure s2 + 7 doesn't cross a + * page boundary, otherwise we might read past the end of the buffer and + * trigger a page fault. We use 4K as the conservative minimum page + * size. If we detect that case we go to the byte-by-byte loop. + * + * Otherwise the next double word is loaded from s1 and s2, and shifted + * right to compare the appropriate bits. */ + clrldi r6,r4,(64-12) // r6 = r4 & 0xfff + cmpdi r6,0xff8 + bgt .Lshort + subfic r6,r5,8 slwi r6,r6,3 LD rA,0,r3 From 945ab8f6de94430c23a82f3cf2e3f6d6f2945ff7 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 25 Mar 2019 08:15:14 -0400 Subject: [PATCH 289/468] locks: wake any locks blocked on request before deadlock check Andreas reported that he was seeing the tdbtorture test fail in some cases with -EDEADLCK when it wasn't before. Some debugging showed that deadlock detection was sometimes discovering the caller's lock request itself in a dependency chain. While we remove the request from the blocked_lock_hash prior to reattempting to acquire it, any locks that are blocked on that request will still be present in the hash and will still have their fl_blocker pointer set to the current request. This causes posix_locks_deadlock to find a deadlock dependency chain when it shouldn't, as a lock request cannot block itself. We are going to end up waking all of those blocked locks anyway when we go to reinsert the request back into the blocked_lock_hash, so just do it prior to checking for deadlocks. This ensures that any lock blocked on the current request will no longer be part of any blocked request chain. URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975 Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.") Cc: stable@vger.kernel.org Reported-by: Andreas Schneider Signed-off-by: Neil Brown Signed-off-by: Jeff Layton --- fs/locks.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index eaa1cfaf73b0..71d0c6c2aac5 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1160,6 +1160,11 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, */ error = -EDEADLK; spin_lock(&blocked_lock_lock); + /* + * Ensure that we don't find any locks blocked on this + * request during deadlock detection. + */ + __locks_wake_up_blocks(request); if (likely(!posix_locks_deadlock(request, fl))) { error = FILE_LOCK_DEFERRED; __locks_insert_block(fl, request, From 8bc32a285660e13fdcf92ddaf5b8653abe112040 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Fri, 22 Mar 2019 16:52:17 +0100 Subject: [PATCH 290/468] iommu: Don't print warning when IOMMU driver only supports unmanaged domains Print the warning about the fall-back to IOMMU_DOMAIN_DMA in iommu_group_get_for_dev() only when such a domain was actually allocated. Otherwise the user will get misleading warnings in the kernel log when the iommu driver used doesn't support IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY. Fixes: fccb4e3b8ab09 ('iommu: Allow default domain type to be set on the kernel command line') Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 33a982e33716..109de67d5d72 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1105,10 +1105,12 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev) dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type); if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) { - dev_warn(dev, - "failed to allocate default IOMMU domain of type %u; falling back to IOMMU_DOMAIN_DMA", - iommu_def_domain_type); dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA); + if (dom) { + dev_warn(dev, + "failed to allocate default IOMMU domain of type %u; falling back to IOMMU_DOMAIN_DMA", + iommu_def_domain_type); + } } group->default_domain = dom; From ed79dac98c5e9f8471456afe2cc09a3912586b52 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Fri, 22 Mar 2019 18:10:22 -0700 Subject: [PATCH 291/468] xfs: prohibit fstrim in norecovery mode The xfs fstrim implementation uses the free space btrees to find free space that can be discarded. If we haven't recovered the log, the bnobt will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong Reviewed-by: Eric Sandeen --- fs/xfs/xfs_discard.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c index 93f07edafd81..9ee2a7d02e70 100644 --- a/fs/xfs/xfs_discard.c +++ b/fs/xfs/xfs_discard.c @@ -161,6 +161,14 @@ xfs_ioc_trim( return -EPERM; if (!blk_queue_discard(q)) return -EOPNOTSUPP; + + /* + * We haven't recovered the log, so we cannot use our bnobt-guided + * storage zapping commands. + */ + if (mp->m_flags & XFS_MOUNT_NORECOVERY) + return -EROFS; + if (copy_from_user(&range, urange, sizeof(range))) return -EFAULT; From 113ce08109f8e3b091399e7cc32486df1cff48e7 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 25 Mar 2019 10:38:58 +0100 Subject: [PATCH 292/468] ALSA: pcm: Don't suspend stream in unrecoverable PCM state Currently PCM core sets each opened stream forcibly to SUSPENDED state via snd_pcm_suspend_all() call, and the user-space is responsible for re-triggering the resume manually either via snd_pcm_resume() or prepare call. The scheme works fine usually, but there are corner cases where the stream can't be resumed by that call: the streams still in OPEN state before finishing hw_params. When they are suspended, user-space cannot perform resume or prepare because they haven't been set up yet. The only possible recovery is to re-open the device, which isn't nice at all. Similarly, when a stream is in DISCONNECTED state, it makes no sense to change it to SUSPENDED state. Ditto for in SETUP state; which you can re-prepare directly. So, this patch addresses these issues by filtering the PCM streams to be suspended by checking the PCM state. When a stream is in either OPEN, SETUP or DISCONNECTED as well as already SUSPENDED, the suspend action is skipped. To be noted, this problem was originally reported for the PCM runtime PM on HD-audio. And, the runtime PM problem itself was already addressed (although not intended) by the code refactoring commits 3d21ef0b49f8 ("ALSA: pcm: Suspend streams globally via device type PM ops") and 17bc4815de58 ("ALSA: pci: Remove superfluous snd_pcm_suspend*() calls"). These commits eliminated the snd_pcm_suspend*() calls from the runtime PM suspend callback code path, hence the racy OPEN state won't appear while runtime PM. (FWIW, the race window is between snd_pcm_open_substream() and the first power up in azx_pcm_open().) Although the runtime PM issue was already "fixed", the same problem is still present for the system PM, hence this patch is still needed. And for stable trees, this patch alone should suffice for fixing the runtime PM problem, too. Reported-and-tested-by: Jon Hunter Cc: Signed-off-by: Takashi Iwai --- sound/core/pcm_native.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index f731f904e8cc..1d8452912b14 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -1445,8 +1445,15 @@ static int snd_pcm_pause(struct snd_pcm_substream *substream, int push) static int snd_pcm_pre_suspend(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; - if (runtime->status->state == SNDRV_PCM_STATE_SUSPENDED) + switch (runtime->status->state) { + case SNDRV_PCM_STATE_SUSPENDED: return -EBUSY; + /* unresumable PCM state; return -EBUSY for skipping suspend */ + case SNDRV_PCM_STATE_OPEN: + case SNDRV_PCM_STATE_SETUP: + case SNDRV_PCM_STATE_DISCONNECTED: + return -EBUSY; + } runtime->trigger_master = substream; return 0; } From 2dbed152e2d4c3fe2442284918d14797898b1e8a Mon Sep 17 00:00:00 2001 From: Sekhar Nori Date: Wed, 20 Feb 2019 16:36:52 +0530 Subject: [PATCH 293/468] ARM: davinci: fix build failure with allnoconfig allnoconfig build with just ARCH_DAVINCI enabled fails because drivers/clk/davinci/* depends on REGMAP being enabled. Fix it by selecting REGMAP_MMIO when building in DaVinci support. Signed-off-by: Sekhar Nori Reviewed-by: David Lechner Signed-off-by: Arnd Bergmann --- arch/arm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 054ead960f98..850b4805e2d1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -596,6 +596,7 @@ config ARCH_DAVINCI select HAVE_IDE select PM_GENERIC_DOMAINS if PM select PM_GENERIC_DOMAINS_OF if PM && OF + select REGMAP_MMIO select RESET_CONTROLLER select SPARSE_IRQ select USE_OF From fa9463564e77067df81b0b8dec91adbbbc47bfb4 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Mon, 18 Mar 2019 16:31:22 +0100 Subject: [PATCH 294/468] ARM: dts: nomadik: Fix polarity of SPI CS The SPI DT bindings are for historical reasons a pitfall, the ability to flag a GPIO line as active high/low with the second cell flags was introduced later so the SPI subsystem will only accept the bool flag spi-cs-high to indicate that the line is active high. It worked by mistake, but the mistake was corrected in another commit. The comment in the DTS file was also misleading: this CS is indeed active high. Fixes: cffbb02dafa3 ("ARM: dts: nomadik: Augment NHK15 panel setting") Signed-off-by: Linus Walleij Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/ste-nomadik-nhk15.dts | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/ste-nomadik-nhk15.dts b/arch/arm/boot/dts/ste-nomadik-nhk15.dts index 04066f9cb8a3..f2f6558a00f1 100644 --- a/arch/arm/boot/dts/ste-nomadik-nhk15.dts +++ b/arch/arm/boot/dts/ste-nomadik-nhk15.dts @@ -213,12 +213,13 @@ spi { gpio-sck = <&gpio0 5 GPIO_ACTIVE_HIGH>; gpio-mosi = <&gpio0 4 GPIO_ACTIVE_HIGH>; /* - * It's not actually active high, but the frameworks assume - * the polarity of the passed-in GPIO is "normal" (active - * high) then actively drives the line low to select the - * chip. + * This chipselect is active high. Just setting the flags + * to GPIO_ACTIVE_HIGH is not enough for the SPI DT bindings, + * it will be ignored, only the special "spi-cs-high" flag + * really counts. */ cs-gpios = <&gpio0 6 GPIO_ACTIVE_HIGH>; + spi-cs-high; num-chipselects = <1>; /* From 9e75ad5d8f399a21c86271571aa630dd080223e2 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 25 Mar 2019 15:34:53 +0100 Subject: [PATCH 295/468] io_uring: fix big-endian compat signal mask handling On big-endian architectures, the signal masks are differnet between 32-bit and 64-bit tasks, so we have to use a different function for reading them from user space. io_cqring_wait() initially got this wrong, and always interprets this as a native structure. This is ok on x86 and most arm64, but not on s390, ppc64be, mips64be, sparc64 and parisc. Signed-off-by: Arnd Bergmann Signed-off-by: Jens Axboe --- fs/io_uring.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 6aaa30580a2b..8f48d29abf76 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return 0; if (sig) { - ret = set_user_sigmask(sig, &ksigmask, &sigsaved, sigsz); +#ifdef CONFIG_COMPAT + if (in_compat_syscall()) + ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig, + &ksigmask, &sigsaved, sigsz); + else +#endif + ret = set_user_sigmask(sig, &ksigmask, + &sigsaved, sigsz); + if (ret) return ret; } From 93958742192e7956d05989836ada9071f9ffe42e Mon Sep 17 00:00:00 2001 From: Jonathan Hunter Date: Mon, 25 Mar 2019 12:28:07 +0100 Subject: [PATCH 296/468] arm64: tegra: Disable CQE Support for SDMMC4 on Tegra186 Enabling CQE support on Tegra186 Jetson TX2 has introduced a regression that is causing accesses to the file-system on the eMMC to fail. Errors such as the following have been observed ... mmc2: running CQE recovery mmc2: mmc_select_hs400 failed, error -110 print_req_error: I/O error, dev mmcblk2, sector 8 flags 80700 mmc2: cqhci: CQE failed to exit halt state For now disable CQE support for Tegra186 until this issue is resolved. Fixes: dfd3cb6feb73 arm64: tegra: Add CQE Support for SDMMC4 Signed-off-by: Jonathan Hunter Signed-off-by: Thierry Reding Signed-off-by: Arnd Bergmann --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index bb2045be8814..97aeb946ed5e 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -321,7 +321,6 @@ sdmmc4: sdhci@3460000 { nvidia,default-trim = <0x9>; nvidia,dqs-trim = <63>; mmc-hs400-1_8v; - supports-cqe; status = "disabled"; }; From 9dfec7ca0ba7ef987b3bbb71333d2f2182f5e030 Mon Sep 17 00:00:00 2001 From: Pierre-Yves MORDRET Date: Mon, 25 Mar 2019 17:21:55 +0100 Subject: [PATCH 297/468] dmaengine: stm32-mdma: Revert "dmaengine: stm32-mdma: Add a check on read_u32_array" This reverts commit 906b40b246b0 ("dmaengine: stm32-mdma: Add a check on read_u32_array") As stated by bindings "st,ahb-addr-masks" is optional. The statement inserted by this commit makes this property mandatory and prevents MDMA to be probed in case property not present. Signed-off-by: Pierre-Yves MORDRET Signed-off-by: Vinod Koul --- drivers/dma/stm32-mdma.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c index 4e0eede599a8..ac0301b69593 100644 --- a/drivers/dma/stm32-mdma.c +++ b/drivers/dma/stm32-mdma.c @@ -1578,11 +1578,9 @@ static int stm32_mdma_probe(struct platform_device *pdev) dmadev->nr_channels = nr_channels; dmadev->nr_requests = nr_requests; - ret = device_property_read_u32_array(&pdev->dev, "st,ahb-addr-masks", + device_property_read_u32_array(&pdev->dev, "st,ahb-addr-masks", dmadev->ahb_addr_masks, count); - if (ret) - return ret; dmadev->nr_ahb_addr_masks = count; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); From e861857545567adec8da3bdff728efdf7db12285 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 25 Mar 2019 12:34:10 -0600 Subject: [PATCH 298/468] blk-mq: fix sbitmap ws_active for shared tags We now wrap sbitmap waitqueues in an active counter, so we can avoid iterating wakeups unless we have waiters there. This works as long as everyone that's manipulating the waitqueues use the proper helpers. For the tag wait case for shared tags, however, we add ourselves to the waitqueue without incrementing/decrementing the ->ws_active count. This means that wakeups can take a long time to happen. Fix this by manually doing the inc/dec as needed for the wait queue handling. Reported-by: Michael Leun Tested-by: Michael Leun Cc: stable@vger.kernel.org Reviewed-by: Omar Sandoval Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") Signed-off-by: Jens Axboe --- block/blk-mq.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 28080b0235f0..3ff3d7b49969 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1072,7 +1072,13 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait); spin_lock(&hctx->dispatch_wait_lock); - list_del_init(&wait->entry); + if (!list_empty(&wait->entry)) { + struct sbitmap_queue *sbq; + + list_del_init(&wait->entry); + sbq = &hctx->tags->bitmap_tags; + atomic_dec(&sbq->ws_active); + } spin_unlock(&hctx->dispatch_wait_lock); blk_mq_run_hw_queue(hctx, true); @@ -1088,6 +1094,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, struct request *rq) { + struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags; struct wait_queue_head *wq; wait_queue_entry_t *wait; bool ret; @@ -1110,7 +1117,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, if (!list_empty_careful(&wait->entry)) return false; - wq = &bt_wait_ptr(&hctx->tags->bitmap_tags, hctx)->wait; + wq = &bt_wait_ptr(sbq, hctx)->wait; spin_lock_irq(&wq->lock); spin_lock(&hctx->dispatch_wait_lock); @@ -1120,6 +1127,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, return false; } + atomic_inc(&sbq->ws_active); wait->flags &= ~WQ_FLAG_EXCLUSIVE; __add_wait_queue(wq, wait); @@ -1140,6 +1148,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, * someone else gets the wakeup. */ list_del_init(&wait->entry); + atomic_dec(&sbq->ws_active); spin_unlock(&hctx->dispatch_wait_lock); spin_unlock_irq(&wq->lock); From e6d1fa584e0dd9bfebaf345e9feea588cf75ead2 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 22 Mar 2019 09:13:51 +0800 Subject: [PATCH 299/468] sbitmap: order READ/WRITE freed instance and setting clear bit Inside sbitmap_queue_clear(), once the clear bit is set, it will be visiable to allocation path immediately. Meantime READ/WRITE on old associated instance(such as request in case of blk-mq) may be out-of-order with the setting clear bit, so race with re-allocation may be triggered. Adds one memory barrier for ordering READ/WRITE of the freed associated instance with setting clear bit for avoiding race with re-allocation. The following kernel oops triggerd by block/006 on aarch64 may be fixed: [ 142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330 [ 142.338794] Mem abort info: [ 142.341554] ESR = 0x96000005 [ 142.344632] Exception class = DABT (current EL), IL = 32 bits [ 142.350500] SET = 0, FnV = 0 [ 142.353544] EA = 0, S1PTW = 0 [ 142.356678] Data abort info: [ 142.359528] ISV = 0, ISS = 0x00000005 [ 142.363343] CM = 0, WnR = 0 [ 142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0 [ 142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000 [ 142.379777] Internal error: Oops: 96000005 [#1] SMP [ 142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk] [ 142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki #2 [ 142.449458] pstate: 00400005 (nzcv daif +PAN -UAO) [ 142.454239] pc : __blk_mq_free_request+0x4c/0xa8 [ 142.458830] lr : blk_mq_free_request+0xec/0x118 [ 142.463344] sp : ffff00003360f6a0 [ 142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000 [ 142.471941] x27: ffff801729a50048 x26: 0000000000010000 [ 142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440 [ 142.482529] x23: 0000000000000000 x22: 00000000ffffffff [ 142.487830] x21: ffff801729810000 x20: 0000000000000000 [ 142.493123] x19: ffff801729a50000 x18: 0000000000000000 [ 142.498413] x17: 0000000000000000 x16: 0000000000000001 [ 142.503709] x15: 00000000000000ff x14: ffff7fe000000000 [ 142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000 [ 142.514308] x11: 0000000000000001 x10: 0000000000000008 [ 142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000 [ 142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be [ 142.530187] x5 : 0000000000000001 x4 : 0000000000000000 [ 142.535478] x3 : 0000000000000000 x2 : 0000000000000000 [ 142.540777] x1 : 0000000000000001 x0 : ffff00001041b194 [ 142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea) [ 142.552500] Call trace: [ 142.554926] __blk_mq_free_request+0x4c/0xa8 [ 142.559181] blk_mq_free_request+0xec/0x118 [ 142.563352] blk_mq_end_request+0xfc/0x120 [ 142.567444] end_cmd+0x3c/0xa8 [null_blk] [ 142.571434] null_complete_rq+0x20/0x30 [null_blk] [ 142.576194] blk_mq_complete_request+0x108/0x148 [ 142.580797] null_handle_cmd+0x1d4/0x718 [null_blk] [ 142.585662] null_queue_rq+0x60/0xa8 [null_blk] [ 142.590171] blk_mq_try_issue_directly+0x148/0x280 [ 142.594949] blk_mq_try_issue_list_directly+0x9c/0x108 [ 142.600064] blk_mq_sched_insert_requests+0xb0/0xd0 [ 142.604926] blk_mq_flush_plug_list+0x16c/0x2a0 [ 142.609441] blk_flush_plug_list+0xec/0x118 [ 142.613608] blk_finish_plug+0x3c/0x4c [ 142.617348] blkdev_direct_IO+0x3b4/0x428 [ 142.621336] generic_file_read_iter+0x84/0x180 [ 142.625761] blkdev_read_iter+0x50/0x78 [ 142.629579] aio_read.isra.6+0xf8/0x190 [ 142.633409] __io_submit_one.isra.8+0x148/0x738 [ 142.637912] io_submit_one.isra.9+0x88/0xb8 [ 142.642078] __arm64_sys_io_submit+0xe0/0x238 [ 142.646428] el0_svc_handler+0xa0/0x128 [ 142.650238] el0_svc+0x8/0xc [ 142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81) [ 142.659202] ---[ end trace 467586bc175eb09d ]--- Fixes: ea86ea2cdced20057da ("sbitmap: ammortize cost of clearing bits") Reported-and-bisected_and_tested-by: Yi Zhang Cc: Yi Zhang Cc: "jianchao.wang" Reviewed-by: Omar Sandoval Signed-off-by: Ming Lei Signed-off-by: Jens Axboe --- lib/sbitmap.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 5b382c1244ed..155fe38756ec 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -591,6 +591,17 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up); void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, unsigned int cpu) { + /* + * Once the clear bit is set, the bit may be allocated out. + * + * Orders READ/WRITE on the asssociated instance(such as request + * of blk_mq) by this bit for avoiding race with re-allocation, + * and its pair is the memory barrier implied in __sbitmap_get_word. + * + * One invariant is that the clear bit has to be zero when the bit + * is in use. + */ + smp_mb__before_atomic(); sbitmap_deferred_clear_bit(&sbq->sb, nr); /* From 9bf7933fc3f306bc4ce74ad734f690a71670178a Mon Sep 17 00:00:00 2001 From: Roman Penyaev Date: Mon, 25 Mar 2019 20:09:24 +0100 Subject: [PATCH 300/468] io_uring: offload write to async worker in case of -EAGAIN In case of direct write -EAGAIN will be returned if page cache was previously populated. To avoid immediate completion of a request with -EAGAIN error write has to be offloaded to the async worker, like io_read() does. Signed-off-by: Roman Penyaev Cc: Jens Axboe Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe --- fs/io_uring.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 8f48d29abf76..bbdbd56cf2ac 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1022,6 +1022,8 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, ret = rw_verify_area(WRITE, file, &kiocb->ki_pos, iov_count); if (!ret) { + ssize_t ret2; + /* * Open-code file_start_write here to grab freeze protection, * which will be released by another thread in @@ -1036,7 +1038,19 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, SB_FREEZE_WRITE); } kiocb->ki_flags |= IOCB_WRITE; - io_rw_done(kiocb, call_write_iter(file, kiocb, &iter)); + + ret2 = call_write_iter(file, kiocb, &iter); + if (!force_nonblock || ret2 != -EAGAIN) { + io_rw_done(kiocb, ret2); + } else { + /* + * If ->needs_lock is true, we're already in async + * context. + */ + if (!s->needs_lock) + io_async_list_note(WRITE, req, iov_count); + ret = -EAGAIN; + } } out_free: kfree(iovec); From 3b9c2f2e0e99bb67c96abcb659b3465efe3bee1f Mon Sep 17 00:00:00 2001 From: Malcolm Priestley Date: Sun, 24 Mar 2019 18:53:49 +0000 Subject: [PATCH 301/468] staging: vt6655: Fix interrupt race condition on device start up. It appears on some slower systems that the driver can find its way out of the workqueue while the interrupt is disabled by continuous polling by it. Move MACvIntEnable to vnt_interrupt_work so that it is always enabled on all routes out of vnt_interrupt_process. Move MACvIntDisable so that the device doesn't keep polling the system while the workqueue is being processed. Signed-off-by: Malcolm Priestley CC: stable@vger.kernel.org # v4.2+ Signed-off-by: Greg Kroah-Hartman --- drivers/staging/vt6655/device_main.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c index b370985b58a1..83f1a1cf9182 100644 --- a/drivers/staging/vt6655/device_main.c +++ b/drivers/staging/vt6655/device_main.c @@ -1033,8 +1033,6 @@ static void vnt_interrupt_process(struct vnt_private *priv) return; } - MACvIntDisable(priv->PortOffset); - spin_lock_irqsave(&priv->lock, flags); /* Read low level stats */ @@ -1122,8 +1120,6 @@ static void vnt_interrupt_process(struct vnt_private *priv) } spin_unlock_irqrestore(&priv->lock, flags); - - MACvIntEnable(priv->PortOffset, IMR_MASK_VALUE); } static void vnt_interrupt_work(struct work_struct *work) @@ -1133,6 +1129,8 @@ static void vnt_interrupt_work(struct work_struct *work) if (priv->vif) vnt_interrupt_process(priv); + + MACvIntEnable(priv->PortOffset, IMR_MASK_VALUE); } static irqreturn_t vnt_interrupt(int irq, void *arg) @@ -1142,6 +1140,8 @@ static irqreturn_t vnt_interrupt(int irq, void *arg) if (priv->vif) schedule_work(&priv->interrupt_work); + MACvIntDisable(priv->PortOffset); + return IRQ_HANDLED; } From b6391ac73400eff38377a4a7364bd3df5efb5178 Mon Sep 17 00:00:00 2001 From: Gao Xiang Date: Mon, 25 Mar 2019 11:40:07 +0800 Subject: [PATCH 302/468] staging: erofs: fix error handling when failed to read compresssed data Complete read error handling paths for all three kinds of compressed pages: 1) For cache-managed pages, PG_uptodate will be checked since read_endio will unlock and SetPageUptodate for these pages; 2) For inplaced pages, read_endio cannot SetPageUptodate directly since it should be used to mark the final decompressed data, PG_error will be set with page locked for IO error instead; 3) For staging pages, PG_error is used, which is similar to what we do for inplaced pages. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: # 4.19+ Reviewed-by: Chao Yu Signed-off-by: Gao Xiang Signed-off-by: Greg Kroah-Hartman --- drivers/staging/erofs/unzip_vle.c | 41 +++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index c7b3b21123c1..31eef8395774 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -972,6 +972,7 @@ static int z_erofs_vle_unzip(struct super_block *sb, overlapped = false; compressed_pages = grp->compressed_pages; + err = 0; for (i = 0; i < clusterpages; ++i) { unsigned int pagenr; @@ -981,26 +982,39 @@ static int z_erofs_vle_unzip(struct super_block *sb, DBG_BUGON(!page); DBG_BUGON(!page->mapping); - if (z_erofs_is_stagingpage(page)) - continue; + if (!z_erofs_is_stagingpage(page)) { #ifdef EROFS_FS_HAS_MANAGED_CACHE - if (page->mapping == MNGD_MAPPING(sbi)) { - DBG_BUGON(!PageUptodate(page)); - continue; - } + if (page->mapping == MNGD_MAPPING(sbi)) { + if (unlikely(!PageUptodate(page))) + err = -EIO; + continue; + } #endif - /* only non-head page could be reused as a compressed page */ - pagenr = z_erofs_onlinepage_index(page); + /* + * only if non-head page can be selected + * for inplace decompression + */ + pagenr = z_erofs_onlinepage_index(page); - DBG_BUGON(pagenr >= nr_pages); - DBG_BUGON(pages[pagenr]); - ++sparsemem_pages; - pages[pagenr] = page; + DBG_BUGON(pagenr >= nr_pages); + DBG_BUGON(pages[pagenr]); + ++sparsemem_pages; + pages[pagenr] = page; - overlapped = true; + overlapped = true; + } + + /* PG_error needs checking for inplaced and staging pages */ + if (unlikely(PageError(page))) { + DBG_BUGON(PageUptodate(page)); + err = -EIO; + } } + if (unlikely(err)) + goto out; + llen = (nr_pages << PAGE_SHIFT) - work->pageofs; if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN) { @@ -1198,6 +1212,7 @@ pickup_page_for_submission(struct z_erofs_vle_workgroup *grp, if (page->mapping == mc) { WRITE_ONCE(grp->compressed_pages[nr], page); + ClearPageError(page); if (!PagePrivate(page)) { /* * impossible to be !PagePrivate(page) for From 9b9c87cf51783cbe7140c51472762094033cfeab Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 25 Mar 2019 11:56:59 +0300 Subject: [PATCH 303/468] staging: vc04_services: Fix an error code in vchiq_probe() We need to set "err" on this error path. Fixes: 187ac53e590c ("staging: vchiq_arm: rework probe and init functions") Signed-off-by: Dan Carpenter Acked-by: Stefan Wahren Signed-off-by: Greg Kroah-Hartman --- .../staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 804daf83be35..064d0db4c51e 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -3513,6 +3513,7 @@ static int vchiq_probe(struct platform_device *pdev) struct device_node *fw_node; const struct of_device_id *of_id; struct vchiq_drvdata *drvdata; + struct device *vchiq_dev; int err; of_id = of_match_node(vchiq_of_match, pdev->dev.of_node); @@ -3547,9 +3548,12 @@ static int vchiq_probe(struct platform_device *pdev) goto failed_platform_init; } - if (IS_ERR(device_create(vchiq_class, &pdev->dev, vchiq_devid, - NULL, "vchiq"))) + vchiq_dev = device_create(vchiq_class, &pdev->dev, vchiq_devid, NULL, + "vchiq"); + if (IS_ERR(vchiq_dev)) { + err = PTR_ERR(vchiq_dev); goto failed_device_create; + } vchiq_debugfs_init(); From 9498da46d1cef51ae29f595a9621341acecfa9ab Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Mon, 25 Mar 2019 22:48:01 +0200 Subject: [PATCH 304/468] staging: octeon-ethernet: fix incorrect PHY mode When connecting PHY, we set the mode to PHY_INTERFACE_MODE_GMII which is not always correct. Specifically on boards where RGMII_RXID is needed networking now longer works with at803x after commit 6d4cd041f0af ("net: phy: at803x: disable delay only for RGMII mode"). Fix by passing the correct mode. Tested on EdgeRouter Lite (RGMII_RXID, at803x PHY) and D-Link DSR-500N (RGMII, broadcom PHY). Fixes: 6d4cd041f0af ("net: phy: at803x: disable delay only for RGMII mode") Signed-off-by: Aaro Koskinen Signed-off-by: Greg Kroah-Hartman --- drivers/staging/octeon/ethernet-mdio.c | 2 +- drivers/staging/octeon/ethernet.c | 40 +++++++++++++++++++++--- drivers/staging/octeon/octeon-ethernet.h | 4 ++- 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/drivers/staging/octeon/ethernet-mdio.c b/drivers/staging/octeon/ethernet-mdio.c index d6248eecf123..2aee64fdaec5 100644 --- a/drivers/staging/octeon/ethernet-mdio.c +++ b/drivers/staging/octeon/ethernet-mdio.c @@ -163,7 +163,7 @@ int cvm_oct_phy_setup_device(struct net_device *dev) goto no_phy; phydev = of_phy_connect(dev, phy_node, cvm_oct_adjust_link, 0, - PHY_INTERFACE_MODE_GMII); + priv->phy_mode); of_node_put(phy_node); if (!phydev) diff --git a/drivers/staging/octeon/ethernet.c b/drivers/staging/octeon/ethernet.c index ce61c5670ef6..986db76705cc 100644 --- a/drivers/staging/octeon/ethernet.c +++ b/drivers/staging/octeon/ethernet.c @@ -653,14 +653,37 @@ static struct device_node *cvm_oct_node_for_port(struct device_node *pip, return np; } -static void cvm_set_rgmii_delay(struct device_node *np, int iface, int port) +static void cvm_set_rgmii_delay(struct octeon_ethernet *priv, int iface, + int port) { + struct device_node *np = priv->of_node; u32 delay_value; + bool rx_delay; + bool tx_delay; - if (!of_property_read_u32(np, "rx-delay", &delay_value)) + /* By default, both RX/TX delay is enabled in + * __cvmx_helper_rgmii_enable(). + */ + rx_delay = true; + tx_delay = true; + + if (!of_property_read_u32(np, "rx-delay", &delay_value)) { cvmx_write_csr(CVMX_ASXX_RX_CLK_SETX(port, iface), delay_value); - if (!of_property_read_u32(np, "tx-delay", &delay_value)) + rx_delay = delay_value > 0; + } + if (!of_property_read_u32(np, "tx-delay", &delay_value)) { cvmx_write_csr(CVMX_ASXX_TX_CLK_SETX(port, iface), delay_value); + tx_delay = delay_value > 0; + } + + if (!rx_delay && !tx_delay) + priv->phy_mode = PHY_INTERFACE_MODE_RGMII_ID; + else if (!rx_delay) + priv->phy_mode = PHY_INTERFACE_MODE_RGMII_RXID; + else if (!tx_delay) + priv->phy_mode = PHY_INTERFACE_MODE_RGMII_TXID; + else + priv->phy_mode = PHY_INTERFACE_MODE_RGMII; } static int cvm_oct_probe(struct platform_device *pdev) @@ -825,6 +848,7 @@ static int cvm_oct_probe(struct platform_device *pdev) priv->port = port; priv->queue = cvmx_pko_get_base_queue(priv->port); priv->fau = fau - cvmx_pko_get_num_queues(port) * 4; + priv->phy_mode = PHY_INTERFACE_MODE_NA; for (qos = 0; qos < 16; qos++) skb_queue_head_init(&priv->tx_free_list[qos]); for (qos = 0; qos < cvmx_pko_get_num_queues(port); @@ -856,6 +880,7 @@ static int cvm_oct_probe(struct platform_device *pdev) break; case CVMX_HELPER_INTERFACE_MODE_SGMII: + priv->phy_mode = PHY_INTERFACE_MODE_SGMII; dev->netdev_ops = &cvm_oct_sgmii_netdev_ops; strcpy(dev->name, "eth%d"); break; @@ -865,11 +890,16 @@ static int cvm_oct_probe(struct platform_device *pdev) strcpy(dev->name, "spi%d"); break; - case CVMX_HELPER_INTERFACE_MODE_RGMII: case CVMX_HELPER_INTERFACE_MODE_GMII: + priv->phy_mode = PHY_INTERFACE_MODE_GMII; dev->netdev_ops = &cvm_oct_rgmii_netdev_ops; strcpy(dev->name, "eth%d"); - cvm_set_rgmii_delay(priv->of_node, interface, + break; + + case CVMX_HELPER_INTERFACE_MODE_RGMII: + dev->netdev_ops = &cvm_oct_rgmii_netdev_ops; + strcpy(dev->name, "eth%d"); + cvm_set_rgmii_delay(priv, interface, port_index); break; } diff --git a/drivers/staging/octeon/octeon-ethernet.h b/drivers/staging/octeon/octeon-ethernet.h index 4a07e7f43d12..be570d33685a 100644 --- a/drivers/staging/octeon/octeon-ethernet.h +++ b/drivers/staging/octeon/octeon-ethernet.h @@ -12,7 +12,7 @@ #define OCTEON_ETHERNET_H #include - +#include #include /** @@ -33,6 +33,8 @@ struct octeon_ethernet { * cvmx_helper_interface_mode_t */ int imode; + /* PHY mode */ + phy_interface_t phy_mode; /* List of outstanding tx buffers per queue */ struct sk_buff_head tx_free_list[16]; unsigned int last_speed; From 187df76325af5d9e12ae9daec1510307797e54f0 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Fri, 22 Mar 2019 22:14:19 +0100 Subject: [PATCH 305/468] libceph: fix breakage caused by multipage bvecs A bvec can now consist of multiple physically contiguous pages. This means that bvec_iter_advance() can move to a different page while staying in the same bvec (i.e. ->bi_bvec_done != 0). The messenger works in terms of segments which can now be defined as the smaller of a bvec and a page. The "more bytes to process in this segment" condition holds only if bvec_iter_advance() leaves us in the same bvec _and_ in the same page. On next bvec (possibly in the same page) and on next page (possibly in the same bvec) we may need to set ->last_piece. Signed-off-by: Ilya Dryomov --- net/ceph/messenger.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 7e71b0df1fbc..3083988ce729 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -840,6 +840,7 @@ static bool ceph_msg_data_bio_advance(struct ceph_msg_data_cursor *cursor, size_t bytes) { struct ceph_bio_iter *it = &cursor->bio_iter; + struct page *page = bio_iter_page(it->bio, it->iter); BUG_ON(bytes > cursor->resid); BUG_ON(bytes > bio_iter_len(it->bio, it->iter)); @@ -851,7 +852,8 @@ static bool ceph_msg_data_bio_advance(struct ceph_msg_data_cursor *cursor, return false; /* no more data */ } - if (!bytes || (it->iter.bi_size && it->iter.bi_bvec_done)) + if (!bytes || (it->iter.bi_size && it->iter.bi_bvec_done && + page == bio_iter_page(it->bio, it->iter))) return false; /* more bytes to process in this segment */ if (!it->iter.bi_size) { @@ -899,6 +901,7 @@ static bool ceph_msg_data_bvecs_advance(struct ceph_msg_data_cursor *cursor, size_t bytes) { struct bio_vec *bvecs = cursor->data->bvec_pos.bvecs; + struct page *page = bvec_iter_page(bvecs, cursor->bvec_iter); BUG_ON(bytes > cursor->resid); BUG_ON(bytes > bvec_iter_len(bvecs, cursor->bvec_iter)); @@ -910,7 +913,8 @@ static bool ceph_msg_data_bvecs_advance(struct ceph_msg_data_cursor *cursor, return false; /* no more data */ } - if (!bytes || cursor->bvec_iter.bi_bvec_done) + if (!bytes || (cursor->bvec_iter.bi_bvec_done && + page == bvec_iter_page(bvecs, cursor->bvec_iter))) return false; /* more bytes to process in this segment */ BUG_ON(cursor->last_piece); From a3ac7917b73070010c05b4485b8582a6c9cd69b6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 25 Mar 2019 14:49:00 -0700 Subject: [PATCH 306/468] Revert "parport: daisy: use new parport device model" This reverts commit 1aec4211204d9463d1fd209eb50453de16254599. Steven Rostedt reports that it causes a hang at bootup and bisected it to this commit. The troigger is apparently a module alias for "parport_lowlevel" that points to "parport_pc", which causes a hang with modprobe -q -- parport_lowlevel blocking forever with a backtrace like this: wait_for_completion_killable+0x1c/0x28 call_usermodehelper_exec+0xa7/0x108 __request_module+0x351/0x3d8 get_lowlevel_driver+0x28/0x41 [parport] __parport_register_driver+0x39/0x1f4 [parport] daisy_drv_init+0x31/0x4f [parport] parport_bus_init+0x5d/0x7b [parport] parport_default_proc_register+0x26/0x1000 [parport] do_one_initcall+0xc2/0x1e0 do_init_module+0x50/0x1d4 load_module+0x1c2e/0x21b3 sys_init_module+0xef/0x117 Supid says: "Due to the new device model daisy driver will now try to find the parallel ports while trying to register its driver so that it can bind with them. Now, since daisy driver is loaded while parport bus is initialising the list of parport is still empty and it tries to load the lowlevel driver, which has an alias set to parport_pc, now causes a deadlock" But I don't think the daisy driver should be loaded by the parport initialization in the first place, so let's revert the whole change. If the daisy driver can just initialize separately on its own (like a driver should), instead of hooking into the parport init sequence directly, this issue probably would go away. Reported-and-bisected-by: Steven Rostedt (VMware) Reported-by: Michal Kubecek Acked-by: Greg Kroah-Hartman Cc: Sudip Mukherjee Signed-off-by: Linus Torvalds --- drivers/parport/daisy.c | 32 +------------------------------- drivers/parport/probe.c | 2 +- drivers/parport/share.c | 10 +--------- include/linux/parport.h | 13 ------------- 4 files changed, 3 insertions(+), 54 deletions(-) diff --git a/drivers/parport/daisy.c b/drivers/parport/daisy.c index 56dd83a45e55..5484a46dafda 100644 --- a/drivers/parport/daisy.c +++ b/drivers/parport/daisy.c @@ -213,12 +213,10 @@ void parport_daisy_fini(struct parport *port) struct pardevice *parport_open(int devnum, const char *name) { struct daisydev *p = topology; - struct pardev_cb par_cb; struct parport *port; struct pardevice *dev; int daisy; - memset(&par_cb, 0, sizeof(par_cb)); spin_lock(&topology_lock); while (p && p->devnum != devnum) p = p->next; @@ -232,7 +230,7 @@ struct pardevice *parport_open(int devnum, const char *name) port = parport_get_port(p->port); spin_unlock(&topology_lock); - dev = parport_register_dev_model(port, name, &par_cb, devnum); + dev = parport_register_device(port, name, NULL, NULL, NULL, 0, NULL); parport_put_port(port); if (!dev) return NULL; @@ -482,31 +480,3 @@ static int assign_addrs(struct parport *port) kfree(deviceid); return detected; } - -static int daisy_drv_probe(struct pardevice *par_dev) -{ - struct device_driver *drv = par_dev->dev.driver; - - if (strcmp(drv->name, "daisy_drv")) - return -ENODEV; - if (strcmp(par_dev->name, daisy_dev_name)) - return -ENODEV; - - return 0; -} - -static struct parport_driver daisy_driver = { - .name = "daisy_drv", - .probe = daisy_drv_probe, - .devmodel = true, -}; - -int daisy_drv_init(void) -{ - return parport_register_driver(&daisy_driver); -} - -void daisy_drv_exit(void) -{ - parport_unregister_driver(&daisy_driver); -} diff --git a/drivers/parport/probe.c b/drivers/parport/probe.c index e5e6a463a941..e035174ba205 100644 --- a/drivers/parport/probe.c +++ b/drivers/parport/probe.c @@ -257,7 +257,7 @@ static ssize_t parport_read_device_id (struct parport *port, char *buffer, ssize_t parport_device_id (int devnum, char *buffer, size_t count) { ssize_t retval = -ENXIO; - struct pardevice *dev = parport_open(devnum, daisy_dev_name); + struct pardevice *dev = parport_open (devnum, "Device ID probe"); if (!dev) return -ENXIO; diff --git a/drivers/parport/share.c b/drivers/parport/share.c index 0171b8dbcdcd..5dc53d420ca8 100644 --- a/drivers/parport/share.c +++ b/drivers/parport/share.c @@ -137,19 +137,11 @@ static struct bus_type parport_bus_type = { int parport_bus_init(void) { - int retval; - - retval = bus_register(&parport_bus_type); - if (retval) - return retval; - daisy_drv_init(); - - return 0; + return bus_register(&parport_bus_type); } void parport_bus_exit(void) { - daisy_drv_exit(); bus_unregister(&parport_bus_type); } diff --git a/include/linux/parport.h b/include/linux/parport.h index f41f1d041e2c..397607a0c0eb 100644 --- a/include/linux/parport.h +++ b/include/linux/parport.h @@ -460,7 +460,6 @@ extern size_t parport_ieee1284_epp_read_addr (struct parport *, void *, size_t, int); /* IEEE1284.3 functions */ -#define daisy_dev_name "Device ID probe" extern int parport_daisy_init (struct parport *port); extern void parport_daisy_fini (struct parport *port); extern struct pardevice *parport_open (int devnum, const char *name); @@ -469,18 +468,6 @@ extern ssize_t parport_device_id (int devnum, char *buffer, size_t len); extern void parport_daisy_deselect_all (struct parport *port); extern int parport_daisy_select (struct parport *port, int daisy, int mode); -#ifdef CONFIG_PARPORT_1284 -extern int daisy_drv_init(void); -extern void daisy_drv_exit(void); -#else -static inline int daisy_drv_init(void) -{ - return 0; -} - -static inline void daisy_drv_exit(void) {} -#endif - /* Lowlevel drivers _can_ call this support function to handle irqs. */ static inline void parport_generic_irq(struct parport *port) { From edef1ef134180149694b86386277076f566d165c Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Mon, 25 Mar 2019 09:04:39 -0700 Subject: [PATCH 307/468] ACPI / CPPC: Fix guaranteed performance handling As per the ACPI specification, "Guaranteed Performance Register" is a "Buffer" field and it cannot be "Integer", so treat the "Integer" type for "Guaranteed Performance Register" field as invalid and ignore its value in that case. Also save one cpc_read() call when "Guaranteed Performance Register" is not present, which means a register defined as: "Register(SystemMemory, 0, 0, 0, 0)". Fixes: 29523f095397 ("ACPI / CPPC: Add support for guaranteed performance") Suggested-by: Rafael J. Wysocki Signed-off-by: Srinivas Pandruvada Cc: 4.20+ # 4.20+ Signed-off-by: Rafael J. Wysocki --- drivers/acpi/cppc_acpi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 1b207fca1420..d4244e7d0e38 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1150,8 +1150,13 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps) cpc_read(cpunum, nominal_reg, &nom); perf_caps->nominal_perf = nom; - cpc_read(cpunum, guaranteed_reg, &guaranteed); - perf_caps->guaranteed_perf = guaranteed; + if (guaranteed_reg->type != ACPI_TYPE_BUFFER || + IS_NULL_REG(&guaranteed_reg->cpc_entry.reg)) { + perf_caps->guaranteed_perf = 0; + } else { + cpc_read(cpunum, guaranteed_reg, &guaranteed); + perf_caps->guaranteed_perf = guaranteed; + } cpc_read(cpunum, lowest_non_linear_reg, &min_nonlinear); perf_caps->lowest_nonlinear_perf = min_nonlinear; From 92a3e426ec06e72b1c363179c79d30712447ff76 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Mon, 25 Mar 2019 09:04:40 -0700 Subject: [PATCH 308/468] cpufreq: intel_pstate: Also use CPPC nominal_perf for base_frequency The ACPI specification states that if the "Guaranteed Performance Register" is not implemented, the OSPM assumes guaranteed performance to always be equal to nominal performance. So for invalid or unimplemented guaranteed performance register, use nominal performance as guaranteed performance. This change will fall back to nominal_perf when guranteed_perf is invalid. If nominal_perf is also invalid or not present, fall back to the existing implementation, which is to read from HWP Capabilities MSR. Fixes: 86d333a8cc7f ("cpufreq: intel_pstate: Add base_frequency attribute") Suggested-by: Rafael J. Wysocki Signed-off-by: Srinivas Pandruvada Cc: 4.20+ # 4.20+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index e22f0dbaebb1..b599c7318aab 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -385,7 +385,10 @@ static int intel_pstate_get_cppc_guranteed(int cpu) if (ret) return ret; - return cppc_perf.guaranteed_perf; + if (cppc_perf.guaranteed_perf) + return cppc_perf.guaranteed_perf; + + return cppc_perf.nominal_perf; } #else /* CONFIG_ACPI_CPPC_LIB */ From 3e82a7f9031f204ac0f8dea494ac3870ad111261 Mon Sep 17 00:00:00 2001 From: Alexandru Gagniuc Date: Fri, 22 Mar 2019 19:36:51 -0500 Subject: [PATCH 309/468] PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked A threaded IRQ with a NULL handler does not work with level-triggered interrupts. request_threaded_irq() will return an error: genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 16 pcie_bw_notification: probe of 0000:00:1b.0:pcie010 failed with error -22 For level interrupts we need to silence the interrupt before exiting the IRQ handler, so just clear the PCI_EXP_LNKSTA_LBMS bit there. Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") Reported-by: Linus Torvalds Reported-by: Borislav Petkov Signed-off-by: Alexandru Gagniuc Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/bw_notification.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pcie/bw_notification.c b/drivers/pci/pcie/bw_notification.c index d2eae3b7cc0f..c48746f1cf3c 100644 --- a/drivers/pci/pcie/bw_notification.c +++ b/drivers/pci/pcie/bw_notification.c @@ -44,11 +44,10 @@ static void pcie_disable_link_bandwidth_notification(struct pci_dev *dev) pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctl); } -static irqreturn_t pcie_bw_notification_handler(int irq, void *context) +static irqreturn_t pcie_bw_notification_irq(int irq, void *context) { struct pcie_device *srv = context; struct pci_dev *port = srv->port; - struct pci_dev *dev; u16 link_status, events; int ret; @@ -58,6 +57,17 @@ static irqreturn_t pcie_bw_notification_handler(int irq, void *context) if (ret != PCIBIOS_SUCCESSFUL || !events) return IRQ_NONE; + pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); + pcie_update_link_speed(port->subordinate, link_status); + return IRQ_WAKE_THREAD; +} + +static irqreturn_t pcie_bw_notification_handler(int irq, void *context) +{ + struct pcie_device *srv = context; + struct pci_dev *port = srv->port; + struct pci_dev *dev; + /* * Print status from downstream devices, not this root port or * downstream switch port. @@ -67,8 +77,6 @@ static irqreturn_t pcie_bw_notification_handler(int irq, void *context) __pcie_print_link_status(dev, false); up_read(&pci_bus_sem); - pcie_update_link_speed(port->subordinate, link_status); - pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); return IRQ_HANDLED; } @@ -80,7 +88,8 @@ static int pcie_bandwidth_notification_probe(struct pcie_device *srv) if (!pcie_link_bandwidth_notification_supported(srv->port)) return -ENODEV; - ret = request_threaded_irq(srv->irq, NULL, pcie_bw_notification_handler, + ret = request_threaded_irq(srv->irq, pcie_bw_notification_irq, + pcie_bw_notification_handler, IRQF_SHARED, "PCIe BW notif", srv); if (ret) return ret; From 55397ce8df48bdabe56abdc684764529e1334766 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 12:05:30 +0100 Subject: [PATCH 310/468] PCI/LINK: Clear bandwidth notification interrupt before enabling it When booting a MacBookPro9,1, duplicate link downtraining messages are logged for the devices directly attached to the two CPU-internal Root Ports of the Core i7 3615QM: Once on device enumeration and once on enablement of the bandwidth notification interrupt on the Root Ports. Duplicate messages do not occur with Root Ports on the PCH and Downstream Ports on the Thunderbolt controller: Only a single message is logged for these, namely on device enumeration. The reason for the duplicate messages is a stale interrupt in the Link Status register of the 3615QM's internal Root Ports. Avoid by clearing the interrupt before enabling it. An alternative approach would be to clear the interrupt already on device enumeration or to report link downtraining only if the speed has changed. That way, link downtraining occurring between device enumeration and enablement of the bandwidth notification interrupt could be caught. However clearing stale interrupts before enabling them is a standard operating procedure for any driver and keeping the two steps in one place makes the code easier to follow. Signed-off-by: Lukas Wunner Signed-off-by: Bjorn Helgaas Reviewed-by: Alexandru Gagniuc --- drivers/pci/pcie/bw_notification.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pcie/bw_notification.c b/drivers/pci/pcie/bw_notification.c index c48746f1cf3c..3c0f368c879b 100644 --- a/drivers/pci/pcie/bw_notification.c +++ b/drivers/pci/pcie/bw_notification.c @@ -30,6 +30,8 @@ static void pcie_enable_link_bandwidth_notification(struct pci_dev *dev) { u16 lnk_ctl; + pcie_capability_write_word(dev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS); + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctl); lnk_ctl |= PCI_EXP_LNKCTL_LBMIE; pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctl); From 0fa635aec9abd718bd18c0bda2261351a0811efc Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Mar 2019 12:05:30 +0100 Subject: [PATCH 311/468] PCI/LINK: Deduplicate bandwidth reports for multi-function devices If a multi-function device's bandwidth is already limited when it is enumerated, a message is logged only for function 0. By contrast, when downtraining occurs after enumeration, a message is logged for all functions. That's because the former uses pcie_report_downtraining(), whereas the latter uses __pcie_print_link_status() (which doesn't filter functions != 0). I am seeing this happen on a MacBookPro9,1 with a GPU (function 0) and an integrated HDA controller (function 1). Avoid this incongruence by calling pcie_report_downtraining() in both cases. Signed-off-by: Lukas Wunner Signed-off-by: Bjorn Helgaas Reviewed-by: Alexandru Gagniuc --- drivers/pci/pci.h | 1 + drivers/pci/pcie/bw_notification.c | 2 +- drivers/pci/probe.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 224d88634115..d994839a3e24 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -273,6 +273,7 @@ enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev); u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed, enum pcie_link_width *width); void __pcie_print_link_status(struct pci_dev *dev, bool verbose); +void pcie_report_downtraining(struct pci_dev *dev); /* Single Root I/O Virtualization */ struct pci_sriov { diff --git a/drivers/pci/pcie/bw_notification.c b/drivers/pci/pcie/bw_notification.c index 3c0f368c879b..4fa9e3523ee1 100644 --- a/drivers/pci/pcie/bw_notification.c +++ b/drivers/pci/pcie/bw_notification.c @@ -76,7 +76,7 @@ static irqreturn_t pcie_bw_notification_handler(int irq, void *context) */ down_read(&pci_bus_sem); list_for_each_entry(dev, &port->subordinate->devices, bus_list) - __pcie_print_link_status(dev, false); + pcie_report_downtraining(dev); up_read(&pci_bus_sem); return IRQ_HANDLED; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 2ec0df04e0dc..7e12d0163863 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2388,7 +2388,7 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) return dev; } -static void pcie_report_downtraining(struct pci_dev *dev) +void pcie_report_downtraining(struct pci_dev *dev) { if (!pci_is_pcie(dev)) return; From d29f5aa0bc0c321e1b9e4658a2a7e08e885da52a Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 22 Mar 2019 20:00:20 +0100 Subject: [PATCH 312/468] net: phy: don't clear BMCR in genphy_soft_reset So far we effectively clear the BMCR register. Some PHY's can deal with this (e.g. because they reset BMCR to a default as part of a soft-reset) whilst on others this causes issues because e.g. the autoneg bit is cleared. Marvell is an example, see also thread [0]. So let's be a little bit more gentle and leave all bits we're not interested in as-is. This change is needed for PHY drivers to properly deal with the original patch. [0] https://marc.info/?t=155264050700001&r=1&w=2 Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Tested-by: Phil Reid Tested-by: liweihang Signed-off-by: Heiner Kallweit Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/phy_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 49fdd1ee798e..77068c545de0 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1831,7 +1831,7 @@ int genphy_soft_reset(struct phy_device *phydev) { int ret; - ret = phy_write(phydev, MII_BMCR, BMCR_RESET); + ret = phy_set_bits(phydev, MII_BMCR, BMCR_RESET); if (ret < 0) return ret; From c2fe742ff6e77c5b4fe4ad273191ddf28fdea25e Mon Sep 17 00:00:00 2001 From: Sreekanth Reddy Date: Mon, 4 Mar 2019 07:26:35 -0500 Subject: [PATCH 313/468] scsi: mpt3sas: Fix kernel panic during expander reset During expander reset handling, the driver invokes kernel function scsi_host_find_tag() to obtain outstanding requests associated with the scsi host managed by the driver. Driver loops from tag value zero to hba queue depth to obtain the outstanding scmds. But when blk-mq is enabled, the block layer may return stale entry for one or more requests. This may lead to kernel panic if the returned value is inaccessible or the memory pointed by the returned value is reused. Reference of upstream discussion: https://patchwork.kernel.org/patch/10734933/ Instead of calling scsi_host_find_tag() API for each and every smid (smid is tag +1) from one to shost->can_queue, now driver will call this API (to obtain the outstanding scmd) only for those smid's which are outstanding at the driver level. Driver will determine whether this smid is outstanding at driver level by looking into it's corresponding MPI request frame, if its MPI request frame is empty, then it means that this smid is free and does not need to call scsi_host_find_tag() for it. By doing this, driver will invoke scsi_host_find_tag() for only those tags which are outstanding at the driver level. Driver will check whether particular MPI request frame is empty or not by looking into the "DevHandle" field. If this field is zero then it means that this MPI request is empty. For active MPI request DevHandle must be non-zero. Also driver will memset the MPI request frame once the corresponding scmd is processed (i.e. just before calling scmd->done function). Signed-off-by: Sreekanth Reddy Signed-off-by: Martin K. Petersen --- drivers/scsi/mpt3sas/mpt3sas_base.c | 6 ++++++ drivers/scsi/mpt3sas/mpt3sas_scsih.c | 12 ++++++++++++ 2 files changed, 18 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index e57774472e75..1d8c584ec1e9 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3281,12 +3281,18 @@ mpt3sas_base_free_smid(struct MPT3SAS_ADAPTER *ioc, u16 smid) if (smid < ioc->hi_priority_smid) { struct scsiio_tracker *st; + void *request; st = _get_st_from_smid(ioc, smid); if (!st) { _base_recovery_check(ioc); return; } + + /* Clear MPI request frame */ + request = mpt3sas_base_get_msg_frame(ioc, smid); + memset(request, 0, ioc->request_sz); + mpt3sas_base_clear_st(ioc, st); _base_recovery_check(ioc); return; diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 8bb5b8f9f4d2..1ccfbc7eebe0 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1462,11 +1462,23 @@ mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc, u16 smid) { struct scsi_cmnd *scmd = NULL; struct scsiio_tracker *st; + Mpi25SCSIIORequest_t *mpi_request; if (smid > 0 && smid <= ioc->scsiio_depth - INTERNAL_SCSIIO_CMDS_COUNT) { u32 unique_tag = smid - 1; + mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); + + /* + * If SCSI IO request is outstanding at driver level then + * DevHandle filed must be non-zero. If DevHandle is zero + * then it means that this smid is free at driver level, + * so return NULL. + */ + if (!mpi_request->DevHandle) + return scmd; + scmd = scsi_host_find_tag(ioc->shost, unique_tag); if (scmd) { st = scsi_cmd_priv(scmd); From b6554cfe09e1f610aed7d57164ab7760be57acd9 Mon Sep 17 00:00:00 2001 From: Dave Carroll Date: Fri, 22 Mar 2019 12:16:03 -0600 Subject: [PATCH 314/468] scsi: aacraid: Insure we don't access PCIe space during AER/EEH There are a few windows during AER/EEH when we can access PCIe I/O mapped registers. This will harden the access to insure we do not allow PCIe access during errors Signed-off-by: Dave Carroll Reviewed-by: Sagar Biradar Signed-off-by: Martin K. Petersen --- drivers/scsi/aacraid/aacraid.h | 7 ++++++- drivers/scsi/aacraid/commsup.c | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h index 1df5171594b8..11fb68d7e60d 100644 --- a/drivers/scsi/aacraid/aacraid.h +++ b/drivers/scsi/aacraid/aacraid.h @@ -2640,9 +2640,14 @@ static inline unsigned int cap_to_cyls(sector_t capacity, unsigned divisor) return capacity; } +static inline int aac_pci_offline(struct aac_dev *dev) +{ + return pci_channel_offline(dev->pdev) || dev->handle_pci_error; +} + static inline int aac_adapter_check_health(struct aac_dev *dev) { - if (unlikely(pci_channel_offline(dev->pdev))) + if (unlikely(aac_pci_offline(dev))) return -1; return (dev)->a_ops.adapter_check_health(dev); diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index e67e032936ef..78430a7b294c 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -672,7 +672,7 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size, return -ETIMEDOUT; } - if (unlikely(pci_channel_offline(dev->pdev))) + if (unlikely(aac_pci_offline(dev))) return -EFAULT; if ((blink = aac_adapter_check_health(dev)) > 0) { @@ -772,7 +772,7 @@ int aac_hba_send(u8 command, struct fib *fibptr, fib_callback callback, spin_unlock_irqrestore(&fibptr->event_lock, flags); - if (unlikely(pci_channel_offline(dev->pdev))) + if (unlikely(aac_pci_offline(dev))) return -EFAULT; fibptr->flags |= FIB_CONTEXT_FLAG_WAIT; From fba1bdd2a9a93f3e2181ec1936a3c2f6b37e7ed6 Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Thu, 14 Mar 2019 01:30:59 -0500 Subject: [PATCH 315/468] scsi: qla4xxx: fix a potential NULL pointer dereference In case iscsi_lookup_endpoint fails, the fix returns -EINVAL to avoid NULL pointer dereference. Signed-off-by: Kangjie Lu Acked-by: Manish Rangankar Reviewed-by: Mukesh Ojha Signed-off-by: Martin K. Petersen --- drivers/scsi/qla4xxx/ql4_os.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index 16a18d5d856f..6e4f4931ae17 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -3203,6 +3203,8 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session *cls_session, if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) return -EINVAL; ep = iscsi_lookup_endpoint(transport_fd); + if (!ep) + return -EINVAL; conn = cls_conn->dd_data; qla_conn = conn->dd_data; qla_conn->qla_ep = ep->dd_data; From d498bc0ce8acb4a1bb80d6089d1932d919dc2532 Mon Sep 17 00:00:00 2001 From: Vinod Koul Date: Tue, 26 Mar 2019 10:55:47 +0530 Subject: [PATCH 316/468] MAINTAINERS: Fix uniphier-mdmac.c file path Commit 32e74aabebc8 ("dmaengine: uniphier-mdmac: add UniPhier MIO DMAC driver") wrongly put filepath for uniphier-mdmac.c, fix it Fixes: 32e74aabebc8 ("dmaengine: uniphier-mdmac: add UniPhier MIO DMAC driver") Reported-by: Joe Perches Signed-off-by: Vinod Koul --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e17ebf70b548..67e895feddc8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2356,7 +2356,7 @@ F: arch/arm/mm/cache-uniphier.c F: arch/arm64/boot/dts/socionext/uniphier* F: drivers/bus/uniphier-system-bus.c F: drivers/clk/uniphier/ -F: drivers/dmaengine/uniphier-mdmac.c +F: drivers/dma/uniphier-mdmac.c F: drivers/gpio/gpio-uniphier.c F: drivers/i2c/busses/i2c-uniphier* F: drivers/irqchip/irq-uniphier-aidet.c From 1396929e8a903db80425343cacca766a18ad6409 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Fri, 22 Mar 2019 16:51:07 +0800 Subject: [PATCH 317/468] phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs While only the first PHY supports mode switching, the remaining PHYs work in USB host mode. They should support set_mode with mode=USB_HOST instead of failing. This is especially needed now that the USB core does set_mode for all USB ports, which was added in commit b97a31348379 ("usb: core: comply to PHY framework"). Make set_mode with mode=USB_HOST a no-op instead of failing for the non-OTG USB PHYs. Fixes: 6ba43c291961 ("phy-sun4i-usb: Add support for phy_set_mode") Signed-off-by: Chen-Yu Tsai Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/phy/allwinner/phy-sun4i-usb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/phy/allwinner/phy-sun4i-usb.c b/drivers/phy/allwinner/phy-sun4i-usb.c index 5163097b43df..4bbd9ede38c8 100644 --- a/drivers/phy/allwinner/phy-sun4i-usb.c +++ b/drivers/phy/allwinner/phy-sun4i-usb.c @@ -485,8 +485,11 @@ static int sun4i_usb_phy_set_mode(struct phy *_phy, struct sun4i_usb_phy_data *data = to_sun4i_usb_phy_data(phy); int new_mode; - if (phy->index != 0) + if (phy->index != 0) { + if (mode == PHY_MODE_USB_HOST) + return 0; return -EINVAL; + } switch (mode) { case PHY_MODE_USB_HOST: From e671765e521c571afec3157a7e17502d54f6a43e Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Fri, 22 Mar 2019 16:51:08 +0800 Subject: [PATCH 318/468] usb: core: Try generic PHY_MODE_USB_HOST if usb_phy_roothub_set_mode fails Some PHYs do not support PHY_MODE_USB_HOST_SS, i.e. USB 3.0 or higher. Fall back and try the more generic PHY_MODE_USB_HOST if it fails. Fixes: b97a31348379 ("usb: core: comply to PHY framework") Signed-off-by: Chen-Yu Tsai Tested-by: Neil Armstrong Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hcd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 3189181bb628..975d7c1288e3 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2741,6 +2741,9 @@ int usb_add_hcd(struct usb_hcd *hcd, retval = usb_phy_roothub_set_mode(hcd->phy_roothub, PHY_MODE_USB_HOST_SS); + if (retval) + retval = usb_phy_roothub_set_mode(hcd->phy_roothub, + PHY_MODE_USB_HOST); if (retval) goto err_usb_phy_roothub_power_on; From 41f00e6e9e55546390031996b773e7f3c1d95928 Mon Sep 17 00:00:00 2001 From: Aditya Pakki Date: Wed, 20 Mar 2019 10:27:11 -0500 Subject: [PATCH 319/468] usb: usb251xb: fix to avoid potential NULL pointer dereference of_match_device in usb251xb_probe can fail and returns a NULL pointer. The patch avoids a potential NULL pointer dereference in this scenario. Signed-off-by: Aditya Pakki Reviewed-by: Richard Leitner Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/usb251xb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/misc/usb251xb.c b/drivers/usb/misc/usb251xb.c index 2c8e2cad7e10..04684849d683 100644 --- a/drivers/usb/misc/usb251xb.c +++ b/drivers/usb/misc/usb251xb.c @@ -612,7 +612,7 @@ static int usb251xb_probe(struct usb251xb *hub) dev); int err; - if (np) { + if (np && of_id) { err = usb251xb_get_ofdata(hub, (struct usb251xb_data *)of_id->data); if (err) { From 3d54d10c6afed34fd45b852bf76f55e8da31d8ef Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 25 Mar 2019 14:54:30 +0100 Subject: [PATCH 320/468] usb: mtu3: fix EXTCON dependency When EXTCON is a loadable module, mtu3 fails to link as built-in: drivers/usb/mtu3/mtu3_plat.o: In function `mtu3_probe': mtu3_plat.c:(.text+0x690): undefined reference to `extcon_get_edev_by_phandle' Add a Kconfig dependency to force mtu3 also to be a loadable module if extconn is, but still allow it to be built without extcon. Fixes: d0ed062a8b75 ("usb: mtu3: dual-role mode support") Signed-off-by: Arnd Bergmann Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/mtu3/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/mtu3/Kconfig b/drivers/usb/mtu3/Kconfig index bcc23486c4ed..928c2cd6fc00 100644 --- a/drivers/usb/mtu3/Kconfig +++ b/drivers/usb/mtu3/Kconfig @@ -6,6 +6,7 @@ config USB_MTU3 tristate "MediaTek USB3 Dual Role controller" depends on USB || USB_GADGET depends on ARCH_MEDIATEK || COMPILE_TEST + depends on EXTCON || !EXTCON select USB_XHCI_MTK if USB_SUPPORT && USB_XHCI_HCD help Say Y or M here if your system runs on MediaTek SoCs with From 4b9a3932e7ba929baa231231e61874c7a56f8959 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Fri, 22 Mar 2019 22:49:44 +0200 Subject: [PATCH 321/468] drm/i915: Mark AML 0x87CA as ULX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If I'm reading the spec right AML 0x87CA is a Y SKU, so it should be marked as ULX in our old style terminology. Cc: stable@vger.kernel.org Cc: José Roberto de Souza Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Fixes: c0c46ca461f1 ("drm/i915/aml: Add new Amber Lake PCI ID") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20190322204944.23613-1-ville.syrjala@linux.intel.com Reviewed-by: José Roberto de Souza (cherry picked from commit 57b1c4460dc46a00f6ec439f3f11d670736b0209) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/i915_drv.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9adc7bb9e69c..a67a63b5aa84 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2346,7 +2346,8 @@ static inline unsigned int i915_sg_segment_size(void) INTEL_DEVID(dev_priv) == 0x5915 || \ INTEL_DEVID(dev_priv) == 0x591E) #define IS_AML_ULX(dev_priv) (INTEL_DEVID(dev_priv) == 0x591C || \ - INTEL_DEVID(dev_priv) == 0x87C0) + INTEL_DEVID(dev_priv) == 0x87C0 || \ + INTEL_DEVID(dev_priv) == 0x87CA) #define IS_SKL_GT2(dev_priv) (IS_SKYLAKE(dev_priv) && \ INTEL_INFO(dev_priv)->gt == 2) #define IS_SKL_GT3(dev_priv) (IS_SKYLAKE(dev_priv) && \ From ff9d31d0d46672e201fc9ff59c42f1eef5f00c77 Mon Sep 17 00:00:00 2001 From: Tom Zanussi Date: Wed, 20 Mar 2019 12:53:33 -0500 Subject: [PATCH 322/468] tracing: Remove unnecessary var_ref destroy in track_data_destroy() Commit 656fe2ba85e8 (tracing: Use hist trigger's var_ref array to destroy var_refs) centralized the destruction of all the var_refs in one place so that other code didn't have to do it. The track_data_destroy() added later ignored that and also destroyed the track_data var_ref, causing a double-free error flagged by KASAN. ================================================================== BUG: KASAN: use-after-free in destroy_hist_field+0x30/0x70 Read of size 8 at addr ffff888086df2210 by task bash/1694 CPU: 6 PID: 1694 Comm: bash Not tainted 5.1.0-rc1-test+ #15 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x71/0xa0 ? destroy_hist_field+0x30/0x70 print_address_description.cold.3+0x9/0x1fb ? destroy_hist_field+0x30/0x70 ? destroy_hist_field+0x30/0x70 kasan_report.cold.4+0x1a/0x33 ? __kasan_slab_free+0x100/0x150 ? destroy_hist_field+0x30/0x70 destroy_hist_field+0x30/0x70 track_data_destroy+0x55/0xe0 destroy_hist_data+0x1f0/0x350 hist_unreg_all+0x203/0x220 event_trigger_open+0xbb/0x130 do_dentry_open+0x296/0x700 ? stacktrace_count_trigger+0x30/0x30 ? generic_permission+0x56/0x200 ? __x64_sys_fchdir+0xd0/0xd0 ? inode_permission+0x55/0x200 ? security_inode_permission+0x18/0x60 path_openat+0x633/0x22b0 ? path_lookupat.isra.50+0x420/0x420 ? __kasan_kmalloc.constprop.12+0xc1/0xd0 ? kmem_cache_alloc+0xe5/0x260 ? getname_flags+0x6c/0x2a0 ? do_sys_open+0x149/0x2b0 ? do_syscall_64+0x73/0x1b0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ? _raw_write_lock_bh+0xe0/0xe0 ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x2f/0x50 ? __list_add_valid+0x2d/0x70 ? deactivate_slab.isra.62+0x1f4/0x5a0 ? getname_flags+0x6c/0x2a0 ? set_track+0x76/0x120 do_filp_open+0x11a/0x1a0 ? may_open_dev+0x50/0x50 ? _raw_spin_lock+0x7a/0xd0 ? _raw_write_lock_bh+0xe0/0xe0 ? __alloc_fd+0x10f/0x200 do_sys_open+0x1db/0x2b0 ? filp_open+0x50/0x50 do_syscall_64+0x73/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa7b24a4ca2 Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 85 7a 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007fffbafb3af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 000055d3648ade30 RCX: 00007fa7b24a4ca2 RDX: 0000000000000241 RSI: 000055d364a55240 RDI: 00000000ffffff9c RBP: 00007fffbafb3bf0 R08: 0000000000000020 R09: 0000000000000002 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 000055d364a55240 ================================================================== So remove the track_data_destroy() destroy_hist_field() call for that var_ref. Link: http://lkml.kernel.org/r/1deffec420f6a16d11dd8647318d34a66d1989a9.camel@linux.intel.com Fixes: 466f4528fbc69 ("tracing: Generalize hist trigger onmax and save action") Reported-by: Steven Rostedt (VMware) Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ca46339f3009..795aa2038377 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -3713,7 +3713,6 @@ static void track_data_destroy(struct hist_trigger_data *hist_data, struct trace_event_file *file = hist_data->event_file; destroy_hist_field(data->track_data.track_var, 0); - destroy_hist_field(data->track_data.var_ref, 0); if (data->action == ACTION_SNAPSHOT) { struct track_data *track_data; From 3dee10da2e9ff220e054a8f158cc296c797fbe81 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 21 Mar 2019 23:58:20 -0700 Subject: [PATCH 323/468] tracing: initialize variable in create_dyn_event() Fix compile warning in create_dyn_event(): 'ret' may be used uninitialized in this function [-Wuninitialized]. Link: http://lkml.kernel.org/r/1553237900-8555-1-git-send-email-frowand.list@gmail.com Cc: Masami Hiramatsu Cc: Ingo Molnar Cc: Tom Zanussi Cc: Ravi Bangoria Cc: stable@vger.kernel.org Fixes: 5448d44c3855 ("tracing: Add unified dynamic event framework") Signed-off-by: Frank Rowand Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_dynevent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c index dd1f43588d70..fa100ed3b4de 100644 --- a/kernel/trace/trace_dynevent.c +++ b/kernel/trace/trace_dynevent.c @@ -74,7 +74,7 @@ int dyn_event_release(int argc, char **argv, struct dyn_event_operations *type) static int create_dyn_event(int argc, char **argv) { struct dyn_event_operations *ops; - int ret; + int ret = -ENODEV; if (argv[0][0] == '-' || argv[0][0] == '!') return dyn_event_release(argc, argv, NULL); From 9efb85c5cfac7e1f0caae4471446d936ff2163fe Mon Sep 17 00:00:00 2001 From: Hariprasad Kelam Date: Sun, 24 Mar 2019 00:05:23 +0530 Subject: [PATCH 324/468] ftrace: Fix warning using plain integer as NULL & spelling corrections Changed 0 --> NULL to avoid sparse warning Corrected spelling mistakes reported by checkpatch.pl Sparse warning below: sudo make C=2 CF=-D__CHECK_ENDIAN__ M=kernel/trace CHECK kernel/trace/ftrace.c kernel/trace/ftrace.c:3007:24: warning: Using plain integer as NULL pointer kernel/trace/ftrace.c:4758:37: warning: Using plain integer as NULL pointer Link: http://lkml.kernel.org/r/20190323183523.GA2244@hari-Inspiron-1545 Signed-off-by: Hariprasad Kelam Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/ftrace.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index fa79323331b2..26c8ca9bd06b 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1992,7 +1992,7 @@ static void print_bug_type(void) * modifying the code. @failed should be one of either: * EFAULT - if the problem happens on reading the @ip address * EINVAL - if what is read at @ip is not what was expected - * EPERM - if the problem happens on writting to the @ip address + * EPERM - if the problem happens on writing to the @ip address */ void ftrace_bug(int failed, struct dyn_ftrace *rec) { @@ -2391,7 +2391,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable) return ftrace_modify_call(rec, ftrace_old_addr, ftrace_addr); } - return -1; /* unknow ftrace bug */ + return -1; /* unknown ftrace bug */ } void __weak ftrace_replace_code(int mod_flags) @@ -3004,7 +3004,7 @@ ftrace_allocate_pages(unsigned long num_to_init) int cnt; if (!num_to_init) - return 0; + return NULL; start_pg = pg = kzalloc(sizeof(*pg), GFP_KERNEL); if (!pg) @@ -4755,7 +4755,7 @@ static int ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove, int reset, int enable) { - return ftrace_set_hash(ops, 0, 0, ip, remove, reset, enable); + return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable); } /** @@ -5463,7 +5463,7 @@ void ftrace_create_filter_files(struct ftrace_ops *ops, /* * The name "destroy_filter_files" is really a misnomer. Although - * in the future, it may actualy delete the files, but this is + * in the future, it may actually delete the files, but this is * really intended to make sure the ops passed in are disabled * and that when this function returns, the caller is free to * free the ops. @@ -5786,7 +5786,7 @@ void ftrace_module_enable(struct module *mod) /* * If the tracing is enabled, go ahead and enable the record. * - * The reason not to enable the record immediatelly is the + * The reason not to enable the record immediately is the * inherent check of ftrace_make_nop/ftrace_make_call for * correct previous instructions. Making first the NOP * conversion puts the module to the correct state, thus From db779ef67ffeadbb44e9e818eb64dbe528e2f48f Mon Sep 17 00:00:00 2001 From: Bhupesh Sharma Date: Tue, 26 Mar 2019 12:20:28 +0530 Subject: [PATCH 325/468] proc/kcore: Remove unused kclist_add_remap() Commit bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline") removed the sole usage of kclist_add_remap() but it did not remove the left-over definition from the include file. Fix the same. Signed-off-by: Bhupesh Sharma Signed-off-by: Borislav Petkov Cc: Adrian Hunter Cc: Andrew Morton Cc: Dave Anderson Cc: Dave Young Cc: "David S. Miller" Cc: Ingo Molnar Cc: James Morse Cc: Kairui Song Cc: kexec@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Omar Sandoval Cc: "Peter Zijlstra (Intel)" Cc: Rahul Lakkireddy Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/1553583028-17804-1-git-send-email-bhsharma@redhat.com --- include/linux/kcore.h | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/include/linux/kcore.h b/include/linux/kcore.h index 8c3f8c14eeaa..94b561df3877 100644 --- a/include/linux/kcore.h +++ b/include/linux/kcore.h @@ -38,22 +38,11 @@ struct vmcoredd_node { #ifdef CONFIG_PROC_KCORE void __init kclist_add(struct kcore_list *, void *, size_t, int type); -static inline -void kclist_add_remap(struct kcore_list *m, void *addr, void *vaddr, size_t sz) -{ - m->vaddr = (unsigned long)vaddr; - kclist_add(m, addr, sz, KCORE_REMAP); -} #else static inline void kclist_add(struct kcore_list *new, void *addr, size_t size, int type) { } - -static inline -void kclist_add_remap(struct kcore_list *m, void *addr, void *vaddr, size_t sz) -{ -} #endif #endif /* _LINUX_KCORE_H */ From 2032a8a27b5cc0f578d37fa16fa2494b80a0d00a Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Mon, 25 Mar 2019 17:01:45 -0700 Subject: [PATCH 326/468] xfs: serialize unaligned dio writes against all other dio writes XFS applies more strict serialization constraints to unaligned direct writes to accommodate things like direct I/O layer zeroing, unwritten extent conversion, etc. Unaligned submissions acquire the exclusive iolock and wait for in-flight dio to complete to ensure multiple submissions do not race on the same block and cause data corruption. This generally works in the case of an aligned dio followed by an unaligned dio, but the serialization is lost if I/Os occur in the opposite order. If an unaligned write is submitted first and immediately followed by an overlapping, aligned write, the latter submits without the typical unaligned serialization barriers because there is no indication of an unaligned dio still in-flight. This can lead to unpredictable results. To provide proper unaligned dio serialization, require that such direct writes are always the only dio allowed in-flight at one time for a particular inode. We already acquire the exclusive iolock and drain pending dio before submitting the unaligned dio. Wait once more after the dio submission to hold the iolock across the I/O and prevent further submissions until the unaligned I/O completes. This is heavy handed, but consistent with the current pre-submission serialization for unaligned direct writes. Signed-off-by: Brian Foster Reviewed-by: Allison Henderson Reviewed-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 1f2e2845eb76..a7ceae90110e 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -529,18 +529,17 @@ xfs_file_dio_aio_write( count = iov_iter_count(from); /* - * If we are doing unaligned IO, wait for all other IO to drain, - * otherwise demote the lock if we had to take the exclusive lock - * for other reasons in xfs_file_aio_write_checks. + * If we are doing unaligned IO, we can't allow any other overlapping IO + * in-flight at the same time or we risk data corruption. Wait for all + * other IO to drain before we submit. If the IO is aligned, demote the + * iolock if we had to take the exclusive lock in + * xfs_file_aio_write_checks() for other reasons. */ if (unaligned_io) { - /* If we are going to wait for other DIO to finish, bail */ - if (iocb->ki_flags & IOCB_NOWAIT) { - if (atomic_read(&inode->i_dio_count)) - return -EAGAIN; - } else { - inode_dio_wait(inode); - } + /* unaligned dio always waits, bail */ + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + inode_dio_wait(inode); } else if (iolock == XFS_IOLOCK_EXCL) { xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); iolock = XFS_IOLOCK_SHARED; @@ -548,6 +547,14 @@ xfs_file_dio_aio_write( trace_xfs_file_direct_write(ip, count, iocb->ki_pos); ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io); + + /* + * If unaligned, this is the only IO in-flight. If it has not yet + * completed, wait on it before we release the iolock to prevent + * subsequent overlapping IO. + */ + if (ret == -EIOCBQUEUED && unaligned_io) + inode_dio_wait(inode); out: xfs_iunlock(ip, iolock); From e2a829b3da01b9b32c4d0291d042b8a6e2a98ca3 Mon Sep 17 00:00:00 2001 From: Bernhard Rosenkraenzer Date: Tue, 5 Mar 2019 00:38:19 +0100 Subject: [PATCH 327/468] ALSA: hda/realtek - Fix speakers on Acer Predator Helios 500 Ryzen laptops On an Acer Predator Helios 500 (Ryzen version), the laptop's speakers don't work out of the box. The problem can be worked around with hdajackretask, remapping the "Black Headphone, Right side" pin (0x21) to the Internal speaker. This patch adds a quirk to change this mapping by default. [ corrected ALC299_FIXUP_PREDATOR_SPK definition and adapted for the latest tree by tiwai ] Signed-off-by: Bernhard Rosenkraenzer Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 01c71467600b..a3fb3d4c5730 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5689,6 +5689,7 @@ enum { ALC225_FIXUP_WYSE_DISABLE_MIC_VREF, ALC286_FIXUP_ACER_AIO_HEADSET_MIC, ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, + ALC299_FIXUP_PREDATOR_SPK, }; static const struct hda_fixup alc269_fixups[] = { @@ -6706,6 +6707,13 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC256_FIXUP_ASUS_HEADSET_MODE }, + [ALC299_FIXUP_PREDATOR_SPK] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x21, 0x90170150 }, /* use as headset mic, without its own jack detect */ + { } + } + }, }; static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -6724,6 +6732,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1025, 0x106d, "Acer Cloudbook 14", ALC283_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x1025, 0x1099, "Acer Aspire E5-523G", ALC255_FIXUP_ACER_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x110e, "Acer Aspire ES1-432", ALC255_FIXUP_ACER_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1025, 0x1246, "Acer Predator Helios 500", ALC299_FIXUP_PREDATOR_SPK), SND_PCI_QUIRK(0x1025, 0x128f, "Acer Veriton Z6860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), @@ -7124,6 +7133,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC255_FIXUP_DELL_HEADSET_MIC, .name = "alc255-dell-headset"}, {.id = ALC295_FIXUP_HP_X360, .name = "alc295-hp-x360"}, {.id = ALC295_FIXUP_CHROME_BOOK, .name = "alc-sense-combo"}, + {.id = ALC299_FIXUP_PREDATOR_SPK, .name = "predator-spk"}, {} }; #define ALC225_STANDARD_PINS \ From fb1eb41a3dd4cfff274c98f3c3324ab329641298 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:00 +0100 Subject: [PATCH 328/468] dt-bindings: net: dsa: qca8k: fix example In the example, the phy at phy@0 is clashing with the switch0@0 at the same address. Usually, the switches are accessible through pseudo PHYs which in case of the qca8k are located at 0x10 - 0x18. Reviewed-by: Florian Fainelli Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/dsa/qca8k.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt index bbcb255c3150..5eda99e6c86e 100644 --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -55,12 +55,12 @@ Example: reg = <4>; }; - switch0@0 { + switch@10 { compatible = "qca,qca8337"; #address-cells = <1>; #size-cells = <0>; - reg = <0>; + reg = <0x10>; ports { #address-cells = <1>; From 5e07321f3388e6f2b13c43ae9df3e09efa8418e0 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:01 +0100 Subject: [PATCH 329/468] dt-bindings: net: dsa: qca8k: support internal mdio-bus This patch updates the qca8k's binding to document to the approach for using the internal mdio-bus of the supported qca8k switches. Reviewed-by: Florian Fainelli Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- .../devicetree/bindings/net/dsa/qca8k.txt | 69 +++++++++++++++++-- 1 file changed, 64 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt index 5eda99e6c86e..93a7469e70d4 100644 --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -12,10 +12,15 @@ Required properties: Subnodes: The integrated switch subnode should be specified according to the binding -described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of -port and PHY id, each subnode describing a port needs to have a valid phandle -referencing the internal PHY connected to it. The CPU port of this switch is -always port 0. +described in dsa/dsa.txt. If the QCA8K switch is connect to a SoC's external +mdio-bus each subnode describing a port needs to have a valid phandle +referencing the internal PHY it is connected to. This is because there's no +N:N mapping of port and PHY id. + +Don't use mixed external and internal mdio-bus configurations, as this is +not supported by the hardware. + +The CPU port of this switch is always port 0. A CPU port node has the following optional node: @@ -31,8 +36,9 @@ For QCA8K the 'fixed-link' sub-node supports only the following properties: - 'full-duplex' (boolean, optional), to indicate that full duplex is used. When absent, half duplex is assumed. -Example: +Examples: +for the external mdio-bus configuration: &mdio0 { phy_port1: phy@0 { @@ -108,3 +114,56 @@ Example: }; }; }; + +for the internal master mdio-bus configuration: + + &mdio0 { + switch@10 { + compatible = "qca,qca8337"; + #address-cells = <1>; + #size-cells = <0>; + + reg = <0x10>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + label = "cpu"; + ethernet = <&gmac1>; + phy-mode = "rgmii"; + fixed-link { + speed = 1000; + full-duplex; + }; + }; + + port@1 { + reg = <1>; + label = "lan1"; + }; + + port@2 { + reg = <2>; + label = "lan2"; + }; + + port@3 { + reg = <3>; + label = "lan3"; + }; + + port@4 { + reg = <4>; + label = "lan4"; + }; + + port@5 { + reg = <5>; + label = "wan"; + }; + }; + }; + }; From 1eec7151ae0e134bd42e3f128066b2ff8da21393 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:02 +0100 Subject: [PATCH 330/468] net: dsa: qca8k: remove leftover phy accessors This belated patch implements Andrew Lunn's request of "remove the phy_read() and phy_write() functions." While seemingly harmless, this causes the switch's user port PHYs to get registered twice. This is because the DSA subsystem will create a slave mdio-bus not knowing that the qca8k_phy_(read|write) accessors operate on the external mdio-bus. So the same "bus" gets effectively duplicated. Cc: stable@vger.kernel.org Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family") Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- drivers/net/dsa/qca8k.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index 576b37d12a63..14ad78225f07 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -624,22 +624,6 @@ qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy) qca8k_port_set_status(priv, port, 1); } -static int -qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum) -{ - struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv; - - return mdiobus_read(priv->bus, phy, regnum); -} - -static int -qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val) -{ - struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv; - - return mdiobus_write(priv->bus, phy, regnum, val); -} - static void qca8k_get_strings(struct dsa_switch *ds, int port, u32 stringset, uint8_t *data) { @@ -879,8 +863,6 @@ static const struct dsa_switch_ops qca8k_switch_ops = { .setup = qca8k_setup, .adjust_link = qca8k_adjust_link, .get_strings = qca8k_get_strings, - .phy_read = qca8k_phy_read, - .phy_write = qca8k_phy_write, .get_ethtool_stats = qca8k_get_ethtool_stats, .get_sset_count = qca8k_get_sset_count, .get_mac_eee = qca8k_get_mac_eee, From db460c54b67fc2cbe6dcef88b7bf3cba8e07f80e Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:03 +0100 Subject: [PATCH 331/468] net: dsa: qca8k: extend slave-bus implementations This patch implements accessors for the QCA8337 MDIO access through the MDIO_MASTER register, which makes it possible to access the PHYs on slave-bus through the switch. In cases where the switch ports are already mapped via external "phy-phandles", the internal mdio-bus is disabled in order to prevent a duplicated discovery and enumeration of the same PHYs. Don't use mixed external and internal mdio-bus configurations, as this is not supported by the hardware. Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- drivers/net/dsa/qca8k.c | 156 +++++++++++++++++++++++++++++++++++++++- drivers/net/dsa/qca8k.h | 13 ++++ 2 files changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index 14ad78225f07..c4fa400efdcc 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -481,6 +481,155 @@ qca8k_port_set_status(struct qca8k_priv *priv, int port, int enable) qca8k_reg_clear(priv, QCA8K_REG_PORT_STATUS(port), mask); } +static u32 +qca8k_port_to_phy(int port) +{ + /* From Andrew Lunn: + * Port 0 has no internal phy. + * Port 1 has an internal PHY at MDIO address 0. + * Port 2 has an internal PHY at MDIO address 1. + * ... + * Port 5 has an internal PHY at MDIO address 4. + * Port 6 has no internal PHY. + */ + + return port - 1; +} + +static int +qca8k_mdio_write(struct qca8k_priv *priv, int port, u32 regnum, u16 data) +{ + u32 phy, val; + + if (regnum >= QCA8K_MDIO_MASTER_MAX_REG) + return -EINVAL; + + /* callee is responsible for not passing bad ports, + * but we still would like to make spills impossible. + */ + phy = qca8k_port_to_phy(port) % PHY_MAX_ADDR; + val = QCA8K_MDIO_MASTER_BUSY | QCA8K_MDIO_MASTER_EN | + QCA8K_MDIO_MASTER_WRITE | QCA8K_MDIO_MASTER_PHY_ADDR(phy) | + QCA8K_MDIO_MASTER_REG_ADDR(regnum) | + QCA8K_MDIO_MASTER_DATA(data); + + qca8k_write(priv, QCA8K_MDIO_MASTER_CTRL, val); + + return qca8k_busy_wait(priv, QCA8K_MDIO_MASTER_CTRL, + QCA8K_MDIO_MASTER_BUSY); +} + +static int +qca8k_mdio_read(struct qca8k_priv *priv, int port, u32 regnum) +{ + u32 phy, val; + + if (regnum >= QCA8K_MDIO_MASTER_MAX_REG) + return -EINVAL; + + /* callee is responsible for not passing bad ports, + * but we still would like to make spills impossible. + */ + phy = qca8k_port_to_phy(port) % PHY_MAX_ADDR; + val = QCA8K_MDIO_MASTER_BUSY | QCA8K_MDIO_MASTER_EN | + QCA8K_MDIO_MASTER_READ | QCA8K_MDIO_MASTER_PHY_ADDR(phy) | + QCA8K_MDIO_MASTER_REG_ADDR(regnum); + + qca8k_write(priv, QCA8K_MDIO_MASTER_CTRL, val); + + if (qca8k_busy_wait(priv, QCA8K_MDIO_MASTER_CTRL, + QCA8K_MDIO_MASTER_BUSY)) + return -ETIMEDOUT; + + val = (qca8k_read(priv, QCA8K_MDIO_MASTER_CTRL) & + QCA8K_MDIO_MASTER_DATA_MASK); + + return val; +} + +static int +qca8k_phy_write(struct dsa_switch *ds, int port, int regnum, u16 data) +{ + struct qca8k_priv *priv = ds->priv; + + return qca8k_mdio_write(priv, port, regnum, data); +} + +static int +qca8k_phy_read(struct dsa_switch *ds, int port, int regnum) +{ + struct qca8k_priv *priv = ds->priv; + int ret; + + ret = qca8k_mdio_read(priv, port, regnum); + + if (ret < 0) + return 0xffff; + + return ret; +} + +static int +qca8k_setup_mdio_bus(struct qca8k_priv *priv) +{ + u32 internal_mdio_mask = 0, external_mdio_mask = 0, reg; + struct device_node *ports, *port; + int err; + + ports = of_get_child_by_name(priv->dev->of_node, "ports"); + if (!ports) + return -EINVAL; + + for_each_available_child_of_node(ports, port) { + err = of_property_read_u32(port, "reg", ®); + if (err) + return err; + + if (!dsa_is_user_port(priv->ds, reg)) + continue; + + if (of_property_read_bool(port, "phy-handle")) + external_mdio_mask |= BIT(reg); + else + internal_mdio_mask |= BIT(reg); + } + + if (!external_mdio_mask && !internal_mdio_mask) { + dev_err(priv->dev, "no PHYs are defined.\n"); + return -EINVAL; + } + + /* The QCA8K_MDIO_MASTER_EN Bit, which grants access to PHYs through + * the MDIO_MASTER register also _disconnects_ the external MDC + * passthrough to the internal PHYs. It's not possible to use both + * configurations at the same time! + * + * Because this came up during the review process: + * If the external mdio-bus driver is capable magically disabling + * the QCA8K_MDIO_MASTER_EN and mutex/spin-locking out the qca8k's + * accessors for the time being, it would be possible to pull this + * off. + */ + if (!!external_mdio_mask && !!internal_mdio_mask) { + dev_err(priv->dev, "either internal or external mdio bus configuration is supported.\n"); + return -EINVAL; + } + + if (external_mdio_mask) { + /* Make sure to disable the internal mdio bus in cases + * a dt-overlay and driver reload changed the configuration + */ + + qca8k_reg_clear(priv, QCA8K_MDIO_MASTER_CTRL, + QCA8K_MDIO_MASTER_EN); + return 0; + } + + priv->ops.phy_read = qca8k_phy_read; + priv->ops.phy_write = qca8k_phy_write; + return 0; +} + static int qca8k_setup(struct dsa_switch *ds) { @@ -502,6 +651,10 @@ qca8k_setup(struct dsa_switch *ds) if (IS_ERR(priv->regmap)) pr_warn("regmap initialization failed"); + ret = qca8k_setup_mdio_bus(priv); + if (ret) + return ret; + /* Initialize CPU port pad mode (xMII type, delays...) */ phy_mode = of_get_phy_mode(ds->ports[QCA8K_CPU_PORT].dn); if (phy_mode < 0) { @@ -905,7 +1058,8 @@ qca8k_sw_probe(struct mdio_device *mdiodev) return -ENOMEM; priv->ds->priv = priv; - priv->ds->ops = &qca8k_switch_ops; + priv->ops = qca8k_switch_ops; + priv->ds->ops = &priv->ops; mutex_init(&priv->reg_mutex); dev_set_drvdata(&mdiodev->dev, priv); diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h index d146e54c8a6c..249fd62268e5 100644 --- a/drivers/net/dsa/qca8k.h +++ b/drivers/net/dsa/qca8k.h @@ -49,6 +49,18 @@ #define QCA8K_MIB_FLUSH BIT(24) #define QCA8K_MIB_CPU_KEEP BIT(20) #define QCA8K_MIB_BUSY BIT(17) +#define QCA8K_MDIO_MASTER_CTRL 0x3c +#define QCA8K_MDIO_MASTER_BUSY BIT(31) +#define QCA8K_MDIO_MASTER_EN BIT(30) +#define QCA8K_MDIO_MASTER_READ BIT(27) +#define QCA8K_MDIO_MASTER_WRITE 0 +#define QCA8K_MDIO_MASTER_SUP_PRE BIT(26) +#define QCA8K_MDIO_MASTER_PHY_ADDR(x) ((x) << 21) +#define QCA8K_MDIO_MASTER_REG_ADDR(x) ((x) << 16) +#define QCA8K_MDIO_MASTER_DATA(x) (x) +#define QCA8K_MDIO_MASTER_DATA_MASK GENMASK(15, 0) +#define QCA8K_MDIO_MASTER_MAX_PORTS 5 +#define QCA8K_MDIO_MASTER_MAX_REG 32 #define QCA8K_GOL_MAC_ADDR0 0x60 #define QCA8K_GOL_MAC_ADDR1 0x64 #define QCA8K_REG_PORT_STATUS(_i) (0x07c + (_i) * 4) @@ -169,6 +181,7 @@ struct qca8k_priv { struct dsa_switch *ds; struct mutex reg_mutex; struct device *dev; + struct dsa_switch_ops ops; }; struct qca8k_mib_desc { From 1f8389bf63aec99218c62490869ca38d1a38ce46 Mon Sep 17 00:00:00 2001 From: Leslie Monis Date: Sat, 23 Mar 2019 19:11:33 +0530 Subject: [PATCH 332/468] net: sched: Kconfig: update reference link for PIE RFC 8033 replaces the IETF draft for PIE Signed-off-by: Leslie Monis Signed-off-by: David S. Miller --- net/sched/Kconfig | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 1b9afdee5ba9..5c02ad97ef23 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -358,8 +358,7 @@ config NET_SCH_PIE help Say Y here if you want to use the Proportional Integral controller Enhanced scheduler packet scheduling algorithm. - For more information, please see - http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 + For more information, please see https://tools.ietf.org/html/rfc8033 To compile this driver as a module, choose M here: the module will be called sch_pie. From b7ebee2f95fb0fa2862d5ed2de707f872c311393 Mon Sep 17 00:00:00 2001 From: Dmitry Bezrukov Date: Sat, 23 Mar 2019 13:59:53 +0000 Subject: [PATCH 333/468] net: usb: aqc111: Extend HWID table by QNAP device New device of QNAP based on aqc111u Add this ID to blacklist of cdc_ether driver as well Signed-off-by: Dmitry Bezrukov Signed-off-by: David S. Miller --- drivers/net/usb/aqc111.c | 15 +++++++++++++++ drivers/net/usb/cdc_ether.c | 8 ++++++++ 2 files changed, 23 insertions(+) diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index 820a2fe7d027..aff995be2a31 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -1301,6 +1301,20 @@ static const struct driver_info trendnet_info = { .tx_fixup = aqc111_tx_fixup, }; +static const struct driver_info qnap_info = { + .description = "QNAP QNA-UC5G1T USB to 5GbE Adapter", + .bind = aqc111_bind, + .unbind = aqc111_unbind, + .status = aqc111_status, + .link_reset = aqc111_link_reset, + .reset = aqc111_reset, + .stop = aqc111_stop, + .flags = FLAG_ETHER | FLAG_FRAMING_AX | + FLAG_AVOID_UNLINK_URBS | FLAG_MULTI_PACKET, + .rx_fixup = aqc111_rx_fixup, + .tx_fixup = aqc111_tx_fixup, +}; + static int aqc111_suspend(struct usb_interface *intf, pm_message_t message) { struct usbnet *dev = usb_get_intfdata(intf); @@ -1455,6 +1469,7 @@ static const struct usb_device_id products[] = { {AQC111_USB_ETH_DEV(0x0b95, 0x2790, asix111_info)}, {AQC111_USB_ETH_DEV(0x0b95, 0x2791, asix112_info)}, {AQC111_USB_ETH_DEV(0x20f4, 0xe05a, trendnet_info)}, + {AQC111_USB_ETH_DEV(0x1c04, 0x0015, qnap_info)}, { },/* END */ }; MODULE_DEVICE_TABLE(usb, products); diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index 5512a1038721..3e9b2c319e45 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -851,6 +851,14 @@ static const struct usb_device_id products[] = { .driver_info = 0, }, +/* QNAP QNA-UC5G1T USB to 5GbE Adapter (based on AQC111U) */ +{ + USB_DEVICE_AND_INTERFACE_INFO(0x1c04, 0x0015, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, + USB_CDC_PROTO_NONE), + .driver_info = 0, +}, + /* WHITELIST!!! * * CDC Ether uses two interfaces, not necessarily consecutive. From 9926cb5f8b0f0aea535735185600d74db7608550 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 24 Mar 2019 00:48:22 +0800 Subject: [PATCH 334/468] tipc: change to check tipc_own_id to return in tipc_net_stop When running a syz script, a panic occurred: [ 156.088228] BUG: KASAN: use-after-free in tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.094315] Call Trace: [ 156.094844] [ 156.095306] dump_stack+0x7c/0xc0 [ 156.097346] print_address_description+0x65/0x22e [ 156.100445] kasan_report.cold.3+0x37/0x7a [ 156.102402] tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.106517] call_timer_fn+0x19a/0x610 [ 156.112749] run_timer_softirq+0xb51/0x1090 It was caused by the netns freed without deleting the discoverer timer, while later on the netns would be accessed in the timer handler. The timer should have been deleted by tipc_net_stop() when cleaning up a netns. However, tipc has been able to enable a bearer and start d->timer without the local node_addr set since Commit 52dfae5c85a4 ("tipc: obtain node identity from interface by default"), which caused the timer not to be deleted in tipc_net_stop() then. So fix it in tipc_net_stop() by changing to check local node_id instead of local node_addr, as Jon suggested. While at it, remove the calling of tipc_nametbl_withdraw() there, since tipc_nametbl_stop() will take of the nametbl's freeing after. Fixes: 52dfae5c85a4 ("tipc: obtain node identity from interface by default") Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com Signed-off-by: Xin Long Acked-by: Ying Xue Acked-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/net.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/tipc/net.c b/net/tipc/net.c index f076edb74338..7ce1e86b024f 100644 --- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -163,12 +163,9 @@ void tipc_sched_net_finalize(struct net *net, u32 addr) void tipc_net_stop(struct net *net) { - u32 self = tipc_own_addr(net); - - if (!self) + if (!tipc_own_id(net)) return; - tipc_nametbl_withdraw(net, TIPC_CFG_SRV, self, self, self); rtnl_lock(); tipc_bearer_stop(net); tipc_node_stop(net); From 450895d04ba13a96886eddfeddb11556ae8624f1 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sun, 24 Mar 2019 00:18:46 +0200 Subject: [PATCH 335/468] net: phy: bcm54xx: Encode link speed and activity into LEDs Previously the green and amber LEDs on this quad PHY were solid, to indicate an encoding of the link speed (10/100/1000). This keeps the LEDs always on just as before, but now they flash on Rx/Tx activity. Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/broadcom.c | 13 +++++++++++++ include/linux/brcmphy.h | 16 ++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c index 9605d4fe540b..cb86a3e90c7d 100644 --- a/drivers/net/phy/broadcom.c +++ b/drivers/net/phy/broadcom.c @@ -323,6 +323,19 @@ static int bcm54xx_config_init(struct phy_device *phydev) bcm54xx_phydsp_config(phydev); + /* Encode link speed into LED1 and LED3 pair (green/amber). + * Also flash these two LEDs on activity. This means configuring + * them for MULTICOLOR and encoding link/activity into them. + */ + val = BCM5482_SHD_LEDS1_LED1(BCM_LED_SRC_MULTICOLOR1) | + BCM5482_SHD_LEDS1_LED3(BCM_LED_SRC_MULTICOLOR1); + bcm_phy_write_shadow(phydev, BCM5482_SHD_LEDS1, val); + + val = BCM_LED_MULTICOLOR_IN_PHASE | + BCM5482_SHD_LEDS1_LED1(BCM_LED_MULTICOLOR_LINK_ACT) | + BCM5482_SHD_LEDS1_LED3(BCM_LED_MULTICOLOR_LINK_ACT); + bcm_phy_write_exp(phydev, BCM_EXP_MULTICOLOR, val); + return 0; } diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 9cd00a37b8d3..6db2d9a6e503 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -148,6 +148,22 @@ #define BCM_LED_SRC_OFF 0xe /* Tied high */ #define BCM_LED_SRC_ON 0xf /* Tied low */ +/* + * Broadcom Multicolor LED configurations (expansion register 4) + */ +#define BCM_EXP_MULTICOLOR (MII_BCM54XX_EXP_SEL_ER + 0x04) +#define BCM_LED_MULTICOLOR_IN_PHASE BIT(8) +#define BCM_LED_MULTICOLOR_LINK_ACT 0x0 +#define BCM_LED_MULTICOLOR_SPEED 0x1 +#define BCM_LED_MULTICOLOR_ACT_FLASH 0x2 +#define BCM_LED_MULTICOLOR_FDX 0x3 +#define BCM_LED_MULTICOLOR_OFF 0x4 +#define BCM_LED_MULTICOLOR_ON 0x5 +#define BCM_LED_MULTICOLOR_ALT 0x6 +#define BCM_LED_MULTICOLOR_FLASH 0x7 +#define BCM_LED_MULTICOLOR_LINK 0x8 +#define BCM_LED_MULTICOLOR_ACT 0x9 +#define BCM_LED_MULTICOLOR_PROGRAM 0xa /* * BCM5482: Shadow registers From c493b09b2792336f471d2206be180a4b4c1fc9ba Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Sun, 24 Mar 2019 00:21:03 +0100 Subject: [PATCH 336/468] net: devlink: skip info_get op call if it is not defined in dumpit In dumpit, unlike doit, the check for info_get op being defined is missing. Add it and avoid null pointer dereference in case driver does not define this op. Fixes: f9cf22882c60 ("devlink: add device information API") Reported-by: Ido Schimmel Signed-off-by: Jiri Pirko Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/core/devlink.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/devlink.c b/net/core/devlink.c index 78e22cea4cc7..da0a29f30885 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3897,6 +3897,11 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, continue; } + if (!devlink->ops->info_get) { + idx++; + continue; + } + mutex_lock(&devlink->lock); err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, NETLINK_CB(cb->skb).portid, From 047a013f8d0af8299ce2d02af152de6a30165ccc Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 25 Mar 2019 13:49:16 +0100 Subject: [PATCH 337/468] chelsio: use BUG() instead of BUG_ON(1) clang warns about possible bugs in a dead code branch after BUG_ON(1) when CONFIG_PROFILE_ALL_BRANCHES is enabled: drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: error: variable 'buf_size' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] BUG_ON(1); ^~~~~~~~~ include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:48:23: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/sge.c:482:9: note: uninitialized use occurs here return buf_size; ^~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: note: remove the 'if' if its condition is always true BUG_ON(1); ^ include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^ drivers/net/ethernet/chelsio/cxgb4/sge.c:459:14: note: initialize the variable 'buf_size' to silence this warning int buf_size; ^ = 0 Use BUG() here to create simpler code that clang understands correctly. Signed-off-by: Arnd Bergmann Reviewed-by: Nick Desaulniers Signed-off-by: David S. Miller --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index 3130b43bba52..02959035ed3f 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -2620,7 +2620,7 @@ static inline struct port_info *ethqset2pinfo(struct adapter *adap, int qset) } /* should never happen! */ - BUG_ON(1); + BUG(); return NULL; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index 88773ca58e6b..b3da81e90132 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -476,7 +476,7 @@ static inline int get_buf_size(struct adapter *adapter, break; default: - BUG_ON(1); + BUG(); } return buf_size; From 8c838f53e149871561a9261ac768a9c7071b43d0 Mon Sep 17 00:00:00 2001 From: Ioana Ciornei Date: Mon, 25 Mar 2019 13:06:22 +0000 Subject: [PATCH 338/468] dpaa2-eth: fix race condition with bql frame accounting It might happen that Tx conf acknowledges a frame before it was subscribed in bql, as subscribing was previously done after the enqueue operation. This patch moves the netdev_tx_sent_queue call before the actual frame enqueue, so that this can never happen. Fixes: 569dac6a5a0d ("dpaa2-eth: bql support") Signed-off-by: Ioana Ciornei Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 1a68052abb94..dc339dc1adb2 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -815,6 +815,14 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, struct net_device *net_dev) */ queue_mapping = skb_get_queue_mapping(skb); fq = &priv->fq[queue_mapping]; + + fd_len = dpaa2_fd_get_len(&fd); + nq = netdev_get_tx_queue(net_dev, queue_mapping); + netdev_tx_sent_queue(nq, fd_len); + + /* Everything that happens after this enqueues might race with + * the Tx confirmation callback for this frame + */ for (i = 0; i < DPAA2_ETH_ENQUEUE_RETRIES; i++) { err = priv->enqueue(priv, fq, &fd, 0); if (err != -EBUSY) @@ -825,13 +833,10 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, struct net_device *net_dev) percpu_stats->tx_errors++; /* Clean up everything, including freeing the skb */ free_tx_fd(priv, fq, &fd, false); + netdev_tx_completed_queue(nq, 1, fd_len); } else { - fd_len = dpaa2_fd_get_len(&fd); percpu_stats->tx_packets++; percpu_stats->tx_bytes += fd_len; - - nq = netdev_get_tx_queue(net_dev, queue_mapping); - netdev_tx_sent_queue(nq, fd_len); } return NETDEV_TX_OK; From 4cb6560514fa19d556954b88128f3846fee66a03 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= Date: Thu, 28 Feb 2019 22:57:33 +0100 Subject: [PATCH 339/468] leds: trigger: netdev: fix refcnt leak on interface rename MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Renaming a netdev-trigger-tracked interface was resulting in an unbalanced dev_hold(). Example: > iw phy phy0 interface add foo type __ap > echo netdev > trigger > echo foo > device_name > ip link set foo name bar > iw dev bar del [ 237.355366] unregister_netdevice: waiting for bar to become free. Usage count = 1 [ 247.435362] unregister_netdevice: waiting for bar to become free. Usage count = 1 [ 257.545366] unregister_netdevice: waiting for bar to become free. Usage count = 1 Above problem was caused by trigger checking a dev->name which obviously changes after renaming an interface. It meant missing all further events including the NETDEV_UNREGISTER which is required for calling dev_put(). This change fixes that by: 1) Comparing device struct *address* for notification-filtering purposes 2) Dropping unneeded NETDEV_CHANGENAME code (no behavior change) Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Signed-off-by: Rafał Miłecki Acked-by: Pavel Machek Signed-off-by: Jacek Anaszewski --- drivers/leds/trigger/ledtrig-netdev.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c index 3dd3ed46d473..167a94c02d05 100644 --- a/drivers/leds/trigger/ledtrig-netdev.c +++ b/drivers/leds/trigger/ledtrig-netdev.c @@ -301,11 +301,11 @@ static int netdev_trig_notify(struct notifier_block *nb, container_of(nb, struct led_netdev_data, notifier); if (evt != NETDEV_UP && evt != NETDEV_DOWN && evt != NETDEV_CHANGE - && evt != NETDEV_REGISTER && evt != NETDEV_UNREGISTER - && evt != NETDEV_CHANGENAME) + && evt != NETDEV_REGISTER && evt != NETDEV_UNREGISTER) return NOTIFY_DONE; - if (strcmp(dev->name, trigger_data->device_name)) + if (!(dev == trigger_data->net_dev || + (evt == NETDEV_REGISTER && !strcmp(dev->name, trigger_data->device_name)))) return NOTIFY_DONE; cancel_delayed_work_sync(&trigger_data->work); @@ -320,12 +320,9 @@ static int netdev_trig_notify(struct notifier_block *nb, dev_hold(dev); trigger_data->net_dev = dev; break; - case NETDEV_CHANGENAME: case NETDEV_UNREGISTER: - if (trigger_data->net_dev) { - dev_put(trigger_data->net_dev); - trigger_data->net_dev = NULL; - } + dev_put(trigger_data->net_dev); + trigger_data->net_dev = NULL; break; case NETDEV_UP: case NETDEV_CHANGE: From 01f2f5b82a2b523ae76af53f2ff43c48dde10a00 Mon Sep 17 00:00:00 2001 From: Alakesh Haloi Date: Tue, 26 Mar 2019 02:00:01 +0000 Subject: [PATCH 340/468] SUNRPC: fix uninitialized variable warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Avoid following compiler warning on uninitialized variable net/sunrpc/xprtsock.c: In function ‘xs_read_stream_request.constprop’: net/sunrpc/xprtsock.c:525:10: warning: ‘read’ may be used uninitialized in this function [-Wmaybe-uninitialized] return read; ^~~~ net/sunrpc/xprtsock.c:529:23: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] return ret < 0 ? ret : read; ~~~~~~~~~~~~~~^~~~~~ Signed-off-by: Alakesh Haloi Signed-off-by: Trond Myklebust --- net/sunrpc/xprtsock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 9359539907ba..732d4b57411a 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -495,8 +495,8 @@ xs_read_stream_request(struct sock_xprt *transport, struct msghdr *msg, int flags, struct rpc_rqst *req) { struct xdr_buf *buf = &req->rq_private_buf; - size_t want, read; - ssize_t ret; + size_t want, uninitialized_var(read); + ssize_t uninitialized_var(ret); xs_read_header(transport, buf); From ce9afe08e71e3f7d64f337a6e932e50849230fc2 Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" Date: Fri, 8 Mar 2019 21:03:24 +0530 Subject: [PATCH 341/468] powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent, we currently use of_read_property() to obtain the pointer to the array corresponding to the property "ibm,drc-indexes". The elements of this array are of type __be32, but are accessed without any conversion to the OS-endianness, which is buggy on a Little Endian OS. Fix this by using of_property_read_u32_index() accessor function to safely read the elements of the array. Fixes: e83636ac3334 ("pseries/drc-info: Search DRC properties for CPU indexes") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Pavithra R. Prakash Signed-off-by: Gautham R. Shenoy Reviewed-by: Vaidyanathan Srinivasan [mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable] Signed-off-by: Michael Ellerman --- .../platforms/pseries/pseries_energy.c | 27 ++++++++++++------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/platforms/pseries/pseries_energy.c index 6ed22127391b..921f12182f3e 100644 --- a/arch/powerpc/platforms/pseries/pseries_energy.c +++ b/arch/powerpc/platforms/pseries/pseries_energy.c @@ -77,18 +77,27 @@ static u32 cpu_to_drc_index(int cpu) ret = drc.drc_index_start + (thread_index * drc.sequential_inc); } else { - const __be32 *indexes; - - indexes = of_get_property(dn, "ibm,drc-indexes", NULL); - if (indexes == NULL) - goto err_of_node_put; + u32 nr_drc_indexes, thread_drc_index; /* - * The first element indexes[0] is the number of drc_indexes - * returned in the list. Hence thread_index+1 will get the - * drc_index corresponding to core number thread_index. + * The first element of ibm,drc-indexes array is the + * number of drc_indexes returned in the list. Hence + * thread_index+1 will get the drc_index corresponding + * to core number thread_index. */ - ret = indexes[thread_index + 1]; + rc = of_property_read_u32_index(dn, "ibm,drc-indexes", + 0, &nr_drc_indexes); + if (rc) + goto err_of_node_put; + + WARN_ON_ONCE(thread_index > nr_drc_indexes); + rc = of_property_read_u32_index(dn, "ibm,drc-indexes", + thread_index + 1, + &thread_drc_index); + if (rc) + goto err_of_node_put; + + ret = thread_drc_index; } rc = 0; From 71cd6cb234876e2232635bb2fdcfde5146f7fffd Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 26 Mar 2019 08:08:43 +0300 Subject: [PATCH 342/468] drm/i915/selftests: Fix an IS_ERR() vs NULL check The live_context() function returns error pointers. It never returns NULL. Fixes: 9c1477e83e62 ("drm/i915/selftests: Exercise adding requests to a full GGTT") Signed-off-by: Dan Carpenter Reviewed-by: Mika Kuoppala Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20190326050843.GA20038@kadam (cherry picked from commit 602cbe8efc523ba56e1f41e8f74c7aa835672593) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c index 32dce7176f63..b9b0ea4e2404 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c @@ -455,7 +455,7 @@ static int igt_evict_contexts(void *arg) struct i915_gem_context *ctx; ctx = live_context(i915, file); - if (!ctx) + if (IS_ERR(ctx)) break; /* We will need some GGTT space for the rq's context */ From fa59dd234c9a237e590a5f6db530d7f7ee88e5e8 Mon Sep 17 00:00:00 2001 From: Andrew Jeffery Date: Tue, 26 Mar 2019 15:19:54 +1030 Subject: [PATCH 343/468] Revert "gpio: use new gpio_set_config() helper in more places" gpio-aspeed implements support for PIN_CONFIG_INPUT_DEBOUNCE. As of v5.1-rc1 we're seeing the following when booting a Romulus BMC kernel: > [ 21.373137] ------------[ cut here ]------------ > [ 21.374545] WARNING: CPU: 0 PID: 1 at drivers/gpio/gpio-aspeed.c:834 unregister_allocated_timer+0x38/0x94 > [ 21.376181] No timer allocated to offset 74 > [ 21.377672] CPU: 0 PID: 1 Comm: swapper Not tainted 5.1.0-rc1-dirty #6 > [ 21.378800] Hardware name: Generic DT based system > [ 21.379965] Backtrace: > [ 21.381024] [<80107d44>] (dump_backtrace) from [<80107f78>] (show_stack+0x20/0x24) > [ 21.382713] r7:8038b720 r6:00000009 r5:00000000 r4:87897c64 > [ 21.383815] [<80107f58>] (show_stack) from [<80656398>] (dump_stack+0x20/0x28) > [ 21.385042] [<80656378>] (dump_stack) from [<80115f1c>] (__warn.part.3+0xb4/0xdc) > [ 21.386253] [<80115e68>] (__warn.part.3) from [<80115fb0>] (warn_slowpath_fmt+0x6c/0x90) > [ 21.387471] r6:00000342 r5:807f8758 r4:80a07008 > [ 21.388278] [<80115f48>] (warn_slowpath_fmt) from [<8038b720>] (unregister_allocated_timer+0x38/0x94) > [ 21.389809] r3:0000004a r2:807f8774 > [ 21.390526] r7:00000000 r6:0000000a r5:60000153 r4:0000004a > [ 21.391601] [<8038b6e8>] (unregister_allocated_timer) from [<8038baac>] (aspeed_gpio_set_config+0x330/0x48c) > [ 21.393248] [<8038b77c>] (aspeed_gpio_set_config) from [<803840b0>] (gpiod_set_debounce+0xe8/0x114) > [ 21.394745] r10:82ee2248 r9:00000000 r8:87b63a00 r7:00001388 r6:87947320 r5:80729310 > [ 21.396030] r4:879f64a0 > [ 21.396499] [<80383fc8>] (gpiod_set_debounce) from [<804b4350>] (gpio_keys_probe+0x69c/0x8e0) > [ 21.397715] r7:845d94b8 r6:00000001 r5:00000000 r4:87b63a1c > [ 21.398618] [<804b3cb4>] (gpio_keys_probe) from [<8040eeec>] (platform_dev_probe+0x44/0x80) > [ 21.399834] r10:00000003 r9:80a3a8b0 r8:00000000 r7:00000000 r6:80a7f9dc r5:80a3a8b0 > [ 21.401163] r4:8796bc10 > [ 21.401634] [<8040eea8>] (platform_drv_probe) from [<8040d0d4>] (really_probe+0x208/0x3dc) > [ 21.402786] r5:80a7f8d0 r4:8796bc10 > [ 21.403547] [<8040cecc>] (really_probe) from [<8040d7a4>] (driver_probe_device+0x130/0x170) > [ 21.404744] r10:0000007b r9:8093683c r8:00000000 r7:80a07008 r6:80a3a8b0 r5:8796bc10 > [ 21.405854] r4:80a3a8b0 > [ 21.406324] [<8040d674>] (driver_probe_device) from [<8040da8c>] (device_driver_attach+0x68/0x70) > [ 21.407568] r9:8093683c r8:00000000 r7:80a07008 r6:80a3a8b0 r5:00000000 r4:8796bc10 > [ 21.408877] [<8040da24>] (device_driver_attach) from [<8040db14>] (__driver_attach+0x80/0x150) > [ 21.410327] r7:80a07008 r6:8796bc10 r5:00000001 r4:80a3a8b0 > [ 21.411294] [<8040da94>] (__driver_attach) from [<8040b20c>] (bus_for_each_dev+0x80/0xc4) > [ 21.412641] r7:80a07008 r6:8040da94 r5:80a3a8b0 r4:87966f30 > [ 21.413580] [<8040b18c>] (bus_for_each_dev) from [<8040dc0c>] (driver_attach+0x28/0x30) > [ 21.414943] r7:00000000 r6:87b411e0 r5:80a33fc8 r4:80a3a8b0 > [ 21.415927] [<8040dbe4>] (driver_attach) from [<8040bbf0>] (bus_add_driver+0x14c/0x200) > [ 21.417289] [<8040baa4>] (bus_add_driver) from [<8040e2b4>] (driver_register+0x84/0x118) > [ 21.418652] r7:80a60ae0 r6:809226b8 r5:80a07008 r4:80a3a8b0 > [ 21.419652] [<8040e230>] (driver_register) from [<8040fc28>] (__platform_driver_register+0x3c/0x50) > [ 21.421193] r5:80a07008 r4:809525f8 > [ 21.421990] [<8040fbec>] (__platform_driver_register) from [<809226d8>] (gpio_keys_init+0x20/0x28) > [ 21.423447] [<809226b8>] (gpio_keys_init) from [<8090128c>] (do_one_initcall+0x80/0x180) > [ 21.424886] [<8090120c>] (do_one_initcall) from [<80901538>] (kernel_init_freeable+0x1ac/0x26c) > [ 21.426354] r8:80a60ae0 r7:80a60ae0 r6:8093685c r5:00000008 r4:809525f8 > [ 21.427579] [<8090138c>] (kernel_init_freeable) from [<8066d9a0>] (kernel_init+0x18/0x11c) > [ 21.428819] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8066d988 > [ 21.429947] r4:00000000 > [ 21.430415] [<8066d988>] (kernel_init) from [<801010e8>] (ret_from_fork+0x14/0x2c) > [ 21.431666] Exception stack(0x87897fb0 to 0x87897ff8) > [ 21.432877] 7fa0: 00000000 00000000 00000000 00000000 > [ 21.434446] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 21.436052] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > [ 21.437308] r5:8066d988 r4:00000000 > [ 21.438102] ---[ end trace d7d7ac3a80567d0e ]--- We only hit unregister_allocated_timer() if the argument to aspeed_gpio_set_config() is 0, but we can't be calling through gpiod_set_debounce() from gpio_keys_probe() unless the gpio-keys button has a non-zero debounce interval. Commit 6581eaf0e890 ("gpio: use new gpio_set_config() helper in more places") spreads the use of gpio_set_config() to the debounce and transitory state configuration paths. The implementation of gpio_set_config() is: > static int gpio_set_config(struct gpio_chip *gc, unsigned offset, > enum pin_config_param mode) > { > unsigned long config = { PIN_CONF_PACKED(mode, 0) }; > > return gc->set_config ? gc->set_config(gc, offset, config) : -ENOTSUPP; > } Here it packs its own config value with a fixed argument of 0; this is incorrect behaviour for implementing the debounce and transitory functions, and the debounce and transitory gpio_set_config() call-sites now have an undetected type mismatch as they both already pack their own config parameter (i.e. what gets passed is not an `enum pin_config_param`). Indeed this can be seen in the small diff for 6581eaf0e890: > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index de595fa31a1a..1f239aac43df 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -2725,7 +2725,7 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned debounce) > } > > config = pinconf_to_config_packed(PIN_CONFIG_INPUT_DEBOUNCE, debounce); > - return chip->set_config(chip, gpio_chip_hwgpio(desc), config); > + return gpio_set_config(chip, gpio_chip_hwgpio(desc), config); > } > EXPORT_SYMBOL_GPL(gpiod_set_debounce); > > @@ -2762,7 +2762,7 @@ int gpiod_set_transitory(struct gpio_desc *desc, bool transitory) > packed = pinconf_to_config_packed(PIN_CONFIG_PERSIST_STATE, > !transitory); > gpio = gpio_chip_hwgpio(desc); > - rc = chip->set_config(chip, gpio, packed); > + rc = gpio_set_config(chip, gpio, packed); > if (rc == -ENOTSUPP) { > dev_dbg(&desc->gdev->dev, "Persistence not supported for GPIO %d\n", > gpio); Revert commit 6581eaf0e890 ("gpio: use new gpio_set_config() helper in more places") to restore correct behaviour for gpiod_set_debounce() and gpiod_set_transitory(). Cc: Thomas Petazzoni Signed-off-by: Andrew Jeffery Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 144af0733581..0495bf1d480a 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2776,7 +2776,7 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned debounce) } config = pinconf_to_config_packed(PIN_CONFIG_INPUT_DEBOUNCE, debounce); - return gpio_set_config(chip, gpio_chip_hwgpio(desc), config); + return chip->set_config(chip, gpio_chip_hwgpio(desc), config); } EXPORT_SYMBOL_GPL(gpiod_set_debounce); @@ -2813,7 +2813,7 @@ int gpiod_set_transitory(struct gpio_desc *desc, bool transitory) packed = pinconf_to_config_packed(PIN_CONFIG_PERSIST_STATE, !transitory); gpio = gpio_chip_hwgpio(desc); - rc = gpio_set_config(chip, gpio, packed); + rc = chip->set_config(chip, gpio, packed); if (rc == -ENOTSUPP) { dev_dbg(&desc->gdev->dev, "Persistence not supported for GPIO %d\n", gpio); From 2583303debb7acc77295b77901916d08a4c743c2 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Fri, 22 Mar 2019 18:27:12 +0100 Subject: [PATCH 344/468] gpio: mockup: fix debugfs read The debugfs read callback must advance ppos or users using read() on the file descriptor will never get the EOL. This wasn't spotted before as I was using busybox cat for testing which uses sendfile() internally and only noticed it now when switched to cat from coreutils. Fixes: 2a9e27408e12 ("gpio: mockup: rework debugfs interface") Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpio-mockup.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c index 154d959e8993..74ba8b1d71d8 100644 --- a/drivers/gpio/gpio-mockup.c +++ b/drivers/gpio/gpio-mockup.c @@ -204,8 +204,9 @@ static ssize_t gpio_mockup_debugfs_read(struct file *file, struct gpio_mockup_chip *chip; struct seq_file *sfile; struct gpio_chip *gc; + int val, rv, cnt; char buf[3]; - int val, rv; + if (*ppos != 0) return 0; @@ -216,13 +217,14 @@ static ssize_t gpio_mockup_debugfs_read(struct file *file, gc = &chip->gc; val = gpio_mockup_get(gc, priv->offset); - snprintf(buf, sizeof(buf), "%d\n", val); + cnt = snprintf(buf, sizeof(buf), "%d\n", val); - rv = copy_to_user(usr_buf, buf, sizeof(buf)); + rv = copy_to_user(usr_buf, buf, cnt); if (rv) return rv; - return sizeof(buf) - 1; + *ppos += cnt; + return cnt; } static ssize_t gpio_mockup_debugfs_write(struct file *file, From 0f02daed4e089c7a380a0ffdc9d93a5989043cf4 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Mon, 4 Mar 2019 13:55:46 +0800 Subject: [PATCH 345/468] x86/boot: Fix incorrect ifdeffery scope The declarations related to immovable memory handling are out of the BOOT_COMPRESSED_MISC_H #ifdef scope, wrap them inside. Signed-off-by: Baoquan He Signed-off-by: Borislav Petkov Cc: Chao Fan Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Juergen Gross Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190304055546.18566-1-bhe@redhat.com --- arch/x86/boot/compressed/misc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index fd13655e0f9b..d2f184165934 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -120,8 +120,6 @@ static inline void console_init(void) void set_sev_encryption_mask(void); -#endif - /* acpi.c */ #ifdef CONFIG_ACPI acpi_physical_address get_rsdp_addr(void); @@ -135,3 +133,5 @@ int count_immovable_mem_regions(void); #else static inline int count_immovable_mem_regions(void) { return 0; } #endif + +#endif /* BOOT_COMPRESSED_MISC_H */ From c4dcd89d20a8fe4009d25660c69396611328cc5e Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:19 +0100 Subject: [PATCH 346/468] i2c: iop3xx: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/{i2c-xscale.txt => i2c-iop3xx.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/i2c/{i2c-xscale.txt => i2c-iop3xx.txt} (100%) diff --git a/Documentation/devicetree/bindings/i2c/i2c-xscale.txt b/Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt similarity index 100% rename from Documentation/devicetree/bindings/i2c/i2c-xscale.txt rename to Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt From 94c87527f4e1ebf85936f707ec84ff458f3bbb00 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:20 +0100 Subject: [PATCH 347/468] i2c: mt65xx: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/{i2c-mtk.txt => i2c-mt65xx.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/i2c/{i2c-mtk.txt => i2c-mt65xx.txt} (100%) diff --git a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt similarity index 100% rename from Documentation/devicetree/bindings/i2c/i2c-mtk.txt rename to Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt From 0a96f9ffbfe9446b5c4c67461085236d578248e5 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:21 +0100 Subject: [PATCH 348/468] i2c: stu300: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/{i2c-st-ddci2c.txt => i2c-stu300.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/i2c/{i2c-st-ddci2c.txt => i2c-stu300.txt} (100%) diff --git a/Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt b/Documentation/devicetree/bindings/i2c/i2c-stu300.txt similarity index 100% rename from Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt rename to Documentation/devicetree/bindings/i2c/i2c-stu300.txt From 45dfceb0d14a06eeffe22581eb2996f4ed5225ca Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:22 +0100 Subject: [PATCH 349/468] i2c: sun6i-p2wi: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang Acked-by: Maxime Ripard --- .../bindings/i2c/{i2c-sunxi-p2wi.txt => i2c-sun6i-p2wi.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/i2c/{i2c-sunxi-p2wi.txt => i2c-sun6i-p2wi.txt} (100%) diff --git a/Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt b/Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt similarity index 100% rename from Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt rename to Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt From 080a910414659dfd18ffaf60a1e95b96c3f50eab Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:23 +0100 Subject: [PATCH 350/468] i2c: wmt: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/{i2c-vt8500.txt => i2c-wmt.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/i2c/{i2c-vt8500.txt => i2c-wmt.txt} (100%) diff --git a/Documentation/devicetree/bindings/i2c/i2c-vt8500.txt b/Documentation/devicetree/bindings/i2c/i2c-wmt.txt similarity index 100% rename from Documentation/devicetree/bindings/i2c/i2c-vt8500.txt rename to Documentation/devicetree/bindings/i2c/i2c-wmt.txt From b929a500d68479163c48739d809cbf4c1335db6f Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Tue, 26 Mar 2019 21:30:46 +0100 Subject: [PATCH 351/468] x86/realmode: Don't leak the trampoline kernel address Since commit ad67b74d2469 ("printk: hash addresses printed with %p") at boot "____ptrval____" is printed instead of the trampoline addresses: Base memory trampoline at [(____ptrval____)] 99000 size 24576 Remove the print as we don't want to leak kernel addresses and this statement is not needed anymore. Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") Signed-off-by: Matteo Croce Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190326203046.20787-1-mcroce@redhat.com --- arch/x86/realmode/init.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index d10105825d57..47d097946872 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -20,8 +20,6 @@ void __init set_real_mode_mem(phys_addr_t mem, size_t size) void *base = __va(mem); real_mode_header = (struct real_mode_header *) base; - printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n", - base, (unsigned long long)mem, size); } void __init reserve_real_mode(void) From 93e1c8a638308980309e009cc40b5a57ef87caf1 Mon Sep 17 00:00:00 2001 From: Romain Izard Date: Fri, 22 Mar 2019 16:53:02 +0100 Subject: [PATCH 352/468] usb: cdc-acm: fix race during wakeup blocking TX traffic When the kernel is compiled with preemption enabled, the URB completion handler can run in parallel with the work responsible for waking up the tty layer. If the URB handler sets the EVENT_TTY_WAKEUP bit during the call to tty_port_tty_wakeup() to signal that there is room for additional input, it will be cleared at the end of this call. As a result, TX traffic on the upper layer will be blocked. This can be seen with a kernel configured with CONFIG_PREEMPT, and a fast modem connected with PPP running over a USB CDC-ACM port. Use test_and_clear_bit() instead, which ensures that each wakeup requested by the URB completion code will trigger a call to tty_port_tty_wakeup(). Fixes: 1aba579f3cf5 cdc-acm: handle read pipe errors Signed-off-by: Romain Izard Cc: stable Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 739f8960811a..ec666eb4b7b4 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -558,10 +558,8 @@ static void acm_softint(struct work_struct *work) clear_bit(EVENT_RX_STALL, &acm->flags); } - if (test_bit(EVENT_TTY_WAKEUP, &acm->flags)) { + if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags)) tty_port_tty_wakeup(&acm->port); - clear_bit(EVENT_TTY_WAKEUP, &acm->flags); - } } /* From f276e002793cdb820862e8ea8f76769d56bba575 Mon Sep 17 00:00:00 2001 From: Mukesh Ojha Date: Tue, 26 Mar 2019 13:42:22 +0530 Subject: [PATCH 353/468] usb: u132-hcd: fix resource leak if platform_driver_register fails, cleanup the allocated resource gracefully. Signed-off-by: Mukesh Ojha Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/u132-hcd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/host/u132-hcd.c b/drivers/usb/host/u132-hcd.c index 934584f0a20a..6343fbacd244 100644 --- a/drivers/usb/host/u132-hcd.c +++ b/drivers/usb/host/u132-hcd.c @@ -3204,6 +3204,9 @@ static int __init u132_hcd_init(void) printk(KERN_INFO "driver %s\n", hcd_name); workqueue = create_singlethread_workqueue("u132"); retval = platform_driver_register(&u132_platform_driver); + if (retval) + destroy_workqueue(workqueue); + return retval; } From f3040983132bf3477acd45d2452a906e67c2fec9 Mon Sep 17 00:00:00 2001 From: Razvan Stefanescu Date: Tue, 19 Mar 2019 15:20:34 +0200 Subject: [PATCH 354/468] tty/serial: atmel: Add is_half_duplex helper Use a helper function to check that a port needs to use half duplex communication, replacing several occurrences of multi-line bit checking. Fixes: b389f173aaa1 ("tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done") Cc: stable Signed-off-by: Razvan Stefanescu Acked-by: Richard Genoud Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index 41b728d223d1..55ca3416113f 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -231,6 +231,13 @@ static inline void atmel_uart_write_char(struct uart_port *port, u8 value) __raw_writeb(value, port->membase + ATMEL_US_THR); } +static inline int atmel_uart_is_half_duplex(struct uart_port *port) +{ + return ((port->rs485.flags & SER_RS485_ENABLED) && + !(port->rs485.flags & SER_RS485_RX_DURING_TX)) || + (port->iso7816.flags & SER_ISO7816_ENABLED); +} + #ifdef CONFIG_SERIAL_ATMEL_PDC static bool atmel_use_pdc_rx(struct uart_port *port) { @@ -608,10 +615,9 @@ static void atmel_stop_tx(struct uart_port *port) /* Disable interrupts */ atmel_uart_writel(port, ATMEL_US_IDR, atmel_port->tx_done_mask); - if (((port->rs485.flags & SER_RS485_ENABLED) && - !(port->rs485.flags & SER_RS485_RX_DURING_TX)) || - port->iso7816.flags & SER_ISO7816_ENABLED) + if (atmel_uart_is_half_duplex(port)) atmel_start_rx(port); + } /* @@ -628,9 +634,7 @@ static void atmel_start_tx(struct uart_port *port) return; if (atmel_use_pdc_tx(port) || atmel_use_dma_tx(port)) - if (((port->rs485.flags & SER_RS485_ENABLED) && - !(port->rs485.flags & SER_RS485_RX_DURING_TX)) || - port->iso7816.flags & SER_ISO7816_ENABLED) + if (atmel_uart_is_half_duplex(port)) atmel_stop_rx(port); if (atmel_use_pdc_tx(port)) @@ -928,9 +932,7 @@ static void atmel_complete_tx_dma(void *arg) */ if (!uart_circ_empty(xmit)) atmel_tasklet_schedule(atmel_port, &atmel_port->tasklet_tx); - else if (((port->rs485.flags & SER_RS485_ENABLED) && - !(port->rs485.flags & SER_RS485_RX_DURING_TX)) || - port->iso7816.flags & SER_ISO7816_ENABLED) { + else if (atmel_uart_is_half_duplex(port)) { /* DMA done, stop TX, start RX for RS485 */ atmel_start_rx(port); } @@ -1512,9 +1514,7 @@ static void atmel_tx_pdc(struct uart_port *port) atmel_uart_writel(port, ATMEL_US_IER, atmel_port->tx_done_mask); } else { - if (((port->rs485.flags & SER_RS485_ENABLED) && - !(port->rs485.flags & SER_RS485_RX_DURING_TX)) || - port->iso7816.flags & SER_ISO7816_ENABLED) { + if (atmel_uart_is_half_duplex(port)) { /* DMA done, stop TX, start RX for RS485 */ atmel_start_rx(port); } From 69646d7a3689fbe1a65ae90397d22ac3f1b8d40f Mon Sep 17 00:00:00 2001 From: Razvan Stefanescu Date: Tue, 19 Mar 2019 15:20:35 +0200 Subject: [PATCH 355/468] tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped In half-duplex operation, RX should be started after TX completes. If DMA is used, there is a case when the DMA transfer completes but the TX FIFO is not emptied, so the RX cannot be restarted just yet. Use a boolean variable to store this state and rearm TX interrupt mask to be signaled again that the transfer finished. In interrupt transmit handler this variable is used to start RX. A warning message is generated if RX is activated before TX fifo is cleared. Fixes: b389f173aaa1 ("tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done") Signed-off-by: Razvan Stefanescu Acked-by: Richard Genoud Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index 55ca3416113f..0b4f36905321 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -166,6 +166,8 @@ struct atmel_uart_port { unsigned int pending_status; spinlock_t lock_suspended; + bool hd_start_rx; /* can start RX during half-duplex operation */ + /* ISO7816 */ unsigned int fidi_min; unsigned int fidi_max; @@ -933,8 +935,13 @@ static void atmel_complete_tx_dma(void *arg) if (!uart_circ_empty(xmit)) atmel_tasklet_schedule(atmel_port, &atmel_port->tasklet_tx); else if (atmel_uart_is_half_duplex(port)) { - /* DMA done, stop TX, start RX for RS485 */ - atmel_start_rx(port); + /* + * DMA done, re-enable TXEMPTY and signal that we can stop + * TX and start RX for RS485 + */ + atmel_port->hd_start_rx = true; + atmel_uart_writel(port, ATMEL_US_IER, + atmel_port->tx_done_mask); } spin_unlock_irqrestore(&port->lock, flags); @@ -1382,9 +1389,20 @@ atmel_handle_transmit(struct uart_port *port, unsigned int pending) struct atmel_uart_port *atmel_port = to_atmel_uart_port(port); if (pending & atmel_port->tx_done_mask) { - /* Either PDC or interrupt transmission */ atmel_uart_writel(port, ATMEL_US_IDR, atmel_port->tx_done_mask); + + /* Start RX if flag was set and FIFO is empty */ + if (atmel_port->hd_start_rx) { + if (!(atmel_uart_readl(port, ATMEL_US_CSR) + & ATMEL_US_TXEMPTY)) + dev_warn(port->dev, "Should start RX, but TX fifo is not empty\n"); + + atmel_port->hd_start_rx = false; + atmel_start_rx(port); + return; + } + atmel_tasklet_schedule(atmel_port, &atmel_port->tasklet_tx); } } From 898a737c8a436b2fcd6dcb0b57775ada2f846a26 Mon Sep 17 00:00:00 2001 From: Erin Lo Date: Mon, 11 Mar 2019 16:54:31 +0800 Subject: [PATCH 356/468] dt-bindings: serial: Add compatible for Mediatek MT8183 This adds dt-binding documentation of uart for Mediatek MT8183 SoC Platform. Signed-off-by: Erin Lo Acked-by: Rob Herring Acked-by: Matthias Brugger Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/serial/mtk-uart.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/serial/mtk-uart.txt b/Documentation/devicetree/bindings/serial/mtk-uart.txt index 742cb470595b..bcfb13194f16 100644 --- a/Documentation/devicetree/bindings/serial/mtk-uart.txt +++ b/Documentation/devicetree/bindings/serial/mtk-uart.txt @@ -16,6 +16,7 @@ Required properties: * "mediatek,mt8127-uart" for MT8127 compatible UARTS * "mediatek,mt8135-uart" for MT8135 compatible UARTS * "mediatek,mt8173-uart" for MT8173 compatible UARTS + * "mediatek,mt8183-uart", "mediatek,mt6577-uart" for MT8183 compatible UARTS * "mediatek,mt6577-uart" for MT6577 and all of the above - reg: The base address of the UART register bank. From 3ec8002951ea173e24b466df1ea98c56b7920e63 Mon Sep 17 00:00:00 2001 From: Wentao Wang Date: Wed, 20 Mar 2019 15:30:39 +0000 Subject: [PATCH 357/468] Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Echo "" to /sys/module/kgdboc/parameters/kgdboc will fail with "No such device” error. This is caused by function "configure_kgdboc" who init err to ENODEV when the config is empty (legal input) the code go out with ENODEV returned. Fixes: 2dd453168643 ("kgdboc: Fix restrict error") Signed-off-by: Wentao Wang Cc: stable Acked-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/kgdboc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 6fb312e7af71..bfe5e9e034ec 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c @@ -148,8 +148,10 @@ static int configure_kgdboc(void) char *cptr = config; struct console *cons; - if (!strlen(config) || isspace(config[0])) + if (!strlen(config) || isspace(config[0])) { + err = 0; goto noconfig; + } kgdboc_io_ops.is_console = 0; kgdb_tty_driver = NULL; From f4e68d58cf2b20a581759bbc7228052534652673 Mon Sep 17 00:00:00 2001 From: Fabien Dessenne Date: Thu, 21 Mar 2019 16:43:26 +0100 Subject: [PATCH 358/468] tty: fix NULL pointer issue when tty_port ops is not set Unlike 'client_ops' which is initialized to 'default_client_ops', the port operations 'ops' may be left to NULL. Check the 'ops' value before checking the 'ops->x' value. Signed-off-by: Fabien Dessenne Signed-off-by: Greg Kroah-Hartman --- drivers/tty/tty_port.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c index 044c3cbdcfa4..a9e12b3bc31d 100644 --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -325,7 +325,7 @@ static void tty_port_shutdown(struct tty_port *port, struct tty_struct *tty) if (tty && C_HUPCL(tty)) tty_port_lower_dtr_rts(port); - if (port->ops->shutdown) + if (port->ops && port->ops->shutdown) port->ops->shutdown(port); } out: @@ -398,7 +398,7 @@ EXPORT_SYMBOL_GPL(tty_port_tty_wakeup); */ int tty_port_carrier_raised(struct tty_port *port) { - if (port->ops->carrier_raised == NULL) + if (!port->ops || !port->ops->carrier_raised) return 1; return port->ops->carrier_raised(port); } @@ -414,7 +414,7 @@ EXPORT_SYMBOL(tty_port_carrier_raised); */ void tty_port_raise_dtr_rts(struct tty_port *port) { - if (port->ops->dtr_rts) + if (port->ops && port->ops->dtr_rts) port->ops->dtr_rts(port, 1); } EXPORT_SYMBOL(tty_port_raise_dtr_rts); @@ -429,7 +429,7 @@ EXPORT_SYMBOL(tty_port_raise_dtr_rts); */ void tty_port_lower_dtr_rts(struct tty_port *port) { - if (port->ops->dtr_rts) + if (port->ops && port->ops->dtr_rts) port->ops->dtr_rts(port, 0); } EXPORT_SYMBOL(tty_port_lower_dtr_rts); @@ -684,7 +684,7 @@ int tty_port_open(struct tty_port *port, struct tty_struct *tty, if (!tty_port_initialized(port)) { clear_bit(TTY_IO_ERROR, &tty->flags); - if (port->ops->activate) { + if (port->ops && port->ops->activate) { int retval = port->ops->activate(port, tty); if (retval) { mutex_unlock(&port->mutex); From 0532a1b0d045115521a93acf28f1270df89ad806 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Fri, 22 Mar 2019 09:19:34 +0100 Subject: [PATCH 359/468] virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x VirtualBox 6.0.x has a new feature where the guest kernel driver passes info about the origin of the request (e.g. userspace or kernelspace) to the hypervisor. If we do not pass this information then when running the 6.0.x userspace guest-additions tools on a 6.0.x host, some requests will get denied with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and the mounting of shared folders marked to be auto-mounted. This commit implements passing the requestor info to the host, fixing this. Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman --- drivers/virt/vboxguest/vboxguest_core.c | 106 ++++++++++++++------- drivers/virt/vboxguest/vboxguest_core.h | 15 +-- drivers/virt/vboxguest/vboxguest_linux.c | 26 ++++- drivers/virt/vboxguest/vboxguest_utils.c | 32 ++++--- drivers/virt/vboxguest/vboxguest_version.h | 9 +- drivers/virt/vboxguest/vmmdev.h | 8 +- include/linux/vbox_utils.h | 12 ++- include/uapi/linux/vbox_vmmdev_types.h | 60 ++++++++++++ 8 files changed, 197 insertions(+), 71 deletions(-) diff --git a/drivers/virt/vboxguest/vboxguest_core.c b/drivers/virt/vboxguest/vboxguest_core.c index df7d09409efe..8ca333f21292 100644 --- a/drivers/virt/vboxguest/vboxguest_core.c +++ b/drivers/virt/vboxguest/vboxguest_core.c @@ -27,6 +27,10 @@ #define GUEST_MAPPINGS_TRIES 5 +#define VBG_KERNEL_REQUEST \ + (VMMDEV_REQUESTOR_KERNEL | VMMDEV_REQUESTOR_USR_DRV | \ + VMMDEV_REQUESTOR_CON_DONT_KNOW | VMMDEV_REQUESTOR_TRUST_NOT_GIVEN) + /** * Reserves memory in which the VMM can relocate any guest mappings * that are floating around. @@ -48,7 +52,8 @@ static void vbg_guest_mappings_init(struct vbg_dev *gdev) int i, rc; /* Query the required space. */ - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_GET_HYPERVISOR_INFO); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_GET_HYPERVISOR_INFO, + VBG_KERNEL_REQUEST); if (!req) return; @@ -135,7 +140,8 @@ static void vbg_guest_mappings_exit(struct vbg_dev *gdev) * Tell the host that we're going to free the memory we reserved for * it, the free it up. (Leak the memory if anything goes wrong here.) */ - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_HYPERVISOR_INFO); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_HYPERVISOR_INFO, + VBG_KERNEL_REQUEST); if (!req) return; @@ -172,8 +178,10 @@ static int vbg_report_guest_info(struct vbg_dev *gdev) struct vmmdev_guest_info2 *req2 = NULL; int rc, ret = -ENOMEM; - req1 = vbg_req_alloc(sizeof(*req1), VMMDEVREQ_REPORT_GUEST_INFO); - req2 = vbg_req_alloc(sizeof(*req2), VMMDEVREQ_REPORT_GUEST_INFO2); + req1 = vbg_req_alloc(sizeof(*req1), VMMDEVREQ_REPORT_GUEST_INFO, + VBG_KERNEL_REQUEST); + req2 = vbg_req_alloc(sizeof(*req2), VMMDEVREQ_REPORT_GUEST_INFO2, + VBG_KERNEL_REQUEST); if (!req1 || !req2) goto out_free; @@ -187,8 +195,8 @@ static int vbg_report_guest_info(struct vbg_dev *gdev) req2->additions_minor = VBG_VERSION_MINOR; req2->additions_build = VBG_VERSION_BUILD; req2->additions_revision = VBG_SVN_REV; - /* (no features defined yet) */ - req2->additions_features = 0; + req2->additions_features = + VMMDEV_GUEST_INFO2_ADDITIONS_FEATURES_REQUESTOR_INFO; strlcpy(req2->name, VBG_VERSION_STRING, sizeof(req2->name)); @@ -230,7 +238,8 @@ static int vbg_report_driver_status(struct vbg_dev *gdev, bool active) struct vmmdev_guest_status *req; int rc; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_REPORT_GUEST_STATUS); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_REPORT_GUEST_STATUS, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; @@ -423,7 +432,8 @@ static int vbg_heartbeat_host_config(struct vbg_dev *gdev, bool enabled) struct vmmdev_heartbeat *req; int rc; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_HEARTBEAT_CONFIGURE); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_HEARTBEAT_CONFIGURE, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; @@ -457,7 +467,8 @@ static int vbg_heartbeat_init(struct vbg_dev *gdev) gdev->guest_heartbeat_req = vbg_req_alloc( sizeof(*gdev->guest_heartbeat_req), - VMMDEVREQ_GUEST_HEARTBEAT); + VMMDEVREQ_GUEST_HEARTBEAT, + VBG_KERNEL_REQUEST); if (!gdev->guest_heartbeat_req) return -ENOMEM; @@ -528,7 +539,8 @@ static int vbg_reset_host_event_filter(struct vbg_dev *gdev, struct vmmdev_mask *req; int rc; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_CTL_GUEST_FILTER_MASK); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_CTL_GUEST_FILTER_MASK, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; @@ -567,8 +579,14 @@ static int vbg_set_session_event_filter(struct vbg_dev *gdev, u32 changed, previous; int rc, ret = 0; - /* Allocate a request buffer before taking the spinlock */ - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_CTL_GUEST_FILTER_MASK); + /* + * Allocate a request buffer before taking the spinlock, when + * the session is being terminated the requestor is the kernel, + * as we're cleaning up. + */ + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_CTL_GUEST_FILTER_MASK, + session_termination ? VBG_KERNEL_REQUEST : + session->requestor); if (!req) { if (!session_termination) return -ENOMEM; @@ -627,7 +645,8 @@ static int vbg_reset_host_capabilities(struct vbg_dev *gdev) struct vmmdev_mask *req; int rc; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_GUEST_CAPABILITIES); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_GUEST_CAPABILITIES, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; @@ -662,8 +681,14 @@ static int vbg_set_session_capabilities(struct vbg_dev *gdev, u32 changed, previous; int rc, ret = 0; - /* Allocate a request buffer before taking the spinlock */ - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_GUEST_CAPABILITIES); + /* + * Allocate a request buffer before taking the spinlock, when + * the session is being terminated the requestor is the kernel, + * as we're cleaning up. + */ + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_GUEST_CAPABILITIES, + session_termination ? VBG_KERNEL_REQUEST : + session->requestor); if (!req) { if (!session_termination) return -ENOMEM; @@ -722,7 +747,8 @@ static int vbg_query_host_version(struct vbg_dev *gdev) struct vmmdev_host_version *req; int rc, ret; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_GET_HOST_VERSION); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_GET_HOST_VERSION, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; @@ -783,19 +809,24 @@ int vbg_core_init(struct vbg_dev *gdev, u32 fixed_events) gdev->mem_balloon.get_req = vbg_req_alloc(sizeof(*gdev->mem_balloon.get_req), - VMMDEVREQ_GET_MEMBALLOON_CHANGE_REQ); + VMMDEVREQ_GET_MEMBALLOON_CHANGE_REQ, + VBG_KERNEL_REQUEST); gdev->mem_balloon.change_req = vbg_req_alloc(sizeof(*gdev->mem_balloon.change_req), - VMMDEVREQ_CHANGE_MEMBALLOON); + VMMDEVREQ_CHANGE_MEMBALLOON, + VBG_KERNEL_REQUEST); gdev->cancel_req = vbg_req_alloc(sizeof(*(gdev->cancel_req)), - VMMDEVREQ_HGCM_CANCEL2); + VMMDEVREQ_HGCM_CANCEL2, + VBG_KERNEL_REQUEST); gdev->ack_events_req = vbg_req_alloc(sizeof(*gdev->ack_events_req), - VMMDEVREQ_ACKNOWLEDGE_EVENTS); + VMMDEVREQ_ACKNOWLEDGE_EVENTS, + VBG_KERNEL_REQUEST); gdev->mouse_status_req = vbg_req_alloc(sizeof(*gdev->mouse_status_req), - VMMDEVREQ_GET_MOUSE_STATUS); + VMMDEVREQ_GET_MOUSE_STATUS, + VBG_KERNEL_REQUEST); if (!gdev->mem_balloon.get_req || !gdev->mem_balloon.change_req || !gdev->cancel_req || !gdev->ack_events_req || @@ -892,9 +923,9 @@ void vbg_core_exit(struct vbg_dev *gdev) * vboxguest_linux.c calls this when userspace opens the char-device. * Return: A pointer to the new session or an ERR_PTR on error. * @gdev: The Guest extension device. - * @user: Set if this is a session for the vboxuser device. + * @requestor: VMMDEV_REQUESTOR_* flags */ -struct vbg_session *vbg_core_open_session(struct vbg_dev *gdev, bool user) +struct vbg_session *vbg_core_open_session(struct vbg_dev *gdev, u32 requestor) { struct vbg_session *session; @@ -903,7 +934,7 @@ struct vbg_session *vbg_core_open_session(struct vbg_dev *gdev, bool user) return ERR_PTR(-ENOMEM); session->gdev = gdev; - session->user_session = user; + session->requestor = requestor; return session; } @@ -924,7 +955,9 @@ void vbg_core_close_session(struct vbg_session *session) if (!session->hgcm_client_ids[i]) continue; - vbg_hgcm_disconnect(gdev, session->hgcm_client_ids[i], &rc); + /* requestor is kernel here, as we're cleaning up. */ + vbg_hgcm_disconnect(gdev, VBG_KERNEL_REQUEST, + session->hgcm_client_ids[i], &rc); } kfree(session); @@ -1152,7 +1185,8 @@ static int vbg_req_allowed(struct vbg_dev *gdev, struct vbg_session *session, return -EPERM; } - if (trusted_apps_only && session->user_session) { + if (trusted_apps_only && + (session->requestor & VMMDEV_REQUESTOR_USER_DEVICE)) { vbg_err("Denying userspace vmm call type %#08x through vboxuser device node\n", req->request_type); return -EPERM; @@ -1209,8 +1243,8 @@ static int vbg_ioctl_hgcm_connect(struct vbg_dev *gdev, if (i >= ARRAY_SIZE(session->hgcm_client_ids)) return -EMFILE; - ret = vbg_hgcm_connect(gdev, &conn->u.in.loc, &client_id, - &conn->hdr.rc); + ret = vbg_hgcm_connect(gdev, session->requestor, &conn->u.in.loc, + &client_id, &conn->hdr.rc); mutex_lock(&gdev->session_mutex); if (ret == 0 && conn->hdr.rc >= 0) { @@ -1251,7 +1285,8 @@ static int vbg_ioctl_hgcm_disconnect(struct vbg_dev *gdev, if (i >= ARRAY_SIZE(session->hgcm_client_ids)) return -EINVAL; - ret = vbg_hgcm_disconnect(gdev, client_id, &disconn->hdr.rc); + ret = vbg_hgcm_disconnect(gdev, session->requestor, client_id, + &disconn->hdr.rc); mutex_lock(&gdev->session_mutex); if (ret == 0 && disconn->hdr.rc >= 0) @@ -1313,12 +1348,12 @@ static int vbg_ioctl_hgcm_call(struct vbg_dev *gdev, } if (IS_ENABLED(CONFIG_COMPAT) && f32bit) - ret = vbg_hgcm_call32(gdev, client_id, + ret = vbg_hgcm_call32(gdev, session->requestor, client_id, call->function, call->timeout_ms, VBG_IOCTL_HGCM_CALL_PARMS32(call), call->parm_count, &call->hdr.rc); else - ret = vbg_hgcm_call(gdev, client_id, + ret = vbg_hgcm_call(gdev, session->requestor, client_id, call->function, call->timeout_ms, VBG_IOCTL_HGCM_CALL_PARMS(call), call->parm_count, &call->hdr.rc); @@ -1408,6 +1443,7 @@ static int vbg_ioctl_check_balloon(struct vbg_dev *gdev, } static int vbg_ioctl_write_core_dump(struct vbg_dev *gdev, + struct vbg_session *session, struct vbg_ioctl_write_coredump *dump) { struct vmmdev_write_core_dump *req; @@ -1415,7 +1451,8 @@ static int vbg_ioctl_write_core_dump(struct vbg_dev *gdev, if (vbg_ioctl_chk(&dump->hdr, sizeof(dump->u.in), 0)) return -EINVAL; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_WRITE_COREDUMP); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_WRITE_COREDUMP, + session->requestor); if (!req) return -ENOMEM; @@ -1476,7 +1513,7 @@ int vbg_core_ioctl(struct vbg_session *session, unsigned int req, void *data) case VBG_IOCTL_CHECK_BALLOON: return vbg_ioctl_check_balloon(gdev, data); case VBG_IOCTL_WRITE_CORE_DUMP: - return vbg_ioctl_write_core_dump(gdev, data); + return vbg_ioctl_write_core_dump(gdev, session, data); } /* Variable sized requests. */ @@ -1508,7 +1545,8 @@ int vbg_core_set_mouse_status(struct vbg_dev *gdev, u32 features) struct vmmdev_mouse_status *req; int rc; - req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_MOUSE_STATUS); + req = vbg_req_alloc(sizeof(*req), VMMDEVREQ_SET_MOUSE_STATUS, + VBG_KERNEL_REQUEST); if (!req) return -ENOMEM; diff --git a/drivers/virt/vboxguest/vboxguest_core.h b/drivers/virt/vboxguest/vboxguest_core.h index 7ad9ec45bfa9..4188c12b839f 100644 --- a/drivers/virt/vboxguest/vboxguest_core.h +++ b/drivers/virt/vboxguest/vboxguest_core.h @@ -154,15 +154,15 @@ struct vbg_session { * host. Protected by vbg_gdev.session_mutex. */ u32 guest_caps; - /** Does this session belong to a root process or a user one? */ - bool user_session; + /** VMMDEV_REQUESTOR_* flags */ + u32 requestor; /** Set on CANCEL_ALL_WAITEVENTS, protected by vbg_devevent_spinlock. */ bool cancel_waiters; }; int vbg_core_init(struct vbg_dev *gdev, u32 fixed_events); void vbg_core_exit(struct vbg_dev *gdev); -struct vbg_session *vbg_core_open_session(struct vbg_dev *gdev, bool user); +struct vbg_session *vbg_core_open_session(struct vbg_dev *gdev, u32 requestor); void vbg_core_close_session(struct vbg_session *session); int vbg_core_ioctl(struct vbg_session *session, unsigned int req, void *data); int vbg_core_set_mouse_status(struct vbg_dev *gdev, u32 features); @@ -172,12 +172,13 @@ irqreturn_t vbg_core_isr(int irq, void *dev_id); void vbg_linux_mouse_event(struct vbg_dev *gdev); /* Private (non exported) functions form vboxguest_utils.c */ -void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type); +void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type, + u32 requestor); void vbg_req_free(void *req, size_t len); int vbg_req_perform(struct vbg_dev *gdev, void *req); int vbg_hgcm_call32( - struct vbg_dev *gdev, u32 client_id, u32 function, u32 timeout_ms, - struct vmmdev_hgcm_function_parameter32 *parm32, u32 parm_count, - int *vbox_status); + struct vbg_dev *gdev, u32 requestor, u32 client_id, u32 function, + u32 timeout_ms, struct vmmdev_hgcm_function_parameter32 *parm32, + u32 parm_count, int *vbox_status); #endif diff --git a/drivers/virt/vboxguest/vboxguest_linux.c b/drivers/virt/vboxguest/vboxguest_linux.c index 6e2a9619192d..6e8c0f1c1056 100644 --- a/drivers/virt/vboxguest/vboxguest_linux.c +++ b/drivers/virt/vboxguest/vboxguest_linux.c @@ -5,6 +5,7 @@ * Copyright (C) 2006-2016 Oracle Corporation */ +#include #include #include #include @@ -28,6 +29,23 @@ static DEFINE_MUTEX(vbg_gdev_mutex); /** Global vbg_gdev pointer used by vbg_get/put_gdev. */ static struct vbg_dev *vbg_gdev; +static u32 vbg_misc_device_requestor(struct inode *inode) +{ + u32 requestor = VMMDEV_REQUESTOR_USERMODE | + VMMDEV_REQUESTOR_CON_DONT_KNOW | + VMMDEV_REQUESTOR_TRUST_NOT_GIVEN; + + if (from_kuid(current_user_ns(), current->cred->uid) == 0) + requestor |= VMMDEV_REQUESTOR_USR_ROOT; + else + requestor |= VMMDEV_REQUESTOR_USR_USER; + + if (in_egroup_p(inode->i_gid)) + requestor |= VMMDEV_REQUESTOR_GRP_VBOX; + + return requestor; +} + static int vbg_misc_device_open(struct inode *inode, struct file *filp) { struct vbg_session *session; @@ -36,7 +54,7 @@ static int vbg_misc_device_open(struct inode *inode, struct file *filp) /* misc_open sets filp->private_data to our misc device */ gdev = container_of(filp->private_data, struct vbg_dev, misc_device); - session = vbg_core_open_session(gdev, false); + session = vbg_core_open_session(gdev, vbg_misc_device_requestor(inode)); if (IS_ERR(session)) return PTR_ERR(session); @@ -53,7 +71,8 @@ static int vbg_misc_device_user_open(struct inode *inode, struct file *filp) gdev = container_of(filp->private_data, struct vbg_dev, misc_device_user); - session = vbg_core_open_session(gdev, false); + session = vbg_core_open_session(gdev, vbg_misc_device_requestor(inode) | + VMMDEV_REQUESTOR_USER_DEVICE); if (IS_ERR(session)) return PTR_ERR(session); @@ -115,7 +134,8 @@ static long vbg_misc_device_ioctl(struct file *filp, unsigned int req, req == VBG_IOCTL_VMMDEV_REQUEST_BIG; if (is_vmmdev_req) - buf = vbg_req_alloc(size, VBG_IOCTL_HDR_TYPE_DEFAULT); + buf = vbg_req_alloc(size, VBG_IOCTL_HDR_TYPE_DEFAULT, + session->requestor); else buf = kmalloc(size, GFP_KERNEL); if (!buf) diff --git a/drivers/virt/vboxguest/vboxguest_utils.c b/drivers/virt/vboxguest/vboxguest_utils.c index bf4474214b4d..75fd140b02ff 100644 --- a/drivers/virt/vboxguest/vboxguest_utils.c +++ b/drivers/virt/vboxguest/vboxguest_utils.c @@ -62,7 +62,8 @@ VBG_LOG(vbg_err, pr_err); VBG_LOG(vbg_debug, pr_debug); #endif -void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type) +void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type, + u32 requestor) { struct vmmdev_request_header *req; int order = get_order(PAGE_ALIGN(len)); @@ -78,7 +79,7 @@ void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type) req->request_type = req_type; req->rc = VERR_GENERAL_FAILURE; req->reserved1 = 0; - req->reserved2 = 0; + req->requestor = requestor; return req; } @@ -119,7 +120,7 @@ static bool hgcm_req_done(struct vbg_dev *gdev, return done; } -int vbg_hgcm_connect(struct vbg_dev *gdev, +int vbg_hgcm_connect(struct vbg_dev *gdev, u32 requestor, struct vmmdev_hgcm_service_location *loc, u32 *client_id, int *vbox_status) { @@ -127,7 +128,7 @@ int vbg_hgcm_connect(struct vbg_dev *gdev, int rc; hgcm_connect = vbg_req_alloc(sizeof(*hgcm_connect), - VMMDEVREQ_HGCM_CONNECT); + VMMDEVREQ_HGCM_CONNECT, requestor); if (!hgcm_connect) return -ENOMEM; @@ -153,13 +154,15 @@ int vbg_hgcm_connect(struct vbg_dev *gdev, } EXPORT_SYMBOL(vbg_hgcm_connect); -int vbg_hgcm_disconnect(struct vbg_dev *gdev, u32 client_id, int *vbox_status) +int vbg_hgcm_disconnect(struct vbg_dev *gdev, u32 requestor, + u32 client_id, int *vbox_status) { struct vmmdev_hgcm_disconnect *hgcm_disconnect = NULL; int rc; hgcm_disconnect = vbg_req_alloc(sizeof(*hgcm_disconnect), - VMMDEVREQ_HGCM_DISCONNECT); + VMMDEVREQ_HGCM_DISCONNECT, + requestor); if (!hgcm_disconnect) return -ENOMEM; @@ -593,9 +596,10 @@ static int hgcm_call_copy_back_result( return 0; } -int vbg_hgcm_call(struct vbg_dev *gdev, u32 client_id, u32 function, - u32 timeout_ms, struct vmmdev_hgcm_function_parameter *parms, - u32 parm_count, int *vbox_status) +int vbg_hgcm_call(struct vbg_dev *gdev, u32 requestor, u32 client_id, + u32 function, u32 timeout_ms, + struct vmmdev_hgcm_function_parameter *parms, u32 parm_count, + int *vbox_status) { struct vmmdev_hgcm_call *call; void **bounce_bufs = NULL; @@ -615,7 +619,7 @@ int vbg_hgcm_call(struct vbg_dev *gdev, u32 client_id, u32 function, goto free_bounce_bufs; } - call = vbg_req_alloc(size, VMMDEVREQ_HGCM_CALL); + call = vbg_req_alloc(size, VMMDEVREQ_HGCM_CALL, requestor); if (!call) { ret = -ENOMEM; goto free_bounce_bufs; @@ -647,9 +651,9 @@ EXPORT_SYMBOL(vbg_hgcm_call); #ifdef CONFIG_COMPAT int vbg_hgcm_call32( - struct vbg_dev *gdev, u32 client_id, u32 function, u32 timeout_ms, - struct vmmdev_hgcm_function_parameter32 *parm32, u32 parm_count, - int *vbox_status) + struct vbg_dev *gdev, u32 requestor, u32 client_id, u32 function, + u32 timeout_ms, struct vmmdev_hgcm_function_parameter32 *parm32, + u32 parm_count, int *vbox_status) { struct vmmdev_hgcm_function_parameter *parm64 = NULL; u32 i, size; @@ -689,7 +693,7 @@ int vbg_hgcm_call32( goto out_free; } - ret = vbg_hgcm_call(gdev, client_id, function, timeout_ms, + ret = vbg_hgcm_call(gdev, requestor, client_id, function, timeout_ms, parm64, parm_count, vbox_status); if (ret < 0) goto out_free; diff --git a/drivers/virt/vboxguest/vboxguest_version.h b/drivers/virt/vboxguest/vboxguest_version.h index 77f0c8f8a231..84834dad38d5 100644 --- a/drivers/virt/vboxguest/vboxguest_version.h +++ b/drivers/virt/vboxguest/vboxguest_version.h @@ -9,11 +9,10 @@ #ifndef __VBOX_VERSION_H__ #define __VBOX_VERSION_H__ -/* Last synced October 4th 2017 */ -#define VBG_VERSION_MAJOR 5 -#define VBG_VERSION_MINOR 2 +#define VBG_VERSION_MAJOR 6 +#define VBG_VERSION_MINOR 0 #define VBG_VERSION_BUILD 0 -#define VBG_SVN_REV 68940 -#define VBG_VERSION_STRING "5.2.0" +#define VBG_SVN_REV 127566 +#define VBG_VERSION_STRING "6.0.0" #endif diff --git a/drivers/virt/vboxguest/vmmdev.h b/drivers/virt/vboxguest/vmmdev.h index 5e2ae978935d..6337b8d75d96 100644 --- a/drivers/virt/vboxguest/vmmdev.h +++ b/drivers/virt/vboxguest/vmmdev.h @@ -98,8 +98,8 @@ struct vmmdev_request_header { s32 rc; /** Reserved field no.1. MBZ. */ u32 reserved1; - /** Reserved field no.2. MBZ. */ - u32 reserved2; + /** IN: Requestor information (VMMDEV_REQUESTOR_*) */ + u32 requestor; }; VMMDEV_ASSERT_SIZE(vmmdev_request_header, 24); @@ -247,6 +247,8 @@ struct vmmdev_guest_info { }; VMMDEV_ASSERT_SIZE(vmmdev_guest_info, 24 + 8); +#define VMMDEV_GUEST_INFO2_ADDITIONS_FEATURES_REQUESTOR_INFO BIT(0) + /** struct vmmdev_guestinfo2 - Guest information report, version 2. */ struct vmmdev_guest_info2 { /** Header. */ @@ -259,7 +261,7 @@ struct vmmdev_guest_info2 { u32 additions_build; /** SVN revision. */ u32 additions_revision; - /** Feature mask, currently unused. */ + /** Feature mask. */ u32 additions_features; /** * The intentional meaning of this field was: diff --git a/include/linux/vbox_utils.h b/include/linux/vbox_utils.h index a240ed2a0372..ff56c443180c 100644 --- a/include/linux/vbox_utils.h +++ b/include/linux/vbox_utils.h @@ -24,15 +24,17 @@ __printf(1, 2) void vbg_debug(const char *fmt, ...); #define vbg_debug pr_debug #endif -int vbg_hgcm_connect(struct vbg_dev *gdev, +int vbg_hgcm_connect(struct vbg_dev *gdev, u32 requestor, struct vmmdev_hgcm_service_location *loc, u32 *client_id, int *vbox_status); -int vbg_hgcm_disconnect(struct vbg_dev *gdev, u32 client_id, int *vbox_status); +int vbg_hgcm_disconnect(struct vbg_dev *gdev, u32 requestor, + u32 client_id, int *vbox_status); -int vbg_hgcm_call(struct vbg_dev *gdev, u32 client_id, u32 function, - u32 timeout_ms, struct vmmdev_hgcm_function_parameter *parms, - u32 parm_count, int *vbox_status); +int vbg_hgcm_call(struct vbg_dev *gdev, u32 requestor, u32 client_id, + u32 function, u32 timeout_ms, + struct vmmdev_hgcm_function_parameter *parms, u32 parm_count, + int *vbox_status); /** * Convert a VirtualBox status code to a standard Linux kernel return value. diff --git a/include/uapi/linux/vbox_vmmdev_types.h b/include/uapi/linux/vbox_vmmdev_types.h index 0e68024f36c7..26f39816af14 100644 --- a/include/uapi/linux/vbox_vmmdev_types.h +++ b/include/uapi/linux/vbox_vmmdev_types.h @@ -102,6 +102,66 @@ enum vmmdev_request_type { #define VMMDEVREQ_HGCM_CALL VMMDEVREQ_HGCM_CALL32 #endif +/* vmmdev_request_header.requestor defines */ + +/* Requestor user not given. */ +#define VMMDEV_REQUESTOR_USR_NOT_GIVEN 0x00000000 +/* The kernel driver (vboxguest) is the requestor. */ +#define VMMDEV_REQUESTOR_USR_DRV 0x00000001 +/* Some other kernel driver is the requestor. */ +#define VMMDEV_REQUESTOR_USR_DRV_OTHER 0x00000002 +/* The root or a admin user is the requestor. */ +#define VMMDEV_REQUESTOR_USR_ROOT 0x00000003 +/* Regular joe user is making the request. */ +#define VMMDEV_REQUESTOR_USR_USER 0x00000006 +/* User classification mask. */ +#define VMMDEV_REQUESTOR_USR_MASK 0x00000007 + +/* Kernel mode request. Note this is 0, check for !USERMODE instead. */ +#define VMMDEV_REQUESTOR_KERNEL 0x00000000 +/* User mode request. */ +#define VMMDEV_REQUESTOR_USERMODE 0x00000008 +/* User or kernel mode classification mask. */ +#define VMMDEV_REQUESTOR_MODE_MASK 0x00000008 + +/* Don't know the physical console association of the requestor. */ +#define VMMDEV_REQUESTOR_CON_DONT_KNOW 0x00000000 +/* + * The request originates with a process that is NOT associated with the + * physical console. + */ +#define VMMDEV_REQUESTOR_CON_NO 0x00000010 +/* Requestor process is associated with the physical console. */ +#define VMMDEV_REQUESTOR_CON_YES 0x00000020 +/* Console classification mask. */ +#define VMMDEV_REQUESTOR_CON_MASK 0x00000030 + +/* Requestor is member of special VirtualBox user group. */ +#define VMMDEV_REQUESTOR_GRP_VBOX 0x00000080 + +/* Note: trust level is for windows guests only, linux always uses not-given */ +/* Requestor trust level: Unspecified */ +#define VMMDEV_REQUESTOR_TRUST_NOT_GIVEN 0x00000000 +/* Requestor trust level: Untrusted (SID S-1-16-0) */ +#define VMMDEV_REQUESTOR_TRUST_UNTRUSTED 0x00001000 +/* Requestor trust level: Untrusted (SID S-1-16-4096) */ +#define VMMDEV_REQUESTOR_TRUST_LOW 0x00002000 +/* Requestor trust level: Medium (SID S-1-16-8192) */ +#define VMMDEV_REQUESTOR_TRUST_MEDIUM 0x00003000 +/* Requestor trust level: Medium plus (SID S-1-16-8448) */ +#define VMMDEV_REQUESTOR_TRUST_MEDIUM_PLUS 0x00004000 +/* Requestor trust level: High (SID S-1-16-12288) */ +#define VMMDEV_REQUESTOR_TRUST_HIGH 0x00005000 +/* Requestor trust level: System (SID S-1-16-16384) */ +#define VMMDEV_REQUESTOR_TRUST_SYSTEM 0x00006000 +/* Requestor trust level >= Protected (SID S-1-16-20480, S-1-16-28672) */ +#define VMMDEV_REQUESTOR_TRUST_PROTECTED 0x00007000 +/* Requestor trust level mask */ +#define VMMDEV_REQUESTOR_TRUST_MASK 0x00007000 + +/* Requestor is using the less trusted user device node (/dev/vboxuser) */ +#define VMMDEV_REQUESTOR_USER_DEVICE 0x00008000 + /** HGCM service location types. */ enum vmmdev_hgcm_service_location_type { VMMDEV_HGCM_LOC_INVALID = 0, From daf5cc27eed99afdea8d96e71b89ba41f5406ef6 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 26 Mar 2019 01:38:58 +0000 Subject: [PATCH 360/468] ceph: fix use-after-free on symlink traversal free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory. Signed-off-by: Al Viro Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov --- fs/ceph/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index e3346628efe2..2d61ddda9bf5 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -524,6 +524,7 @@ static void ceph_i_callback(struct rcu_head *head) struct inode *inode = container_of(head, struct inode, i_rcu); struct ceph_inode_info *ci = ceph_inode(inode); + kfree(ci->i_symlink); kmem_cache_free(ceph_inode_cachep, ci); } @@ -566,7 +567,6 @@ void ceph_destroy_inode(struct inode *inode) } } - kfree(ci->i_symlink); while ((n = rb_first(&ci->i_fragtree)) != NULL) { frag = rb_entry(n, struct ceph_inode_frag, node); rb_erase(n, &ci->i_fragtree); From 9e0a17db517d83d568bb7fa983b54759d4e34f1f Mon Sep 17 00:00:00 2001 From: Chen Zhou Date: Wed, 27 Mar 2019 21:51:16 +0800 Subject: [PATCH 361/468] arm64: replace memblock_alloc_low with memblock_alloc If we use "crashkernel=Y[@X]" and the start address is above 4G, the arm64 kdump capture kernel may call memblock_alloc_low() failure in request_standard_resources(). Replacing memblock_alloc_low() with memblock_alloc(). [ 0.000000] MEMBLOCK configuration: [ 0.000000] memory size = 0x0000000040650000 reserved size = 0x0000000004db7f39 [ 0.000000] memory.cnt = 0x6 [ 0.000000] memory[0x0] [0x00000000395f0000-0x000000003968ffff], 0x00000000000a0000 bytes on node 0 flags: 0x4 [ 0.000000] memory[0x1] [0x0000000039730000-0x000000003973ffff], 0x0000000000010000 bytes on node 0 flags: 0x4 [ 0.000000] memory[0x2] [0x0000000039780000-0x000000003986ffff], 0x00000000000f0000 bytes on node 0 flags: 0x4 [ 0.000000] memory[0x3] [0x0000000039890000-0x0000000039d0ffff], 0x0000000000480000 bytes on node 0 flags: 0x4 [ 0.000000] memory[0x4] [0x000000003ed00000-0x000000003ed2ffff], 0x0000000000030000 bytes on node 0 flags: 0x4 [ 0.000000] memory[0x5] [0x0000002040000000-0x000000207fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0 [ 0.000000] reserved.cnt = 0x7 [ 0.000000] reserved[0x0] [0x0000002040080000-0x0000002041c4dfff], 0x0000000001bce000 bytes flags: 0x0 [ 0.000000] reserved[0x1] [0x0000002041c53000-0x0000002042c203f8], 0x0000000000fcd3f9 bytes flags: 0x0 [ 0.000000] reserved[0x2] [0x000000207da00000-0x000000207dbfffff], 0x0000000000200000 bytes flags: 0x0 [ 0.000000] reserved[0x3] [0x000000207ddef000-0x000000207fbfffff], 0x0000000001e11000 bytes flags: 0x0 [ 0.000000] reserved[0x4] [0x000000207fdf2b00-0x000000207fdfc03f], 0x0000000000009540 bytes flags: 0x0 [ 0.000000] reserved[0x5] [0x000000207fdfd000-0x000000207ffff3ff], 0x0000000000202400 bytes flags: 0x0 [ 0.000000] reserved[0x6] [0x000000207ffffe00-0x000000207fffffff], 0x0000000000000200 bytes flags: 0x0 [ 0.000000] Kernel panic - not syncing: request_standard_resources: Failed to allocate 384 bytes [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-next-20190321+ #4 [ 0.000000] Call trace: [ 0.000000] dump_backtrace+0x0/0x188 [ 0.000000] show_stack+0x24/0x30 [ 0.000000] dump_stack+0xa8/0xcc [ 0.000000] panic+0x14c/0x31c [ 0.000000] setup_arch+0x2b0/0x5e0 [ 0.000000] start_kernel+0x90/0x52c [ 0.000000] ---[ end Kernel panic - not syncing: request_standard_resources: Failed to allocate 384 bytes ]--- Link: https://www.spinics.net/lists/arm-kernel/msg715293.html Signed-off-by: Chen Zhou Signed-off-by: Catalin Marinas --- arch/arm64/kernel/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index f8482fe5a190..413d566405d1 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -217,7 +217,7 @@ static void __init request_standard_resources(void) num_standard_resources = memblock.memory.cnt; res_size = num_standard_resources * sizeof(*standard_resources); - standard_resources = memblock_alloc_low(res_size, SMP_CACHE_BYTES); + standard_resources = memblock_alloc(res_size, SMP_CACHE_BYTES); if (!standard_resources) panic("%s: Failed to allocate %zu bytes\n", __func__, res_size); From 70fc085c5015c54a7b8742a45fc9ab05d6da90da Mon Sep 17 00:00:00 2001 From: zhengbin Date: Fri, 22 Mar 2019 10:56:46 +0800 Subject: [PATCH 362/468] scsi: core: Run queue when state is set to running after being blocked Use dd to test a SCSI device: 1. echo "blocked" >/sys/block/sda/device/state 2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 3. echo "running" >/sys/block/sda/device/state dd should finish this work after step 3, but it hangs. After step2, the call chain is this: blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transition it to BLK_STS_DEV_RESOURCE which means that driver can guarantee that IO dispatch will be triggered in future when the resource is available. Need to follow the rule if we set the device state to running. [mkp: tweaked commit description and code comment as suggested by Bart] Signed-off-by: zhengbin Reviewed-by: Ming Lei Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen --- drivers/scsi/scsi_sysfs.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 6a9040faed00..3b119ca0cc0c 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -771,6 +771,12 @@ store_state_field(struct device *dev, struct device_attribute *attr, mutex_lock(&sdev->state_mutex); ret = scsi_device_set_state(sdev, state); + /* + * If the device state changes to SDEV_RUNNING, we need to run + * the queue to avoid I/O hang. + */ + if (ret == 0 && state == SDEV_RUNNING) + blk_mq_run_hw_queues(sdev->request_queue, true); mutex_unlock(&sdev->state_mutex); return ret == 0 ? count : -EINVAL; From c14a57264399efd39514a2329c591a4b954246d8 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 25 Mar 2019 10:01:46 -0700 Subject: [PATCH 363/468] scsi: sd: Fix a race between closing an sd device and sd I/O The scsi_end_request() function calls scsi_cmd_to_driver() indirectly and hence needs the disk->private_data pointer. Avoid that that pointer is cleared before all affected I/O requests have finished. This patch avoids that the following crash occurs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: scsi_mq_uninit_cmd+0x1c/0x30 scsi_end_request+0x7c/0x1b8 scsi_io_completion+0x464/0x668 scsi_finish_command+0xbc/0x160 scsi_eh_flush_done_q+0x10c/0x170 sas_scsi_recover_host+0x84c/0xa98 [libsas] scsi_error_handler+0x140/0x5b0 kthread+0x100/0x12c ret_from_fork+0x10/0x18 Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Jason Yan Cc: Signed-off-by: Bart Van Assche Reported-by: Jason Yan Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen --- drivers/scsi/sd.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 251db30d0882..3ffa34dc2fd5 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1415,11 +1415,6 @@ static void sd_release(struct gendisk *disk, fmode_t mode) scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW); } - /* - * XXX and what if there are packets in flight and this close() - * XXX is followed by a "rmmod sd_mod"? - */ - scsi_disk_put(sdkp); } @@ -3505,9 +3500,21 @@ static void scsi_disk_release(struct device *dev) { struct scsi_disk *sdkp = to_scsi_disk(dev); struct gendisk *disk = sdkp->disk; - + struct request_queue *q = disk->queue; + ida_free(&sd_index_ida, sdkp->index); + /* + * Wait until all requests that are in progress have completed. + * This is necessary to avoid that e.g. scsi_end_request() crashes + * due to clearing the disk->private_data pointer. Wait from inside + * scsi_disk_release() instead of from sd_release() to avoid that + * freezing and unfreezing the request queue affects user space I/O + * in case multiple processes open a /dev/sd... node concurrently. + */ + blk_mq_freeze_queue(q); + blk_mq_unfreeze_queue(q); + disk->private_data = NULL; put_disk(disk); put_device(&sdkp->device->sdev_gendev); From 1d5de5bd311be7cd54f02f7cd164f0349a75c876 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Wed, 27 Mar 2019 12:11:52 -0400 Subject: [PATCH 364/468] scsi: sd: Quiesce warning if device does not report optimal I/O size Commit a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of physical block size") split one conditional into several separate statements in an effort to provide more accurate warning messages when a device reports a nonsensical value. However, this reorganization accidentally dropped the precondition of the reported value being larger than zero. This lead to a warning getting emitted on devices that do not report an optimal I/O size at all. Remain silent if a device does not report an optimal I/O size. Fixes: a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of physical block size") Cc: Randy Dunlap Cc: Reported-by: Hussam Al-Tayeb Tested-by: Hussam Al-Tayeb Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen --- drivers/scsi/sd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 3ffa34dc2fd5..2b2bc4b49d78 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3071,6 +3071,9 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp, unsigned int opt_xfer_bytes = logical_to_bytes(sdp, sdkp->opt_xfer_blocks); + if (sdkp->opt_xfer_blocks == 0) + return false; + if (sdkp->opt_xfer_blocks > dev_max) { sd_first_printk(KERN_WARNING, sdkp, "Optimal transfer size %u logical blocks " \ From fe67888fc007a76b81e37da23ce5bd8fb95890b0 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Tue, 26 Mar 2019 14:36:58 +0100 Subject: [PATCH 365/468] scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host An already deleted SCSI device can exist on the Scsi_Host and remain there because something still holds a reference. A new SCSI device with the same H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created. When we try to unblock an rport, we still find the deleted SCSI device and return early because the zfcp_scsi_dev of that SCSI device is not ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if the new proper SCSI device would be in good state. Therefore, skip deleted SCSI devices when iterating the sdevs of the shost. [cf. __scsi_device_lookup{_by_target}() or scsi_device_get()] The following abbreviated trace sequence can indicate such problem: Area : REC Tag : ersfs_3 LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x40000000 not ZFCP_STATUS_COMMON_UNBLOCKED Ready count : n not incremented yet Running count : 0x00000000 ERP want : 0x01 ERP need : 0xc1 ZFCP_ERP_ACTION_NONE Area : REC Tag : ersfs_3 LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x41000000 Ready count : n+1 Running count : 0x00000000 ERP want : 0x01 ERP need : 0x01 ... Area : REC Level : 4 only with increased trace level Tag : ertru_l LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x40000000 Request ID : 0x0000000000000000 ERP status : 0x01800000 ERP step : 0x1000 ERP action : 0x01 ERP count : 0x00 NOT followed by a trace record with tag "scpaddy" for WWPN 0x50050763031bd327. Signed-off-by: Steffen Maier Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: #2.6.32+ Reviewed-by: Jens Remus Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen --- drivers/s390/scsi/zfcp_erp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index 744a64680d5b..c0b2348d7ce6 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -1341,6 +1341,9 @@ static void zfcp_erp_try_rport_unblock(struct zfcp_port *port) struct zfcp_scsi_dev *zsdev = sdev_to_zfcp(sdev); int lun_status; + if (sdev->sdev_state == SDEV_DEL || + sdev->sdev_state == SDEV_CANCEL) + continue; if (zsdev->port != port) continue; /* LUN under port of interest */ From 242ec1455151267fe35a0834aa9038e4c4670884 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Tue, 26 Mar 2019 14:36:59 +0100 Subject: [PATCH 366/468] scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices Suppose more than one non-NPIV FCP device is active on the same channel. Send I/O to storage and have some of the pending I/O run into a SCSI command timeout, e.g. due to bit errors on the fibre. Now the error situation stops. However, we saw FCP requests continue to timeout in the channel. The abort will be successful, but the subsequent TUR fails. Scsi_eh starts. The LUN reset fails. The target reset fails. The host reset only did an FCP device recovery. However, for non-NPIV FCP devices, this does not close and reopen ports on the SAN-side if other non-NPIV FCP device(s) share the same open ports. In order to resolve the continuing FCP request timeouts, we need to explicitly close and reopen ports on the SAN-side. This was missing since the beginning of zfcp in v2.6.0 history commit ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter."). Note: The FSF requests for forced port reopen could run into FSF request timeouts due to other reasons. This would trigger an internal FCP device recovery. Pending forced port reopen recoveries would get dismissed. So some ports might not get fully reopened during this host reset handler. However, subsequent I/O would trigger the above described escalation and eventually all ports would be forced reopen to resolve any continuing FCP request timeouts due to earlier bit errors. Signed-off-by: Steffen Maier Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: #3.0+ Reviewed-by: Jens Remus Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen --- drivers/s390/scsi/zfcp_erp.c | 14 ++++++++++++++ drivers/s390/scsi/zfcp_ext.h | 2 ++ drivers/s390/scsi/zfcp_scsi.c | 4 ++++ 3 files changed, 20 insertions(+) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index c0b2348d7ce6..e8fc28dba8df 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -624,6 +624,20 @@ static void zfcp_erp_strategy_memwait(struct zfcp_erp_action *erp_action) add_timer(&erp_action->timer); } +void zfcp_erp_port_forced_reopen_all(struct zfcp_adapter *adapter, + int clear, char *dbftag) +{ + unsigned long flags; + struct zfcp_port *port; + + write_lock_irqsave(&adapter->erp_lock, flags); + read_lock(&adapter->port_list_lock); + list_for_each_entry(port, &adapter->port_list, list) + _zfcp_erp_port_forced_reopen(port, clear, dbftag); + read_unlock(&adapter->port_list_lock); + write_unlock_irqrestore(&adapter->erp_lock, flags); +} + static void _zfcp_erp_port_reopen_all(struct zfcp_adapter *adapter, int clear, char *dbftag) { diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index 3fce47b0b21b..c6acca521ffe 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -70,6 +70,8 @@ extern void zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *dbftag); extern void zfcp_erp_port_shutdown(struct zfcp_port *, int, char *); extern void zfcp_erp_port_forced_reopen(struct zfcp_port *, int, char *); +extern void zfcp_erp_port_forced_reopen_all(struct zfcp_adapter *adapter, + int clear, char *dbftag); extern void zfcp_erp_set_lun_status(struct scsi_device *, u32); extern void zfcp_erp_clear_lun_status(struct scsi_device *, u32); extern void zfcp_erp_lun_reopen(struct scsi_device *, int, char *); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index f4f6a07c5222..221d0dfb8493 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -368,6 +368,10 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt) struct zfcp_adapter *adapter = zfcp_sdev->port->adapter; int ret = SUCCESS, fc_ret; + if (!(adapter->connection_features & FSF_FEATURE_NPIV_MODE)) { + zfcp_erp_port_forced_reopen_all(adapter, 0, "schrh_p"); + zfcp_erp_wait(adapter); + } zfcp_erp_adapter_reopen(adapter, 0, "schrh_1"); zfcp_erp_wait(adapter); fc_ret = fc_block_scsi_eh(scpnt); From c8206579175c34a2546de8a74262456278a7795a Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Tue, 26 Mar 2019 14:37:00 +0100 Subject: [PATCH 367/468] scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN If an incoming ELS of type RSCN contains more than one element, zfcp suboptimally causes repeated erp trigger NOP trace records for each previously failed port. These could be ports that went away. It loops over each RSCN element, and for each of those in an inner loop over all zfcp_ports. The trigger to recover failed ports should be just the reception of some RSCN, no matter how many elements it has. So we can loop over failed ports separately, and only then loop over each RSCN element to handle the non-failed ports. The call chain was: zfcp_fc_incoming_rscn for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <=== In order the reduce the "flooding" of the REC trace area in such cases, we factor out handling the failed ports to be outside of the entries loop: zfcp_fc_incoming_rscn if (no_entries > 1) <=== list_for_each_entry(port, &adapter->port_list, list) <=== if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <=== for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link Abbreviated example trace records before this code change: Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x02 Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x00 NOP => superfluous trace record The last trace entry repeats if there are more than 2 RSCN elements. Signed-off-by: Steffen Maier Reviewed-by: Benjamin Block Reviewed-by: Jens Remus Signed-off-by: Martin K. Petersen --- drivers/s390/scsi/zfcp_fc.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c index db00b5e3abbe..33eddb02ee30 100644 --- a/drivers/s390/scsi/zfcp_fc.c +++ b/drivers/s390/scsi/zfcp_fc.c @@ -239,10 +239,6 @@ static void _zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req, u32 range, list_for_each_entry(port, &adapter->port_list, list) { if ((port->d_id & range) == (ntoh24(page->rscn_fid) & range)) zfcp_fc_test_link(port); - if (!port->d_id) - zfcp_erp_port_reopen(port, - ZFCP_STATUS_COMMON_ERP_FAILED, - "fcrscn1"); } read_unlock_irqrestore(&adapter->port_list_lock, flags); } @@ -250,6 +246,7 @@ static void _zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req, u32 range, static void zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req) { struct fsf_status_read_buffer *status_buffer = (void *)fsf_req->data; + struct zfcp_adapter *adapter = fsf_req->adapter; struct fc_els_rscn *head; struct fc_els_rscn_page *page; u16 i; @@ -263,6 +260,22 @@ static void zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req) no_entries = be16_to_cpu(head->rscn_plen) / sizeof(struct fc_els_rscn_page); + if (no_entries > 1) { + /* handle failed ports */ + unsigned long flags; + struct zfcp_port *port; + + read_lock_irqsave(&adapter->port_list_lock, flags); + list_for_each_entry(port, &adapter->port_list, list) { + if (port->d_id) + continue; + zfcp_erp_port_reopen(port, + ZFCP_STATUS_COMMON_ERP_FAILED, + "fcrscn1"); + } + read_unlock_irqrestore(&adapter->port_list_lock, flags); + } + for (i = 1; i < no_entries; i++) { /* skip head and start with 1st element */ page++; From 6dc6a944d58aea3d9de3828318b0fffdb60a7097 Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Wed, 20 Mar 2019 14:56:51 -0500 Subject: [PATCH 368/468] scsi: ibmvfc: Remove "failed" from logged errors The text of messages logged with ibmvfc_log_error() always contain the term "failed". In the case of cancelled commands during EH they are reported back by the VIOS using error codes. This can be confusing to somebody looking at these log messages as to whether a command was successfully cancelled. The following real log message for example it is unclear if the transaction was actaully cancelled. <6>sd 0:0:1:1: Cancelling outstanding commands. <3>sd 0:0:1:1: [sde] Command (28) failed: transaction cancelled (2:6) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0 Remove prefixing of "failed" to all error logged messages. The ibmvfc_log_error() function translates the returned error/status codes to a human readable message already. Signed-off-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index dbaa4f131433..c3ce27039552 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1494,7 +1494,7 @@ static void ibmvfc_log_error(struct ibmvfc_event *evt) if (rsp->flags & FCP_RSP_LEN_VALID) rsp_code = rsp->data.info.rsp_code; - scmd_printk(KERN_ERR, cmnd, "Command (%02X) failed: %s (%x:%x) " + scmd_printk(KERN_ERR, cmnd, "Command (%02X) : %s (%x:%x) " "flags: %x fcp_rsp: %x, resid=%d, scsi_status: %x\n", cmnd->cmnd[0], err, vfc_cmd->status, vfc_cmd->error, rsp->flags, rsp_code, scsi_get_resid(cmnd), rsp->scsi_status); From 95237c25d8d08ebc451dd2d793f7e765f57b0c9f Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Wed, 20 Mar 2019 14:56:52 -0500 Subject: [PATCH 369/468] scsi: ibmvfc: Add failed PRLI to cmd_status lookup array The VIOS uses the SCSI_ERROR class to report PRLI failures. These errors are indicated with the combination of a IBMVFC_FC_SCSI_ERROR return status and 0x8000 error code. Add these codes to cmd_status[] with appropriate human readable error message. Signed-off-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index c3ce27039552..18ee2a8ec3d5 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -139,6 +139,7 @@ static const struct { { IBMVFC_FC_FAILURE, IBMVFC_VENDOR_SPECIFIC, DID_ERROR, 1, 1, "vendor specific" }, { IBMVFC_FC_SCSI_ERROR, 0, DID_OK, 1, 0, "SCSI error" }, + { IBMVFC_FC_SCSI_ERROR, IBMVFC_COMMAND_FAILED, DID_ERROR, 0, 1, "PRLI to device failed." }, }; static void ibmvfc_npiv_login(struct ibmvfc_host *); From 3e6f7de43f4960fba8322b16531b0d6624a9322d Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Wed, 20 Mar 2019 14:56:53 -0500 Subject: [PATCH 370/468] scsi: ibmvfc: Byte swap status and error codes when logging Status and error codes are returned in big endian from the VIOS. The values are translated into a human readable format when logged, but the values are also logged. This patch byte swaps those values so that they are consistent between BE and LE platforms. Signed-off-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 18ee2a8ec3d5..33dda4d32f65 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1497,7 +1497,7 @@ static void ibmvfc_log_error(struct ibmvfc_event *evt) scmd_printk(KERN_ERR, cmnd, "Command (%02X) : %s (%x:%x) " "flags: %x fcp_rsp: %x, resid=%d, scsi_status: %x\n", - cmnd->cmnd[0], err, vfc_cmd->status, vfc_cmd->error, + cmnd->cmnd[0], err, be16_to_cpu(vfc_cmd->status), be16_to_cpu(vfc_cmd->error), rsp->flags, rsp_code, scsi_get_resid(cmnd), rsp->scsi_status); } @@ -2023,7 +2023,7 @@ static int ibmvfc_reset_device(struct scsi_device *sdev, int type, char *desc) sdev_printk(KERN_ERR, sdev, "%s reset failed: %s (%x:%x) " "flags: %x fcp_rsp: %x, scsi_status: %x\n", desc, ibmvfc_get_cmd_error(be16_to_cpu(rsp_iu.cmd.status), be16_to_cpu(rsp_iu.cmd.error)), - rsp_iu.cmd.status, rsp_iu.cmd.error, fc_rsp->flags, rsp_code, + be16_to_cpu(rsp_iu.cmd.status), be16_to_cpu(rsp_iu.cmd.error), fc_rsp->flags, rsp_code, fc_rsp->scsi_status); rsp_rc = -EIO; } else @@ -2382,7 +2382,7 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev) sdev_printk(KERN_ERR, sdev, "Abort failed: %s (%x:%x) " "flags: %x fcp_rsp: %x, scsi_status: %x\n", ibmvfc_get_cmd_error(be16_to_cpu(rsp_iu.cmd.status), be16_to_cpu(rsp_iu.cmd.error)), - rsp_iu.cmd.status, rsp_iu.cmd.error, fc_rsp->flags, rsp_code, + be16_to_cpu(rsp_iu.cmd.status), be16_to_cpu(rsp_iu.cmd.error), fc_rsp->flags, rsp_code, fc_rsp->scsi_status); rsp_rc = -EIO; } else @@ -3349,7 +3349,7 @@ static void ibmvfc_tgt_prli_done(struct ibmvfc_event *evt) tgt_log(tgt, level, "Process Login failed: %s (%x:%x) rc=0x%02X\n", ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), - rsp->status, rsp->error, status); + be16_to_cpu(rsp->status), be16_to_cpu(rsp->error), status); break; } @@ -3447,9 +3447,10 @@ static void ibmvfc_tgt_plogi_done(struct ibmvfc_event *evt) ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_DEL_RPORT); tgt_log(tgt, level, "Port Login failed: %s (%x:%x) %s (%x) %s (%x) rc=0x%02X\n", - ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), rsp->status, rsp->error, - ibmvfc_get_fc_type(be16_to_cpu(rsp->fc_type)), rsp->fc_type, - ibmvfc_get_ls_explain(be16_to_cpu(rsp->fc_explain)), rsp->fc_explain, status); + ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), + be16_to_cpu(rsp->status), be16_to_cpu(rsp->error), + ibmvfc_get_fc_type(be16_to_cpu(rsp->fc_type)), be16_to_cpu(rsp->fc_type), + ibmvfc_get_ls_explain(be16_to_cpu(rsp->fc_explain)), be16_to_cpu(rsp->fc_explain), status); break; } @@ -3620,7 +3621,7 @@ static void ibmvfc_tgt_adisc_done(struct ibmvfc_event *evt) fc_explain = (be32_to_cpu(mad->fc_iu.response[1]) & 0x0000ff00) >> 8; tgt_info(tgt, "ADISC failed: %s (%x:%x) %s (%x) %s (%x) rc=0x%02X\n", ibmvfc_get_cmd_error(be16_to_cpu(mad->iu.status), be16_to_cpu(mad->iu.error)), - mad->iu.status, mad->iu.error, + be16_to_cpu(mad->iu.status), be16_to_cpu(mad->iu.error), ibmvfc_get_fc_type(fc_reason), fc_reason, ibmvfc_get_ls_explain(fc_explain), fc_explain, status); break; @@ -3832,9 +3833,10 @@ static void ibmvfc_tgt_query_target_done(struct ibmvfc_event *evt) tgt_log(tgt, level, "Query Target failed: %s (%x:%x) %s (%x) %s (%x) rc=0x%02X\n", ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), - rsp->status, rsp->error, ibmvfc_get_fc_type(be16_to_cpu(rsp->fc_type)), - rsp->fc_type, ibmvfc_get_gs_explain(be16_to_cpu(rsp->fc_explain)), - rsp->fc_explain, status); + be16_to_cpu(rsp->status), be16_to_cpu(rsp->error), + ibmvfc_get_fc_type(be16_to_cpu(rsp->fc_type)), be16_to_cpu(rsp->fc_type), + ibmvfc_get_gs_explain(be16_to_cpu(rsp->fc_explain)), be16_to_cpu(rsp->fc_explain), + status); break; } @@ -3960,7 +3962,7 @@ static void ibmvfc_discover_targets_done(struct ibmvfc_event *evt) level += ibmvfc_retry_host_init(vhost); ibmvfc_log(vhost, level, "Discover Targets failed: %s (%x:%x)\n", ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), - rsp->status, rsp->error); + be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)); break; case IBMVFC_MAD_DRIVER_FAILED: break; @@ -4025,7 +4027,7 @@ static void ibmvfc_npiv_login_done(struct ibmvfc_event *evt) ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); ibmvfc_log(vhost, level, "NPIV Login failed: %s (%x:%x)\n", ibmvfc_get_cmd_error(be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)), - rsp->status, rsp->error); + be16_to_cpu(rsp->status), be16_to_cpu(rsp->error)); ibmvfc_free_event(evt); return; case IBMVFC_MAD_CRQ_ERROR: From d6e2635b9cf7982102750c5d9e4ba1474afa0981 Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Wed, 20 Mar 2019 14:56:54 -0500 Subject: [PATCH 371/468] scsi: ibmvfc: Clean up transport events No change to functionality. Simply make transport event messages a little clearer, and rework CRQ format enums such that we have separate enums for INIT messages and XPORT events. [mkp: typo] Signed-off-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 8 +++++--- drivers/scsi/ibmvscsi/ibmvfc.h | 7 ++++++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 33dda4d32f65..3ad997ac3510 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2756,16 +2756,18 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost) ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_NONE); if (crq->format == IBMVFC_PARTITION_MIGRATED) { /* We need to re-setup the interpartition connection */ - dev_info(vhost->dev, "Re-enabling adapter\n"); + dev_info(vhost->dev, "Partition migrated, Re-enabling adapter\n"); vhost->client_migrated = 1; ibmvfc_purge_requests(vhost, DID_REQUEUE); ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN); ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_REENABLE); - } else { - dev_err(vhost->dev, "Virtual adapter failed (rc=%d)\n", crq->format); + } else if (crq->format == IBMVFC_PARTNER_FAILED || crq->format == IBMVFC_PARTNER_DEREGISTER) { + dev_err(vhost->dev, "Host partner adapter deregistered or failed (rc=%d)\n", crq->format); ibmvfc_purge_requests(vhost, DID_ERROR); ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN); ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_RESET); + } else { + dev_err(vhost->dev, "Received unknown transport event from partner (rc=%d)\n", crq->format); } return; case IBMVFC_CRQ_CMD_RSP: diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index b81a53c4a9a8..459cc288ba1d 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -78,9 +78,14 @@ enum ibmvfc_crq_valid { IBMVFC_CRQ_XPORT_EVENT = 0xFF, }; -enum ibmvfc_crq_format { +enum ibmvfc_crq_init_msg { IBMVFC_CRQ_INIT = 0x01, IBMVFC_CRQ_INIT_COMPLETE = 0x02, +}; + +enum ibmvfc_crq_xport_evts { + IBMVFC_PARTNER_FAILED = 0x01, + IBMVFC_PARTNER_DEREGISTER = 0x02, IBMVFC_PARTITION_MIGRATED = 0x06, }; From a595ecdd5f60b2d93863cebb07eec7f935839b54 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 27 Mar 2019 10:11:14 +0900 Subject: [PATCH 372/468] USB: serial: cp210x: add new device id Lorenz Messtechnik has a device that is controlled by the cp210x driver, so add the device id to the driver. The device id was provided by Silicon-Labs for the devices from this vendor. Reported-by: Uli Signed-off-by: Greg Kroah-Hartman Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index fffe23ab0189..979bef9bfb6b 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -80,6 +80,7 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x804E) }, /* Software Bisque Paramount ME build-in converter */ { USB_DEVICE(0x10C4, 0x8053) }, /* Enfora EDG1228 */ { USB_DEVICE(0x10C4, 0x8054) }, /* Enfora GSM2228 */ + { USB_DEVICE(0x10C4, 0x8056) }, /* Lorenz Messtechnik devices */ { USB_DEVICE(0x10C4, 0x8066) }, /* Argussoft In-System Programmer */ { USB_DEVICE(0x10C4, 0x806F) }, /* IMS USB to RS422 Converter Cable */ { USB_DEVICE(0x10C4, 0x807A) }, /* Crumb128 board */ From 84f3b43f7378b98b7e3096d5499de75183d4347c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Wed, 27 Mar 2019 15:25:32 +0100 Subject: [PATCH 373/468] USB: serial: option: add Olicard 600 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is a Qualcomm based device with a QMI function on interface 4. It is mode switched from 2020:2030 using a standard eject message. T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2031 Rev= 2.32 S: Manufacturer=Mobile Connect S: Product=Mobile Connect S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us Cc: stable@vger.kernel.org Signed-off-by: Bjørn Mork [ johan: use tabs to align comments in adjacent lines ] Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index db5ece767d59..83869065b802 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1945,10 +1945,12 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(0x2001, 0x7e35, 0xff), /* D-Link DWM-222 */ .driver_info = RSVD(4) }, - { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e01, 0xff, 0xff, 0xff) }, /* D-Link DWM-152/C1 */ - { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e02, 0xff, 0xff, 0xff) }, /* D-Link DWM-156/C1 */ - { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x7e11, 0xff, 0xff, 0xff) }, /* D-Link DWM-156/A3 */ - { USB_DEVICE_INTERFACE_CLASS(0x2020, 0x4000, 0xff) }, /* OLICARD300 - MT6225 */ + { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e01, 0xff, 0xff, 0xff) }, /* D-Link DWM-152/C1 */ + { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e02, 0xff, 0xff, 0xff) }, /* D-Link DWM-156/C1 */ + { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x7e11, 0xff, 0xff, 0xff) }, /* D-Link DWM-156/A3 */ + { USB_DEVICE_INTERFACE_CLASS(0x2020, 0x2031, 0xff), /* Olicard 600 */ + .driver_info = RSVD(4) }, + { USB_DEVICE_INTERFACE_CLASS(0x2020, 0x4000, 0xff) }, /* OLICARD300 - MT6225 */ { USB_DEVICE(INOVIA_VENDOR_ID, INOVIA_SEW858) }, { USB_DEVICE(VIATELECOM_VENDOR_ID, VIATELECOM_PRODUCT_CDS7) }, { USB_DEVICE_AND_INTERFACE_INFO(WETELECOM_VENDOR_ID, WETELECOM_PRODUCT_WMD200, 0xff, 0xff, 0xff) }, From b6ffdf27f3d4f1e9af56effe6f86989170d71e95 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Mon, 18 Mar 2019 15:50:27 +0100 Subject: [PATCH 374/468] s390/cpumf: Fix warning from check_processor_id Function __hw_perf_event_init() used a CPU variable without ensuring CPU preemption has been disabled. This caused the following warning in the kernel log: [ 7.277085] BUG: using smp_processor_id() in preemptible [00000000] code: cf-csdiag/1892 [ 7.277111] caller is cf_diag_event_init+0x13a/0x338 [ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted 5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug #1 [ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR) [ 7.277139] Call Trace: [ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0) [ 7.277161] [<0000000000b7a71a>] dump_stack+0x92/0xd0 [ 7.277174] [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100 [ 7.277183] [<00000000001228aa>] cf_diag_event_init+0x13a/0x338 [ 7.277195] [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0 [ 7.277204] [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0 [ 7.277214] [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50 [ 7.277224] [<0000000000b9e8f0>] system_call+0xdc/0x2d8 [ 7.277233] 2 locks held by cf-csdiag/1892: [ 7.277241] #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.}, at: __s390x_sys_perf_event_open+0xd2e/0xd50 [ 7.277257] #1: 00000000363b11bd (&pmus_srcu){....}, at: perf_event_alloc+0x52e/0xce0 The variable is now accessed in proper context. Use get_cpu_var()/put_cpu_var() pair to disable preemption during access. As the hardware authorization settings apply to all CPUs, it does not matter which CPU is used to check the authorization setting. Remove the event->count assignment. It is not needed as function perf_event_alloc() allocates memory for the event with kzalloc() and thus count is already set to zero. Fixes: fe5908bccc56 ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace") Signed-off-by: Thomas Richter Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/perf_cpum_cf_diag.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/s390/kernel/perf_cpum_cf_diag.c b/arch/s390/kernel/perf_cpum_cf_diag.c index c6fad208c2fa..b6854812d2ed 100644 --- a/arch/s390/kernel/perf_cpum_cf_diag.c +++ b/arch/s390/kernel/perf_cpum_cf_diag.c @@ -196,23 +196,30 @@ static void cf_diag_perf_event_destroy(struct perf_event *event) */ static int __hw_perf_event_init(struct perf_event *event) { - struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events); struct perf_event_attr *attr = &event->attr; + struct cpu_cf_events *cpuhw; enum cpumf_ctr_set i; int err = 0; - debug_sprintf_event(cf_diag_dbg, 5, - "%s event %p cpu %d authorized %#x\n", __func__, - event, event->cpu, cpuhw->info.auth_ctl); + debug_sprintf_event(cf_diag_dbg, 5, "%s event %p cpu %d\n", __func__, + event, event->cpu); event->hw.config = attr->config; event->hw.config_base = 0; - local64_set(&event->count, 0); - /* Add all authorized counter sets to config_base */ + /* Add all authorized counter sets to config_base. The + * the hardware init function is either called per-cpu or just once + * for all CPUS (event->cpu == -1). This depends on the whether + * counting is started for all CPUs or on a per workload base where + * the perf event moves from one CPU to another CPU. + * Checking the authorization on any CPU is fine as the hardware + * applies the same authorization settings to all CPUs. + */ + cpuhw = &get_cpu_var(cpu_cf_events); for (i = CPUMF_CTR_SET_BASIC; i < CPUMF_CTR_SET_MAX; ++i) if (cpuhw->info.auth_ctl & cpumf_ctr_ctl[i]) event->hw.config_base |= cpumf_ctr_ctl[i]; + put_cpu_var(cpu_cf_events); /* No authorized counter sets, nothing to count/sample */ if (!event->hw.config_base) { From 31d4c528cea4023cf36f6148c03bb960cedefeef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Vincent=20Stehl=C3=A9?= Date: Wed, 27 Mar 2019 23:06:42 +0100 Subject: [PATCH 375/468] cpufreq: scpi: Fix use after free MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Free the priv structure only after we are done using it. Fixes: 1690d8bb91e370ab ("cpufreq: scpi/scmi: Fix freeing of dynamic OPPs") Signed-off-by: Vincent Stehlé Cc: 4.20+ # 4.20+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/scpi-cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c index 3f49427766b8..2b51e0718c9f 100644 --- a/drivers/cpufreq/scpi-cpufreq.c +++ b/drivers/cpufreq/scpi-cpufreq.c @@ -189,8 +189,8 @@ static int scpi_cpufreq_exit(struct cpufreq_policy *policy) clk_put(priv->clk); dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); - kfree(priv); dev_pm_opp_remove_all_dynamic(priv->cpu_dev); + kfree(priv); return 0; } From 056d28d135bca0b1d0908990338e00e9dadaf057 Mon Sep 17 00:00:00 2001 From: Rolf Eike Beer Date: Tue, 26 Mar 2019 12:48:39 -0500 Subject: [PATCH 376/468] objtool: Query pkg-config for libelf location If it is not in the default location, compilation fails at several points. Signed-off-by: Rolf Eike Beer Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/91a25e992566a7968fedc89ec80e7f4c83ad0548.1553622500.git.jpoimboe@redhat.com --- Makefile | 4 +++- tools/objtool/Makefile | 7 +++++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index c0a34064c574..eef3d3f87c3b 100644 --- a/Makefile +++ b/Makefile @@ -950,9 +950,11 @@ mod_sign_cmd = true endif export mod_sign_cmd +HOST_LIBELF_LIBS = $(shell pkg-config libelf --libs 2>/dev/null || echo -lelf) + ifdef CONFIG_STACK_VALIDATION has_libelf := $(call try-run,\ - echo "int main() {}" | $(HOSTCC) -xc -o /dev/null -lelf -,1,0) + echo "int main() {}" | $(HOSTCC) -xc -o /dev/null $(HOST_LIBELF_LIBS) -,1,0) ifeq ($(has_libelf),1) objtool_target := tools/objtool FORCE else diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index c9d038f91af6..53f8be0f4a1f 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -25,14 +25,17 @@ LIBSUBCMD = $(LIBSUBCMD_OUTPUT)libsubcmd.a OBJTOOL := $(OUTPUT)objtool OBJTOOL_IN := $(OBJTOOL)-in.o +LIBELF_FLAGS := $(shell pkg-config libelf --cflags 2>/dev/null) +LIBELF_LIBS := $(shell pkg-config libelf --libs 2>/dev/null || echo -lelf) + all: $(OBJTOOL) INCLUDES := -I$(srctree)/tools/include \ -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi \ -I$(srctree)/tools/objtool/arch/$(ARCH)/include WARNINGS := $(EXTRA_WARNINGS) -Wno-switch-default -Wno-switch-enum -Wno-packed -CFLAGS += -Werror $(WARNINGS) $(KBUILD_HOSTCFLAGS) -g $(INCLUDES) -LDFLAGS += -lelf $(LIBSUBCMD) $(KBUILD_HOSTLDFLAGS) +CFLAGS += -Werror $(WARNINGS) $(KBUILD_HOSTCFLAGS) -g $(INCLUDES) $(LIBELF_FLAGS) +LDFLAGS += $(LIBELF_LIBS) $(LIBSUBCMD) $(KBUILD_HOSTLDFLAGS) # Allow old libelf to be used: elfshdr := $(shell echo '$(pound)include ' | $(CC) $(CFLAGS) -x c -E - | grep elf_getshdr) From 7dd47617114921fdd8c095509e5e7b4373cc44a1 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 26 Mar 2019 22:51:02 +0100 Subject: [PATCH 377/468] watchdog: Respect watchdog cpumask on CPU hotplug The rework of the watchdog core to use cpu_stop_work broke the watchdog cpumask on CPU hotplug. The watchdog_enable/disable() functions are now called unconditionally from the hotplug callback, i.e. even on CPUs which are not in the watchdog cpumask. As a consequence the watchdog can become unstoppable. Only invoke them when the plugged CPU is in the watchdog cpumask. Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work") Reported-by: Maxime Coquelin Signed-off-by: Thomas Gleixner Tested-by: Maxime Coquelin Cc: Peter Zijlstra Cc: Oleg Nesterov Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Don Zickus Cc: Ricardo Neri Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903262245490.1789@nanos.tec.linutronix.de --- kernel/watchdog.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 403c9bd90413..6a5787233113 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -554,13 +554,15 @@ static void softlockup_start_all(void) int lockup_detector_online_cpu(unsigned int cpu) { - watchdog_enable(cpu); + if (cpumask_test_cpu(cpu, &watchdog_allowed_mask)) + watchdog_enable(cpu); return 0; } int lockup_detector_offline_cpu(unsigned int cpu) { - watchdog_disable(cpu); + if (cpumask_test_cpu(cpu, &watchdog_allowed_mask)) + watchdog_disable(cpu); return 0; } From 206b92353c839c0b27a0b9bec24195f93fd6cf7a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 26 Mar 2019 17:36:05 +0100 Subject: [PATCH 378/468] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Tianyu reported a crash in a CPU hotplug teardown callback when booting a kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot parameter. It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken forever in case that a bringup callback fails. Unfortunately this issue was not recognized when the CPU hotplug code was reworked, so the shortcoming just stayed in place. When a bringup callback fails, the CPU hotplug code rolls back the operation and takes the CPU offline. The 'nosmt' command line argument uses a bringup failure to abort the bringup of SMT sibling CPUs. This partial bringup is required due to the MCE misdesign on Intel CPUs. With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level teardown of a CPU including the synchronizations in various facilities like RCU, NOHZ and others. As a consequence the teardown callbacks which must be executed on the outgoing CPU within stop machine with interrupts disabled are executed on the control CPU in interrupt enabled and preemptible context causing the kernel to crash and burn. The pre state machine code has a different failure mode which is more subtle and resulting in a less obvious use after free crash because the control side frees resources which are still in use by the undead CPU. But this is not a x86 only problem. Any architecture which supports the SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less likely to be triggered because in 99.99999% of the cases all bringup callbacks succeed. The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on all architectures as the following architectures have either no hotplug support at all or not all subarchitectures support it: alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial). Crashing the kernel in such a situation is not an acceptable state either. Implement a minimal rollback variant by limiting the teardown to the point where all regular teardown callbacks have been invoked and leave the CPU in the 'dead' idle state. This has the following consequences: - the CPU is brought down to the point where the stop_machine takedown would happen. - the CPU stays there forever and is idle - The CPU is cleared in the CPU active mask, but not in the CPU online mask which is a legit state. - Interrupts are not forced away from the CPU - All facilities which only look at online mask would still see it, but that is the case during normal hotplug/unplug operations as well. It's just a (way) longer time frame. This will expose issues, which haven't been exposed before or only seldom, because now the normally transient state of being non active but online is a permanent state. In testing this exposed already an issue vs. work queues where the vmstat code schedules work on the almost dead CPU which ends up in an unbound workqueue and triggers 'preemtible context' warnings. This is not a problem of this change, it merily exposes an already existing issue. Still this is better than crashing fully without a chance to debug it. This is mainly thought as workaround for those architectures which do not support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP. Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions") Reported-by: Tianyu Lan Signed-off-by: Thomas Gleixner Tested-by: Tianyu Lan Acked-by: Greg Kroah-Hartman Cc: Konrad Wilk Cc: Josh Poimboeuf Cc: Mukesh Ojha Cc: Peter Zijlstra Cc: Jiri Kosina Cc: Rik van Riel Cc: Andy Lutomirski Cc: Micheal Kelley Cc: "K. Y. Srinivasan" Cc: Linus Torvalds Cc: Borislav Petkov Cc: K. Y. Srinivasan Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de --- kernel/cpu.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 025f419d16f6..6754f3ecfd94 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st) cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL); } +static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st) +{ + if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) + return true; + /* + * When CPU hotplug is disabled, then taking the CPU down is not + * possible because takedown_cpu() and the architecture and + * subsystem specific mechanisms are not available. So the CPU + * which would be completely unplugged again needs to stay around + * in the current state. + */ + return st->state <= CPUHP_BRINGUP_CPU; +} + static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st, enum cpuhp_state target) { @@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st, st->state++; ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL); if (ret) { - st->target = prev_state; - undo_cpu_up(cpu, st); + if (can_rollback_cpu(st)) { + st->target = prev_state; + undo_cpu_up(cpu, st); + } break; } } From bebd024e4815b1a170fcd21ead9c2222b23ce9e6 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 26 Mar 2019 17:36:06 +0100 Subject: [PATCH 379/468] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y The SMT disable 'nosmt' command line argument is not working properly when CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are required to be brought up due to the MCE issues, cannot work. The CPUs are then kept in a half dead state. As the 'nosmt' functionality has become popular due to the speculative hardware vulnerabilities, the half torn down state is not a proper solution to the problem. Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is possible. Reported-by: Tianyu Lan Signed-off-by: Thomas Gleixner Acked-by: Greg Kroah-Hartman Cc: Konrad Wilk Cc: Josh Poimboeuf Cc: Mukesh Ojha Cc: Peter Zijlstra Cc: Jiri Kosina Cc: Rik van Riel Cc: Andy Lutomirski Cc: Micheal Kelley Cc: "K. Y. Srinivasan" Cc: Linus Torvalds Cc: Borislav Petkov Cc: K. Y. Srinivasan Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de --- arch/x86/Kconfig | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c1f9b3cf437c..5ad92419be19 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2217,14 +2217,8 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING If unsure, leave at the default value. config HOTPLUG_CPU - bool "Support for hot-pluggable CPUs" + def_bool y depends on SMP - ---help--- - Say Y here to allow turning CPUs off and on. CPUs can be - controlled through /sys/devices/system/cpu. - ( Note: power management support will enable this option - automatically on SMP systems. ) - Say N if you want to disable CPU hotplug. config BOOTPARAM_HOTPLUG_CPU0 bool "Set default setting of cpu0_hotpluggable" From a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 25 Mar 2019 14:56:20 +0100 Subject: [PATCH 380/468] x86/retpolines: Disable switch jump tables when retpolines are enabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect calls from switch-case") raised the limit under retpolines to 20 switch cases where gcc would only then start to emit jump tables, and therefore effectively disabling the emission of slow indirect calls in this area. After this has been brought to attention to gcc folks [0], Martin Liska has then fixed gcc to align with clang by avoiding to generate switch jump tables entirely under retpolines. This is taking effect in gcc starting from stable version 8.4.0. Given kernel supports compilation with older versions of gcc where the fix is not being available or backported anymore, we need to keep the extra KBUILD_CFLAGS around for some time and generally set the -fno-jump-tables to align with what more recent gcc is doing automatically today. More than 20 switch cases are not expected to be fast-path critical, but it would still be good to align with gcc behavior for versions < 8.4.0 in order to have consistency across supported gcc versions. vmlinux size is slightly growing by 0.27% for older gcc. This flag is only set to work around affected gcc, no change for clang. [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952 Suggested-by: Martin Liska Signed-off-by: Daniel Borkmann Signed-off-by: Thomas Gleixner Cc: David Woodhouse Cc: Linus Torvalds Cc: Jesper Dangaard Brouer Cc: Björn Töpel Cc: Magnus Karlsson Cc: Alexei Starovoitov Cc: H.J. Lu Cc: Alexei Starovoitov Cc: David S. Miller Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net --- arch/x86/Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 2d8b9d8ca4f8..a587805c6687 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -219,8 +219,12 @@ ifdef CONFIG_RETPOLINE # Additionally, avoid generating expensive indirect jumps which # are subject to retpolines for small number of switch cases. # clang turns off jump table generation by default when under - # retpoline builds, however, gcc does not for x86. - KBUILD_CFLAGS += $(call cc-option,--param=case-values-threshold=20) + # retpoline builds, however, gcc does not for x86. This has + # only been fixed starting from gcc stable version 8.4.0 and + # onwards, but not for older ones. See gcc bug #86952. + ifndef CONFIG_CC_IS_CLANG + KBUILD_CFLAGS += $(call cc-option,-fno-jump-tables) + endif endif archscripts: scripts_basic From 92c77f7c4d5dfaaf45b2ce19360e69977c264766 Mon Sep 17 00:00:00 2001 From: Ralph Campbell Date: Mon, 25 Mar 2019 17:18:17 -0700 Subject: [PATCH 381/468] x86/mm: Don't exceed the valid physical address space valid_phys_addr_range() is used to sanity check the physical address range of an operation, e.g., access to /dev/mem. It uses __pa(high_memory) internally. If memory is populated at the end of the physical address space, then __pa(high_memory) is outside of the physical address space because: high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1; For the comparison in valid_phys_addr_range() this is not an issue, but if CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which verifies that the resulting physical address is within the valid physical address space of the CPU. So in the case that memory is populated at the end of the physical address space, this is not true and triggers a VIRTUAL_BUG_ON(). Use __pa(high_memory - 1) to prevent the conversion from going beyond the end of valid physical addresses. Fixes: be62a3204406 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses") Signed-off-by: Ralph Campbell Signed-off-by: Thomas Gleixner Cc: Craig Bergstrom Cc: Linus Torvalds Cc: Boris Ostrovsky Cc: Fengguang Wu Cc: Greg Kroah-Hartman Cc: Hans Verkuil Cc: Mauro Carvalho Chehab Cc: Peter Zijlstra Cc: Sander Eikelenboom Cc: Sean Young Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.com --- arch/x86/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index db3165714521..dc726e07d8ba 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -230,7 +230,7 @@ bool mmap_address_hint_valid(unsigned long addr, unsigned long len) /* Can we access it for direct reading/writing? Must be RAM: */ int valid_phys_addr_range(phys_addr_t addr, size_t count) { - return addr + count <= __pa(high_memory); + return addr + count - 1 <= __pa(high_memory - 1); } /* Can we access it through mmap? Must be a valid physical address: */ From 8324c3d518cfd69f2a17866b52c13bf56d3042d8 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Mon, 25 Mar 2019 08:02:05 +0000 Subject: [PATCH 382/468] KVM: arm/arm64: Comments cleanup in mmu.c Some comments in virt/kvm/arm/mmu.c are outdated. Update them to reflect the current state of the code. Signed-off-by: Zenghui Yu Reviewed-by: Suzuki K Poulose [maz: commit message tidy-up] Signed-off-by: Marc Zyngier --- virt/kvm/arm/mmu.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index f9da2fad9bd6..27c958306449 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -102,8 +102,7 @@ static bool kvm_is_device_pfn(unsigned long pfn) * @addr: IPA * @pmd: pmd pointer for IPA * - * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. Marks all - * pages in the range dirty. + * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. */ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) { @@ -121,8 +120,7 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) * @addr: IPA * @pud: pud pointer for IPA * - * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks all - * pages in the range dirty. + * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. */ static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp) { @@ -899,9 +897,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation. * @kvm: The KVM struct pointer for the VM. * - * Allocates only the stage-2 HW PGD level table(s) (can support either full - * 40-bit input addresses or limited to 32-bit input addresses). Clears the - * allocated pages. + * Allocates only the stage-2 HW PGD level table(s) of size defined by + * stage2_pgd_size(kvm). * * Note we don't need locking here as this is only called when the VM is * created, which can only be done once. @@ -1478,13 +1475,11 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, } /** - * stage2_wp_puds - write protect PGD range - * @pgd: pointer to pgd entry - * @addr: range start address - * @end: range end address - * - * Process PUD entries, for a huge PUD we cause a panic. - */ + * stage2_wp_puds - write protect PGD range + * @pgd: pointer to pgd entry + * @addr: range start address + * @end: range end address + */ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr, phys_addr_t end) { From 93a64ee71d10abc5787de7d4a00027e2c6469c3b Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 28 Mar 2019 14:29:27 +0100 Subject: [PATCH 383/468] MAINTAINERS: Remove deleted file from futex file pattern kernel/futex_compat.c was recently removed, but it's still in the MAINTAINERS file. Remove it there as well. Fixes: 04e7712f4460 ("y2038: futex: Move compat implementation into futex.c") Reported-by: Joe Perches Signed-off-by: Thomas Gleixner Cc: Arnd Bergmann --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3e5a5d263f29..f015ff786109 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6408,7 +6408,6 @@ L: linux-kernel@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core S: Maintained F: kernel/futex.c -F: kernel/futex_compat.c F: include/asm-generic/futex.h F: include/linux/futex.h F: include/uapi/linux/futex.h From 26cdaac4793c49357d2c731f2190632cefb7efb1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= Date: Tue, 26 Mar 2019 16:02:23 -0700 Subject: [PATCH 384/468] drm/i915/icl: Fix VEBOX mismatch BUG_ON() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GT VEBOX DISABLE is only 4 bits wide but it was using a 8 bits wide mask, the remaning reserved bits is set to 0 causing 4 more nonexistent VEBOX engines being detected as enabled, triggering the BUG_ON() because of mismatch between vebox_mask and newly added VEBOX_MASK(). [ 64.081621] [drm:intel_device_info_init_mmio [i915]] vdbox enable: 0005, instances: 0005 [ 64.081763] [drm:intel_device_info_init_mmio [i915]] vebox enable: 00f1, instances: 0001 [ 64.081825] intel_device_info_init_mmio:925 GEM_BUG_ON(vebox_mask != ({ unsigned int first__ = (VECS0); unsigned int count__ = (2); ((&(dev_priv)->__info)->engine_mask & (((~0UL) - (1UL << (first__)) + 1) & (~0UL >> (64 - 1 - (first__ + count__ - 1))))) >> first__; })) [ 64.082047] ------------[ cut here ]------------ [ 64.082054] kernel BUG at drivers/gpu/drm/i915/intel_device_info.c:925! BSpec: 20680 Fixes: 26376a7e74d2 ("drm/i915/icl: Check for fused-off VDBOX and VEBOX instances") Cc: Chris Wilson Cc: Tvrtko Ursulin Cc: Oscar Mateo Signed-off-by: José Roberto de Souza Reviewed-by: Tvrtko Ursulin Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20190326230223.26336-1-jose.souza@intel.com (cherry picked from commit 547fcf9b1c608cf5c43c156a8773a94c6a38dc44) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/i915_reg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 684589acc368..047855dd8c6b 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2863,7 +2863,7 @@ enum i915_power_well_id { #define GEN11_GT_VEBOX_VDBOX_DISABLE _MMIO(0x9140) #define GEN11_GT_VDBOX_DISABLE_MASK 0xff #define GEN11_GT_VEBOX_DISABLE_SHIFT 16 -#define GEN11_GT_VEBOX_DISABLE_MASK (0xff << GEN11_GT_VEBOX_DISABLE_SHIFT) +#define GEN11_GT_VEBOX_DISABLE_MASK (0x0f << GEN11_GT_VEBOX_DISABLE_SHIFT) #define GEN11_EU_DISABLE _MMIO(0x9134) #define GEN11_EU_DIS_MASK 0xFF From dd08a8d9a66de4b54575c294a92630299f7e0fe7 Mon Sep 17 00:00:00 2001 From: raymond pang Date: Thu, 28 Mar 2019 12:19:25 +0000 Subject: [PATCH 385/468] libata: fix using DMA buffers on stack When CONFIG_VMAP_STACK=y, __pa() returns incorrect physical address for a stack virtual address. Stack DMA buffers must be avoided. Signed-off-by: raymond pang Signed-off-by: Jens Axboe --- drivers/ata/libata-zpodd.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c index b3ed8f9953a8..173e6f2dd9af 100644 --- a/drivers/ata/libata-zpodd.c +++ b/drivers/ata/libata-zpodd.c @@ -52,38 +52,52 @@ static int eject_tray(struct ata_device *dev) /* Per the spec, only slot type and drawer type ODD can be supported */ static enum odd_mech_type zpodd_get_mech_type(struct ata_device *dev) { - char buf[16]; + char *buf; unsigned int ret; - struct rm_feature_desc *desc = (void *)(buf + 8); + struct rm_feature_desc *desc; struct ata_taskfile tf; static const char cdb[] = { GPCMD_GET_CONFIGURATION, 2, /* only 1 feature descriptor requested */ 0, 3, /* 3, removable medium feature */ 0, 0, 0,/* reserved */ - 0, sizeof(buf), + 0, 16, 0, 0, 0, }; + buf = kzalloc(16, GFP_KERNEL); + if (!buf) + return ODD_MECH_TYPE_UNSUPPORTED; + desc = (void *)(buf + 8); + ata_tf_init(dev, &tf); tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf.command = ATA_CMD_PACKET; tf.protocol = ATAPI_PROT_PIO; - tf.lbam = sizeof(buf); + tf.lbam = 16; ret = ata_exec_internal(dev, &tf, cdb, DMA_FROM_DEVICE, - buf, sizeof(buf), 0); - if (ret) + buf, 16, 0); + if (ret) { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + } - if (be16_to_cpu(desc->feature_code) != 3) + if (be16_to_cpu(desc->feature_code) != 3) { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + } - if (desc->mech_type == 0 && desc->load == 0 && desc->eject == 1) + if (desc->mech_type == 0 && desc->load == 0 && desc->eject == 1) { + kfree(buf); return ODD_MECH_TYPE_SLOT; - else if (desc->mech_type == 1 && desc->load == 0 && desc->eject == 1) + } else if (desc->mech_type == 1 && desc->load == 0 && + desc->eject == 1) { + kfree(buf); return ODD_MECH_TYPE_DRAWER; - else + } else { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + } } /* Test if ODD is zero power ready by sense code */ From 7265f5b72640f43e558af80347c62e32d568371f Mon Sep 17 00:00:00 2001 From: Wen Yang Date: Sat, 23 Mar 2019 14:14:31 +0800 Subject: [PATCH 386/468] coccinelle: put_device: reduce false positives Don't complain about a return when this function returns "&pdev->dev". Fixes: da9cfb87a44d ("coccinelle: semantic code search for missing put_device()") Reported-by: Julia Lawall Signed-off-by: Wen Yang Acked-by: Julia Lawall Signed-off-by: Masahiro Yamada --- scripts/coccinelle/free/put_device.cocci | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/coccinelle/free/put_device.cocci b/scripts/coccinelle/free/put_device.cocci index 7395697e7f19..c9f071b0a0ab 100644 --- a/scripts/coccinelle/free/put_device.cocci +++ b/scripts/coccinelle/free/put_device.cocci @@ -32,6 +32,7 @@ if (id == NULL || ...) { ... return ...; } ( id | (T2)dev_get_drvdata(&id->dev) | (T3)platform_get_drvdata(id) +| &id->dev ); | return@p2 ...; ) From 221cc2d27ddc49b3e06d4637db02bf78e70c573c Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 26 Mar 2019 13:02:19 +0900 Subject: [PATCH 387/468] kbuild: skip parsing pre sub-make code for recursion When Make recurses to the top Makefile with sub-make-done unset, the code block surrounded by 'ifneq ($(sub-make-done),1) ... endif' is parsed multiple times. This happens for in-tree building of include/config/auto.conf, *-pkg, etc. with GNU Make 4.x. This is a slight regression by commit 688931a5ad4e ("kbuild: skip sub-make for in-tree build with GNU Make 4.x") in terms of performance since that code block contains one $(shell ...) invocation. Fix it by exporting the variable irrespective of sub-make being run. I renamed it because GNU Make cannot properly export variables containing hyphens. This is probably a bug of GNU Make, and the issue in Kbuild had already been reported by commit 2bfbe7881ee0 ("kbuild: Do not use hyphen in exported variable name"). Signed-off-by: Masahiro Yamada --- Makefile | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 6c0635f69982..3c788a65d652 100644 --- a/Makefile +++ b/Makefile @@ -31,7 +31,7 @@ _all: # descending is started. They are now explicitly listed as the # prepare rule. -ifneq ($(sub-make-done),1) +ifneq ($(sub_make_done),1) # Do not use make's built-in rules and variables # (this increases performance and avoids hard-to-debug behaviour) @@ -155,6 +155,8 @@ need-sub-make := 1 $(lastword $(MAKEFILE_LIST)): ; endif +export sub_make_done := 1 + ifeq ($(need-sub-make),1) PHONY += $(MAKECMDGOALS) sub-make @@ -164,12 +166,12 @@ $(filter-out _all sub-make $(CURDIR)/Makefile, $(MAKECMDGOALS)) _all: sub-make # Invoke a second make in the output directory, passing relevant variables sub-make: - $(Q)$(MAKE) sub-make-done=1 \ + $(Q)$(MAKE) \ $(if $(KBUILD_OUTPUT),-C $(KBUILD_OUTPUT) KBUILD_SRC=$(CURDIR)) \ -f $(CURDIR)/Makefile $(filter-out _all sub-make,$(MAKECMDGOALS)) endif # need-sub-make -endif # sub-make-done +endif # sub_make_done # We process the rest of the Makefile if this is the final invocation of make ifeq ($(need-sub-make),) From 156e7cbb3ef50d6d58b7975d79802e4ffb024dc2 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 26 Mar 2019 13:26:58 +0900 Subject: [PATCH 388/468] kbuild: do not overwrite .gitignore in output directory Commit 3a51ff344204 ("kbuild: gitignore output directory") seemed to bother people who version-control output directories. Andre Przywara says: "Unfortunately this breaks my setup, because I keep a totally separate git repository in my build directories to track (various versions of) .config. So .gitignore there is carefully crafted to ignore most build artefacts, but not .config, for instance." Link: https://lkml.org/lkml/2019/3/22/1819 Reported-by: Andre Przywara Signed-off-by: Masahiro Yamada Tested-by: Andre Przywara Reviewed-by: Andre Przywara --- Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 3c788a65d652..2810425541f4 100644 --- a/Makefile +++ b/Makefile @@ -499,7 +499,8 @@ outputmakefile: ifneq ($(KBUILD_SRC),) $(Q)ln -fsn $(srctree) source $(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree) - $(Q){ echo "# this is build directory, ignore it"; echo "*"; } > .gitignore + $(Q)test -e .gitignore || \ + { echo "# this is build directory, ignore it"; echo "*"; } > .gitignore endif ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),) From 1a49b2fd8f58dd397043a17de9b3c421ccf8eda7 Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Tue, 26 Mar 2019 10:50:28 -0400 Subject: [PATCH 389/468] kbuild: strip whitespace in cmd_record_mcount findstring CC_FLAGS_FTRACE may contain trailing whitespace that interferes with findstring. For example, commit 6977f95e63b9 ("powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer") introduced a change such that on my ppc64le box, CC_FLAGS_FTRACE="-pg -mprofile-kernel ". (Note the trailing space.) When cmd_record_mcount is now invoked, findstring fails as the ftrace flags were found at very end of _c_flags, without the trailing space. _c_flags=" ... -pg -mprofile-kernel" CC_FLAGS_FTRACE="-pg -mprofile-kernel " ^ findstring is looking for this extra space Remove the redundant whitespaces from CC_FLAGS_FTRACE in cmd_record_mcount to avoid this problem. [masahiro.yamada: This issue only happens in the released versions of GNU Make. CC_FLAGS_FTRACE will not contain the trailing space if you use the latest GNU Make, which contains commit b90fabc8d6f3 ("* NEWS: Do not insert a space during '+=' if the value is empty.") ] Suggested-by: Masahiro Yamada (refactoring) Fixes: 6977f95e63b9 ("powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer"). Signed-off-by: Joe Lawrence Acked-by: Steven Rostedt (VMware) Signed-off-by: Masahiro Yamada --- scripts/Makefile.build | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 2554a15ecf2b..76ca30cc4791 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -199,11 +199,8 @@ sub_cmd_record_mcount = perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \ "$(if $(part-of-module),1,0)" "$(@)"; recordmcount_source := $(srctree)/scripts/recordmcount.pl endif # BUILD_C_RECORDMCOUNT -cmd_record_mcount = \ - if [ "$(findstring $(CC_FLAGS_FTRACE),$(_c_flags))" = \ - "$(CC_FLAGS_FTRACE)" ]; then \ - $(sub_cmd_record_mcount) \ - fi +cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), \ + $(sub_cmd_record_mcount)) endif # CC_USING_RECORD_MCOUNT endif # CONFIG_FTRACE_MCOUNT_RECORD From 7fcddf7c004140052d6f72273a0455239ad49112 Mon Sep 17 00:00:00 2001 From: Michael Stefaniuc Date: Tue, 26 Mar 2019 22:22:00 +0100 Subject: [PATCH 390/468] scripts: coccinelle: Fix description of badty.cocci Summary was copy and pasted from array_size.cocci. Signed-off-by: Michael Stefaniuc Acked-by: Julia Lawall Signed-off-by: Masahiro Yamada --- scripts/coccinelle/misc/badty.cocci | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/coccinelle/misc/badty.cocci b/scripts/coccinelle/misc/badty.cocci index 481cf301ccfc..08470362199c 100644 --- a/scripts/coccinelle/misc/badty.cocci +++ b/scripts/coccinelle/misc/badty.cocci @@ -1,4 +1,4 @@ -/// Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element +/// Correct the size argument to alloc functions /// //# This makes an effort to find cases where the argument to sizeof is wrong //# in memory allocation functions by checking the type of the allocated memory From 54a7151b1496cddbb7a83546b7998103e98edc88 Mon Sep 17 00:00:00 2001 From: Fredrik Noring Date: Wed, 27 Mar 2019 19:12:50 +0100 Subject: [PATCH 391/468] kbuild: modversions: Fix relative CRC byte order interpretation Fix commit 56067812d5b0 ("kbuild: modversions: add infrastructure for emitting relative CRCs") where CRCs are interpreted in host byte order rather than proper kernel byte order. The bug is conditional on CONFIG_MODULE_REL_CRCS. For example, when loading a BE module into a BE kernel compiled with a LE system, the error "disagrees about version of symbol module_layout" is produced. A message such as "Found checksum D7FA6856 vs module 5668FAD7" will be given with debug enabled, which indicates an obvious endian problem within __kcrctab within the kernel image. The general solution is to use the macro TO_NATIVE, as is done in similar cases throughout modpost.c. With this correction it has been verified that a BE kernel compiled with a LE system accepts BE modules. This change has also been verified with a LE kernel compiled with a LE system, in which case TO_NATIVE returns its value unmodified since the byte orders match. This is by far the common case. Fixes: 56067812d5b0 ("kbuild: modversions: add infrastructure for emitting relative CRCs") Signed-off-by: Fredrik Noring Cc: stable@vger.kernel.org Signed-off-by: Masahiro Yamada --- scripts/mod/modpost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 0b0d1080b1c5..f277e116e0eb 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -639,7 +639,7 @@ static void handle_modversions(struct module *mod, struct elf_info *info, info->sechdrs[sym->st_shndx].sh_offset - (info->hdr->e_type != ET_REL ? info->sechdrs[sym->st_shndx].sh_addr : 0); - crc = *crcp; + crc = TO_NATIVE(*crcp); } sym_update_crc(symname + strlen("__crc_"), mod, crc, export); From 7d6ab823d6461e60d211d4c8d89a13dce08b730d Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 27 Mar 2019 22:53:31 +0000 Subject: [PATCH 392/468] vfs: Update mount API docs Update the mount API docs to reflect recent changes to the code. Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- Documentation/filesystems/mount_api.txt | 365 +++++++++++++----------- 1 file changed, 194 insertions(+), 171 deletions(-) diff --git a/Documentation/filesystems/mount_api.txt b/Documentation/filesystems/mount_api.txt index 944d1965e917..00ff0cfccfa7 100644 --- a/Documentation/filesystems/mount_api.txt +++ b/Documentation/filesystems/mount_api.txt @@ -12,11 +12,13 @@ CONTENTS (4) Filesystem context security. - (5) VFS filesystem context operations. + (5) VFS filesystem context API. - (6) Parameter description. + (6) Superblock creation helpers. - (7) Parameter helper functions. + (7) Parameter description. + + (8) Parameter helper functions. ======== @@ -41,12 +43,15 @@ The creation of new mounts is now to be done in a multistep process: (7) Destroy the context. -To support this, the file_system_type struct gains a new field: +To support this, the file_system_type struct gains two new fields: int (*init_fs_context)(struct fs_context *fc); + const struct fs_parameter_description *parameters; -which is invoked to set up the filesystem-specific parts of a filesystem -context, including the additional space. +The first is invoked to set up the filesystem-specific parts of a filesystem +context, including the additional space, and the second points to the +parameter description for validation at registration time and querying by a +future system call. Note that security initialisation is done *after* the filesystem is called so that the namespaces may be adjusted first. @@ -73,9 +78,9 @@ context. This is represented by the fs_context structure: void *s_fs_info; unsigned int sb_flags; unsigned int sb_flags_mask; + unsigned int s_iflags; + unsigned int lsm_flags; enum fs_context_purpose purpose:8; - bool sloppy:1; - bool silent:1; ... }; @@ -141,6 +146,10 @@ The fs_context fields are as follows: Which bits SB_* flags are to be set/cleared in super_block::s_flags. + (*) unsigned int s_iflags + + These will be bitwise-OR'd with s->s_iflags when a superblock is created. + (*) enum fs_context_purpose This indicates the purpose for which the context is intended. The @@ -150,17 +159,6 @@ The fs_context fields are as follows: FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount - (*) bool sloppy - (*) bool silent - - These are set if the sloppy or silent mount options are given. - - [NOTE] sloppy is probably unnecessary when userspace passes over one - option at a time since the error can just be ignored if userspace deems it - to be unimportant. - - [NOTE] silent is probably redundant with sb_flags & SB_SILENT. - The mount context is created by calling vfs_new_fs_context() or vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the structure is not refcounted. @@ -342,28 +340,47 @@ number of operations used by the new mount code for this purpose: It should return 0 on success or a negative error code on failure. -================================= -VFS FILESYSTEM CONTEXT OPERATIONS -================================= +========================== +VFS FILESYSTEM CONTEXT API +========================== -There are four operations for creating a filesystem context and -one for destroying a context: +There are four operations for creating a filesystem context and one for +destroying a context: - (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type, - struct dentry *reference, - unsigned int sb_flags, - unsigned int sb_flags_mask, - enum fs_context_purpose purpose); + (*) struct fs_context *fs_context_for_mount( + struct file_system_type *fs_type, + unsigned int sb_flags); - Create a filesystem context for a given filesystem type and purpose. This - allocates the filesystem context, sets the superblock flags, initialises - the security and calls fs_type->init_fs_context() to initialise the - filesystem private data. + Allocate a filesystem context for the purpose of setting up a new mount, + whether that be with a new superblock or sharing an existing one. This + sets the superblock flags, initialises the security and calls + fs_type->init_fs_context() to initialise the filesystem private data. - reference can be NULL or it may indicate the root dentry of a superblock - that is going to be reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or - the automount point that triggered a submount (FS_CONTEXT_FOR_SUBMOUNT). - This is provided as a source of namespace information. + fs_type specifies the filesystem type that will manage the context and + sb_flags presets the superblock flags stored therein. + + (*) struct fs_context *fs_context_for_reconfigure( + struct dentry *dentry, + unsigned int sb_flags, + unsigned int sb_flags_mask); + + Allocate a filesystem context for the purpose of reconfiguring an + existing superblock. dentry provides a reference to the superblock to be + configured. sb_flags and sb_flags_mask indicate which superblock flags + need changing and to what. + + (*) struct fs_context *fs_context_for_submount( + struct file_system_type *fs_type, + struct dentry *reference); + + Allocate a filesystem context for the purpose of creating a new mount for + an automount point or other derived superblock. fs_type specifies the + filesystem type that will manage the context and the reference dentry + supplies the parameters. Namespaces are propagated from the reference + dentry's superblock also. + + Note that it's not a requirement that the reference dentry be of the same + filesystem type as fs_type. (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc); @@ -390,20 +407,6 @@ context pointer or a negative error code. For the remaining operations, if an error occurs, a negative error code will be returned. - (*) int vfs_get_tree(struct fs_context *fc); - - Get or create the mountable root and superblock, using the parameters in - the filesystem context to select/configure the superblock. This invokes - the ->validate() op and then the ->get_tree() op. - - [NOTE] ->validate() could perhaps be rolled into ->get_tree() and - ->reconfigure(). - - (*) struct vfsmount *vfs_create_mount(struct fs_context *fc); - - Create a mount given the parameters in the specified filesystem context. - Note that this does not attach the mount to anything. - (*) int vfs_parse_fs_param(struct fs_context *fc, struct fs_parameter *param); @@ -432,17 +435,80 @@ returned. clear the pointer, but then becomes responsible for disposing of the object. - (*) int vfs_parse_fs_string(struct fs_context *fc, char *key, + (*) int vfs_parse_fs_string(struct fs_context *fc, const char *key, const char *value, size_t v_size); - A wrapper around vfs_parse_fs_param() that just passes a constant string. + A wrapper around vfs_parse_fs_param() that copies the value string it is + passed. (*) int generic_parse_monolithic(struct fs_context *fc, void *data); Parse a sys_mount() data page, assuming the form to be a text list consisting of key[=val] options separated by commas. Each item in the list is passed to vfs_mount_option(). This is the default when the - ->parse_monolithic() operation is NULL. + ->parse_monolithic() method is NULL. + + (*) int vfs_get_tree(struct fs_context *fc); + + Get or create the mountable root and superblock, using the parameters in + the filesystem context to select/configure the superblock. This invokes + the ->get_tree() method. + + (*) struct vfsmount *vfs_create_mount(struct fs_context *fc); + + Create a mount given the parameters in the specified filesystem context. + Note that this does not attach the mount to anything. + + +=========================== +SUPERBLOCK CREATION HELPERS +=========================== + +A number of VFS helpers are available for use by filesystems for the creation +or looking up of superblocks. + + (*) struct super_block * + sget_fc(struct fs_context *fc, + int (*test)(struct super_block *sb, struct fs_context *fc), + int (*set)(struct super_block *sb, struct fs_context *fc)); + + This is the core routine. If test is non-NULL, it searches for an + existing superblock matching the criteria held in the fs_context, using + the test function to match them. If no match is found, a new superblock + is created and the set function is called to set it up. + + Prior to the set function being called, fc->s_fs_info will be transferred + to sb->s_fs_info - and fc->s_fs_info will be cleared if set returns + success (ie. 0). + +The following helpers all wrap sget_fc(): + + (*) int vfs_get_super(struct fs_context *fc, + enum vfs_get_super_keying keying, + int (*fill_super)(struct super_block *sb, + struct fs_context *fc)) + + This creates/looks up a deviceless superblock. The keying indicates how + many superblocks of this type may exist and in what manner they may be + shared: + + (1) vfs_get_single_super + + Only one such superblock may exist in the system. Any further + attempt to get a new superblock gets this one (and any parameter + differences are ignored). + + (2) vfs_get_keyed_super + + Multiple superblocks of this type may exist and they're keyed on + their s_fs_info pointer (for example this may refer to a + namespace). + + (3) vfs_get_independent_super + + Multiple independent superblocks of this type may exist. This + function never matches an existing one and always creates a new + one. ===================== @@ -454,35 +520,22 @@ There's a core description struct that links everything together: struct fs_parameter_description { const char name[16]; - u8 nr_params; - u8 nr_alt_keys; - u8 nr_enums; - bool ignore_unknown; - bool no_source; - const char *const *keys; - const struct constant_table *alt_keys; const struct fs_parameter_spec *specs; const struct fs_parameter_enum *enums; }; For example: - enum afs_param { + enum { Opt_autocell, Opt_bar, Opt_dyn, Opt_foo, Opt_source, - nr__afs_params }; static const struct fs_parameter_description afs_fs_parameters = { .name = "kAFS", - .nr_params = nr__afs_params, - .nr_alt_keys = ARRAY_SIZE(afs_param_alt_keys), - .nr_enums = ARRAY_SIZE(afs_param_enums), - .keys = afs_param_keys, - .alt_keys = afs_param_alt_keys, .specs = afs_param_specs, .enums = afs_param_enums, }; @@ -494,28 +547,24 @@ The members are as follows: The name to be used in error messages generated by the parse helper functions. - (2) u8 nr_params; + (2) const struct fs_parameter_specification *specs; - The number of discrete parameter identifiers. This indicates the number - of elements in the ->types[] array and also limits the values that may be - used in the values that the ->keys[] array maps to. + Table of parameter specifications, terminated with a null entry, where the + entries are of type: - It is expected that, for example, two parameters that are related, say - "acl" and "noacl" with have the same ID, but will be flagged to indicate - that one is the inverse of the other. The value can then be picked out - from the parse result. - - (3) const struct fs_parameter_specification *specs; - - Table of parameter specifications, where the entries are of type: - - struct fs_parameter_type { - enum fs_parameter_spec type:8; - u8 flags; + struct fs_parameter_spec { + const char *name; + u8 opt; + enum fs_parameter_type type:8; + unsigned short flags; }; - and the parameter identifier is the index to the array. 'type' indicates - the desired value type and must be one of: + The 'name' field is a string to match exactly to the parameter key (no + wildcards, patterns and no case-independence) and 'opt' is the value that + will be returned by the fs_parser() function in the case of a successful + match. + + The 'type' field indicates the desired value type and must be one of: TYPE NAME EXPECTED VALUE RESULT IN ======================= ======================= ===================== @@ -525,85 +574,65 @@ The members are as follows: fs_param_is_u32_octal 32-bit octal int result->uint_32 fs_param_is_u32_hex 32-bit hex int result->uint_32 fs_param_is_s32 32-bit signed int result->int_32 + fs_param_is_u64 64-bit unsigned int result->uint_64 fs_param_is_enum Enum value name result->uint_32 fs_param_is_string Arbitrary string param->string fs_param_is_blob Binary blob param->blob fs_param_is_blockdev Blockdev path * Needs lookup fs_param_is_path Path * Needs lookup - fs_param_is_fd File descriptor param->file - - And each parameter can be qualified with 'flags': - - fs_param_v_optional The value is optional - fs_param_neg_with_no If key name is prefixed with "no", it is false - fs_param_neg_with_empty If value is "", it is false - fs_param_deprecated The parameter is deprecated. - - For example: - - static const struct fs_parameter_spec afs_param_specs[nr__afs_params] = { - [Opt_autocell] = { fs_param_is flag }, - [Opt_bar] = { fs_param_is_enum }, - [Opt_dyn] = { fs_param_is flag }, - [Opt_foo] = { fs_param_is_bool, fs_param_neg_with_no }, - [Opt_source] = { fs_param_is_string }, - }; + fs_param_is_fd File descriptor result->int_32 Note that if the value is of fs_param_is_bool type, fs_parse() will try to match any string value against "0", "1", "no", "yes", "false", "true". - [!] NOTE that the table must be sorted according to primary key name so - that ->keys[] is also sorted. + Each parameter can also be qualified with 'flags': - (4) const char *const *keys; + fs_param_v_optional The value is optional + fs_param_neg_with_no result->negated set if key is prefixed with "no" + fs_param_neg_with_empty result->negated set if value is "" + fs_param_deprecated The parameter is deprecated. - Table of primary key names for the parameters. There must be one entry - per defined parameter. The table is optional if ->nr_params is 0. The - table is just an array of names e.g.: + These are wrapped with a number of convenience wrappers: - static const char *const afs_param_keys[nr__afs_params] = { - [Opt_autocell] = "autocell", - [Opt_bar] = "bar", - [Opt_dyn] = "dyn", - [Opt_foo] = "foo", - [Opt_source] = "source", + MACRO SPECIFIES + ======================= =============================================== + fsparam_flag() fs_param_is_flag + fsparam_flag_no() fs_param_is_flag, fs_param_neg_with_no + fsparam_bool() fs_param_is_bool + fsparam_u32() fs_param_is_u32 + fsparam_u32oct() fs_param_is_u32_octal + fsparam_u32hex() fs_param_is_u32_hex + fsparam_s32() fs_param_is_s32 + fsparam_u64() fs_param_is_u64 + fsparam_enum() fs_param_is_enum + fsparam_string() fs_param_is_string + fsparam_blob() fs_param_is_blob + fsparam_bdev() fs_param_is_blockdev + fsparam_path() fs_param_is_path + fsparam_fd() fs_param_is_fd + + all of which take two arguments, name string and option number - for + example: + + static const struct fs_parameter_spec afs_param_specs[] = { + fsparam_flag ("autocell", Opt_autocell), + fsparam_flag ("dyn", Opt_dyn), + fsparam_string ("source", Opt_source), + fsparam_flag_no ("foo", Opt_foo), + {} }; - [!] NOTE that the table must be sorted such that the table can be searched - with bsearch() using strcmp(). This means that the Opt_* values must - correspond to the entries in this table. - - (5) const struct constant_table *alt_keys; - u8 nr_alt_keys; - - Table of additional key names and their mappings to parameter ID plus the - number of elements in the table. This is optional. The table is just an - array of { name, integer } pairs, e.g.: - - static const struct constant_table afs_param_keys[] = { - { "baz", Opt_bar }, - { "dynamic", Opt_dyn }, - }; - - [!] NOTE that the table must be sorted such that strcmp() can be used with - bsearch() to search the entries. - - The parameter ID can also be fs_param_key_removed to indicate that a - deprecated parameter has been removed and that an error will be given. - This differs from fs_param_deprecated where the parameter may still have - an effect. - - Further, the behaviour of the parameter may differ when an alternate name - is used (for instance with NFS, "v3", "v4.2", etc. are alternate names). + An addition macro, __fsparam() is provided that takes an additional pair + of arguments to specify the type and the flags for anything that doesn't + match one of the above macros. (6) const struct fs_parameter_enum *enums; - u8 nr_enums; - Table of enum value names to integer mappings and the number of elements - stored therein. This is of type: + Table of enum value names to integer mappings, terminated with a null + entry. This is of type: struct fs_parameter_enum { - u8 param_id; + u8 opt; char name[14]; u8 value; }; @@ -621,11 +650,6 @@ The members are as follows: try to look the value up in the enum table and the result will be stored in the parse result. - (7) bool no_source; - - If this is set, fs_parse() will ignore any "source" parameter and not - pass it to the filesystem. - The parser should be pointed to by the parser pointer in the file_system_type struct as this will provide validation on registration (if CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from @@ -650,9 +674,8 @@ process the parameters it is given. int value; }; - and it must be sorted such that it can be searched using bsearch() using - strcmp(). If a match is found, the corresponding value is returned. If a - match isn't found, the not_found value is returned instead. + If a match is found, the corresponding value is returned. If a match + isn't found, the not_found value is returned instead. (*) bool validate_constant_table(const struct constant_table *tbl, size_t tbl_size, @@ -665,36 +688,36 @@ process the parameters it is given. should just be set to lie inside the low-to-high range. If all is good, true is returned. If the table is invalid, errors are - logged to dmesg, the stack is dumped and false is returned. + logged to dmesg and false is returned. + + (*) bool fs_validate_description(const struct fs_parameter_description *desc); + + This performs some validation checks on a parameter description. It + returns true if the description is good and false if it is not. It will + log errors to dmesg if validation fails. (*) int fs_parse(struct fs_context *fc, - const struct fs_param_parser *parser, + const struct fs_parameter_description *desc, struct fs_parameter *param, - struct fs_param_parse_result *result); + struct fs_parse_result *result); This is the main interpreter of parameters. It uses the parameter - description (parser) to look up the name of the parameter to use and to - convert that to a parameter ID (stored in result->key). + description to look up a parameter by key name and to convert that to an + option number (which it returns). If successful, and if the parameter type indicates the result is a boolean, integer or enum type, the value is converted by this function and - the result stored in result->{boolean,int_32,uint_32}. + the result stored in result->{boolean,int_32,uint_32,uint_64}. If a match isn't initially made, the key is prefixed with "no" and no value is present then an attempt will be made to look up the key with the prefix removed. If this matches a parameter for which the type has flag - fs_param_neg_with_no set, then a match will be made and the value will be - set to false/0/NULL. + fs_param_neg_with_no set, then a match will be made and result->negated + will be set to true. - If the parameter is successfully matched and, optionally, parsed - correctly, 1 is returned. If the parameter isn't matched and - parser->ignore_unknown is set, then 0 is returned. Otherwise -EINVAL is - returned. - - (*) bool fs_validate_description(const struct fs_parameter_description *desc); - - This is validates the parameter description. It returns true if the - description is good and false if it is not. + If the parameter isn't matched, -ENOPARAM will be returned; if the + parameter is matched, but the value is erroneous, -EINVAL will be + returned; otherwise the parameter's option number will be returned. (*) int fs_lookup_param(struct fs_context *fc, struct fs_parameter *value, From 8c7ae38d1ce12a0eaeba655df8562552b3596c7f Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 27 Mar 2019 22:48:02 +0000 Subject: [PATCH 393/468] afs: Fix StoreData op marshalling The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls generated by ->setattr() ops for the purpose of expanding a file is incorrect due to older documentation incorrectly describing the way the RPC 'FileLength' parameter is meant to work. The older documentation says that this is the length the file is meant to end up at the end of the operation; however, it was never implemented this way in any of the servers, but rather the file is truncated down to this before the write operation is effected, and never expanded to it (and, indeed, it was renamed to 'TruncPos' in 2014). Fix this by setting the position parameter to the new file length and doing a zero-lengh write there. The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file it then mmaps. This can be tested by giving the following test program a filename in an AFS directory: #include #include #include #include #include int main(int argc, char *argv[]) { char *p; int fd; if (argc != 2) { fprintf(stderr, "Format: test-trunc-mmap \n"); exit(2); } fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC); if (fd < 0) { perror(argv[1]); exit(1); } if (ftruncate(fd, 0x140008) == -1) { perror("ftruncate"); exit(1); } p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror("mmap"); exit(1); } p[0] = 'a'; if (munmap(p, 4096) < 0) { perror("munmap"); exit(1); } if (close(fd) < 0) { perror("close"); exit(1); } exit(0); } Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Jonathan Billings Tested-by: Jonathan Billings Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- fs/afs/fsclient.c | 6 +++--- fs/afs/yfsclient.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index ca08c83168f5..0b37867b5c20 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -1515,8 +1515,8 @@ static int afs_fs_setattr_size64(struct afs_fs_cursor *fc, struct iattr *attr) xdr_encode_AFS_StoreStatus(&bp, attr); - *bp++ = 0; /* position of start of write */ - *bp++ = 0; + *bp++ = htonl(attr->ia_size >> 32); /* position of start of write */ + *bp++ = htonl((u32) attr->ia_size); *bp++ = 0; /* size of write */ *bp++ = 0; *bp++ = htonl(attr->ia_size >> 32); /* new file length */ @@ -1564,7 +1564,7 @@ static int afs_fs_setattr_size(struct afs_fs_cursor *fc, struct iattr *attr) xdr_encode_AFS_StoreStatus(&bp, attr); - *bp++ = 0; /* position of start of write */ + *bp++ = htonl(attr->ia_size); /* position of start of write */ *bp++ = 0; /* size of write */ *bp++ = htonl(attr->ia_size); /* new file length */ diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 5aa57929e8c2..6e97a42d24d1 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -1514,7 +1514,7 @@ static int yfs_fs_setattr_size(struct afs_fs_cursor *fc, struct iattr *attr) bp = xdr_encode_u32(bp, 0); /* RPC flags */ bp = xdr_encode_YFSFid(bp, &vnode->fid); bp = xdr_encode_YFS_StoreStatus(bp, attr); - bp = xdr_encode_u64(bp, 0); /* position of start of write */ + bp = xdr_encode_u64(bp, attr->ia_size); /* position of start of write */ bp = xdr_encode_u64(bp, 0); /* size of write */ bp = xdr_encode_u64(bp, attr->ia_size); /* new file length */ yfs_check_req(call, bp); From f6027c81099e87516d27d123867f10abd04b2d38 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Thu, 28 Mar 2019 16:49:48 +0100 Subject: [PATCH 394/468] x86/cpufeature: Fix __percpu annotation in this_cpu_has() &cpu_info.x86_capability is __percpu, and the second argument of x86_this_cpu_test_bit() is expected to be __percpu. Don't cast the __percpu away and then implicitly add it again. This gets rid of 106 lines of sparse warnings with the kernel config I'm using. Signed-off-by: Jann Horn Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Masahiro Yamada Cc: Nadav Amit Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190328154948.152273-1-jannh@google.com --- arch/x86/include/asm/cpufeature.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index ce95b8cbd229..0e56ff7e4848 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -112,8 +112,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; test_cpu_cap(c, bit)) #define this_cpu_has(bit) \ - (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ - x86_this_cpu_test_bit(bit, (unsigned long *)&cpu_info.x86_capability)) + (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ + x86_this_cpu_test_bit(bit, \ + (unsigned long __percpu *)&cpu_info.x86_capability)) /* * This macro is for detection of features which need kernel From e5545c94e43b8f6599ffc01df8d1aedf18ee912a Mon Sep 17 00:00:00 2001 From: Andrey Smirnov Date: Mon, 25 Mar 2019 23:32:08 -0700 Subject: [PATCH 395/468] gpio: of: Check propname before applying "cs-gpios" quirks SPI GPIO device has more than just "cs-gpio" property in its node and would request those GPIOs as a part of its initialization. To avoid applying CS-specific quirk to all of them add a check to make sure that propname is "cs-gpios". Signed-off-by: Andrey Smirnov Cc: Linus Walleij Cc: Bartosz Golaszewski Cc: Chris Healy Cc: linux-gpio@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Linus Walleij --- drivers/gpio/gpiolib-of.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index 8b9c3ab70f6e..ee7f08386a72 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -120,7 +120,8 @@ static void of_gpio_flags_quirks(struct device_node *np, * to determine if the flags should have inverted semantics. */ if (IS_ENABLED(CONFIG_SPI_MASTER) && - of_property_read_bool(np, "cs-gpios")) { + of_property_read_bool(np, "cs-gpios") && + !strcmp(propname, "cs-gpios")) { struct device_node *child; u32 cs; int ret; From 7ce40277bf848391705011ba37eac2e377cbd9e6 Mon Sep 17 00:00:00 2001 From: Andrey Smirnov Date: Mon, 25 Mar 2019 23:32:09 -0700 Subject: [PATCH 396/468] gpio: of: Check for "spi-cs-high" in child instead of parent node "spi-cs-high" is going to be specified in child node of an SPI controller's representing attached SPI device, so change the code to look for it there, instead of checking parent node. Signed-off-by: Andrey Smirnov Cc: Linus Walleij Cc: Bartosz Golaszewski Cc: Chris Healy Cc: linux-gpio@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Linus Walleij --- drivers/gpio/gpiolib-of.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index ee7f08386a72..0220dd6d64ed 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -143,16 +143,16 @@ static void of_gpio_flags_quirks(struct device_node *np, * conflict and the "spi-cs-high" flag will * take precedence. */ - if (of_property_read_bool(np, "spi-cs-high")) { + if (of_property_read_bool(child, "spi-cs-high")) { if (*flags & OF_GPIO_ACTIVE_LOW) { pr_warn("%s GPIO handle specifies active low - ignored\n", - of_node_full_name(np)); + of_node_full_name(child)); *flags &= ~OF_GPIO_ACTIVE_LOW; } } else { if (!(*flags & OF_GPIO_ACTIVE_LOW)) pr_info("%s enforce active low on chipselect handle\n", - of_node_full_name(np)); + of_node_full_name(child)); *flags |= OF_GPIO_ACTIVE_LOW; } break; From 552c69b1dc714854a5f4e27d37a43c6d797adf7d Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 7 Mar 2019 15:27:43 -0800 Subject: [PATCH 397/468] KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT Explicitly zero out quadrant and invalid instead of inheriting them from the root_mmu. Functionally, this patch is a nop as we (should) never set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only allowed if EPT is used for L1, and the root_mmu will never be invalid at this point. Explicitly setting flags sets the stage for repurposing the legacy paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which point 'smm' would be the only flag to be inherited from root_mmu. Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7837ab001d80..01bb090aaa5c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4918,11 +4918,15 @@ static union kvm_mmu_role kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, bool execonly) { - union kvm_mmu_role role; + union kvm_mmu_role role = {0}; + union kvm_mmu_page_role root_base = vcpu->arch.root_mmu.mmu_role.base; - /* Base role is inherited from root_mmu */ - role.base.word = vcpu->arch.root_mmu.mmu_role.base.word; - role.ext = kvm_calc_mmu_role_ext(vcpu); + /* Legacy paging and SMM flags are inherited from root_mmu */ + role.base.smm = root_base.smm; + role.base.nxe = root_base.nxe; + role.base.cr0_wp = root_base.cr0_wp; + role.base.smep_andnot_wp = root_base.smep_andnot_wp; + role.base.smap_andnot_wp = root_base.smap_andnot_wp; role.base.level = PT64_ROOT_4LEVEL; role.base.direct = false; @@ -4930,6 +4934,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, role.base.guest_mode = true; role.base.access = ACC_ALL; + role.ext = kvm_calc_mmu_role_ext(vcpu); role.ext.execonly = execonly; return role; From 47c42e6b4192a2ac8b6c9858ebcf400a9eff7a10 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 7 Mar 2019 15:27:44 -0800 Subject: [PATCH 398/468] KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by: Vitaly Kuznetsov Tested-by: Vitaly Kuznetsov Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/mmu.txt | 11 +++++---- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu.c | 38 +++++++++++++++++++------------ arch/x86/kvm/mmutrace.h | 4 ++-- 4 files changed, 35 insertions(+), 22 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index f365102c80f5..2efe0efc516e 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -142,7 +142,7 @@ Shadow pages contain the following information: If clear, this page corresponds to a guest page table denoted by the gfn field. role.quadrant: - When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit + When role.gpte_is_8_bytes=0, the guest uses 32-bit gptes while the host uses 64-bit sptes. That means a guest page table contains more ptes than the host, so multiple shadow pages are needed to shadow one guest page. For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the @@ -158,9 +158,9 @@ Shadow pages contain the following information: The page is invalid and should not be used. It is a root page that is currently pinned (by a cpu hardware register pointing to it); once it is unpinned it will be destroyed. - role.cr4_pae: - Contains the value of cr4.pae for which the page is valid (e.g. whether - 32-bit or 64-bit gptes are in use). + role.gpte_is_8_bytes: + Reflects the size of the guest PTE for which the page is valid, i.e. '1' + if 64-bit gptes are in use, '0' if 32-bit gptes are in use. role.nxe: Contains the value of efer.nxe for which the page is valid. role.cr0_wp: @@ -173,6 +173,9 @@ Shadow pages contain the following information: Contains the value of cr4.smap && !cr0.wp for which the page is valid (pages for which this is true are different from other pages; see the treatment of cr0.wp=0 below). + role.ept_sp: + This is a virtual flag to denote a shadowed nested EPT page. ept_sp + is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination. role.smm: Is 1 if the page is valid in system management mode. This field determines which of the kvm_memslots array was used to build this diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a5db4475e72d..88f5192ce05e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -253,14 +253,14 @@ struct kvm_mmu_memory_cache { * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used * by indirect shadow page can not be more than 15 bits. * - * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access, + * Currently, we used 14 bits that are @level, @gpte_is_8_bytes, @quadrant, @access, * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp. */ union kvm_mmu_page_role { u32 word; struct { unsigned level:4; - unsigned cr4_pae:1; + unsigned gpte_is_8_bytes:1; unsigned quadrant:2; unsigned direct:1; unsigned access:3; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 01bb090aaa5c..776a58b00682 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -182,7 +182,7 @@ struct kvm_shadow_walk_iterator { static const union kvm_mmu_page_role mmu_base_role_mask = { .cr0_wp = 1, - .cr4_pae = 1, + .gpte_is_8_bytes = 1, .nxe = 1, .smep_andnot_wp = 1, .smap_andnot_wp = 1, @@ -2205,6 +2205,7 @@ static bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, static void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list); + #define for_each_valid_sp(_kvm, _sp, _gfn) \ hlist_for_each_entry(_sp, \ &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \ @@ -2215,12 +2216,17 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, for_each_valid_sp(_kvm, _sp, _gfn) \ if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else +static inline bool is_ept_sp(struct kvm_mmu_page *sp) +{ + return sp->role.cr0_wp && sp->role.smap_andnot_wp; +} + /* @sp->gfn should be write-protected at the call site */ static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct list_head *invalid_list) { - if (sp->role.cr4_pae != !!is_pae(vcpu) - || vcpu->arch.mmu->sync_page(vcpu, sp) == 0) { + if ((!is_ept_sp(sp) && sp->role.gpte_is_8_bytes != !!is_pae(vcpu)) || + vcpu->arch.mmu->sync_page(vcpu, sp) == 0) { kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); return false; } @@ -2423,7 +2429,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, role.level = level; role.direct = direct; if (role.direct) - role.cr4_pae = 0; + role.gpte_is_8_bytes = true; role.access = access; if (!vcpu->arch.mmu->direct_map && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { @@ -4794,7 +4800,6 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, role.base.access = ACC_ALL; role.base.nxe = !!is_nx(vcpu); - role.base.cr4_pae = !!is_pae(vcpu); role.base.cr0_wp = is_write_protection(vcpu); role.base.smm = is_smm(vcpu); role.base.guest_mode = is_guest_mode(vcpu); @@ -4815,6 +4820,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) role.base.ad_disabled = (shadow_accessed_mask == 0); role.base.level = kvm_x86_ops->get_tdp_level(vcpu); role.base.direct = true; + role.base.gpte_is_8_bytes = true; return role; } @@ -4879,6 +4885,7 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) role.base.smap_andnot_wp = role.ext.cr4_smap && !is_write_protection(vcpu); role.base.direct = !is_paging(vcpu); + role.base.gpte_is_8_bytes = !!is_pae(vcpu); if (!is_long_mode(vcpu)) role.base.level = PT32E_ROOT_LEVEL; @@ -4919,21 +4926,24 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, bool execonly) { union kvm_mmu_role role = {0}; - union kvm_mmu_page_role root_base = vcpu->arch.root_mmu.mmu_role.base; - /* Legacy paging and SMM flags are inherited from root_mmu */ - role.base.smm = root_base.smm; - role.base.nxe = root_base.nxe; - role.base.cr0_wp = root_base.cr0_wp; - role.base.smep_andnot_wp = root_base.smep_andnot_wp; - role.base.smap_andnot_wp = root_base.smap_andnot_wp; + /* SMM flag is inherited from root_mmu */ + role.base.smm = vcpu->arch.root_mmu.mmu_role.base.smm; role.base.level = PT64_ROOT_4LEVEL; + role.base.gpte_is_8_bytes = true; role.base.direct = false; role.base.ad_disabled = !accessed_dirty; role.base.guest_mode = true; role.base.access = ACC_ALL; + /* + * WP=1 and NOT_WP=1 is an impossible combination, use WP and the + * SMAP variation to denote shadow EPT entries. + */ + role.base.cr0_wp = true; + role.base.smap_andnot_wp = true; + role.ext = kvm_calc_mmu_role_ext(vcpu); role.ext.execonly = execonly; @@ -5184,7 +5194,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, gpa, bytes, sp->role.word); offset = offset_in_page(gpa); - pte_size = sp->role.cr4_pae ? 8 : 4; + pte_size = sp->role.gpte_is_8_bytes ? 8 : 4; /* * Sometimes, the OS only writes the last one bytes to update status @@ -5208,7 +5218,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) page_offset = offset_in_page(gpa); level = sp->role.level; *nspte = 1; - if (!sp->role.cr4_pae) { + if (!sp->role.gpte_is_8_bytes) { page_offset <<= 1; /* 32->64 */ /* * A 32-bit pde maps 4MB while the shadow pdes map diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 9f6c855a0043..dd30dccd2ad5 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -29,10 +29,10 @@ \ role.word = __entry->role; \ \ - trace_seq_printf(p, "sp gfn %llx l%u%s q%u%s %s%s" \ + trace_seq_printf(p, "sp gfn %llx l%u %u-byte q%u%s %s%s" \ " %snxe %sad root %u %s%c", \ __entry->gfn, role.level, \ - role.cr4_pae ? " pae" : "", \ + role.gpte_is_8_bytes ? 8 : 4, \ role.quadrant, \ role.direct ? " direct" : "", \ access_str[role.access], \ From 5e124900c6ebf5dfbe31b7b67073a64b2da14967 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:38 -0800 Subject: [PATCH 399/468] KVM: doc: Fix incorrect word ordering regarding supported use of APIs Per Paolo[1], instantiating multiple VMs in a single process is legal; but this conflicts with KVM's API documentation, which states: The only supported use is one virtual machine per process, and one vcpu per thread. However, an earlier section in the documentation states: Only run VM ioctls from the same process (address space) that was used to create the VM. and: Only run vcpu ioctls from the same thread that was used to create the vcpu. This suggests that the conflicting documentation is simply an incorrect ordering of of words, i.e. what's really meant is that a virtual machine can't be shared across multiple processes and a vCPU can't be shared across multiple threads. Tweak the blurb on issuing ioctls to use a more assertive tone, and rewrite the "supported use" sentence to reference said blurb instead of poorly restating it in different terms. Opportunistically add missing punctuation. [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com Fixes: 9c1b96e34717 ("KVM: Document basic API") Signed-off-by: Sean Christopherson [Improve notes on asynchronous ioctl] Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7de9eee73fcd..158805532083 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -5,24 +5,26 @@ The Definitive KVM (Kernel-based Virtual Machine) API Documentation ---------------------- The kvm API is a set of ioctls that are issued to control various aspects -of a virtual machine. The ioctls belong to three classes +of a virtual machine. The ioctls belong to three classes: - System ioctls: These query and set global attributes which affect the whole kvm subsystem. In addition a system ioctl is used to create - virtual machines + virtual machines. - VM ioctls: These query and set attributes that affect an entire virtual machine, for example memory layout. In addition a VM ioctl is used to create virtual cpus (vcpus). - Only run VM ioctls from the same process (address space) that was used - to create the VM. + VM ioctls must be issued from the same process (address space) that was + used to create the VM. - vcpu ioctls: These query and set attributes that control the operation of a single virtual cpu. - Only run vcpu ioctls from the same thread that was used to create the - vcpu. + vcpu ioctls should be issued from the same thread that was used to create + the vcpu, except for asynchronous vcpu ioctl that are marked as such in + the documentation. Otherwise, the first ioctl after switching threads + could see a performance impact. 2. File descriptors @@ -41,9 +43,8 @@ In general file descriptors can be migrated among processes by means of fork() and the SCM_RIGHTS facility of unix domain socket. These kinds of tricks are explicitly not supported by kvm. While they will not cause harm to the host, their actual behavior is not guaranteed by -the API. The only supported use is one virtual machine per process, -and one vcpu per thread. - +the API. See "General description" for details on the ioctl usage +model that is supported by KVM. It is important to note that althought VM ioctls may only be issued from the process that created the VM, a VM's lifecycle is associated with its @@ -515,11 +516,15 @@ c) KVM_INTERRUPT_SET_LEVEL Note that any value for 'irq' other than the ones stated above is invalid and incurs unexpected behavior. +This is an asynchronous vcpu ioctl and can be invoked from any thread. + MIPS: Queues an external interrupt to be injected into the virtual CPU. A negative interrupt number dequeues the interrupt. +This is an asynchronous vcpu ioctl and can be invoked from any thread. + 4.17 KVM_DEBUG_GUEST @@ -2493,7 +2498,7 @@ KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm, machine checks needing further payload are not supported by this ioctl) -Note that the vcpu ioctl is asynchronous to vcpu execution. +This is an asynchronous vcpu ioctl and can be invoked from any thread. 4.78 KVM_PPC_GET_HTAB_FD @@ -3042,8 +3047,7 @@ KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall KVM_S390_MCHK - machine check interrupt; parameters in .mchk - -Note that the vcpu ioctl is asynchronous to vcpu execution. +This is an asynchronous vcpu ioctl and can be invoked from any thread. 4.94 KVM_S390_GET_IRQ_STATE From ddba91801aeb5c160b660caed1800eb3aef403f8 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:39 -0800 Subject: [PATCH 400/468] KVM: Reject device ioctls from processes other than the VM's creator KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 16 +++++++++++----- virt/kvm/kvm_main.c | 3 +++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 158805532083..88ecbc7645db 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -13,7 +13,7 @@ of a virtual machine. The ioctls belong to three classes: - VM ioctls: These query and set attributes that affect an entire virtual machine, for example memory layout. In addition a VM ioctl is used to - create virtual cpus (vcpus). + create virtual cpus (vcpus) and devices. VM ioctls must be issued from the same process (address space) that was used to create the VM. @@ -26,6 +26,11 @@ of a virtual machine. The ioctls belong to three classes: the documentation. Otherwise, the first ioctl after switching threads could see a performance impact. + - device ioctls: These query and set attributes that control the operation + of a single device. + + device ioctls must be issued from the same process (address space) that + was used to create the VM. 2. File descriptors ------------------- @@ -34,10 +39,11 @@ The kvm API is centered around file descriptors. An initial open("/dev/kvm") obtains a handle to the kvm subsystem; this handle can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this handle will create a VM file descriptor which can be used to issue VM -ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu -and return a file descriptor pointing to it. Finally, ioctls on a vcpu -fd can be used to control the vcpu, including the important task of -actually running guest code. +ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will +create a virtual cpu or device and return a file descriptor pointing to +the new resource. Finally, ioctls on a vcpu or device fd can be used +to control the vcpu or device. For vcpus, this includes the important +task of actually running guest code. In general file descriptors can be migrated among processes by means of fork() and the SCM_RIGHTS facility of unix domain socket. These diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f25aa98a94df..55fe8e20d8fd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2905,6 +2905,9 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl, { struct kvm_device *dev = filp->private_data; + if (dev->kvm->mm != current->mm) + return -EIO; + switch (ioctl) { case KVM_SET_DEVICE_ATTR: return kvm_device_ioctl_attr(dev, dev->ops->set_attr, arg); From 05d5a48635259e621ea26d01e8316c6feeb34190 Mon Sep 17 00:00:00 2001 From: "Singh, Brijesh" Date: Fri, 15 Feb 2019 17:24:12 +0000 Subject: [PATCH 401/468] KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: Venkatesh Srinivas Cc: Jim Mattson Cc: Tom Lendacky Cc: Borislav Petkov Cc: Joerg Roedel Cc: "Radim Krčmář" Cc: Paolo Bonzini Signed-off-by: Brijesh Singh Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.c | 8 +++++--- arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 6 ++++++ 4 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 88f5192ce05e..5b03006c00be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1192,6 +1192,8 @@ struct kvm_x86_ops { int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu); + + bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 776a58b00682..f6d760dcdb75 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5408,10 +5408,12 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, * This can happen if a guest gets a page-fault on data access but the HW * table walker is not able to read the instruction page (e.g instruction * page is not present in memory). In those cases we simply restart the - * guest. + * guest, with the exception of AMD Erratum 1096 which is unrecoverable. */ - if (unlikely(insn && !insn_len)) - return 1; + if (unlikely(insn && !insn_len)) { + if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu)) + return 1; + } er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b5b128a0a051..426039285fd1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7098,6 +7098,36 @@ static int nested_enable_evmcs(struct kvm_vcpu *vcpu, return -ENODEV; } +static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu) +{ + bool is_user, smap; + + is_user = svm_get_cpl(vcpu) == 3; + smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); + + /* + * Detect and workaround Errata 1096 Fam_17h_00_0Fh + * + * In non SEV guest, hypervisor will be able to read the guest + * memory to decode the instruction pointer when insn_len is zero + * so we return true to indicate that decoding is possible. + * + * But in the SEV guest, the guest memory is encrypted with the + * guest specific key and hypervisor will not be able to decode the + * instruction pointer so we will not able to workaround it. Lets + * print the error and request to kill the guest. + */ + if (is_user && smap) { + if (!sev_guest(vcpu->kvm)) + return true; + + pr_err_ratelimited("KVM: Guest triggered AMD Erratum 1096\n"); + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + } + + return false; +} + static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -7231,6 +7261,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .nested_enable_evmcs = nested_enable_evmcs, .nested_get_evmcs_version = nested_get_evmcs_version, + + .need_emulation_on_page_fault = svm_need_emulation_on_page_fault, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c73375e01ab8..6aa84e09217b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7409,6 +7409,11 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) return 0; } +static bool vmx_need_emulation_on_page_fault(struct kvm_vcpu *vcpu) +{ + return 0; +} + static __init int hardware_setup(void) { unsigned long host_bndcfgs; @@ -7711,6 +7716,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .set_nested_state = NULL, .get_vmcs12_pages = NULL, .nested_enable_evmcs = NULL, + .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault, }; static void vmx_cleanup_l1d_flush(void) From 711eff3a8fa1d6193139a895524240912011b4dc Mon Sep 17 00:00:00 2001 From: Krish Sadhukhan Date: Thu, 7 Feb 2019 14:05:30 -0500 Subject: [PATCH 402/468] kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check is performed on vmentry of L2 guests: On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field must each contain a canonical address. Signed-off-by: Krish Sadhukhan Reviewed-by: Mihai Carabas Reviewed-by: Jim Mattson Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/nested.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index f24a2c225070..153e539c29c9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2585,6 +2585,11 @@ static int nested_check_host_control_regs(struct kvm_vcpu *vcpu, !nested_host_cr4_valid(vcpu, vmcs12->host_cr4) || !nested_cr3_valid(vcpu, vmcs12->host_cr3)) return -EINVAL; + + if (is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu) || + is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)) + return -EINVAL; + /* * If the load IA32_EFER VM-exit control is 1, bits reserved in the * IA32_EFER MSR must be 0 in the field for that register. In addition, From 4d66623cfba0949b2f0d669bd2ae732124c99ded Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Thu, 27 Sep 2018 08:31:26 +0800 Subject: [PATCH 403/468] KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() * nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is non-zero. * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages() never return zero. Based on these two reasons, we can merge the two *if* clause and use the return value from kvm_mmu_calculate_mmu_pages() directly. This simplify the code and also eliminate the possibility for reader to believe nr_mmu_pages would be zero. Signed-off-by: Wei Yang Reviewed-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu.c | 2 +- arch/x86/kvm/x86.c | 8 ++------ 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5b03006c00be..679168931c40 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1254,7 +1254,7 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen); -unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); +unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f6d760dcdb75..5a9981465fbb 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -6028,7 +6028,7 @@ int kvm_mmu_module_init(void) /* * Calculate mmu pages needed for kvm. */ -unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm) +unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm) { unsigned int nr_mmu_pages; unsigned int nr_pages = 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 65e4559eef2f..491e92383da8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9429,13 +9429,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *new, enum kvm_mr_change change) { - int nr_mmu_pages = 0; - if (!kvm->arch.n_requested_mmu_pages) - nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); - - if (nr_mmu_pages) - kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); + kvm_mmu_change_mmu_pages(kvm, + kvm_mmu_calculate_default_mmu_pages(kvm)); /* * Dirty logging tracks sptes in 4k granularity, meaning that large From 3d9683cf3bfb6d4e4605a153958dfca7e18b52f2 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 18 Mar 2019 18:08:12 +0900 Subject: [PATCH 404/468] KVM: export and iif KVM is supported I do not see any consistency about headers_install of and . According to my analysis of Linux 5.1-rc1, there are 3 groups: [1] Both and are exported alpha, arm, hexagon, mips, powerpc, s390, sparc, x86 [2] is exported, but is not arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc, parisc, sh, unicore32, xtensa [3] Neither nor is exported csky, nds32, riscv This does not match to the actual KVM support. At least, [2] is half-baked. Nor do arch maintainers look like they care about this. For example, commit 0add53713b1c ("microblaze: Add missing kvm_para.h to Kbuild") exported to user-space in order to fix an in-kernel build error. We have two ways to make this consistent: [A] export both and for all architectures, irrespective of the KVM support [B] Match the header export of and to the KVM support My first attempt was [A] because the code looks cleaner, but Paolo suggested [B]. So, this commit goes with [B]. For most architectures, was moved to the kernel-space. I changed include/uapi/linux/Kbuild so that it checks generated asm/kvm_para.h as well as check-in ones. After this commit, there will be two groups: [1] Both and are exported arm, arm64, mips, powerpc, s390, x86 [2] Neither nor is exported alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze, nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa Signed-off-by: Masahiro Yamada Acked-by: Cornelia Huck Signed-off-by: Paolo Bonzini --- arch/alpha/include/asm/Kbuild | 1 + arch/alpha/include/uapi/asm/kvm_para.h | 2 -- arch/arc/include/asm/Kbuild | 1 + arch/arc/include/uapi/asm/Kbuild | 1 - arch/arm/include/uapi/asm/Kbuild | 1 + arch/arm/include/uapi/asm/kvm_para.h | 2 -- arch/c6x/include/asm/Kbuild | 1 + arch/c6x/include/uapi/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild | 1 + arch/h8300/include/uapi/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 + arch/hexagon/include/uapi/asm/kvm_para.h | 2 -- arch/ia64/include/asm/Kbuild | 1 + arch/ia64/include/uapi/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 + arch/m68k/include/uapi/asm/Kbuild | 1 - arch/microblaze/include/asm/Kbuild | 1 + arch/microblaze/include/uapi/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild | 1 + arch/nios2/include/uapi/asm/Kbuild | 1 - arch/openrisc/include/asm/Kbuild | 1 + arch/openrisc/include/uapi/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 + arch/parisc/include/uapi/asm/Kbuild | 1 - arch/sh/include/asm/Kbuild | 1 + arch/sh/include/uapi/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild | 1 + arch/sparc/include/uapi/asm/kvm_para.h | 2 -- arch/unicore32/include/asm/Kbuild | 1 + arch/unicore32/include/uapi/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 + arch/xtensa/include/uapi/asm/Kbuild | 1 - include/uapi/linux/Kbuild | 2 ++ 33 files changed, 18 insertions(+), 20 deletions(-) delete mode 100644 arch/alpha/include/uapi/asm/kvm_para.h delete mode 100644 arch/arm/include/uapi/asm/kvm_para.h delete mode 100644 arch/hexagon/include/uapi/asm/kvm_para.h delete mode 100644 arch/sparc/include/uapi/asm/kvm_para.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index dc0ab28baca1..70b783333965 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,6 +6,7 @@ generic-y += exec.h generic-y += export.h generic-y += fb.h generic-y += irq_work.h +generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h diff --git a/arch/alpha/include/uapi/asm/kvm_para.h b/arch/alpha/include/uapi/asm/kvm_para.h deleted file mode 100644 index baacc4996d18..000000000000 --- a/arch/alpha/include/uapi/asm/kvm_para.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#include diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index b41f8881ecc8..decc306a3b52 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -11,6 +11,7 @@ generic-y += hardirq.h generic-y += hw_irq.h generic-y += irq_regs.h generic-y += irq_work.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/arc/include/uapi/asm/Kbuild b/arch/arc/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/arc/include/uapi/asm/Kbuild +++ b/arch/arc/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/arm/include/uapi/asm/Kbuild b/arch/arm/include/uapi/asm/Kbuild index 23b4464c0995..ce8573157774 100644 --- a/arch/arm/include/uapi/asm/Kbuild +++ b/arch/arm/include/uapi/asm/Kbuild @@ -3,3 +3,4 @@ generated-y += unistd-common.h generated-y += unistd-oabi.h generated-y += unistd-eabi.h +generic-y += kvm_para.h diff --git a/arch/arm/include/uapi/asm/kvm_para.h b/arch/arm/include/uapi/asm/kvm_para.h deleted file mode 100644 index baacc4996d18..000000000000 --- a/arch/arm/include/uapi/asm/kvm_para.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#include diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild index 63b4a1705182..249c9f6f26dc 100644 --- a/arch/c6x/include/asm/Kbuild +++ b/arch/c6x/include/asm/Kbuild @@ -19,6 +19,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h diff --git a/arch/c6x/include/uapi/asm/Kbuild b/arch/c6x/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/c6x/include/uapi/asm/Kbuild +++ b/arch/c6x/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild index 3e7c8ecf151e..e3dead402e5f 100644 --- a/arch/h8300/include/asm/Kbuild +++ b/arch/h8300/include/asm/Kbuild @@ -23,6 +23,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += linkage.h generic-y += local.h generic-y += local64.h diff --git a/arch/h8300/include/uapi/asm/Kbuild b/arch/h8300/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/h8300/include/uapi/asm/Kbuild +++ b/arch/h8300/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index b25fd42aa0f4..d046e8ccdf78 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -19,6 +19,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/hexagon/include/uapi/asm/kvm_para.h b/arch/hexagon/include/uapi/asm/kvm_para.h deleted file mode 100644 index baacc4996d18..000000000000 --- a/arch/hexagon/include/uapi/asm/kvm_para.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#include diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index 43e21fe3499c..11f191689c9e 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -2,6 +2,7 @@ generated-y += syscall_table.h generic-y += compat.h generic-y += exec.h generic-y += irq_work.h +generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h diff --git a/arch/ia64/include/uapi/asm/Kbuild b/arch/ia64/include/uapi/asm/Kbuild index 20018cb883a9..62a9522af51e 100644 --- a/arch/ia64/include/uapi/asm/Kbuild +++ b/arch/ia64/include/uapi/asm/Kbuild @@ -1,2 +1 @@ generated-y += unistd_64.h -generic-y += kvm_para.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 95f8f631c4df..2c359d9e80f6 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -13,6 +13,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/m68k/include/uapi/asm/Kbuild b/arch/m68k/include/uapi/asm/Kbuild index 8a7ad40be463..7417847dc438 100644 --- a/arch/m68k/include/uapi/asm/Kbuild +++ b/arch/m68k/include/uapi/asm/Kbuild @@ -1,2 +1 @@ generated-y += unistd_32.h -generic-y += kvm_para.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index 791cc8d54d0a..1a8285c3f693 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -17,6 +17,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += linkage.h generic-y += local.h generic-y += local64.h diff --git a/arch/microblaze/include/uapi/asm/Kbuild b/arch/microblaze/include/uapi/asm/Kbuild index 3ce84fbb2678..13f59631c576 100644 --- a/arch/microblaze/include/uapi/asm/Kbuild +++ b/arch/microblaze/include/uapi/asm/Kbuild @@ -1,3 +1,2 @@ generated-y += unistd_32.h -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 8fde4fa2c34f..88a667d12aaa 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -23,6 +23,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h diff --git a/arch/nios2/include/uapi/asm/Kbuild b/arch/nios2/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/nios2/include/uapi/asm/Kbuild +++ b/arch/nios2/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index 5a73e2956ac4..22aa97136c01 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -20,6 +20,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h diff --git a/arch/openrisc/include/uapi/asm/Kbuild b/arch/openrisc/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/openrisc/include/uapi/asm/Kbuild +++ b/arch/openrisc/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index 6f49e77d82a2..9bcd0c903dbb 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -11,6 +11,7 @@ generic-y += irq_regs.h generic-y += irq_work.h generic-y += kdebug.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/parisc/include/uapi/asm/Kbuild b/arch/parisc/include/uapi/asm/Kbuild index 22fdbd08cdc8..2bd5b392277c 100644 --- a/arch/parisc/include/uapi/asm/Kbuild +++ b/arch/parisc/include/uapi/asm/Kbuild @@ -1,3 +1,2 @@ generated-y += unistd_32.h generated-y += unistd_64.h -generic-y += kvm_para.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index a6ef3fee5f85..7bf2cb680d32 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -9,6 +9,7 @@ generic-y += emergency-restart.h generic-y += exec.h generic-y += irq_regs.h generic-y += irq_work.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/sh/include/uapi/asm/Kbuild b/arch/sh/include/uapi/asm/Kbuild index ecfbd40924dd..b8812c74c1de 100644 --- a/arch/sh/include/uapi/asm/Kbuild +++ b/arch/sh/include/uapi/asm/Kbuild @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 generated-y += unistd_32.h -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index b82f64e28f55..a22cfd5c0ee8 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -9,6 +9,7 @@ generic-y += exec.h generic-y += export.h generic-y += irq_regs.h generic-y += irq_work.h +generic-y += kvm_para.h generic-y += linkage.h generic-y += local.h generic-y += local64.h diff --git a/arch/sparc/include/uapi/asm/kvm_para.h b/arch/sparc/include/uapi/asm/kvm_para.h deleted file mode 100644 index baacc4996d18..000000000000 --- a/arch/sparc/include/uapi/asm/kvm_para.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#include diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild index 1d1544b6ca74..d77d953c04c1 100644 --- a/arch/unicore32/include/asm/Kbuild +++ b/arch/unicore32/include/asm/Kbuild @@ -18,6 +18,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h diff --git a/arch/unicore32/include/uapi/asm/Kbuild b/arch/unicore32/include/uapi/asm/Kbuild index 755bb11323d8..1c72f04ff75d 100644 --- a/arch/unicore32/include/uapi/asm/Kbuild +++ b/arch/unicore32/include/uapi/asm/Kbuild @@ -1,2 +1 @@ -generic-y += kvm_para.h generic-y += ucontext.h diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index 42b6cb3d16f7..3843198e03d4 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -15,6 +15,7 @@ generic-y += irq_work.h generic-y += kdebug.h generic-y += kmap_types.h generic-y += kprobes.h +generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h diff --git a/arch/xtensa/include/uapi/asm/Kbuild b/arch/xtensa/include/uapi/asm/Kbuild index 8a7ad40be463..7417847dc438 100644 --- a/arch/xtensa/include/uapi/asm/Kbuild +++ b/arch/xtensa/include/uapi/asm/Kbuild @@ -1,2 +1 @@ generated-y += unistd_32.h -generic-y += kvm_para.h diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index 5f24b50c9e88..059dc2bedaf6 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -7,5 +7,7 @@ no-export-headers += kvm.h endif ifeq ($(wildcard $(srctree)/arch/$(SRCARCH)/include/uapi/asm/kvm_para.h),) +ifeq ($(wildcard $(objtree)/arch/$(SRCARCH)/include/generated/uapi/asm/kvm_para.h),) no-export-headers += kvm_para.h endif +endif From f285c633cb6d68d2bf3a8ad65bee3835aac9886c Mon Sep 17 00:00:00 2001 From: Ben Gardon Date: Tue, 12 Mar 2019 11:45:59 -0700 Subject: [PATCH 405/468] kvm: mmu: Used range based flushing in slot_handle_level_range Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address in slot_handle_level_range. When range based flushes are not enabled kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs. This changes the behavior of many functions that indirectly use slot_handle_level_range, iff the range based flushes are enabled. The only potential problem I see with this is that kvm->tlbs_dirty will be cleared less often, however the only caller of slot_handle_level_range that checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which checks it and does a kvm_flush_remote_tlbs after calling kvm_unmap_hva_range anyway. Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and without this patch. The patch introduced no new failures. Signed-off-by: Ben Gardon Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5a9981465fbb..eee455a8a612 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5526,7 +5526,9 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { if (flush && lock_flush_tlb) { - kvm_flush_remote_tlbs(kvm); + kvm_flush_remote_tlbs_with_address(kvm, + start_gfn, + iterator.gfn - start_gfn + 1); flush = false; } cond_resched_lock(&kvm->mmu_lock); @@ -5534,7 +5536,8 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, } if (flush && lock_flush_tlb) { - kvm_flush_remote_tlbs(kvm); + kvm_flush_remote_tlbs_with_address(kvm, start_gfn, + end_gfn - start_gfn + 1); flush = false; } From ca0488aadd014809862428cde896a6a6e8f13e42 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 15 Mar 2019 18:58:15 +0100 Subject: [PATCH 406/468] kvm: don't redefine flags as something else MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The function irqfd_wakeup() has flags defined as __poll_t and then it has additional flags which is used for irqflags. Redefine the inner flags variable as iflags so it does not shadow the outer flags. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: kvm@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Paolo Bonzini --- virt/kvm/eventfd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 4325250afd72..001aeda4c154 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -214,9 +214,9 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) if (flags & EPOLLHUP) { /* The eventfd is closing, detach from KVM */ - unsigned long flags; + unsigned long iflags; - spin_lock_irqsave(&kvm->irqfds.lock, flags); + spin_lock_irqsave(&kvm->irqfds.lock, iflags); /* * We must check if someone deactivated the irqfd before @@ -230,7 +230,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) if (irqfd_is_active(irqfd)) irqfd_deactivate(irqfd); - spin_unlock_irqrestore(&kvm->irqfds.lock, flags); + spin_unlock_irqrestore(&kvm->irqfds.lock, iflags); } return 0; From 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 7 Mar 2019 15:43:02 -0800 Subject: [PATCH 407/468] KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li Cc: Jim Mattson Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/vmx.c | 13 ------------- arch/x86/kvm/vmx/vmx.h | 1 - arch/x86/kvm/x86.c | 12 ++++++++++++ 4 files changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 679168931c40..264814f26ce0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -568,6 +568,7 @@ struct kvm_vcpu_arch { bool tpr_access_reporting; u64 ia32_xss; u64 microcode_version; + u64 arch_capabilities; /* * Paging state of the vcpu diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6aa84e09217b..ab432a930ae8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1683,12 +1683,6 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = to_vmx(vcpu)->spec_ctrl; break; - case MSR_IA32_ARCH_CAPABILITIES: - if (!msr_info->host_initiated && - !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) - return 1; - msr_info->data = to_vmx(vcpu)->arch_capabilities; - break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -1895,11 +1889,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; - case MSR_IA32_ARCH_CAPABILITIES: - if (!msr_info->host_initiated) - return 1; - vmx->arch_capabilities = data; - break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -4088,8 +4077,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } - vmx->arch_capabilities = kvm_get_arch_capabilities(); - vm_exit_controls_init(vmx, vmx_vmexit_ctrl()); /* 22.2.1, 20.8.1 */ diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 1554cb45b393..a1e00d0a2482 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -190,7 +190,6 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif - u64 arch_capabilities; u64 spec_ctrl; u32 vm_entry_controls_shadow; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 491e92383da8..9fc378531ca7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2443,6 +2443,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (msr_info->host_initiated) vcpu->arch.microcode_version = data; break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vcpu->arch.arch_capabilities = data; + break; case MSR_EFER: return set_efer(vcpu, data); case MSR_K7_HWCR: @@ -2747,6 +2752,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_UCODE_REV: msr_info->data = vcpu->arch.microcode_version; break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = vcpu->arch.arch_capabilities; + break; case MSR_IA32_TSC: msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset; break; @@ -8733,6 +8744,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) { + vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; kvm_vcpu_mtrr_init(vcpu); vcpu_load(vcpu); From 2bdb76c015df7125783d8394d6339d181cb5bc30 Mon Sep 17 00:00:00 2001 From: Xiaoyao Li Date: Fri, 8 Mar 2019 15:57:20 +0800 Subject: [PATCH 408/468] kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if host doesn't suppot it. We should move it to array emulated_msrs from arry msrs_to_save, to report to userspace that guest support this msr. Signed-off-by: Xiaoyao Li Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9fc378531ca7..9f279bb145e5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1125,7 +1125,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, - MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES, + MSR_IA32_SPEC_CTRL, MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH, MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK, MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B, @@ -1158,6 +1158,7 @@ static u32 emulated_msrs[] = { MSR_IA32_TSC_ADJUST, MSR_IA32_TSCDEADLINE, + MSR_IA32_ARCH_CAPABILITIES, MSR_IA32_MISC_ENABLE, MSR_IA32_MCG_STATUS, MSR_IA32_MCG_CTL, From 013cc6ebbf41496ce4badedd71ea6d4a6d198c14 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 13 Mar 2019 18:13:42 +0100 Subject: [PATCH 409/468] x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init When userspace initializes guest vCPUs it may want to zero all supported MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/ HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") we began doing stimer_mark_pending() unconditionally on every config change. The issue I'm observing manifests itself as following: - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER; - kvm_hv_has_stimer_pending() starts returning true; - kvm_vcpu_has_events() starts returning true; - kvm_arch_vcpu_runnable() starts returning true; - when kvm_arch_vcpu_ioctl_run() gets into (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case: - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns immediately, avoiding normal wait path; - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing userspace to retry. So instead of normal wait path we get a busy loop on all secondary vCPUs before they get INIT signal. This seems to be undesirable, especially given that this happens even when Hyper-V extensions are not used. Generally, it seems to be pointless to mark an stimer as pending in stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing kvm_hv_process_stimers() will do is clear the corresponding bit. We may just not mark disabled timers as pending instead. Fixes: f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") Signed-off-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini --- arch/x86/kvm/hyperv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 27c43525a05f..421899f6ad7b 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -526,7 +526,9 @@ static int stimer_set_config(struct kvm_vcpu_hv_stimer *stimer, u64 config, new_config.enable = 0; stimer->config.as_uint64 = new_config.as_uint64; - stimer_mark_pending(stimer, false); + if (stimer->config.enable) + stimer_mark_pending(stimer, false); + return 0; } @@ -542,7 +544,10 @@ static int stimer_set_count(struct kvm_vcpu_hv_stimer *stimer, u64 count, stimer->config.enable = 0; else if (stimer->config.auto_enable) stimer->config.enable = 1; - stimer_mark_pending(stimer, false); + + if (stimer->config.enable) + stimer_mark_pending(stimer, false); + return 0; } From 45def77ebf79e2e8942b89ed79294d97ce914fa0 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 11 Mar 2019 20:01:05 -0700 Subject: [PATCH 410/468] KVM: x86: update %rip after emulating IO Most (all?) x86 platforms provide a port IO based reset mechanism, e.g. OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM that it is doing a reset, e.g. Qemu jams vCPU state and resumes running. To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM: Skip pio instruction when it is emulated, not executed") changed the behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the instruction prior to exiting to userspace. Full emulation doesn't need such tricks becase re-emulating the instruction will naturally handle %rip being changed to point at the reset vector. Updating %rip prior to executing to userspace has several drawbacks: - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation fails it will likely yell about the wrong address. - Single step exits to userspace for are effectively dropped as KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO. - Behavior of PIO emulation is different depending on whether it goes down the fast path or the slow path. Rather than skip the PIO instruction before exiting to userspace, snapshot the linear %rip and cancel PIO completion if the current value does not match the snapshot. For a 64-bit vCPU, i.e. the most common scenario, the snapshot and comparison has negligible overhead as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra VMREAD in this case. All other alternatives to snapshotting the linear %rip that don't rely on an explicit reset announcenment suffer from one corner case or another. For example, canceling PIO completion on any write to %rip fails if userspace does a save/restore of %rip, and attempting to avoid that issue by canceling PIO only if %rip changed then fails if PIO collides with the reset %rip. Attempting to zero in on the exact reset vector won't work for APs, which means adding more hooks such as the vCPU's MP_STATE, and so on and so forth. Checking for a linear %rip match technically suffers from corner cases, e.g. userspace could theoretically rewrite the underlying code page and expect a different instruction to execute, or the guest hardcodes a PIO reset at 0xfffffff0, but those are far, far outside of what can be considered normal operation. Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O") Cc: Reported-by: Jim Mattson Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 36 ++++++++++++++++++++++++--------- 2 files changed, 27 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 264814f26ce0..159b5988292f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -350,6 +350,7 @@ struct kvm_mmu_page { }; struct kvm_pio_request { + unsigned long linear_rip; unsigned long count; int in; int port; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9f279bb145e5..099b851dabaf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6535,14 +6535,27 @@ int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, } EXPORT_SYMBOL_GPL(kvm_emulate_instruction_from_buffer); +static int complete_fast_pio_out(struct kvm_vcpu *vcpu) +{ + vcpu->arch.pio.count = 0; + + if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.pio.linear_rip))) + return 1; + + return kvm_skip_emulated_instruction(vcpu); +} + static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) { unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt, size, port, &val, 1); - /* do not return to emulator after return from userspace */ - vcpu->arch.pio.count = 0; + + if (!ret) { + vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); + vcpu->arch.complete_userspace_io = complete_fast_pio_out; + } return ret; } @@ -6553,6 +6566,11 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu) /* We should only ever be called with arch.pio.count equal to 1 */ BUG_ON(vcpu->arch.pio.count != 1); + if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.pio.linear_rip))) { + vcpu->arch.pio.count = 0; + return 1; + } + /* For size less than 4 we merge, else we zero extend */ val = (vcpu->arch.pio.size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0; @@ -6565,7 +6583,7 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu) vcpu->arch.pio.port, &val, 1); kvm_register_write(vcpu, VCPU_REGS_RAX, val); - return 1; + return kvm_skip_emulated_instruction(vcpu); } static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, @@ -6584,6 +6602,7 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, return ret; } + vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); vcpu->arch.complete_userspace_io = complete_fast_pio_in; return 0; @@ -6591,16 +6610,13 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in) { - int ret = kvm_skip_emulated_instruction(vcpu); + int ret; - /* - * TODO: we might be squashing a KVM_GUESTDBG_SINGLESTEP-triggered - * KVM_EXIT_DEBUG here. - */ if (in) - return kvm_fast_pio_in(vcpu, size, port) && ret; + ret = kvm_fast_pio_in(vcpu, size, port); else - return kvm_fast_pio_out(vcpu, size, port) && ret; + ret = kvm_fast_pio_out(vcpu, size, port); + return ret && kvm_skip_emulated_instruction(vcpu); } EXPORT_SYMBOL_GPL(kvm_fast_pio); From 8df98ae0ab2ead9a02228756eec26f8d7b17f499 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 13 Mar 2019 13:19:26 -0700 Subject: [PATCH 411/468] KVM: selftests: assert on exit reason in CR4/cpuid sync test ...so that the test doesn't end up in an infinite loop if it fails for whatever reason, e.g. SHUTDOWN due to gcc inserting stack canary code into ucall() and attempting to derefence a null segment. Fixes: ca359066889f7 ("kvm: selftests: add cr4_cpuid_sync_test") Cc: Wei Huang Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- .../kvm/x86_64/cr4_cpuid_sync_test.c | 35 ++++++++++--------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c index d503a51fad30..7c2c4d4055a8 100644 --- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c +++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c @@ -87,22 +87,25 @@ int main(int argc, char *argv[]) while (1) { rc = _vcpu_run(vm, VCPU_ID); - if (run->exit_reason == KVM_EXIT_IO) { - switch (get_ucall(vm, VCPU_ID, &uc)) { - case UCALL_SYNC: - /* emulate hypervisor clearing CR4.OSXSAVE */ - vcpu_sregs_get(vm, VCPU_ID, &sregs); - sregs.cr4 &= ~X86_CR4_OSXSAVE; - vcpu_sregs_set(vm, VCPU_ID, &sregs); - break; - case UCALL_ABORT: - TEST_ASSERT(false, "Guest CR4 bit (OSXSAVE) unsynchronized with CPUID bit."); - break; - case UCALL_DONE: - goto done; - default: - TEST_ASSERT(false, "Unknown ucall 0x%x.", uc.cmd); - } + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Unexpected exit reason: %u (%s),\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + switch (get_ucall(vm, VCPU_ID, &uc)) { + case UCALL_SYNC: + /* emulate hypervisor clearing CR4.OSXSAVE */ + vcpu_sregs_get(vm, VCPU_ID, &sregs); + sregs.cr4 &= ~X86_CR4_OSXSAVE; + vcpu_sregs_set(vm, VCPU_ID, &sregs); + break; + case UCALL_ABORT: + TEST_ASSERT(false, "Guest CR4 bit (OSXSAVE) unsynchronized with CPUID bit."); + break; + case UCALL_DONE: + goto done; + default: + TEST_ASSERT(false, "Unknown ucall 0x%x.", uc.cmd); } } From 0a3f29b5a77d6c27796d7a7adabafd199dc066d5 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 13 Mar 2019 16:19:30 -0700 Subject: [PATCH 412/468] KVM: selftests: explicitly disable PIE for tests KVM selftests embed the guest "image" as a function in the test itself and extract the guest code at runtime by manually parsing the elf headers. The parsing is very simple and doesn't supporting fancy things like position independent executables. Recent versions of gcc enable pie by default, which results in triple fault shutdowns in the guest due to the virtual address in the headers not matching up with the virtual address retrieved from the function pointer. Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- tools/testing/selftests/kvm/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 3c1f4bdf9000..73d59c9d94a3 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -29,7 +29,7 @@ LIBKVM += $(LIBKVM_$(UNAME_M)) INSTALL_HDR_PATH = $(top_srcdir)/usr LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include -CFLAGS += -O2 -g -std=gnu99 -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude -I$( Date: Wed, 13 Mar 2019 12:43:14 -0700 Subject: [PATCH 413/468] KVM: selftests: disable stack protector for all KVM tests Since 4.8.3, gcc has enabled -fstack-protector by default. This is problematic for the KVM selftests as they do not configure fs or gs segments (the stack canary is pulled from fs:0x28). With the default behavior, gcc will insert a stack canary on any function that creates buffers of 8 bytes or more. As a result, ucall() will hit a triple fault shutdown due to reading a bad fs segment when inserting its stack canary, i.e. every test fails with an unexpected SHUTDOWN. Fixes: 14c47b7530e2d ("kvm: selftests: introduce ucall") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- tools/testing/selftests/kvm/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 73d59c9d94a3..7514fcea91a7 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -29,8 +29,8 @@ LIBKVM += $(LIBKVM_$(UNAME_M)) INSTALL_HDR_PATH = $(top_srcdir)/usr LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include -CFLAGS += -O2 -g -std=gnu99 -no-pie -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude -I$( Date: Wed, 13 Mar 2019 16:49:31 -0700 Subject: [PATCH 414/468] KVM: selftests: complete IO before migrating guest state Documentation/virtual/kvm/api.txt states: NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Because guest state may be inconsistent, starting state migration after an IO exit without first completing IO may result in test failures, e.g. a proposed change to KVM's handling of %rip in its fast PIO handling[1] will cause the new VM, i.e. the post-migration VM, to have its %rip set to the IN instruction that triggered KVM_EXIT_IO, leading to a test assertion due to a stage mismatch. For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip the test if it's not available. The addition of KVM_CAP_IMMEDIATE_EXIT predates the state selftest by more than a year. [1] https://patchwork.kernel.org/patch/10848545/ Fixes: fa3899add1056 ("kvm: selftests: add basic test for state save and restore") Reported-by: Jim Mattson Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- tools/testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 16 ++++++++++++++++ .../testing/selftests/kvm/x86_64/state_test.c | 18 ++++++++++++++++-- 3 files changed, 33 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index a84785b02557..07b71ad9734a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -102,6 +102,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva); struct kvm_run *vcpu_state(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); +void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_set_mp_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_mp_state *mp_state); void vcpu_regs_get(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_regs *regs); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b52cfdefecbf..efa0aad8b3c6 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1121,6 +1121,22 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid) return rc; } +void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid) +{ + struct vcpu *vcpu = vcpu_find(vm, vcpuid); + int ret; + + TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); + + vcpu->state->immediate_exit = 1; + ret = ioctl(vcpu->fd, KVM_RUN, NULL); + vcpu->state->immediate_exit = 0; + + TEST_ASSERT(ret == -1 && errno == EINTR, + "KVM_RUN IOCTL didn't exit immediately, rc: %i, errno: %i", + ret, errno); +} + /* * VM VCPU Set MP State * diff --git a/tools/testing/selftests/kvm/x86_64/state_test.c b/tools/testing/selftests/kvm/x86_64/state_test.c index 4b3f556265f1..30f75856cf39 100644 --- a/tools/testing/selftests/kvm/x86_64/state_test.c +++ b/tools/testing/selftests/kvm/x86_64/state_test.c @@ -134,6 +134,11 @@ int main(int argc, char *argv[]) struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); + if (!kvm_check_cap(KVM_CAP_IMMEDIATE_EXIT)) { + fprintf(stderr, "immediate_exit not available, skipping test\n"); + exit(KSFT_SKIP); + } + /* Create VM */ vm = vm_create_default(VCPU_ID, 0, guest_code); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); @@ -156,8 +161,6 @@ int main(int argc, char *argv[]) stage, run->exit_reason, exit_reason_str(run->exit_reason)); - memset(®s1, 0, sizeof(regs1)); - vcpu_regs_get(vm, VCPU_ID, ®s1); switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_ABORT: TEST_ASSERT(false, "%s at %s:%d", (const char *)uc.args[0], @@ -176,6 +179,17 @@ int main(int argc, char *argv[]) uc.args[1] == stage, "Unexpected register values vmexit #%lx, got %lx", stage, (ulong)uc.args[1]); + /* + * When KVM exits to userspace with KVM_EXIT_IO, KVM guarantees + * guest state is consistent only after userspace re-enters the + * kernel with KVM_RUN. Complete IO prior to migrating state + * to a new VM. + */ + vcpu_run_complete_io(vm, VCPU_ID); + + memset(®s1, 0, sizeof(regs1)); + vcpu_regs_get(vm, VCPU_ID, ®s1); + state = vcpu_save_state(vm, VCPU_ID); kvm_vm_release(vm); From 919f6cd8bb2fe7151f8aecebc3b3d1ca2567396e Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:40 -0800 Subject: [PATCH 415/468] KVM: doc: Document the life cycle of a VM and its resources The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/ Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 88ecbc7645db..f39969d0121e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -69,6 +69,23 @@ by and on behalf of the VM's process may not be freed/unaccounted when the VM is shut down. +It is important to note that althought VM ioctls may only be issued from +the process that created the VM, a VM's lifecycle is associated with its +file descriptor, not its creator (process). In other words, the VM and +its resources, *including the associated address space*, are not freed +until the last reference to the VM's file descriptor has been released. +For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will +not be freed until both the parent (original) process and its child have +put their references to the VM's file descriptor. + +Because a VM's resources are not freed until the last reference to its +file descriptor is released, creating additional references to a VM via +via fork(), dup(), etc... without careful consideration is strongly +discouraged and may have unwanted side effects, e.g. memory allocated +by and on behalf of the VM's process may not be freed/unaccounted when +the VM is shut down. + + 3. Extensions ------------- From e2788c4a41cb5fa68096f5a58bccacec1a700295 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 28 Mar 2019 17:22:31 +0100 Subject: [PATCH 416/468] Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION The documentation does not mention how to delete a slot, add the information. Reported-by: Nathaniel McCallum Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index f39969d0121e..67068c47c591 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1114,14 +1114,12 @@ struct kvm_userspace_memory_region { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) -This ioctl allows the user to create or modify a guest physical memory -slot. When changing an existing slot, it may be moved in the guest -physical memory space, or its flags may be modified. It may not be -resized. Slots may not overlap in guest physical address space. -Bits 0-15 of "slot" specifies the slot id and this value should be -less than the maximum number of user memory slots supported per VM. -The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, -if this capability is supported by the architecture. +This ioctl allows the user to create, modify or delete a guest physical +memory slot. Bits 0-15 of "slot" specify the slot id and this value +should be less than the maximum number of user memory slots supported per +VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, +if this capability is supported by the architecture. Slots may not +overlap in guest physical address space. If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" specifies the address space which is being modified. They must be @@ -1130,6 +1128,10 @@ KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces are unrelated; the restriction on overlapping slots only applies within each address space. +Deleting a slot is done by passing zero for memory_size. When changing +an existing slot, it may be moved in the guest physical memory space, +or its flags may be modified, but it may not be resized. + Memory for the region is taken starting at the address denoted by the field userspace_addr, which must point at user addressable memory for the entire memory slot size. Any object may back this memory, including From f7299d441a4da8a5088e651ea55023525a793a13 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 28 Mar 2019 14:13:47 +0100 Subject: [PATCH 417/468] gpio: of: Fix of_gpiochip_add() error path If the call to of_gpiochip_scan_gpios() in of_gpiochip_add() fails, no error handling is performed. This lead to the need of callers to call of_gpiochip_remove() on failure, which causes "BAD of_node_put() on ..." if the failure happened before the call to of_node_get(). Fix this by adding proper error handling. Note that calling gpiochip_remove_pin_ranges() multiple times causes no harm: subsequent calls are a no-op. Fixes: dfbd379ba9b7431e ("gpio: of: Return error if gpio hog configuration failed") Signed-off-by: Geert Uytterhoeven Reviewed-by: Mukesh Ojha Signed-off-by: Linus Walleij --- drivers/gpio/gpiolib-of.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index 0220dd6d64ed..6a3ec575a404 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -718,7 +718,13 @@ int of_gpiochip_add(struct gpio_chip *chip) of_node_get(chip->of_node); - return of_gpiochip_scan_gpios(chip); + status = of_gpiochip_scan_gpios(chip); + if (status) { + of_node_put(chip->of_node); + gpiochip_remove_pin_ranges(chip); + } + + return status; } void of_gpiochip_remove(struct gpio_chip *chip) From 1aa176ef5a451adc0546d5aaa3fb107975c786b7 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Wed, 27 Mar 2019 17:21:42 -0700 Subject: [PATCH 418/468] Yama: mark local symbols as static sparse complains that Yama defines functions and a variable as non-static even though they don't exist in any header. Fix it by making them static. Co-developed-by: Mukesh Ojha Signed-off-by: Mukesh Ojha Signed-off-by: Jann Horn [kees: merged similar static-ness fixes into a single patch] Link: https://lkml.kernel.org/r/20190326230841.87834-1-jannh@google.com Link: https://lkml.kernel.org/r/1553673018-19234-1-git-send-email-mojha@codeaurora.org Signed-off-by: Kees Cook Signed-off-by: James Morris --- security/yama/yama_lsm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c index 57cc60722dd3..efac68556b45 100644 --- a/security/yama/yama_lsm.c +++ b/security/yama/yama_lsm.c @@ -206,7 +206,7 @@ static void yama_ptracer_del(struct task_struct *tracer, * yama_task_free - check for task_pid to remove from exception list * @task: task being removed */ -void yama_task_free(struct task_struct *task) +static void yama_task_free(struct task_struct *task) { yama_ptracer_del(task, task); } @@ -222,7 +222,7 @@ void yama_task_free(struct task_struct *task) * Return 0 on success, -ve on error. -ENOSYS is returned when Yama * does not handle the given option. */ -int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3, +static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { int rc = -ENOSYS; @@ -401,7 +401,7 @@ static int yama_ptrace_access_check(struct task_struct *child, * * Returns 0 if following the ptrace is allowed, -ve on error. */ -int yama_ptrace_traceme(struct task_struct *parent) +static int yama_ptrace_traceme(struct task_struct *parent) { int rc = 0; @@ -452,7 +452,7 @@ static int yama_dointvec_minmax(struct ctl_table *table, int write, static int zero; static int max_scope = YAMA_SCOPE_NO_ATTACH; -struct ctl_path yama_sysctl_path[] = { +static struct ctl_path yama_sysctl_path[] = { { .procname = "kernel", }, { .procname = "yama", }, { } From ce9fb53c72834646f26ecb2213e40e6876048f87 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Thu, 28 Mar 2019 11:38:06 +0100 Subject: [PATCH 419/468] gpio: mockup: use simple_read_from_buffer() in debugfs read callback Calling read() for a single byte read will return 2 currently. Use simple_read_from_buffer() which correctly handles all sizes. Fixes: 2a9e27408e12 ("gpio: mockup: rework debugfs interface") Reviewed-by: Mukesh Ojha Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpio-mockup.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c index 74ba8b1d71d8..b6a4efce7c92 100644 --- a/drivers/gpio/gpio-mockup.c +++ b/drivers/gpio/gpio-mockup.c @@ -204,10 +204,9 @@ static ssize_t gpio_mockup_debugfs_read(struct file *file, struct gpio_mockup_chip *chip; struct seq_file *sfile; struct gpio_chip *gc; - int val, rv, cnt; + int val, cnt; char buf[3]; - if (*ppos != 0) return 0; @@ -219,12 +218,7 @@ static ssize_t gpio_mockup_debugfs_read(struct file *file, val = gpio_mockup_get(gc, priv->offset); cnt = snprintf(buf, sizeof(buf), "%d\n", val); - rv = copy_to_user(usr_buf, buf, cnt); - if (rv) - return rv; - - *ppos += cnt; - return cnt; + return simple_read_from_buffer(usr_buf, size, ppos, buf, cnt); } static ssize_t gpio_mockup_debugfs_write(struct file *file, From 988aef9e8b0dd46b55ad08b1522429739e26122d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Mar 2019 08:41:04 +0100 Subject: [PATCH 420/468] nvme-tcp: fix an endianess miss-annotation nvme_tcp_end_request just takes the status value and the converts it to little endian as well as shifting for the phase bit. Fixes: 43ce38a6d823 ("nvme-tcp: support C2HData with SUCCESS flag") Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg --- drivers/nvme/host/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index e7e08889865e..68c49dd67210 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -627,7 +627,7 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb, return ret; } -static inline void nvme_tcp_end_request(struct request *rq, __le16 status) +static inline void nvme_tcp_end_request(struct request *rq, u16 status) { union nvme_result res = {}; From cc2278c413c3a06a93c23ee8722e4dd3d621de12 Mon Sep 17 00:00:00 2001 From: Martin George Date: Wed, 27 Mar 2019 09:52:56 +0100 Subject: [PATCH 421/468] nvme-multipath: relax ANA state check When undergoing state transitions I/O might be requeued, hence we should always call nvme_mpath_set_live() to schedule requeue_work whenever the nvme device is live, independent on whether the old state was live or not. Signed-off-by: Martin George Signed-off-by: Gargi Srinivas Signed-off-by: Hannes Reinecke Signed-off-by: Christoph Hellwig --- drivers/nvme/host/multipath.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 2839bb70badf..f0716f6ce41f 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -404,15 +404,12 @@ static inline bool nvme_state_is_live(enum nvme_ana_state state) static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc, struct nvme_ns *ns) { - enum nvme_ana_state old; - mutex_lock(&ns->head->lock); - old = ns->ana_state; ns->ana_grpid = le32_to_cpu(desc->grpid); ns->ana_state = desc->state; clear_bit(NVME_NS_ANA_PENDING, &ns->flags); - if (nvme_state_is_live(ns->ana_state) && !nvme_state_is_live(old)) + if (nvme_state_is_live(ns->ana_state)) nvme_mpath_set_live(ns); mutex_unlock(&ns->head->lock); } From 02db99548d3608a625cf481cff2bb7b626829b3f Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Wed, 27 Mar 2019 17:07:22 +0800 Subject: [PATCH 422/468] nvmet: fix building bvec from sg list There are two mistakes for building bvec from sg list for file backed ns: - use request data length to compute number of io vector, this way doesn't consider sg->offset, and the result may be smaller than required io vectors - bvec->bv_len isn't capped by sg->length This patch fixes this issue by building bvec from sg directly, given the whole IO stack is ready for multi-page bvec. Reported-by: Yi Zhang Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver") Signed-off-by: Ming Lei Signed-off-by: Christoph Hellwig --- drivers/nvme/target/io-cmd-file.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c index 3e43212d3c1c..bc6ebb51b0bf 100644 --- a/drivers/nvme/target/io-cmd-file.c +++ b/drivers/nvme/target/io-cmd-file.c @@ -75,11 +75,11 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns) return ret; } -static void nvmet_file_init_bvec(struct bio_vec *bv, struct sg_page_iter *iter) +static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg) { - bv->bv_page = sg_page_iter_page(iter); - bv->bv_offset = iter->sg->offset; - bv->bv_len = PAGE_SIZE - iter->sg->offset; + bv->bv_page = sg_page(sg); + bv->bv_offset = sg->offset; + bv->bv_len = sg->length; } static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos, @@ -128,14 +128,14 @@ static void nvmet_file_io_done(struct kiocb *iocb, long ret, long ret2) static bool nvmet_file_execute_io(struct nvmet_req *req, int ki_flags) { - ssize_t nr_bvec = DIV_ROUND_UP(req->data_len, PAGE_SIZE); - struct sg_page_iter sg_pg_iter; + ssize_t nr_bvec = req->sg_cnt; unsigned long bv_cnt = 0; bool is_sync = false; size_t len = 0, total_len = 0; ssize_t ret = 0; loff_t pos; - + int i; + struct scatterlist *sg; if (req->f.mpool_alloc && nr_bvec > NVMET_MAX_MPOOL_BVEC) is_sync = true; @@ -147,8 +147,8 @@ static bool nvmet_file_execute_io(struct nvmet_req *req, int ki_flags) } memset(&req->f.iocb, 0, sizeof(struct kiocb)); - for_each_sg_page(req->sg, &sg_pg_iter, req->sg_cnt, 0) { - nvmet_file_init_bvec(&req->f.bvec[bv_cnt], &sg_pg_iter); + for_each_sg(req->sg, sg, req->sg_cnt, i) { + nvmet_file_init_bvec(&req->f.bvec[bv_cnt], sg); len += req->f.bvec[bv_cnt].bv_len; total_len += req->f.bvec[bv_cnt].bv_len; bv_cnt++; @@ -225,7 +225,7 @@ static void nvmet_file_submit_buffered_io(struct nvmet_req *req) static void nvmet_file_execute_rw(struct nvmet_req *req) { - ssize_t nr_bvec = DIV_ROUND_UP(req->data_len, PAGE_SIZE); + ssize_t nr_bvec = req->sg_cnt; if (!req->sg_cnt || !nr_bvec) { nvmet_req_complete(req, 0); From a536b49785759bf99465fdf6e248d34322123fcd Mon Sep 17 00:00:00 2001 From: Max Gurtovoy Date: Thu, 28 Mar 2019 12:54:03 +0200 Subject: [PATCH 423/468] nvmet: fix error flow during ns enable In case we fail to enable p2pmem on the current namespace, disable the backing store device before exiting. Cc: Stephen Bates Signed-off-by: Max Gurtovoy Signed-off-by: Christoph Hellwig --- drivers/nvme/target/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 2d73b66e3686..b3e765a95af8 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -509,7 +509,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns) ret = nvmet_p2pmem_ns_enable(ns); if (ret) - goto out_unlock; + goto out_dev_disable; list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) nvmet_p2pmem_ns_add_p2p(ctrl, ns); @@ -550,7 +550,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns) out_dev_put: list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) pci_dev_put(radix_tree_delete(&ctrl->p2p_ns_map, ns->nsid)); - +out_dev_disable: nvmet_ns_dev_disable(ns); goto out_unlock; } From c8fa7a807f3c5f946bd92076fbaf7826edb650dc Mon Sep 17 00:00:00 2001 From: Solomon Tan Date: Fri, 22 Mar 2019 13:22:55 +0800 Subject: [PATCH 424/468] perf cs-etm: Add missing case value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The following error was thrown when compiling `tools/perf` using OpenCSD v0.11.1. This patch fixes said error. CC util/intel-pt-decoder/intel-pt-log.o CC util/cs-etm-decoder/cs-etm-decoder.o util/cs-etm-decoder/cs-etm-decoder.c: In function ‘cs_etm_decoder__buffer_range’: util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value ‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum] switch (elem->last_i_type) { ^~~~~~ CC util/intel-pt-decoder/intel-pt-decoder.o cc1: all warnings being treated as errors Because `OCSD_INSTR_WFI_WFE` case was added only in v0.11.0, the minimum required OpenCSD library version for this patch is no longer v0.10.0. Signed-off-by: Solomon Tan Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Mathieu Poirier Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Robert Walker Cc: Suzuki K Poulouse Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190322052255.GA4809@w-OptiPlex-7050 Signed-off-by: Arnaldo Carvalho de Melo --- tools/build/feature/test-libopencsd.c | 4 ++-- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c index d68eb4fb40cc..2b0e02c38870 100644 --- a/tools/build/feature/test-libopencsd.c +++ b/tools/build/feature/test-libopencsd.c @@ -4,9 +4,9 @@ /* * Check OpenCSD library version is sufficient to provide required features */ -#define OCSD_MIN_VER ((0 << 16) | (10 << 8) | (0)) +#define OCSD_MIN_VER ((0 << 16) | (11 << 8) | (0)) #if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER) -#error "OpenCSD >= 0.10.0 is required" +#error "OpenCSD >= 0.11.0 is required" #endif int main(void) diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ba4c623cd8de..39fe21e1cf93 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -387,6 +387,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; case OCSD_INSTR_ISB: case OCSD_INSTR_DSB_DMB: + case OCSD_INSTR_WFI_WFE: case OCSD_INSTR_OTHER: default: packet->last_instr_taken_branch = false; From f3b4e06b3bda759afd042d3d5fa86bea8f1fe278 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Mon, 25 Mar 2019 15:51:35 +0200 Subject: [PATCH 425/468] perf intel-pt: Fix TSC slip A TSC packet can slip past MTC packets so that the timestamp appears to go backwards. One estimate is that can be up to about 40 CPU cycles, which is certainly less than 0x1000 TSC ticks, but accept slippage an order of magnitude more to be on the safe side. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Fixes: 79b58424b821c ("perf tools: Add Intel PT support for decoding MTC packets") Link: http://lkml.kernel.org/r/20190325135135.18348-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../util/intel-pt-decoder/intel-pt-decoder.c | 20 ++++++++----------- 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 6e03db142091..872fab163585 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -251,19 +251,15 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params) if (!(decoder->tsc_ctc_ratio_n % decoder->tsc_ctc_ratio_d)) decoder->tsc_ctc_mult = decoder->tsc_ctc_ratio_n / decoder->tsc_ctc_ratio_d; - - /* - * Allow for timestamps appearing to backwards because a TSC - * packet has slipped past a MTC packet, so allow 2 MTC ticks - * or ... - */ - decoder->tsc_slip = multdiv(2 << decoder->mtc_shift, - decoder->tsc_ctc_ratio_n, - decoder->tsc_ctc_ratio_d); } - /* ... or 0x100 paranoia */ - if (decoder->tsc_slip < 0x100) - decoder->tsc_slip = 0x100; + + /* + * A TSC packet can slip past MTC packets so that the timestamp appears + * to go backwards. One estimate is that can be up to about 40 CPU + * cycles, which is certainly less than 0x1000 TSC ticks, but accept + * slippage an order of magnitude more to be on the safe side. + */ + decoder->tsc_slip = 0x10000; intel_pt_log("timestamp: mtc_shift %u\n", decoder->mtc_shift); intel_pt_log("timestamp: tsc_ctc_ratio_n %u\n", decoder->tsc_ctc_ratio_n); From 4e8a5c1551370ebc0fbdb8f5c33dad13e45bdc99 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Thu, 14 Mar 2019 15:00:10 +0100 Subject: [PATCH 426/468] perf evsel: Fix max perf_event_attr.precise_ip detection After a discussion with Andi, move the perf_event_attr.precise_ip detection for maximum precise config (via :P modifier or for default cycles event) to perf_evsel__open(). The current detection in perf_event_attr__set_max_precise_ip() is tricky, because precise_ip config is specific for given event and it currently checks only hw cycles. We now check for valid precise_ip value right after failing sys_perf_event_open() for specific event, before any of the perf_event_attr fallback code gets executed. This way we get the proper config in perf_event_attr together with allowed precise_ip settings. We can see that code activity with -vv, like: $ perf record -vv ls ... ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 ... precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -95 decreasing precise_ip by one (2) ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 ... precise_ip 2 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 = 4 ... Suggested-by: Andi Kleen Signed-off-by: Jiri Olsa Link: http://lkml.kernel.org/n/tip-dkvxxbeg7lu74155d4jhlmc9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evlist.c | 29 ---------------- tools/perf/util/evlist.h | 2 -- tools/perf/util/evsel.c | 72 ++++++++++++++++++++++++++++++++-------- 3 files changed, 59 insertions(+), 44 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index ec78e93085de..6689378ee577 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -231,35 +231,6 @@ void perf_evlist__set_leader(struct perf_evlist *evlist) } } -void perf_event_attr__set_max_precise_ip(struct perf_event_attr *pattr) -{ - struct perf_event_attr attr = { - .type = PERF_TYPE_HARDWARE, - .config = PERF_COUNT_HW_CPU_CYCLES, - .exclude_kernel = 1, - .precise_ip = 3, - }; - - event_attr_init(&attr); - - /* - * Unnamed union member, not supported as struct member named - * initializer in older compilers such as gcc 4.4.7 - */ - attr.sample_period = 1; - - while (attr.precise_ip != 0) { - int fd = sys_perf_event_open(&attr, 0, -1, -1, 0); - if (fd != -1) { - close(fd); - break; - } - --attr.precise_ip; - } - - pattr->precise_ip = attr.precise_ip; -} - int __perf_evlist__add_default(struct perf_evlist *evlist, bool precise) { struct perf_evsel *evsel = perf_evsel__new_cycles(precise); diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index dcb68f34d2cd..6a94785b9100 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -315,8 +315,6 @@ void perf_evlist__to_front(struct perf_evlist *evlist, void perf_evlist__set_tracking_event(struct perf_evlist *evlist, struct perf_evsel *tracking_evsel); -void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr); - struct perf_evsel * perf_evlist__find_evsel_by_str(struct perf_evlist *evlist, const char *str); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 7835e05f0c0a..66d066f18b5b 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -295,7 +295,6 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise) if (!precise) goto new_event; - perf_event_attr__set_max_precise_ip(&attr); /* * Now let the usual logic to set up the perf_event_attr defaults * to kick in when we return and before perf_evsel__open() is called. @@ -305,6 +304,8 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise) if (evsel == NULL) goto out; + evsel->precise_max = true; + /* use asprintf() because free(evsel) assumes name is allocated */ if (asprintf(&evsel->name, "cycles%s%s%.*s", (attr.precise_ip || attr.exclude_kernel) ? ":" : "", @@ -1083,7 +1084,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts, } if (evsel->precise_max) - perf_event_attr__set_max_precise_ip(attr); + attr->precise_ip = 3; if (opts->all_user) { attr->exclude_kernel = 1; @@ -1749,6 +1750,59 @@ static bool ignore_missing_thread(struct perf_evsel *evsel, return true; } +static void display_attr(struct perf_event_attr *attr) +{ + if (verbose >= 2) { + fprintf(stderr, "%.60s\n", graph_dotted_line); + fprintf(stderr, "perf_event_attr:\n"); + perf_event_attr__fprintf(stderr, attr, __open_attr__fprintf, NULL); + fprintf(stderr, "%.60s\n", graph_dotted_line); + } +} + +static int perf_event_open(struct perf_evsel *evsel, + pid_t pid, int cpu, int group_fd, + unsigned long flags) +{ + int precise_ip = evsel->attr.precise_ip; + int fd; + + while (1) { + pr_debug2("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", + pid, cpu, group_fd, flags); + + fd = sys_perf_event_open(&evsel->attr, pid, cpu, group_fd, flags); + if (fd >= 0) + break; + + /* + * Do quick precise_ip fallback if: + * - there is precise_ip set in perf_event_attr + * - maximum precise is requested + * - sys_perf_event_open failed with ENOTSUP error, + * which is associated with wrong precise_ip + */ + if (!precise_ip || !evsel->precise_max || (errno != ENOTSUP)) + break; + + /* + * We tried all the precise_ip values, and it's + * still failing, so leave it to standard fallback. + */ + if (!evsel->attr.precise_ip) { + evsel->attr.precise_ip = precise_ip; + break; + } + + pr_debug2("\nsys_perf_event_open failed, error %d\n", -ENOTSUP); + evsel->attr.precise_ip--; + pr_debug2("decreasing precise_ip by one (%d)\n", evsel->attr.precise_ip); + display_attr(&evsel->attr); + } + + return fd; +} + int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, struct thread_map *threads) { @@ -1824,12 +1878,7 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, if (perf_missing_features.sample_id_all) evsel->attr.sample_id_all = 0; - if (verbose >= 2) { - fprintf(stderr, "%.60s\n", graph_dotted_line); - fprintf(stderr, "perf_event_attr:\n"); - perf_event_attr__fprintf(stderr, &evsel->attr, __open_attr__fprintf, NULL); - fprintf(stderr, "%.60s\n", graph_dotted_line); - } + display_attr(&evsel->attr); for (cpu = 0; cpu < cpus->nr; cpu++) { @@ -1841,13 +1890,10 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, group_fd = get_group_fd(evsel, cpu, thread); retry_open: - pr_debug2("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", - pid, cpus->map[cpu], group_fd, flags); - test_attr__ready(); - fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], - group_fd, flags); + fd = perf_event_open(evsel, pid, cpus->map[cpu], + group_fd, flags); FD(evsel, cpu, thread) = fd; From be709d48329a500621d2a05835283150ae137b45 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 25 Mar 2019 14:06:07 -0300 Subject: [PATCH 427/468] tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h To deal with the move of some defines from asm-generic/mmap-common.h to linux/mman.h done in: 746c9398f5ac ("arch: move common mmap flags to linux/mman.h") The generated mmap_flags array stays the same: $ tools/perf/trace/beauty/mmap_flags.sh static const char *mmap_flags[] = { [ilog2(0x40) + 1] = "32BIT", [ilog2(0x01) + 1] = "SHARED", [ilog2(0x02) + 1] = "PRIVATE", [ilog2(0x10) + 1] = "FIXED", [ilog2(0x20) + 1] = "ANONYMOUS", [ilog2(0x100000) + 1] = "FIXED_NOREPLACE", [ilog2(0x0100) + 1] = "GROWSDOWN", [ilog2(0x0800) + 1] = "DENYWRITE", [ilog2(0x1000) + 1] = "EXECUTABLE", [ilog2(0x2000) + 1] = "LOCKED", [ilog2(0x4000) + 1] = "NORESERVE", [ilog2(0x8000) + 1] = "POPULATE", [ilog2(0x10000) + 1] = "NONBLOCK", [ilog2(0x20000) + 1] = "STACK", [ilog2(0x40000) + 1] = "HUGETLB", [ilog2(0x80000) + 1] = "SYNC", }; $ And to have the system's sys/mman.h find the definition of MAP_SHARED and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h in a way that keeps it the same as the kernel's, need for keeping the Android's NDK cross build working. This silences these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h' diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h Cc: Adrian Hunter Cc: Arnd Bergmann Cc: Jiri Olsa Cc: Michael S. Tsirkin Cc: Namhyung Kim Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/alpha/include/uapi/asm/mman.h | 2 -- tools/arch/mips/include/uapi/asm/mman.h | 2 -- tools/arch/parisc/include/uapi/asm/mman.h | 2 -- tools/arch/xtensa/include/uapi/asm/mman.h | 2 -- .../uapi/asm-generic/mman-common-tools.h | 23 +++++++++++++++++++ tools/include/uapi/asm-generic/mman-common.h | 4 +--- tools/include/uapi/asm-generic/mman.h | 2 +- tools/include/uapi/linux/mman.h | 4 ++++ tools/perf/Makefile.perf | 4 ++-- tools/perf/check-headers.sh | 2 +- tools/perf/trace/beauty/mmap_flags.sh | 14 ++++++++--- 11 files changed, 43 insertions(+), 18 deletions(-) create mode 100644 tools/include/uapi/asm-generic/mman-common-tools.h diff --git a/tools/arch/alpha/include/uapi/asm/mman.h b/tools/arch/alpha/include/uapi/asm/mman.h index c317d3e6867a..ea6a255ae61f 100644 --- a/tools/arch/alpha/include/uapi/asm/mman.h +++ b/tools/arch/alpha/include/uapi/asm/mman.h @@ -27,8 +27,6 @@ #define MAP_NONBLOCK 0x40000 #define MAP_NORESERVE 0x10000 #define MAP_POPULATE 0x20000 -#define MAP_PRIVATE 0x02 -#define MAP_SHARED 0x01 #define MAP_STACK 0x80000 #define PROT_EXEC 0x4 #define PROT_GROWSDOWN 0x01000000 diff --git a/tools/arch/mips/include/uapi/asm/mman.h b/tools/arch/mips/include/uapi/asm/mman.h index de2206883abc..c8acaa138d46 100644 --- a/tools/arch/mips/include/uapi/asm/mman.h +++ b/tools/arch/mips/include/uapi/asm/mman.h @@ -28,8 +28,6 @@ #define MAP_NONBLOCK 0x20000 #define MAP_NORESERVE 0x0400 #define MAP_POPULATE 0x10000 -#define MAP_PRIVATE 0x002 -#define MAP_SHARED 0x001 #define MAP_STACK 0x40000 #define PROT_EXEC 0x04 #define PROT_GROWSDOWN 0x01000000 diff --git a/tools/arch/parisc/include/uapi/asm/mman.h b/tools/arch/parisc/include/uapi/asm/mman.h index 1bd78758bde9..f9fd1325f5bd 100644 --- a/tools/arch/parisc/include/uapi/asm/mman.h +++ b/tools/arch/parisc/include/uapi/asm/mman.h @@ -27,8 +27,6 @@ #define MAP_NONBLOCK 0x20000 #define MAP_NORESERVE 0x4000 #define MAP_POPULATE 0x10000 -#define MAP_PRIVATE 0x02 -#define MAP_SHARED 0x01 #define MAP_STACK 0x40000 #define PROT_EXEC 0x4 #define PROT_GROWSDOWN 0x01000000 diff --git a/tools/arch/xtensa/include/uapi/asm/mman.h b/tools/arch/xtensa/include/uapi/asm/mman.h index 34dde6f44dae..f2b08c990afc 100644 --- a/tools/arch/xtensa/include/uapi/asm/mman.h +++ b/tools/arch/xtensa/include/uapi/asm/mman.h @@ -27,8 +27,6 @@ #define MAP_NONBLOCK 0x20000 #define MAP_NORESERVE 0x0400 #define MAP_POPULATE 0x10000 -#define MAP_PRIVATE 0x002 -#define MAP_SHARED 0x001 #define MAP_STACK 0x40000 #define PROT_EXEC 0x4 #define PROT_GROWSDOWN 0x01000000 diff --git a/tools/include/uapi/asm-generic/mman-common-tools.h b/tools/include/uapi/asm-generic/mman-common-tools.h new file mode 100644 index 000000000000..af7d0d3a3182 --- /dev/null +++ b/tools/include/uapi/asm-generic/mman-common-tools.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef __ASM_GENERIC_MMAN_COMMON_TOOLS_ONLY_H +#define __ASM_GENERIC_MMAN_COMMON_TOOLS_ONLY_H + +#include + +/* We need this because we need to have tools/include/uapi/ included in the tools + * header search path to get access to stuff that is not yet in the system's + * copy of the files in that directory, but since this cset: + * + * 746c9398f5ac ("arch: move common mmap flags to linux/mman.h") + * + * We end up making sys/mman.h, that is in the system headers, to not find the + * MAP_SHARED and MAP_PRIVATE defines because they are not anymore in our copy + * of asm-generic/mman-common.h. So we define them here and include this header + * from each of the per arch mman.h headers. + */ +#ifndef MAP_SHARED +#define MAP_SHARED 0x01 /* Share changes */ +#define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x03 /* share + validate extension flags */ +#endif +#endif // __ASM_GENERIC_MMAN_COMMON_TOOLS_ONLY_H diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h index e7ee32861d51..abd238d0f7a4 100644 --- a/tools/include/uapi/asm-generic/mman-common.h +++ b/tools/include/uapi/asm-generic/mman-common.h @@ -15,9 +15,7 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */ -#define MAP_SHARED 0x01 /* Share changes */ -#define MAP_PRIVATE 0x02 /* Changes are private */ -#define MAP_SHARED_VALIDATE 0x03 /* share + validate extension flags */ +/* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x20 /* don't use a file */ diff --git a/tools/include/uapi/asm-generic/mman.h b/tools/include/uapi/asm-generic/mman.h index 653687d9771b..36c197fc44a0 100644 --- a/tools/include/uapi/asm-generic/mman.h +++ b/tools/include/uapi/asm-generic/mman.h @@ -2,7 +2,7 @@ #ifndef __ASM_GENERIC_MMAN_H #define __ASM_GENERIC_MMAN_H -#include +#include #define MAP_GROWSDOWN 0x0100 /* stack-like segment */ #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h index d0f515d53299..fc1a64c3447b 100644 --- a/tools/include/uapi/linux/mman.h +++ b/tools/include/uapi/linux/mman.h @@ -12,6 +12,10 @@ #define OVERCOMMIT_ALWAYS 1 #define OVERCOMMIT_NEVER 2 +#define MAP_SHARED 0x01 /* Share changes */ +#define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x03 /* share + validate extension flags */ + /* * Huge page size encoding when MAP_HUGETLB is specified, and a huge page * size other than the default is desired. See hugetlb_encode.h. diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 01f7555fd933..e8c9f77e9010 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -481,8 +481,8 @@ $(madvise_behavior_array): $(madvise_hdr_dir)/mman-common.h $(madvise_behavior_t mmap_flags_array := $(beauty_outdir)/mmap_flags_array.c mmap_flags_tbl := $(srctree)/tools/perf/trace/beauty/mmap_flags.sh -$(mmap_flags_array): $(asm_generic_uapi_dir)/mman.h $(asm_generic_uapi_dir)/mman-common.h $(mmap_flags_tbl) - $(Q)$(SHELL) '$(mmap_flags_tbl)' $(asm_generic_uapi_dir) $(arch_asm_uapi_dir) > $@ +$(mmap_flags_array): $(linux_uapi_dir)/mman.h $(asm_generic_uapi_dir)/mman.h $(asm_generic_uapi_dir)/mman-common.h $(mmap_flags_tbl) + $(Q)$(SHELL) '$(mmap_flags_tbl)' $(linux_uapi_dir) $(asm_generic_uapi_dir) $(arch_asm_uapi_dir) > $@ mount_flags_array := $(beauty_outdir)/mount_flags_array.c mount_flags_tbl := $(srctree)/tools/perf/trace/beauty/mount_flags.sh diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh index 7b55613924de..c68ee06cae63 100755 --- a/tools/perf/check-headers.sh +++ b/tools/perf/check-headers.sh @@ -103,7 +103,7 @@ done # diff with extra ignore lines check arch/x86/lib/memcpy_64.S '-I "^EXPORT_SYMBOL" -I "^#include "' check arch/x86/lib/memset_64.S '-I "^EXPORT_SYMBOL" -I "^#include "' -check include/uapi/asm-generic/mman.h '-I "^#include <\(uapi/\)*asm-generic/mman-common.h>"' +check include/uapi/asm-generic/mman.h '-I "^#include <\(uapi/\)*asm-generic/mman-common\(-tools\)*.h>"' check include/uapi/linux/mman.h '-I "^#include <\(uapi/\)*asm/mman.h>"' # diff non-symmetric files diff --git a/tools/perf/trace/beauty/mmap_flags.sh b/tools/perf/trace/beauty/mmap_flags.sh index 32bac9c0d694..5f5eefcb3c74 100755 --- a/tools/perf/trace/beauty/mmap_flags.sh +++ b/tools/perf/trace/beauty/mmap_flags.sh @@ -1,15 +1,18 @@ #!/bin/sh # SPDX-License-Identifier: LGPL-2.1 -if [ $# -ne 2 ] ; then +if [ $# -ne 3 ] ; then [ $# -eq 1 ] && hostarch=$1 || hostarch=`uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/` + linux_header_dir=tools/include/uapi/linux header_dir=tools/include/uapi/asm-generic arch_header_dir=tools/arch/${hostarch}/include/uapi/asm else - header_dir=$1 - arch_header_dir=$2 + linux_header_dir=$1 + header_dir=$2 + arch_header_dir=$3 fi +linux_mman=${linux_header_dir}/mman.h arch_mman=${arch_header_dir}/mman.h # those in egrep -vw are flags, we want just the bits @@ -20,6 +23,11 @@ egrep -q $regex ${arch_mman} && \ (egrep $regex ${arch_mman} | \ sed -r "s/$regex/\2 \1/g" | \ xargs printf "\t[ilog2(%s) + 1] = \"%s\",\n") +egrep -q $regex ${linux_mman} && \ +(egrep $regex ${linux_mman} | \ + egrep -vw 'MAP_(UNINITIALIZED|TYPE|SHARED_VALIDATE)' | \ + sed -r "s/$regex/\2 \1/g" | \ + xargs printf "\t[ilog2(%s) + 1] = \"%s\",\n") ([ ! -f ${arch_mman} ] || egrep -q '#[[:space:]]*include[[:space:]]+ Date: Mon, 25 Mar 2019 14:22:47 -0300 Subject: [PATCH 428/468] tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition To get the changes in: ab3948f58ff8 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") And silence this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h' diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Joel Fernandes (Google) Cc: Linus Torvalds Cc: Namhyung Kim Link: https://lkml.kernel.org/n/tip-lvfx5cgf0xzmdi9mcjva1ttl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/fcntl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/include/uapi/linux/fcntl.h b/tools/include/uapi/linux/fcntl.h index 6448cdd9a350..a2f8658f1c55 100644 --- a/tools/include/uapi/linux/fcntl.h +++ b/tools/include/uapi/linux/fcntl.h @@ -41,6 +41,7 @@ #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ #define F_SEAL_GROW 0x0004 /* prevent file from growing */ #define F_SEAL_WRITE 0x0008 /* prevent writes */ +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ /* (1U << 31) is reserved for signed error codes */ /* From 949af89af02c2d66db973c5bca01b7858e1ce0ba Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 25 Mar 2019 14:25:33 -0300 Subject: [PATCH 429/468] tools arch x86: Sync asm/cpufeatures.h with the kernel sources To get the changes from: 52f64909409c ("x86: Add TSX Force Abort CPUID/MSR") That don't cause any changes in the generated perf binaries. And silence this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra (Intel) Cc: Thomas Gleixner Link: https://lkml.kernel.org/n/tip-zv8kw8vnb1zppflncpwfsv2w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index 6d6122524711..981ff9479648 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -344,6 +344,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ From 82392516e9e0818cd2227cf9a16205c90a6cacfa Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 25 Mar 2019 14:28:20 -0300 Subject: [PATCH 430/468] tools headers uapi: Update drm/i915_drm.h To get the changes in: e46c2e99f600 ("drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)") That don't cause changes in the generated perf binaries. To silence this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Tvrtko Ursulin Link: https://lkml.kernel.org/n/tip-h6bspm1nomjnpr90333rrx7q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/drm/i915_drm.h | 64 +++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h index 298b2e197744..397810fa2d33 100644 --- a/tools/include/uapi/drm/i915_drm.h +++ b/tools/include/uapi/drm/i915_drm.h @@ -1486,9 +1486,73 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */ #define I915_CONTEXT_DEFAULT_PRIORITY 0 #define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */ + /* + * When using the following param, value should be a pointer to + * drm_i915_gem_context_param_sseu. + */ +#define I915_CONTEXT_PARAM_SSEU 0x7 __u64 value; }; +/** + * Context SSEU programming + * + * It may be necessary for either functional or performance reason to configure + * a context to run with a reduced number of SSEU (where SSEU stands for Slice/ + * Sub-slice/EU). + * + * This is done by configuring SSEU configuration using the below + * @struct drm_i915_gem_context_param_sseu for every supported engine which + * userspace intends to use. + * + * Not all GPUs or engines support this functionality in which case an error + * code -ENODEV will be returned. + * + * Also, flexibility of possible SSEU configuration permutations varies between + * GPU generations and software imposed limitations. Requesting such a + * combination will return an error code of -EINVAL. + * + * NOTE: When perf/OA is active the context's SSEU configuration is ignored in + * favour of a single global setting. + */ +struct drm_i915_gem_context_param_sseu { + /* + * Engine class & instance to be configured or queried. + */ + __u16 engine_class; + __u16 engine_instance; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 flags; + + /* + * Mask of slices to enable for the context. Valid values are a subset + * of the bitmask value returned for I915_PARAM_SLICE_MASK. + */ + __u64 slice_mask; + + /* + * Mask of subslices to enable for the context. Valid values are a + * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK. + */ + __u64 subslice_mask; + + /* + * Minimum/Maximum number of EUs to enable per subslice for the + * context. min_eus_per_subslice must be inferior or equal to + * max_eus_per_subslice. + */ + __u16 min_eus_per_subslice; + __u16 max_eus_per_subslice; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd; +}; + enum drm_i915_oa_format { I915_OA_FORMAT_A13 = 1, /* HSW only */ I915_OA_FORMAT_A29, /* HSW only */ From 8142bd82a59e452fefea7b21113101d6a87d9fa8 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 25 Mar 2019 11:34:04 -0300 Subject: [PATCH 431/468] tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd To pick up the changes introduced in the following csets: 2b188cc1bb85 ("Add io_uring IO interface") edafccee56ff ("io_uring: add support for pre-mapped user IO buffers") 3eb39f47934f ("signal: add pidfd_send_signal() syscall") This makes 'perf trace' to become aware of these new syscalls, so that one can use them like 'perf trace -e ui_uring*,*signal' to do a system wide strace-like session looking at those syscalls, for instance. For example: # perf trace -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla Summary of events: io_uring-cp (383), 1208866 events, 100.0% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) -------------- ------ -------- ------ ------- ------- ------ io_uring_enter 605780 2955.615 0.000 0.005 33.804 1.94% openat 4 459.446 0.004 114.861 459.435 100.00% munmap 4 0.073 0.009 0.018 0.042 44.03% mmap 10 0.054 0.002 0.005 0.026 43.24% brk 28 0.038 0.001 0.001 0.003 7.51% io_uring_setup 1 0.030 0.030 0.030 0.030 0.00% mprotect 4 0.014 0.002 0.004 0.005 14.32% close 5 0.012 0.001 0.002 0.004 28.87% fstat 3 0.006 0.001 0.002 0.003 35.83% read 4 0.004 0.001 0.001 0.002 13.58% access 1 0.003 0.003 0.003 0.003 0.00% lseek 3 0.002 0.001 0.001 0.001 9.00% arch_prctl 2 0.002 0.001 0.001 0.001 0.69% execve 1 0.000 0.000 0.000 0.000 0.00% # # perf trace -e io_uring* -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla Summary of events: io_uring-cp (390), 1191250 events, 100.0% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) -------------- ------ -------- ------ ------ ------ ------ io_uring_enter 597093 2706.060 0.001 0.005 14.761 1.10% io_uring_setup 1 0.038 0.038 0.038 0.038 0.00% # More work needed to make the tools/perf/examples/bpf/augmented_raw_syscalls.c BPF program to copy the 'struct io_uring_params' arguments to perf's ring buffer so that 'perf trace' can use the BTF info put in place by pahole's conversion of the kernel DWARF and then auto-beautify those arguments. This patch produces the expected change in the generated syscalls table for x86_64: --- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before 2019-03-26 13:37:46.679057774 -0300 +++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2019-03-26 13:38:12.755990383 -0300 @@ -334,5 +334,9 @@ static const char *syscalltbl_x86_64[] = [332] = "statx", [333] = "io_pgetevents", [334] = "rseq", + [424] = "pidfd_send_signal", + [425] = "io_uring_setup", + [426] = "io_uring_enter", + [427] = "io_uring_register", }; -#define SYSCALLTBL_x86_64_MAX_ID 334 +#define SYSCALLTBL_x86_64_MAX_ID 427 This silences these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Cc: Adrian Hunter Cc: Andrii Nakryiko Cc: Christian Brauner Cc: Daniel Borkmann Cc: Jens Axboe Cc: Jiri Olsa Cc: Martin KaFai Lau Cc: Namhyung Kim Cc: Song Liu Cc: Yonghong Song Link: https://lkml.kernel.org/n/tip-p0ars3otuc52x5iznf21shhw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/asm-generic/unistd.h | 11 ++++++++++- tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 4 ++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index 12cdf611d217..dee7292e1df6 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -824,8 +824,17 @@ __SYSCALL(__NR_futex_time64, sys_futex) __SYSCALL(__NR_sched_rr_get_interval_time64, sys_sched_rr_get_interval) #endif +#define __NR_pidfd_send_signal 424 +__SYSCALL(__NR_pidfd_send_signal, sys_pidfd_send_signal) +#define __NR_io_uring_setup 425 +__SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) +#define __NR_io_uring_enter 426 +__SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) +#define __NR_io_uring_register 427 +__SYSCALL(__NR_io_uring_register, sys_io_uring_register) + #undef __NR_syscalls -#define __NR_syscalls 424 +#define __NR_syscalls 428 /* * 32 bit systems traditionally used different diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl index 2ae92fddb6d5..92ee0b4378d4 100644 --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl @@ -345,6 +345,10 @@ 334 common rseq __x64_sys_rseq # don't use numbers 387 through 423, add new calls after the last # 'common' entry +424 common pidfd_send_signal __x64_sys_pidfd_send_signal +425 common io_uring_setup __x64_sys_io_uring_setup +426 common io_uring_enter __x64_sys_io_uring_enter +427 common io_uring_register __x64_sys_io_uring_register # # x32-specific system call numbers start at 512 to avoid cache impact From 707c373c846cf6e27a47a2c093d243a35c691b62 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 26 Mar 2019 13:45:58 -0300 Subject: [PATCH 432/468] tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources To pick up the changes in: 2b57ecd0208f ("KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()") That don't cause any changes in the tools. This silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/kvm.h' differs from latest version at 'arch/powerpc/include/uapi/asm/kvm.h' diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Paul Mackerras Cc: Suraj Jitindar Singh Link: https://lkml.kernel.org/n/tip-4pb7ywp9536hub2pnj4hu6i4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/powerpc/include/uapi/asm/kvm.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/arch/powerpc/include/uapi/asm/kvm.h b/tools/arch/powerpc/include/uapi/asm/kvm.h index 8c876c166ef2..26ca425f4c2c 100644 --- a/tools/arch/powerpc/include/uapi/asm/kvm.h +++ b/tools/arch/powerpc/include/uapi/asm/kvm.h @@ -463,10 +463,12 @@ struct kvm_ppc_cpu_char { #define KVM_PPC_CPU_CHAR_BR_HINT_HONOURED (1ULL << 58) #define KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF (1ULL << 57) #define KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS (1ULL << 56) +#define KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST (1ull << 54) #define KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY (1ULL << 63) #define KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR (1ULL << 62) #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ULL << 61) +#define KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE (1ull << 58) /* Per-vcpu XICS interrupt controller state */ #define KVM_REG_PPC_ICP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c) From 977c7a6d1e263ff1d755f28595b99e4bc0c48a9f Mon Sep 17 00:00:00 2001 From: Wei Li Date: Thu, 28 Feb 2019 17:20:03 +0800 Subject: [PATCH 433/468] perf machine: Update kernel map address and re-order properly Since commit 1fb87b8e9599 ("perf machine: Don't search for active kernel start in __machine__create_kernel_maps"), the __machine__create_kernel_maps() just create a map what start and end are both zero. Though the address will be updated later, the order of map in the rbtree may be incorrect. The commit ee05d21791db ("perf machine: Set main kernel end address properly") fixed the logic in machine__create_kernel_maps(), but it's still wrong in function machine__process_kernel_mmap_event(). To reproduce this issue, we need an environment which the module address is before the kernel text segment. I tested it on an aarch64 machine with kernel 4.19.25: [root@localhost hulk]# grep _stext /proc/kallsyms ffff000008081000 T _stext [root@localhost hulk]# grep _etext /proc/kallsyms ffff000009780000 R _etext [root@localhost hulk]# tail /proc/modules hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000 nvme_core 126976 7 nvme, Live 0xffff0000018b6000 mdio 20480 1 ixgbe, Live 0xffff0000018ab000 hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000 hns_mdio 20480 2 - Live 0xffff000001822000 hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000 dm_mirror 40960 0 - Live 0xffff000001804000 dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000 dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000 dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000 [root@localhost hulk]# Before fix: [root@localhost bin]# perf record sleep 3 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ] [root@localhost bin]# perf buildid-list -i perf.data 4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso] [root@localhost bin]# perf buildid-list -i perf.data -H 0000000000000000000000000000000000000000 /proc/kcore [root@localhost bin]# After fix: [root@localhost tools]# ./perf/perf record sleep 3 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ] [root@localhost tools]# ./perf/perf buildid-list -i perf.data 28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms] 106c14ce6e4acea3453e484dc604d66666f08a2f [vdso] [root@localhost tools]# ./perf/perf buildid-list -i perf.data -H 28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore Signed-off-by: Wei Li Acked-by: Jiri Olsa Acked-by: Namhyung Kim Cc: Alexander Shishkin Cc: David Ahern Cc: Hanjun Guo Cc: Kim Phillips Cc: Li Bin Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 61959aba7e27..3c520baa198c 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1421,6 +1421,20 @@ static void machine__set_kernel_mmap(struct machine *machine, machine->vmlinux_map->end = ~0ULL; } +static void machine__update_kernel_mmap(struct machine *machine, + u64 start, u64 end) +{ + struct map *map = machine__kernel_map(machine); + + map__get(map); + map_groups__remove(&machine->kmaps, map); + + machine__set_kernel_mmap(machine, start, end); + + map_groups__insert(&machine->kmaps, map); + map__put(map); +} + int machine__create_kernel_maps(struct machine *machine) { struct dso *kernel = machine__get_kernel(machine); @@ -1453,17 +1467,11 @@ int machine__create_kernel_maps(struct machine *machine) goto out_put; } - /* we have a real start address now, so re-order the kmaps */ - map = machine__kernel_map(machine); - - map__get(map); - map_groups__remove(&machine->kmaps, map); - - /* assume it's the last in the kmaps */ - machine__set_kernel_mmap(machine, addr, ~0ULL); - - map_groups__insert(&machine->kmaps, map); - map__put(map); + /* + * we have a real start address now, so re-order the kmaps + * assume it's the last in the kmaps + */ + machine__update_kernel_mmap(machine, addr, ~0ULL); } if (machine__create_extra_kernel_maps(machine, kernel)) @@ -1599,7 +1607,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine, if (strstr(kernel->long_name, "vmlinux")) dso__set_short_name(kernel, "[kernel.vmlinux]", false); - machine__set_kernel_mmap(machine, event->mmap.start, + machine__update_kernel_mmap(machine, event->mmap.start, event->mmap.start + event->mmap.len); /* From 8453c936db20489dbf0957187dca9a2656a2a7b6 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Wed, 27 Mar 2019 09:28:25 +0200 Subject: [PATCH 434/468] perf scripts python: exported-sql-viewer.py: Fix never-ending loop pyside version 1 fails to handle python3 large integers in some cases, resulting in Qt getting into a never-ending loop. This affects: samples Table samples_view Table All branches Report Selected branches Report Add workarounds for those cases. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Fixes: beda0e725e5f ("perf script python: Add Python3 support to exported-sql-viewer.py") Link: http://lkml.kernel.org/r/20190327072826.19168-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../scripts/python/exported-sql-viewer.py | 60 +++++++++++++++---- 1 file changed, 50 insertions(+), 10 deletions(-) diff --git a/tools/perf/scripts/python/exported-sql-viewer.py b/tools/perf/scripts/python/exported-sql-viewer.py index e38518cdcbc3..0cf30956064a 100755 --- a/tools/perf/scripts/python/exported-sql-viewer.py +++ b/tools/perf/scripts/python/exported-sql-viewer.py @@ -107,6 +107,7 @@ import os from PySide.QtCore import * from PySide.QtGui import * from PySide.QtSql import * +pyside_version_1 = True from decimal import * from ctypes import * from multiprocessing import Process, Array, Value, Event @@ -1526,6 +1527,19 @@ def BranchDataPrep(query): " (" + dsoname(query.value(15)) + ")") return data +def BranchDataPrepWA(query): + data = [] + data.append(query.value(0)) + # Workaround pyside failing to handle large integers (i.e. time) in python3 by converting to a string + data.append("{:>19}".format(query.value(1))) + for i in xrange(2, 8): + data.append(query.value(i)) + data.append(tohex(query.value(8)).rjust(16) + " " + query.value(9) + offstr(query.value(10)) + + " (" + dsoname(query.value(11)) + ")" + " -> " + + tohex(query.value(12)) + " " + query.value(13) + offstr(query.value(14)) + + " (" + dsoname(query.value(15)) + ")") + return data + # Branch data model class BranchModel(TreeModel): @@ -1553,7 +1567,11 @@ class BranchModel(TreeModel): " AND evsel_id = " + str(self.event_id) + " ORDER BY samples.id" " LIMIT " + str(glb_chunk_sz)) - self.fetcher = SQLFetcher(glb, sql, BranchDataPrep, self.AddSample) + if pyside_version_1 and sys.version_info[0] == 3: + prep = BranchDataPrepWA + else: + prep = BranchDataPrep + self.fetcher = SQLFetcher(glb, sql, prep, self.AddSample) self.fetcher.done.connect(self.Update) self.fetcher.Fetch(glb_chunk_sz) @@ -2079,14 +2097,6 @@ def IsSelectable(db, table, sql = ""): return False return True -# SQL data preparation - -def SQLTableDataPrep(query, count): - data = [] - for i in xrange(count): - data.append(query.value(i)) - return data - # SQL table data model item class SQLTableItem(): @@ -2110,7 +2120,7 @@ class SQLTableModel(TableModel): self.more = True self.populated = 0 self.column_headers = column_headers - self.fetcher = SQLFetcher(glb, sql, lambda x, y=len(column_headers): SQLTableDataPrep(x, y), self.AddSample) + self.fetcher = SQLFetcher(glb, sql, lambda x, y=len(column_headers): self.SQLTableDataPrep(x, y), self.AddSample) self.fetcher.done.connect(self.Update) self.fetcher.Fetch(glb_chunk_sz) @@ -2154,6 +2164,12 @@ class SQLTableModel(TableModel): def columnHeader(self, column): return self.column_headers[column] + def SQLTableDataPrep(self, query, count): + data = [] + for i in xrange(count): + data.append(query.value(i)) + return data + # SQL automatic table data model class SQLAutoTableModel(SQLTableModel): @@ -2182,8 +2198,32 @@ class SQLAutoTableModel(SQLTableModel): QueryExec(query, "SELECT column_name FROM information_schema.columns WHERE table_schema = '" + schema + "' and table_name = '" + select_table_name + "'") while query.next(): column_headers.append(query.value(0)) + if pyside_version_1 and sys.version_info[0] == 3: + if table_name == "samples_view": + self.SQLTableDataPrep = self.samples_view_DataPrep + if table_name == "samples": + self.SQLTableDataPrep = self.samples_DataPrep super(SQLAutoTableModel, self).__init__(glb, sql, column_headers, parent) + def samples_view_DataPrep(self, query, count): + data = [] + data.append(query.value(0)) + # Workaround pyside failing to handle large integers (i.e. time) in python3 by converting to a string + data.append("{:>19}".format(query.value(1))) + for i in xrange(2, count): + data.append(query.value(i)) + return data + + def samples_DataPrep(self, query, count): + data = [] + for i in xrange(9): + data.append(query.value(i)) + # Workaround pyside failing to handle large integers (i.e. time) in python3 by converting to a string + data.append("{:>19}".format(query.value(9))) + for i in xrange(10, count): + data.append(query.value(i)) + return data + # Base class for custom ResizeColumnsToContents class ResizeColumnsToContentsBase(QObject): From 606bd60ab6fbcb7f73deeef4fa37cfd5e447a200 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Wed, 27 Mar 2019 09:28:26 +0200 Subject: [PATCH 435/468] perf scripts python: exported-sql-viewer.py: Fix python3 support Unlike python2, python3 strings are not compatible with byte strings. That results in disassembly not working for the branches reports. Fixup those places overlooked in the port to python3. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Fixes: beda0e725e5f ("perf script python: Add Python3 support to exported-sql-viewer.py") Link: http://lkml.kernel.org/r/20190327072826.19168-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../perf/scripts/python/exported-sql-viewer.py | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/tools/perf/scripts/python/exported-sql-viewer.py b/tools/perf/scripts/python/exported-sql-viewer.py index 0cf30956064a..74ef92f1d19a 100755 --- a/tools/perf/scripts/python/exported-sql-viewer.py +++ b/tools/perf/scripts/python/exported-sql-viewer.py @@ -2908,9 +2908,13 @@ class LibXED(): ok = self.xed_format_context(2, inst.xedp, inst.bufferp, sizeof(inst.buffer), ip, 0, 0) if not ok: return 0, "" + if sys.version_info[0] == 2: + result = inst.buffer.value + else: + result = inst.buffer.value.decode() # Return instruction length and the disassembled instruction text # For now, assume the length is in byte 166 - return inst.xedd[166], inst.buffer.value + return inst.xedd[166], result def TryOpen(file_name): try: @@ -2926,9 +2930,14 @@ def Is64Bit(f): header = f.read(7) f.seek(pos) magic = header[0:4] - eclass = ord(header[4]) - encoding = ord(header[5]) - version = ord(header[6]) + if sys.version_info[0] == 2: + eclass = ord(header[4]) + encoding = ord(header[5]) + version = ord(header[6]) + else: + eclass = header[4] + encoding = header[5] + version = header[6] if magic == chr(127) + "ELF" and eclass > 0 and eclass < 3 and encoding > 0 and encoding < 3 and version == 1: result = True if eclass == 2 else False return result From e94d6b7f615e6dfbaf9fba7db6011db561461d0c Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Fri, 15 Mar 2019 11:00:14 -0700 Subject: [PATCH 436/468] perf pmu: Fix parser error for uncore event alias Perf fails to parse uncore event alias, for example: # perf stat -e unc_m_clockticks -a --no-merge sleep 1 event syntax error: 'unc_m_clockticks' \___ parser error Current code assumes that the event alias is from one specific PMU. To find the PMU, perf strcmps the PMU name of event alias with the real PMU name on the system. However, the uncore event alias may be from multiple PMUs with common prefix. The PMU name of uncore event alias is the common prefix. For example, UNC_M_CLOCKTICKS is clock event for iMC, which include 6 PMUs with the same prefix "uncore_imc" on a skylake server. The real PMU names on the system for iMC are uncore_imc_0 ... uncore_imc_5. The strncmp is used to only check the common prefix for uncore event alias. With the patch: # perf stat -e unc_m_clockticks -a --no-merge sleep 1 Performance counter stats for 'system wide': 723,594,722 unc_m_clockticks [uncore_imc_5] 724,001,954 unc_m_clockticks [uncore_imc_3] 724,042,655 unc_m_clockticks [uncore_imc_1] 724,161,001 unc_m_clockticks [uncore_imc_4] 724,293,713 unc_m_clockticks [uncore_imc_2] 724,340,901 unc_m_clockticks [uncore_imc_0] 1.002090060 seconds time elapsed Signed-off-by: Kan Liang Acked-by: Jiri Olsa Cc: Andi Kleen Cc: Thomas Richter Cc: stable@vger.kernel.org Fixes: ea1fa48c055f ("perf stat: Handle different PMU names with common prefix") Link: http://lkml.kernel.org/r/1552672814-156173-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/pmu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 6199a3174ab9..e0429f4ef335 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -732,10 +732,20 @@ static void pmu_add_cpu_aliases(struct list_head *head, struct perf_pmu *pmu) if (!is_arm_pmu_core(name)) { pname = pe->pmu ? pe->pmu : "cpu"; + + /* + * uncore alias may be from different PMU + * with common prefix + */ + if (pmu_is_uncore(name) && + !strncmp(pname, name, strlen(pname))) + goto new_alias; + if (strcmp(pname, name)) continue; } +new_alias: /* need type casts to override 'const' */ __perf_pmu__new_alias(head, NULL, (char *)pe->name, (char *)pe->desc, (char *)pe->event, From 6f845ebec2706841d15831fab3ffffcfd9e676fa Mon Sep 17 00:00:00 2001 From: Mahesh Salgaonkar Date: Tue, 26 Mar 2019 18:00:31 +0530 Subject: [PATCH 437/468] powerpc/pseries/mce: Fix misleading print for TLB mutlihit On pseries, TLB multihit are reported as D-Cache Multihit. This is because the wrongly populated mc_err_types[] array. Per PAPR, TLB error type is 0x04 and mc_err_types[4] points to "D-Cache" instead of "TLB" string. Fixup the mc_err_types[] array. Machine check error type per PAPR: 0x00 = Uncorrectable Memory Error (UE) 0x01 = SLB error 0x02 = ERAT Error 0x04 = TLB error 0x05 = D-Cache error 0x07 = I-Cache error Fixes: 8f0b80561f21 ("powerpc/pseries: Display machine check error details.") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Aneesh Kumar K.V Signed-off-by: Mahesh Salgaonkar Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/pseries/ras.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index d97d52772789..452dcfd7e5dd 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -550,6 +550,7 @@ static void pseries_print_mce_info(struct pt_regs *regs, "UE", "SLB", "ERAT", + "Unknown", "TLB", "D-Cache", "Unknown", From f560bd19d2fe0e54851d706b72acbc6f2eed3567 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Thu, 28 Mar 2019 12:42:33 +0100 Subject: [PATCH 438/468] x86/realmode: Make set_real_mode_mem() static inline Remove the unused @size argument and move it into a header file, so it can be inlined. [ bp: Massage. ] Signed-off-by: Matteo Croce Signed-off-by: Borislav Petkov Reviewed-by: Mukesh Ojha Cc: Ard Biesheuvel Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-efi Cc: platform-driver-x86@vger.kernel.org Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190328114233.27835-1-mcroce@redhat.com --- arch/x86/include/asm/realmode.h | 6 +++++- arch/x86/platform/efi/quirks.c | 2 +- arch/x86/realmode/init.c | 9 +-------- 3 files changed, 7 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 63b3393bd98e..c53682303c9c 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -77,7 +77,11 @@ static inline size_t real_mode_size_needed(void) return ALIGN(real_mode_blob_end - real_mode_blob, PAGE_SIZE); } -void set_real_mode_mem(phys_addr_t mem, size_t size); +static inline void set_real_mode_mem(phys_addr_t mem) +{ + real_mode_header = (struct real_mode_header *) __va(mem); +} + void reserve_real_mode(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 458a0e2bcc57..a25a9fd987a9 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -449,7 +449,7 @@ void __init efi_free_boot_services(void) */ rm_size = real_mode_size_needed(); if (rm_size && (start + rm_size) < (1<<20) && size >= rm_size) { - set_real_mode_mem(start, rm_size); + set_real_mode_mem(start); start += rm_size; size -= rm_size; } diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 47d097946872..7dce39c8c034 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -15,13 +15,6 @@ u32 *trampoline_cr4_features; /* Hold the pgd entry used on booting additional CPUs */ pgd_t trampoline_pgd_entry; -void __init set_real_mode_mem(phys_addr_t mem, size_t size) -{ - void *base = __va(mem); - - real_mode_header = (struct real_mode_header *) base; -} - void __init reserve_real_mode(void) { phys_addr_t mem; @@ -40,7 +33,7 @@ void __init reserve_real_mode(void) } memblock_reserve(mem, size); - set_real_mode_mem(mem, size); + set_real_mode_mem(mem); } static void __init setup_real_mode(void) From 9c38f1f044080392603c497ecca4d7d09876ff99 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Mon, 25 Mar 2019 15:16:47 +0000 Subject: [PATCH 439/468] kconfig/[mn]conf: handle backspace (^H) key Backspace is not working on some terminal emulators which do not send the key code defined by terminfo. Terminals either send '^H' (8) or '^?' (127). But currently only '^?' is handled. Let's also handle '^H' for those terminals. Signed-off-by: Changbin Du Signed-off-by: Masahiro Yamada --- scripts/kconfig/lxdialog/inputbox.c | 3 ++- scripts/kconfig/nconf.c | 2 +- scripts/kconfig/nconf.gui.c | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/scripts/kconfig/lxdialog/inputbox.c b/scripts/kconfig/lxdialog/inputbox.c index 611945611bf8..1dcfb288ee63 100644 --- a/scripts/kconfig/lxdialog/inputbox.c +++ b/scripts/kconfig/lxdialog/inputbox.c @@ -113,7 +113,8 @@ int dialog_inputbox(const char *title, const char *prompt, int height, int width case KEY_DOWN: break; case KEY_BACKSPACE: - case 127: + case 8: /* ^H */ + case 127: /* ^? */ if (pos) { wattrset(dialog, dlg.inputbox.atr); if (input_x == 0) { diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index a4670f4e825a..ac92c0ded6c5 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -1048,7 +1048,7 @@ static int do_match(int key, struct match_state *state, int *ans) state->match_direction = FIND_NEXT_MATCH_UP; *ans = get_mext_match(state->pattern, state->match_direction); - } else if (key == KEY_BACKSPACE || key == 127) { + } else if (key == KEY_BACKSPACE || key == 8 || key == 127) { state->pattern[strlen(state->pattern)-1] = '\0'; adj_match_dir(&state->match_direction); } else diff --git a/scripts/kconfig/nconf.gui.c b/scripts/kconfig/nconf.gui.c index 7be620a1fcdb..77f525a8617c 100644 --- a/scripts/kconfig/nconf.gui.c +++ b/scripts/kconfig/nconf.gui.c @@ -439,7 +439,8 @@ int dialog_inputbox(WINDOW *main_window, case KEY_F(F_EXIT): case KEY_F(F_BACK): break; - case 127: + case 8: /* ^H */ + case 127: /* ^? */ case KEY_BACKSPACE: if (cursor_position > 0) { memmove(&result[cursor_position-1], From 8aafaaf2212192012f5bae305bb31cdf7681d777 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Thu, 28 Mar 2019 11:44:59 +0100 Subject: [PATCH 440/468] iommu/amd: Reserve exclusion range in iova-domain If a device has an exclusion range specified in the IVRS table, this region needs to be reserved in the iova-domain of that device. This hasn't happened until now and can cause data corruption on data transfered with these devices. Treat exclusion ranges as reserved regions in the iommu-core to fix the problem. Fixes: be2a022c0dd0 ('x86, AMD IOMMU: add functions to parse IOMMU memory mapping requirements for devices') Signed-off-by: Joerg Roedel Reviewed-by: Gary R Hook --- drivers/iommu/amd_iommu.c | 9 ++++++--- drivers/iommu/amd_iommu_init.c | 7 ++++--- drivers/iommu/amd_iommu_types.h | 2 ++ 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 21cb088d6687..f7cdd2ab7f11 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3169,21 +3169,24 @@ static void amd_iommu_get_resv_regions(struct device *dev, return; list_for_each_entry(entry, &amd_iommu_unity_map, list) { + int type, prot = 0; size_t length; - int prot = 0; if (devid < entry->devid_start || devid > entry->devid_end) continue; + type = IOMMU_RESV_DIRECT; length = entry->address_end - entry->address_start; if (entry->prot & IOMMU_PROT_IR) prot |= IOMMU_READ; if (entry->prot & IOMMU_PROT_IW) prot |= IOMMU_WRITE; + if (entry->prot & IOMMU_UNITY_MAP_FLAG_EXCL_RANGE) + /* Exclusion range */ + type = IOMMU_RESV_RESERVED; region = iommu_alloc_resv_region(entry->address_start, - length, prot, - IOMMU_RESV_DIRECT); + length, prot, type); if (!region) { dev_err(dev, "Out of memory allocating dm-regions\n"); return; diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index f773792d77fd..1b1378619fc9 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2013,6 +2013,9 @@ static int __init init_unity_map_range(struct ivmd_header *m) if (e == NULL) return -ENOMEM; + if (m->flags & IVMD_FLAG_EXCL_RANGE) + init_exclusion_range(m); + switch (m->type) { default: kfree(e); @@ -2059,9 +2062,7 @@ static int __init init_memory_definitions(struct acpi_table_header *table) while (p < end) { m = (struct ivmd_header *)p; - if (m->flags & IVMD_FLAG_EXCL_RANGE) - init_exclusion_range(m); - else if (m->flags & IVMD_FLAG_UNITY_MAP) + if (m->flags & (IVMD_FLAG_UNITY_MAP | IVMD_FLAG_EXCL_RANGE)) init_unity_map_range(m); p += m->length; diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index eae0741f72dc..87965e4d9647 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -374,6 +374,8 @@ #define IOMMU_PROT_IR 0x01 #define IOMMU_PROT_IW 0x02 +#define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE (1 << 2) + /* IOMMU capabilities */ #define IOMMU_CAP_IOTLB 24 #define IOMMU_CAP_NPCACHE 26 From 33bac912840fe64dbc15556302537dc6a17cac63 Mon Sep 17 00:00:00 2001 From: Gao Xiang Date: Fri, 29 Mar 2019 04:14:58 +0800 Subject: [PATCH 441/468] staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir() After commit 419d6efc50e9, kernel cannot be crashed in the namei path. However, corrupted nameoff can do harm in the process of readdir for scenerios without dm-verity as well. Fix it now. Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations") Cc: # 4.19+ Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Signed-off-by: Greg Kroah-Hartman --- drivers/staging/erofs/dir.c | 45 ++++++++++++++++++++----------------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/drivers/staging/erofs/dir.c b/drivers/staging/erofs/dir.c index 829f7b12e0dc..9bbc68729c11 100644 --- a/drivers/staging/erofs/dir.c +++ b/drivers/staging/erofs/dir.c @@ -23,6 +23,21 @@ static const unsigned char erofs_filetype_table[EROFS_FT_MAX] = { [EROFS_FT_SYMLINK] = DT_LNK, }; +static void debug_one_dentry(unsigned char d_type, const char *de_name, + unsigned int de_namelen) +{ +#ifdef CONFIG_EROFS_FS_DEBUG + /* since the on-disk name could not have the trailing '\0' */ + unsigned char dbg_namebuf[EROFS_NAME_LEN + 1]; + + memcpy(dbg_namebuf, de_name, de_namelen); + dbg_namebuf[de_namelen] = '\0'; + + debugln("found dirent %s de_len %u d_type %d", dbg_namebuf, + de_namelen, d_type); +#endif +} + static int erofs_fill_dentries(struct dir_context *ctx, void *dentry_blk, unsigned int *ofs, unsigned int nameoff, unsigned int maxsize) @@ -33,14 +48,10 @@ static int erofs_fill_dentries(struct dir_context *ctx, de = dentry_blk + *ofs; while (de < end) { const char *de_name; - int de_namelen; + unsigned int de_namelen; unsigned char d_type; -#ifdef CONFIG_EROFS_FS_DEBUG - unsigned int dbg_namelen; - unsigned char dbg_namebuf[EROFS_NAME_LEN]; -#endif - if (unlikely(de->file_type < EROFS_FT_MAX)) + if (de->file_type < EROFS_FT_MAX) d_type = erofs_filetype_table[de->file_type]; else d_type = DT_UNKNOWN; @@ -48,26 +59,20 @@ static int erofs_fill_dentries(struct dir_context *ctx, nameoff = le16_to_cpu(de->nameoff); de_name = (char *)dentry_blk + nameoff; - de_namelen = unlikely(de + 1 >= end) ? - /* last directory entry */ - strnlen(de_name, maxsize - nameoff) : - le16_to_cpu(de[1].nameoff) - nameoff; + /* the last dirent in the block? */ + if (de + 1 >= end) + de_namelen = strnlen(de_name, maxsize - nameoff); + else + de_namelen = le16_to_cpu(de[1].nameoff) - nameoff; /* a corrupted entry is found */ - if (unlikely(de_namelen < 0)) { + if (unlikely(nameoff + de_namelen > maxsize || + de_namelen > EROFS_NAME_LEN)) { DBG_BUGON(1); return -EIO; } -#ifdef CONFIG_EROFS_FS_DEBUG - dbg_namelen = min(EROFS_NAME_LEN - 1, de_namelen); - memcpy(dbg_namebuf, de_name, dbg_namelen); - dbg_namebuf[dbg_namelen] = '\0'; - - debugln("%s, found de_name %s de_len %d d_type %d", __func__, - dbg_namebuf, de_namelen, d_type); -#endif - + debug_one_dentry(d_type, de_name, de_namelen); if (!dir_emit(ctx, de_name, de_namelen, le64_to_cpu(de->nid), d_type)) /* stopped by some reason */ From cc26358f89c3e493b54766b1ca56cfc6b14db78a Mon Sep 17 00:00:00 2001 From: Malcolm Priestley Date: Wed, 27 Mar 2019 18:45:26 +0000 Subject: [PATCH 442/468] staging: vt6655: Remove vif check from vnt_interrupt A check for vif is made in vnt_interrupt_work. There is a small chance of leaving interrupt disabled while vif is NULL and the work hasn't been scheduled. Signed-off-by: Malcolm Priestley CC: stable@vger.kernel.org # v4.2+ Signed-off-by: Greg Kroah-Hartman --- drivers/staging/vt6655/device_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c index 83f1a1cf9182..c6bb4aaf9bd0 100644 --- a/drivers/staging/vt6655/device_main.c +++ b/drivers/staging/vt6655/device_main.c @@ -1137,8 +1137,7 @@ static irqreturn_t vnt_interrupt(int irq, void *arg) { struct vnt_private *priv = arg; - if (priv->vif) - schedule_work(&priv->interrupt_work); + schedule_work(&priv->interrupt_work); MACvIntDisable(priv->PortOffset); From c412a769d2452161e97f163c4c4f31efc6626f06 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Thu, 28 Mar 2019 20:43:15 -0700 Subject: [PATCH 443/468] kasan: fix variable 'tag' set but not used warning set_tag() compiles away when CONFIG_KASAN_SW_TAGS=n, so make arch_kasan_set_tag() a static inline function to fix warnings below. mm/kasan/common.c: In function '__kasan_kmalloc': mm/kasan/common.c:475:5: warning: variable 'tag' set but not used [-Wunused-but-set-variable] u8 tag; ^~~ Link: http://lkml.kernel.org/r/20190307185244.54648-1-cai@lca.pw Signed-off-by: Qian Cai Reviewed-by: Andrey Konovalov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index 3e0c11f7d7a1..3ce956efa0cb 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -163,7 +163,10 @@ static inline u8 random_tag(void) #endif #ifndef arch_kasan_set_tag -#define arch_kasan_set_tag(addr, tag) ((void *)(addr)) +static inline const void *arch_kasan_set_tag(const void *addr, u8 tag) +{ + return addr; +} #endif #ifndef arch_kasan_reset_tag #define arch_kasan_reset_tag(addr) ((void *)(addr)) From cae85cb8add35f678cf487139d05e083ce2f570a Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 28 Mar 2019 20:43:19 -0700 Subject: [PATCH 444/468] mm/memory.c: fix modifying of page protection by insert_pfn() Aneesh has reported that PPC triggers the following warning when excercising DAX code: IP set_pte_at+0x3c/0x190 LR insert_pfn+0x208/0x280 Call Trace: insert_pfn+0x68/0x280 dax_iomap_pte_fault.isra.7+0x734/0xa40 __xfs_filemap_fault+0x280/0x2d0 do_wp_page+0x48c/0xa40 __handle_mm_fault+0x8d0/0x1fd0 handle_mm_fault+0x140/0x250 __do_page_fault+0x300/0xd60 handle_page_fault+0x18 Now that is WARN_ON in set_pte_at which is VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); The problem is that on some architectures set_pte_at() cannot cope with a situation where there is already some (different) valid entry present. Use ptep_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PTE. Link: http://lkml.kernel.org/r/20190311084537.16029-1-jack@suse.cz Fixes: b2770da64254 "mm: add vm_insert_mixed_mkwrite()" Signed-off-by: Jan Kara Reported-by: "Aneesh Kumar K.V" Reviewed-by: Aneesh Kumar K.V Acked-by: Dan Williams Cc: Chandan Rajendra Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 47fe250307c7..ab650c21bccd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1549,10 +1549,12 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte))); goto out_unlock; } - entry = *pte; - goto out_mkwrite; - } else - goto out_unlock; + entry = pte_mkyoung(*pte); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (ptep_set_access_flags(vma, addr, pte, entry, 1)) + update_mmu_cache(vma, addr, pte); + } + goto out_unlock; } /* Ok, finally just insert the thing.. */ @@ -1561,7 +1563,6 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, else entry = pte_mkspecial(pfn_t_pte(pfn, prot)); -out_mkwrite: if (mkwrite) { entry = pte_mkyoung(entry); entry = maybe_mkwrite(pte_mkdirty(entry), vma); From 44dc1b1fab787d265b9b3064bd564c87b6b86397 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Thu, 28 Mar 2019 20:43:23 -0700 Subject: [PATCH 445/468] mm/debug.c: add a cast to u64 for atomic64_read() atomic64_read() on ppc64le returns "long int", so fix the same way as commit d549f545e690 ("drm/virtio: use %llu format string form atomic64_t") by adding a cast to u64, which makes it work on all arches. In file included from ./include/linux/printk.h:7, from ./include/linux/kernel.h:15, from mm/debug.c:9: mm/debug.c: In function 'dump_mm': ./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 19 has type 'long int' [-Wformat=] #define KERN_SOH "A" /* ASCII Start Of Header */ ^~~~~~ ./include/linux/kern_levels.h:8:20: note: in expansion of macro 'KERN_SOH' #define KERN_EMERG KERN_SOH "0" /* system is unusable */ ^~~~~~~~ ./include/linux/printk.h:297:9: note: in expansion of macro 'KERN_EMERG' printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~ mm/debug.c:133:2: note: in expansion of macro 'pr_emerg' pr_emerg("mm %px mmap %px seqnum %llu task_size %lu" ^~~~~~~~ mm/debug.c:140:17: note: format string is defined here "pinned_vm %llx data_vm %lx exec_vm %lx stack_vm %lx" ~~~^ %lx Link: http://lkml.kernel.org/r/20190310183051.87303-1-cai@lca.pw Fixes: 70f8a3ca68d3 ("mm: make mm->pinned_vm an atomic64 counter") Signed-off-by: Qian Cai Acked-by: Davidlohr Bueso Cc: Jason Gunthorpe Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/debug.c b/mm/debug.c index c0b31b6c3877..45d9eb77b84e 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -168,7 +168,7 @@ void dump_mm(const struct mm_struct *mm) mm_pgtables_bytes(mm), mm->map_count, mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm, - atomic64_read(&mm->pinned_vm), + (u64)atomic64_read(&mm->pinned_vm), mm->data_vm, mm->exec_vm, mm->stack_vm, mm->start_code, mm->end_code, mm->start_data, mm->end_data, mm->start_brk, mm->brk, mm->start_stack, From c1e287c11b752b055257196c5e98e4e91f401b32 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Thu, 28 Mar 2019 20:43:27 -0700 Subject: [PATCH 446/468] mailmap: add Changbin Du Add my email in the mailmap file to have a consistent shortlog output. Link: http://lkml.kernel.org/r/20190308142103.4929-1-changbin.du@gmail.com Signed-off-by: Changbin Du Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index 37e1847c7988..b2cde8668dcc 100644 --- a/.mailmap +++ b/.mailmap @@ -224,3 +224,5 @@ Yakir Yang Yusuke Goda Gustavo Padovan Gustavo Padovan +Changbin Du +Changbin Du From 73601ea5b7b18eb234219ae2adf77530f389da79 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 28 Mar 2019 20:43:30 -0700 Subject: [PATCH 447/468] fs/open.c: allow opening only regular files during execve() syzbot is hitting lockdep warning [1] due to trying to open a fifo during an execve() operation. But we don't need to open non regular files during an execve() operation, for all files which we will need are the executable file itself and the interpreter programs like /bin/sh and ld-linux.so.2 . Since the manpage for execve(2) says that execve() returns EACCES when the file or a script interpreter is not a regular file, and the manpage for uselib(2) says that uselib() can return EACCES, and we use FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non regular file is requested with FMODE_EXEC set. Since this deadlock followed by khungtaskd warnings is trivially reproducible by a local unprivileged user, and syzbot's frequent crash due to this deadlock defers finding other bugs, let's workaround this deadlock until we get a chance to find a better solution. [1] https://syzkaller.appspot.com/bug?id=b5095bfec44ec84213bac54742a82483aad578ce Link: http://lkml.kernel.org/r/1552044017-7890-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot Fixes: 8924feff66f35fe2 ("splice: lift pipe_lock out of splice_to_pipe()") Signed-off-by: Tetsuo Handa Acked-by: Kees Cook Cc: Al Viro Cc: Eric Biggers Cc: Dmitry Vyukov Cc: [4.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/open.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/open.c b/fs/open.c index 0285ce7dbd51..f1c2f855fd43 100644 --- a/fs/open.c +++ b/fs/open.c @@ -733,6 +733,12 @@ static int do_dentry_open(struct file *f, return 0; } + /* Any file opened for execve()/uselib() has to be a regular file. */ + if (unlikely(f->f_flags & FMODE_EXEC && !S_ISREG(inode->i_mode))) { + error = -EACCES; + goto cleanup_file; + } + if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { error = get_write_access(inode); if (unlikely(error)) From 9b7ea46a82b31c74a37e6ff1c2a1df7d53e392ab Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Thu, 28 Mar 2019 20:43:34 -0700 Subject: [PATCH 448/468] mm/hotplug: fix offline undo_isolate_page_range() Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages on ppc64, an "int" should still be enough to represent the number of pageblocks there. Fix an incorrect comment along the way. [cai@lca.pw: v4] Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: Vlastimil Babka Cc: [4.13+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-isolation.h | 10 ------- mm/memory_hotplug.c | 17 +++++++++--- mm/page_alloc.c | 2 +- mm/page_isolation.c | 48 +++++++++++++++++++++------------- mm/sparse.c | 2 +- 5 files changed, 45 insertions(+), 34 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 4eb26d278046..280ae96dc4c3 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -41,16 +41,6 @@ int move_freepages_block(struct zone *zone, struct page *page, /* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. - * If specified range includes migrate types other than MOVABLE or CMA, - * this will fail with -EBUSY. - * - * For isolating all pages in the range finally, the caller have to - * free all pages in the range. test_page_isolated() can be used for - * test it. - * - * The following flags are allowed (they can be combined in a bit mask) - * SKIP_HWPOISON - ignore hwpoison pages - * REPORT_FAILURE - report details about the failure to isolate the range */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f767582af4f8..0e0a16021fd5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1576,7 +1576,7 @@ static int __ref __offline_pages(unsigned long start_pfn, { unsigned long pfn, nr_pages; long offlined_pages; - int ret, node; + int ret, node, nr_isolate_pageblock; unsigned long flags; unsigned long valid_start, valid_end; struct zone *zone; @@ -1602,10 +1602,11 @@ static int __ref __offline_pages(unsigned long start_pfn, ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, SKIP_HWPOISON | REPORT_FAILURE); - if (ret) { + if (ret < 0) { reason = "failure to isolate range"; goto failed_removal; } + nr_isolate_pageblock = ret; arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1657,8 +1658,16 @@ static int __ref __offline_pages(unsigned long start_pfn, /* Ok, all of our target is isolated. We cannot do rollback at this point. */ offline_isolated_pages(start_pfn, end_pfn); - /* reset pagetype flags and makes migrate type to be MOVABLE */ - undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + + /* + * Onlining will reset pagetype flags and makes migrate type + * MOVABLE, so just need to decrease the number of isolated + * pageblocks zone counter here. + */ + spin_lock_irqsave(&zone->lock, flags); + zone->nr_isolate_pageblock -= nr_isolate_pageblock; + spin_unlock_irqrestore(&zone->lock, flags); + /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); zone->present_pages -= offlined_pages; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73d47da..d96ca5bc555b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8233,7 +8233,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, ret = start_isolate_page_range(pfn_max_align_down(start), pfn_max_align_up(end), migratetype, 0); - if (ret) + if (ret < 0) return ret; /* diff --git a/mm/page_isolation.c b/mm/page_isolation.c index ce323e56b34d..bf4159d771c7 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -160,27 +160,36 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) return NULL; } -/* - * start_isolate_page_range() -- make page-allocation-type of range of pages - * to be MIGRATE_ISOLATE. - * @start_pfn: The lower PFN of the range to be isolated. - * @end_pfn: The upper PFN of the range to be isolated. - * @migratetype: migrate type to set in error recovery. +/** + * start_isolate_page_range() - make page-allocation-type of range of pages to + * be MIGRATE_ISOLATE. + * @start_pfn: The lower PFN of the range to be isolated. + * @end_pfn: The upper PFN of the range to be isolated. + * start_pfn/end_pfn must be aligned to pageblock_order. + * @migratetype: Migrate type to set in error recovery. + * @flags: The following flags are allowed (they can be combined in + * a bit mask) + * SKIP_HWPOISON - ignore hwpoison pages + * REPORT_FAILURE - report details about the failure to + * isolate the range * * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in * the range will never be allocated. Any free pages and pages freed in the - * future will not be allocated again. - * - * start_pfn/end_pfn must be aligned to pageblock_order. - * Return 0 on success and -EBUSY if any part of range cannot be isolated. + * future will not be allocated again. If specified range includes migrate types + * other than MOVABLE or CMA, this will fail with -EBUSY. For isolating all + * pages in the range finally, the caller have to free all pages in the range. + * test_page_isolated() can be used for test it. * * There is no high level synchronization mechanism that prevents two threads - * from trying to isolate overlapping ranges. If this happens, one thread + * from trying to isolate overlapping ranges. If this happens, one thread * will notice pageblocks in the overlapping range already set to isolate. * This happens in set_migratetype_isolate, and set_migratetype_isolate - * returns an error. We then clean up by restoring the migration type on - * pageblocks we may have modified and return -EBUSY to caller. This + * returns an error. We then clean up by restoring the migration type on + * pageblocks we may have modified and return -EBUSY to caller. This * prevents two threads from simultaneously working on overlapping ranges. + * + * Return: the number of isolated pageblocks on success and -EBUSY if any part + * of range cannot be isolated. */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype, int flags) @@ -188,6 +197,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned long pfn; unsigned long undo_pfn; struct page *page; + int nr_isolate_pageblock = 0; BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); @@ -196,13 +206,15 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, pfn < end_pfn; pfn += pageblock_nr_pages) { page = __first_valid_page(pfn, pageblock_nr_pages); - if (page && - set_migratetype_isolate(page, migratetype, flags)) { - undo_pfn = pfn; - goto undo; + if (page) { + if (set_migratetype_isolate(page, migratetype, flags)) { + undo_pfn = pfn; + goto undo; + } + nr_isolate_pageblock++; } } - return 0; + return nr_isolate_pageblock; undo: for (pfn = start_pfn; pfn < undo_pfn; diff --git a/mm/sparse.c b/mm/sparse.c index 69904aa6165b..56e057c432f9 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -567,7 +567,7 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn) } #ifdef CONFIG_MEMORY_HOTREMOVE -/* Mark all memory sections within the pfn range as online */ +/* Mark all memory sections within the pfn range as offline */ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; From e6a9467ea14bae8691b0f72c500510c42ea8edb8 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Thu, 28 Mar 2019 20:43:38 -0700 Subject: [PATCH 449/468] ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that we always grab cluster locks in order of increasing inode number. Unfortunately, we forget to swap the inode record buffer head pointers when we've done this, which leads to incorrect bookkeepping when we're trying to make the two inodes have the same refcount tree. This has the effect of causing filesystem shutdowns if you're trying to reflink data from inode 100 into inode 97, where inode 100 already has a refcount tree attached and inode 97 doesn't. The reflink code decides to copy the refcount tree pointer from 100 to 97, but uses inode 97's inode record to open the tree root (which it doesn't have) and blows up. This issue causes filesystem shutdowns and metadata corruption! Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia Fixes: 29ac8e856cb369 ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features") Signed-off-by: Darrick J. Wong Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/refcounttree.c | 42 +++++++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index a35259eebc56..1dc9a08e8bdc 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -4719,22 +4719,23 @@ loff_t ocfs2_reflink_remap_blocks(struct inode *s_inode, /* Lock an inode and grab a bh pointing to the inode. */ int ocfs2_reflink_inodes_lock(struct inode *s_inode, - struct buffer_head **bh1, + struct buffer_head **bh_s, struct inode *t_inode, - struct buffer_head **bh2) + struct buffer_head **bh_t) { - struct inode *inode1; - struct inode *inode2; + struct inode *inode1 = s_inode; + struct inode *inode2 = t_inode; struct ocfs2_inode_info *oi1; struct ocfs2_inode_info *oi2; + struct buffer_head *bh1 = NULL; + struct buffer_head *bh2 = NULL; bool same_inode = (s_inode == t_inode); + bool need_swap = (inode1->i_ino > inode2->i_ino); int status; /* First grab the VFS and rw locks. */ lock_two_nondirectories(s_inode, t_inode); - inode1 = s_inode; - inode2 = t_inode; - if (inode1->i_ino > inode2->i_ino) + if (need_swap) swap(inode1, inode2); status = ocfs2_rw_lock(inode1, 1); @@ -4757,17 +4758,13 @@ int ocfs2_reflink_inodes_lock(struct inode *s_inode, trace_ocfs2_double_lock((unsigned long long)oi1->ip_blkno, (unsigned long long)oi2->ip_blkno); - if (*bh1) - *bh1 = NULL; - if (*bh2) - *bh2 = NULL; - /* We always want to lock the one with the lower lockid first. */ if (oi1->ip_blkno > oi2->ip_blkno) mlog_errno(-ENOLCK); /* lock id1 */ - status = ocfs2_inode_lock_nested(inode1, bh1, 1, OI_LS_REFLINK_TARGET); + status = ocfs2_inode_lock_nested(inode1, &bh1, 1, + OI_LS_REFLINK_TARGET); if (status < 0) { if (status != -ENOENT) mlog_errno(status); @@ -4776,15 +4773,25 @@ int ocfs2_reflink_inodes_lock(struct inode *s_inode, /* lock id2 */ if (!same_inode) { - status = ocfs2_inode_lock_nested(inode2, bh2, 1, + status = ocfs2_inode_lock_nested(inode2, &bh2, 1, OI_LS_REFLINK_TARGET); if (status < 0) { if (status != -ENOENT) mlog_errno(status); goto out_cl1; } - } else - *bh2 = *bh1; + } else { + bh2 = bh1; + } + + /* + * If we swapped inode order above, we have to swap the buffer heads + * before passing them back to the caller. + */ + if (need_swap) + swap(bh1, bh2); + *bh_s = bh1; + *bh_t = bh2; trace_ocfs2_double_lock_end( (unsigned long long)oi1->ip_blkno, @@ -4794,8 +4801,7 @@ int ocfs2_reflink_inodes_lock(struct inode *s_inode, out_cl1: ocfs2_inode_unlock(inode1, 1); - brelse(*bh1); - *bh1 = NULL; + brelse(bh1); out_rw2: ocfs2_rw_unlock(inode2, 1); out_i2: From 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a Mon Sep 17 00:00:00 2001 From: Nicolas Boichat Date: Thu, 28 Mar 2019 20:43:42 -0700 Subject: [PATCH 450/468] mm: add support for kmem caches in DMA32 zone Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables", v6. This is a followup to the discussion in [1], [2]. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For L1 tables that are bigger than a page, we can just use __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA). For L2 tables that only take 1KB, it would be a waste to allocate a full page, so we considered 3 approaches: 1. This series, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. [3] This series is the most memory-efficient approach. stable@ note: We confirmed that this is a regression, and IOMMU errors happen on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek platforms (and maybe others?). [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html [3] https://patchwork.codeaurora.org/patch/671639/ This patch (of 3): IOMMUs using ARMv7 short-descriptor format require page tables to be allocated within the first 4GB of RAM, even on 64-bit systems. On arm64, this is done by passing GFP_DMA32 flag to memory allocation functions. For IOMMU L2 tables that only take 1KB, it would be a waste to allocate a full page using get_free_pages, so we considered 3 approaches: 1. This patch, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. This change makes it possible to create a custom cache in DMA32 zone using kmem_cache_create, then allocate memory using kmem_cache_alloc. We do not create a DMA32 kmalloc cache array, as there are currently no users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK. This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32 kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and unnecessary). Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat Acked-by: Vlastimil Babka Acked-by: Will Deacon Cc: Robin Murphy Cc: Joerg Roedel Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Michal Hocko Cc: Mel Gorman Cc: Sasha Levin Cc: Huaisheng Ye Cc: Mike Rapoport Cc: Yong Wu Cc: Matthias Brugger Cc: Tomasz Figa Cc: Yingjoe Chen Cc: Christoph Hellwig Cc: Matthew Wilcox Cc: Hsin-Yi Wang Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/slab.h | 2 ++ mm/slab.c | 2 ++ mm/slab.h | 3 ++- mm/slab_common.c | 2 +- mm/slub.c | 5 +++++ 5 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 11b45f7ae405..9449b19c5f10 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -32,6 +32,8 @@ #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U) /* Use GFP_DMA memory */ #define SLAB_CACHE_DMA ((slab_flags_t __force)0x00004000U) +/* Use GFP_DMA32 memory */ +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x00008000U) /* DEBUG: Store the last owner for bug hunting */ #define SLAB_STORE_USER ((slab_flags_t __force)0x00010000U) /* Panic if kmem_cache_create() fails */ diff --git a/mm/slab.c b/mm/slab.c index 28652e4218e0..329bfe67f2ca 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2115,6 +2115,8 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags) cachep->allocflags = __GFP_COMP; if (flags & SLAB_CACHE_DMA) cachep->allocflags |= GFP_DMA; + if (flags & SLAB_CACHE_DMA32) + cachep->allocflags |= GFP_DMA32; if (flags & SLAB_RECLAIM_ACCOUNT) cachep->allocflags |= __GFP_RECLAIMABLE; cachep->size = size; diff --git a/mm/slab.h b/mm/slab.h index e5e6658eeacc..43ac818b8592 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -127,7 +127,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, /* Legal flag mask for kmem_cache_create(), for various configurations */ -#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | SLAB_PANIC | \ +#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ + SLAB_CACHE_DMA32 | SLAB_PANIC | \ SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS ) #if defined(CONFIG_DEBUG_SLAB) diff --git a/mm/slab_common.c b/mm/slab_common.c index 03eeb8b7b4b1..58251ba63e4a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -53,7 +53,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, SLAB_FAILSLAB | SLAB_KASAN) #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ - SLAB_ACCOUNT) + SLAB_CACHE_DMA32 | SLAB_ACCOUNT) /* * Merge control. If this is set then no merging of slab caches will occur. diff --git a/mm/slub.c b/mm/slub.c index 1b08fbcb7e61..d30ede89f4a6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3589,6 +3589,9 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) if (s->flags & SLAB_CACHE_DMA) s->allocflags |= GFP_DMA; + if (s->flags & SLAB_CACHE_DMA32) + s->allocflags |= GFP_DMA32; + if (s->flags & SLAB_RECLAIM_ACCOUNT) s->allocflags |= __GFP_RECLAIMABLE; @@ -5679,6 +5682,8 @@ static char *create_unique_id(struct kmem_cache *s) */ if (s->flags & SLAB_CACHE_DMA) *p++ = 'd'; + if (s->flags & SLAB_CACHE_DMA32) + *p++ = 'D'; if (s->flags & SLAB_RECLAIM_ACCOUNT) *p++ = 'a'; if (s->flags & SLAB_CONSISTENCY_CHECKS) From 0a352554da69b02f75ca3389c885c741f1f63235 Mon Sep 17 00:00:00 2001 From: Nicolas Boichat Date: Thu, 28 Mar 2019 20:43:46 -0700 Subject: [PATCH 451/468] iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 is defined (e.g. on arm64 platforms). For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc, as this is not strictly necessary, and would cause a warning in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK. Also, print an error when the physical address does not fit in 32-bit, to make debugging easier in the future. Link: http://lkml.kernel.org/r/20181210011504.122604-3-drinkcat@chromium.org Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Signed-off-by: Nicolas Boichat Acked-by: Will Deacon Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Hsin-Yi Wang Cc: Huaisheng Ye Cc: Joerg Roedel Cc: Joonsoo Kim Cc: Matthew Wilcox Cc: Matthias Brugger Cc: Mel Gorman Cc: Michal Hocko Cc: Mike Rapoport Cc: Pekka Enberg Cc: Robin Murphy Cc: Sasha Levin Cc: Tomasz Figa Cc: Vlastimil Babka Cc: Yingjoe Chen Cc: Yong Wu Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/iommu/io-pgtable-arm-v7s.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index f101afc315ab..9a8a8870e267 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -160,6 +160,14 @@ #define ARM_V7S_TCR_PD1 BIT(5) +#ifdef CONFIG_ZONE_DMA32 +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 +#define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32 +#else +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA +#define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA +#endif + typedef u32 arm_v7s_iopte; static bool selftest_running; @@ -197,13 +205,16 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, void *table = NULL; if (lvl == 1) - table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size)); + table = (void *)__get_free_pages( + __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size)); else if (lvl == 2) - table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA); + table = kmem_cache_zalloc(data->l2_tables, gfp); phys = virt_to_phys(table); - if (phys != (arm_v7s_iopte)phys) + if (phys != (arm_v7s_iopte)phys) { /* Doesn't fit in PTE */ + dev_err(dev, "Page table does not fit in PTE: %pa", &phys); goto out_free; + } if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) { dma = dma_map_single(dev, table, size, DMA_TO_DEVICE); if (dma_mapping_error(dev, dma)) @@ -733,7 +744,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2", ARM_V7S_TABLE_SIZE(2), ARM_V7S_TABLE_SIZE(2), - SLAB_CACHE_DMA, NULL); + ARM_V7S_TABLE_SLAB_FLAGS, NULL); if (!data->l2_tables) goto out_free_data; From a953e7721fa9999fd628885ed451e16641a23d1e Mon Sep 17 00:00:00 2001 From: Souptick Joarder Date: Thu, 28 Mar 2019 20:43:51 -0700 Subject: [PATCH 452/468] include/linux/hugetlb.h: convert to use vm_fault_t kbuild produces the below warning: tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 5453a3df2a5eb49bc24615d4cf0d66b2aae05e5f commit 3d3539018d2c ("mm: create the new vm_fault_t type") reproduce: # apt-get install sparse git checkout 3d3539018d2cbd12e5af4a132636ee7fd8d43ef0 make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' >> mm/memory.c:3968:21: sparse: incorrect type in assignment (different >> base types) @@ expected restricted vm_fault_t [usertype] ret @@ >> got e] ret @@ mm/memory.c:3968:21: expected restricted vm_fault_t [usertype] ret mm/memory.c:3968:21: got int This patch converts to return vm_fault_t type for hugetlb_fault() when CONFIG_HUGETLB_PAGE=n. Regarding the sparse warning, Luc said: : This is the expected behaviour. The constant 0 is magic regarding bitwise : types but ({ ...; 0; }) is not, it is just an ordinary expression of type : 'int'. : : So, IMHO, Souptick's patch is the right thing to do. Link: http://lkml.kernel.org/r/20190318162604.GA31553@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder Reviewed-by: Mike Kravetz Cc: Matthew Wilcox Cc: Luc Van Oostenryck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ea35263eb76b..11943b60f208 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -203,7 +203,6 @@ static inline void hugetlb_show_meminfo(void) #define pud_huge(x) 0 #define is_hugepage_only_range(mm, addr, len) 0 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) -#define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; }) #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \ src_addr, pagep) ({ BUG(); 0; }) #define huge_pte_offset(mm, address, sz) 0 @@ -234,6 +233,13 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb, { BUG(); } +static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, + unsigned int flags) +{ + BUG(); + return 0; +} #endif /* !CONFIG_HUGETLB_PAGE */ /* From a7f40cfe3b7ada57af9b62fd28430eeb4a7cfcb7 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 28 Mar 2019 20:43:55 -0700 Subject: [PATCH 453/468] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified When MPOL_MF_STRICT was specified and an existing page was already on a node that does not follow the policy, mbind() should return -EIO. But commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") broke the rule. And commit c8633798497c ("mm: mempolicy: mbind and migrate_pages support thp migration") didn't return the correct value for THP mbind() too. If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing page was already on a node that does not follow the policy. And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified. Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c [akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: Yang Shi Signed-off-by: Oscar Salvador Reported-by: Cyril Hrubis Suggested-by: Kirill A. Shutemov Acked-by: Rafael Aquini Reviewed-by: Oscar Salvador Acked-by: David Rientjes Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 40 +++++++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 7 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index af171ccb56a2..2219e747df49 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -428,6 +428,13 @@ static inline bool queue_pages_required(struct page *page, return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT); } +/* + * queue_pages_pmd() has three possible return values: + * 1 - pages are placed on the right node or queued successfully. + * 0 - THP was split. + * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing + * page was already on a node that does not follow the policy. + */ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -437,7 +444,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, unsigned long flags; if (unlikely(is_pmd_migration_entry(*pmd))) { - ret = 1; + ret = -EIO; goto unlock; } page = pmd_page(*pmd); @@ -454,8 +461,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, ret = 1; flags = qp->flags; /* go to thp migration */ - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + if (!vma_migratable(walk->vma)) { + ret = -EIO; + goto unlock; + } + migrate_page_add(page, qp->pagelist, flags); + } else + ret = -EIO; unlock: spin_unlock(ptl); out: @@ -480,8 +494,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { ret = queue_pages_pmd(pmd, ptl, addr, end, walk); - if (ret) + if (ret > 0) return 0; + else if (ret < 0) + return ret; } if (pmd_trans_unstable(pmd)) @@ -502,11 +518,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_pages_required(page, qp)) continue; - migrate_page_add(page, qp->pagelist, flags); + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + if (!vma_migratable(vma)) + break; + migrate_page_add(page, qp->pagelist, flags); + } else + break; } pte_unmap_unlock(pte - 1, ptl); cond_resched(); - return 0; + return addr != end ? -EIO : 0; } static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, @@ -576,7 +597,12 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, unsigned long endvma = vma->vm_end; unsigned long flags = qp->flags; - if (!vma_migratable(vma)) + /* + * Need check MPOL_MF_STRICT to return -EIO if possible + * regardless of vma_migratable + */ + if (!vma_migratable(vma) && + !(flags & MPOL_MF_STRICT)) return 1; if (endvma > end) @@ -603,7 +629,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, } /* queue pages from current vma */ - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) + if (flags & MPOL_MF_VALID) return 0; return 1; } From 5ae2efb1dea9f537453e841714e3ee2757595aec Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Thu, 28 Mar 2019 20:44:01 -0700 Subject: [PATCH 454/468] mm/debug.c: fix __dump_page when mapping->host is not set While debugging something, I added a dump_page() into do_swap_page(), and I got the splat from below. The issue happens when dereferencing mapping->host in __dump_page(): ... else if (mapping) { pr_warn("%ps ", mapping->a_ops); if (mapping->host->i_dentry.first) { struct dentry *dentry; dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias); pr_warn("name:\"%pd\" ", dentry); } } ... Swap address space does not contain an inode information, and so mapping->host equals NULL. Although the dump_page() call was added artificially into do_swap_page(), I am not sure if we can hit this from any other path, so it looks worth fixing it. We can easily do that by checking mapping->host first. Link: http://lkml.kernel.org/r/20190318072931.29094-1-osalvador@suse.de Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page") Signed-off-by: Oscar Salvador Acked-by: Michal Hocko Acked-by: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/debug.c b/mm/debug.c index 45d9eb77b84e..eee9c221280c 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -79,7 +79,7 @@ void __dump_page(struct page *page, const char *reason) pr_warn("ksm "); else if (mapping) { pr_warn("%ps ", mapping->a_ops); - if (mapping->host->i_dentry.first) { + if (mapping->host && mapping->host->i_dentry.first) { struct dentry *dentry; dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias); pr_warn("name:\"%pd\" ", dentry); From b736523f0759d1debeb56f8e0c4c87a2bea0fb23 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 28 Mar 2019 20:44:05 -0700 Subject: [PATCH 455/468] include/linux/list.h: fix list_is_first() kernel-doc Fix typo of kernel-doc parameter notation (there should be no space between '@' and the parameter name). Also fixes bogus kernel-doc notation output formatting. Link: http://lkml.kernel.org/r/ddce8b80-9a8a-d52d-3546-87b2211c089a@infradead.org Fixes: 70b44595eafe9 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Randy Dunlap Acked-by: Mel Gorman Reviewed-by: William Kucharski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/list.h b/include/linux/list.h index 79626b5ab36c..58aa3adf94e6 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -207,7 +207,7 @@ static inline void list_bulk_move_tail(struct list_head *head, } /** - * list_is_first -- tests whether @ list is the first entry in list @head + * list_is_first -- tests whether @list is the first entry in list @head * @list: the entry to test * @head: the head of the list */ From eebf36480678f948b3ed15d56ca7b8e6194e7c18 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 28 Mar 2019 20:44:09 -0700 Subject: [PATCH 456/468] fs/proc/kcore.c: make kcore_modules static Fix sparse warning: fs/proc/kcore.c:591:19: warning: symbol 'kcore_modules' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20190320135417.13272-1-yuehaibing@huawei.com Signed-off-by: YueHaibing Acked-by: Mukesh Ojha Cc: Alexey Dobriyan Cc: Omar Sandoval Cc: James Morse Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/kcore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index d29d869abec1..f5834488b67d 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -615,7 +615,7 @@ static void __init proc_kcore_text_init(void) /* * MODULES_VADDR has no intersection with VMALLOC_ADDR. */ -struct kcore_list kcore_modules; +static struct kcore_list kcore_modules; static void __init add_modules_range(void) { if (MODULES_VADDR != VMALLOC_START && MODULES_END != VMALLOC_END) { From fcfc2aa0185f4a731d05a21e9f359968fdfd02e7 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Thu, 28 Mar 2019 20:44:13 -0700 Subject: [PATCH 457/468] ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK There are a few system calls (pselect, ppoll, etc) which replace a task sigmask while they are running in a kernel-space When a task calls one of these syscalls, the kernel saves a current sigmask in task->saved_sigmask and sets a syscall sigmask. On syscall-exit-stop, ptrace traps a task before restoring the saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by saved_sigmask, when the task returns to user-space. This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag. Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com Fixes: 29000caecbe8 ("ptrace: add ability to get/set signal-blocked mask") Signed-off-by: Andrei Vagin Acked-by: Oleg Nesterov Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched/signal.h | 18 ++++++++++++++++++ kernel/ptrace.c | 15 +++++++++++++-- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index ae5655197698..e412c092c1e8 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -418,10 +418,20 @@ static inline void set_restore_sigmask(void) set_thread_flag(TIF_RESTORE_SIGMASK); WARN_ON(!test_thread_flag(TIF_SIGPENDING)); } + +static inline void clear_tsk_restore_sigmask(struct task_struct *tsk) +{ + clear_tsk_thread_flag(tsk, TIF_RESTORE_SIGMASK); +} + static inline void clear_restore_sigmask(void) { clear_thread_flag(TIF_RESTORE_SIGMASK); } +static inline bool test_tsk_restore_sigmask(struct task_struct *tsk) +{ + return test_tsk_thread_flag(tsk, TIF_RESTORE_SIGMASK); +} static inline bool test_restore_sigmask(void) { return test_thread_flag(TIF_RESTORE_SIGMASK); @@ -439,6 +449,10 @@ static inline void set_restore_sigmask(void) current->restore_sigmask = true; WARN_ON(!test_thread_flag(TIF_SIGPENDING)); } +static inline void clear_tsk_restore_sigmask(struct task_struct *tsk) +{ + tsk->restore_sigmask = false; +} static inline void clear_restore_sigmask(void) { current->restore_sigmask = false; @@ -447,6 +461,10 @@ static inline bool test_restore_sigmask(void) { return current->restore_sigmask; } +static inline bool test_tsk_restore_sigmask(struct task_struct *tsk) +{ + return tsk->restore_sigmask; +} static inline bool test_and_clear_restore_sigmask(void) { if (!current->restore_sigmask) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 771e93f9c43f..6f357f4fc859 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -29,6 +29,7 @@ #include #include #include +#include /* * Access another process' address space via ptrace. @@ -924,18 +925,26 @@ int ptrace_request(struct task_struct *child, long request, ret = ptrace_setsiginfo(child, &siginfo); break; - case PTRACE_GETSIGMASK: + case PTRACE_GETSIGMASK: { + sigset_t *mask; + if (addr != sizeof(sigset_t)) { ret = -EINVAL; break; } - if (copy_to_user(datavp, &child->blocked, sizeof(sigset_t))) + if (test_tsk_restore_sigmask(child)) + mask = &child->saved_sigmask; + else + mask = &child->blocked; + + if (copy_to_user(datavp, mask, sizeof(sigset_t))) ret = -EFAULT; else ret = 0; break; + } case PTRACE_SETSIGMASK: { sigset_t new_set; @@ -961,6 +970,8 @@ int ptrace_request(struct task_struct *child, long request, child->blocked = new_set; spin_unlock_irq(&child->sighand->siglock); + clear_tsk_restore_sigmask(child); + ret = 0; break; } From c4efe484b5f0d768e23c9731082fec827723e738 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Thu, 28 Mar 2019 20:44:16 -0700 Subject: [PATCH 458/468] mm/memory_hotplug.c: fix notification in offline error path When start_isolate_page_range() returned -EBUSY in __offline_pages(), it calls memory_notify(MEM_CANCEL_OFFLINE, &arg) with an uninitialized "arg". As the result, it triggers warnings below. Also, it is only necessary to notify MEM_CANCEL_OFFLINE after MEM_GOING_OFFLINE. page:ffffea0001200000 count:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x3fffe000001000(reserved) raw: 003fffe000001000 ffffea0001200008 ffffea0001200008 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page WARNING: CPU: 25 PID: 1665 at mm/kasan/common.c:665 kasan_mem_notifier+0x34/0x23b CPU: 25 PID: 1665 Comm: bash Tainted: G W 5.0.0+ #94 Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017 RIP: 0010:kasan_mem_notifier+0x34/0x23b RSP: 0018:ffff8883ec737890 EFLAGS: 00010206 RAX: 0000000000000246 RBX: ff10f0f4435f1000 RCX: f887a7a21af88000 RDX: dffffc0000000000 RSI: 0000000000000020 RDI: ffff8881f221af88 RBP: ffff8883ec737898 R08: ffff888000000000 R09: ffffffffb0bddcd0 R10: ffffed103e857088 R11: ffff8881f42b8443 R12: dffffc0000000000 R13: 00000000fffffff9 R14: dffffc0000000000 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560fbd31d730 CR3: 00000004049c6003 CR4: 00000000001606a0 Call Trace: notifier_call_chain+0xbf/0x130 __blocking_notifier_call_chain+0x76/0xc0 blocking_notifier_call_chain+0x16/0x20 memory_notify+0x1b/0x20 __offline_pages+0x3e2/0x1210 offline_pages+0x11/0x20 memory_block_action+0x144/0x300 memory_subsys_offline+0xe5/0x170 device_offline+0x13f/0x1e0 state_store+0xeb/0x110 dev_attr_store+0x3f/0x70 sysfs_kf_write+0x104/0x150 kernfs_fop_write+0x25c/0x410 __vfs_write+0x66/0x120 vfs_write+0x15a/0x4f0 ksys_write+0xd2/0x1b0 __x64_sys_write+0x73/0xb0 do_syscall_64+0xeb/0xb78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f14f75cc3b8 RSP: 002b:00007ffe84d01d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f14f75cc3b8 RDX: 0000000000000008 RSI: 0000563f8e433d70 RDI: 0000000000000001 RBP: 0000563f8e433d70 R08: 000000000000000a R09: 00007ffe84d018f0 R10: 000000000000000a R11: 0000000000000246 R12: 00007f14f789e780 R13: 0000000000000008 R14: 00007f14f7899740 R15: 0000000000000008 Link: http://lkml.kernel.org/r/20190320204255.53571-1-cai@lca.pw Fixes: 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure") Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Signed-off-by: Qian Cai Reviewed-by: Andrew Morton Cc: [5.0.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0e0a16021fd5..0082d699be94 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1699,12 +1699,12 @@ static int __ref __offline_pages(unsigned long start_pfn, failed_removal_isolated: undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + memory_notify(MEM_CANCEL_OFFLINE, &arg); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason); - memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ mem_hotplug_done(); return ret; From f5777bc2d9cf0712554228b1a7927b6f13f5c1f0 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Thu, 28 Mar 2019 20:44:21 -0700 Subject: [PATCH 459/468] mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate() Due to has_unmovable_pages() taking an incorrect irqsave flag instead of the isolation flag in set_migratetype_isolate(), there are issues with HWPOSION and error reporting where dump_page() is not called when there is an unmovable page. Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory") Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Signed-off-by: Qian Cai Cc: [5.0.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_isolation.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index bf4159d771c7..019280712e1b 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -59,7 +59,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. * We just check MOVABLE pages. */ - if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype, flags)) + if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype, + isol_flags)) ret = 0; /* From 0bc9f5d14a93971c6cd9c0d81b0fc154fc54c65d Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 28 Mar 2019 20:44:24 -0700 Subject: [PATCH 460/468] drivers/block/zram/zram_drv.c: fix idle/writeback string compare Makoto report a below KASAN error: zram does out-of-bounds read. Because strscpy copies from source up to count bytes unconditionally. It could cause out-of-bounds read on next object in slab. To prevent it, use strlcpy which checks source's length automatically. BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154 Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314 .. Call trace: strscpy+0x68/0x154 idle_store+0xc4/0x34c dev_attr_store+0x50/0x6c sysfs_kf_write+0x98/0xb4 kernfs_fop_write+0x198/0x260 __vfs_write+0x10c/0x338 vfs_write+0x114/0x238 SyS_write+0xc8/0x168 __sys_trace_return+0x0/0x4 Allocated by task 1314: __kmalloc+0x280/0x318 kernfs_fop_write+0xac/0x260 __vfs_write+0x10c/0x338 vfs_write+0x114/0x238 SyS_write+0xc8/0x168 __sys_trace_return+0x0/0x4 Freed by task 2855: kfree+0x138/0x630 kernfs_put_open_node+0x10c/0x124 kernfs_fop_release+0xd8/0x114 __fput+0x130/0x2a4 ____fput+0x1c/0x28 task_work_run+0x16c/0x1c8 do_notify_resume+0x2bc/0x107c work_pending+0x8/0x10 The buggy address belongs to the object at ffffffc0c3495a00 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffffffc0c3495a00, ffffffc0c3495a80) The buggy address belongs to the page: page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000010200(slab|head) page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org Cc: [5.0] Signed-off-by: Minchan Kim Reported-by: Makoto Wu Reviewed-by: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/zram/zram_drv.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e7a5f1d1c314..399cad7daae7 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -290,18 +290,8 @@ static ssize_t idle_store(struct device *dev, struct zram *zram = dev_to_zram(dev); unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; int index; - char mode_buf[8]; - ssize_t sz; - sz = strscpy(mode_buf, buf, sizeof(mode_buf)); - if (sz <= 0) - return -EINVAL; - - /* ignore trailing new line */ - if (mode_buf[sz - 1] == '\n') - mode_buf[sz - 1] = 0x00; - - if (strcmp(mode_buf, "all")) + if (!sysfs_streq(buf, "all")) return -EINVAL; down_read(&zram->init_lock); @@ -635,25 +625,15 @@ static ssize_t writeback_store(struct device *dev, struct bio bio; struct bio_vec bio_vec; struct page *page; - ssize_t ret, sz; - char mode_buf[8]; - int mode = -1; + ssize_t ret; + int mode; unsigned long blk_idx = 0; - sz = strscpy(mode_buf, buf, sizeof(mode_buf)); - if (sz <= 0) - return -EINVAL; - - /* ignore trailing newline */ - if (mode_buf[sz - 1] == '\n') - mode_buf[sz - 1] = 0x00; - - if (!strcmp(mode_buf, "idle")) + if (sysfs_streq(buf, "idle")) mode = IDLE_WRITEBACK; - else if (!strcmp(mode_buf, "huge")) + else if (sysfs_streq(buf, "huge")) mode = HUGE_WRITEBACK; - - if (mode == -1) + else return -EINVAL; down_read(&zram->init_lock); From d2b2c6dd227ba5b8a802858748ec9a780cb75b47 Mon Sep 17 00:00:00 2001 From: Lars Persson Date: Thu, 28 Mar 2019 20:44:28 -0700 Subject: [PATCH 461/468] mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL and SIGSEGV that could not be traced back to a userspace code bug. They had all the magic signs of an I/D cache coherency issue. Now recently we noticed that the /proc/sys/vm/compact_memory interface was quite efficient at provoking this class of userspace crashes. Studying the code in mm/migrate.c there is a distinction made between migrating a page that is mapped at the instant of migration and one that is not mapped. Our problem turned out to be the non-mapped pages. For the non-mapped page the code performs a copy of the page content and all relevant meta-data of the page without doing the required D-cache maintenance. This leaves dirty data in the D-cache of the CPU and on the 1004K cores this data is not visible to the I-cache. A subsequent page-fault that triggers a mapping of the page will happily serve the process with potentially stale code. What about ARM then, this bug should have seen greater exposure? Well ARM became immune to this flaw back in 2010, see commit c01778001a4f ("ARM: 6379/1: Assume new page cache pages have dirty D-cache"). My proposed fix moves the D-cache maintenance inside move_to_new_page to make it common for both cases. Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com Fixes: 97ee0524614 ("flush cache before installing new page at migraton") Signed-off-by: Lars Persson Reviewed-by: Paul Burton Acked-by: Mel Gorman Cc: Ralf Baechle Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/migrate.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index ac6f4939bb59..663a5449367a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -248,10 +248,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, pte = swp_entry_to_pte(entry); } else if (is_device_public_page(new)) { pte = pte_mkdevmap(pte); - flush_dcache_page(new); } - } else - flush_dcache_page(new); + } #ifdef CONFIG_HUGETLB_PAGE if (PageHuge(new)) { @@ -995,6 +993,13 @@ static int move_to_new_page(struct page *newpage, struct page *page, */ if (!PageMappingFlags(page)) page->mapping = NULL; + + if (unlikely(is_zone_device_page(newpage))) { + if (is_device_public_page(newpage)) + flush_dcache_page(newpage); + } else + flush_dcache_page(newpage); + } out: return rc; From 4462996ea3cc6bcf3c4efbd7bd2514a15dd8ece4 Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Thu, 28 Mar 2019 20:44:32 -0700 Subject: [PATCH 462/468] checkpatch: add %pt as a valid vsprintf extension Commit 4d42c44727a0 ("lib/vsprintf: Print time and date in human readable format via %pt") introduced a new extension, %pt. Add it in the list of valid extensions. Link: http://lkml.kernel.org/r/20190314203719.29130-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 5b756278df13..a09333fd7cef 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5977,7 +5977,7 @@ sub process { while ($fmt =~ /(\%[\*\d\.]*p(\w))/g) { $specifier = $1; $extension = $2; - if ($extension !~ /[SsBKRraEhMmIiUDdgVCbGNOx]/) { + if ($extension !~ /[SsBKRraEhMmIiUDdgVCbGNOxt]/) { $bad_specifier = $specifier; last; } From 2620327852478e695afb2eebe66c354b3bc456cc Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 28 Mar 2019 20:44:36 -0700 Subject: [PATCH 463/468] fs: fs_parser: fix printk format warning Fix printk format warning (seen on i386 builds) by using ptrdiff format specifier (%t): fs/fs_parser.c:413:6: warning: format `%lu' expects argument of type `long unsigned int', but argument 3 has type `int' [-Wformat=] Link: http://lkml.kernel.org/r/19432668-ffd3-fbb2-af4f-1c8e48f6cc81@infradead.org Signed-off-by: Randy Dunlap Acked-by: Geert Uytterhoeven Cc: David Howells Cc: Alexander Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/fs_parser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fs_parser.c b/fs/fs_parser.c index 842e8f749db6..570d71043acf 100644 --- a/fs/fs_parser.c +++ b/fs/fs_parser.c @@ -410,7 +410,7 @@ bool fs_validate_description(const struct fs_parameter_description *desc) for (param = desc->specs; param->name; param++) { if (param->opt == e->opt && param->type != fs_param_is_enum) { - pr_err("VALIDATE %s: e[%lu] enum val for %s\n", + pr_err("VALIDATE %s: e[%tu] enum val for %s\n", name, e - desc->enums, param->name); good = false; } From 23da9588037ecdd4901db76a5b79a42b529c4ec3 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 28 Mar 2019 20:44:40 -0700 Subject: [PATCH 464/468] fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links Syzkaller reports: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599 Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91 RSP: 0018:ffff8881d828f238 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267 RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178 RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259 R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4 R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000 FS: 00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0xbc/0x1000 [br_netfilter] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 770020de38961fd0 ]--- A new dir entry can be created in get_subdir and its 'header->parent' is set to NULL. Only after insert_header success, it will be set to 'dir', otherwise 'header->parent' is set to NULL and drop_sysctl_table is called. However in err handling path of get_subdir, drop_sysctl_table also be called on 'new->header' regardless its value of parent pointer. Then put_links is called, which triggers NULL-ptr deref when access member of header->parent. In fact we have multiple error paths which call drop_sysctl_table() there, upon failure on insert_links() we also call drop_sysctl_table().And even in the successful case on __register_sysctl_table() we still always call drop_sysctl_table().This patch fix it. Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing Reported-by: Hulk Robot Acked-by: Luis Chamberlain Cc: Kees Cook Cc: Alexey Dobriyan Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Al Viro Cc: Eric W. Biederman Cc: [3.4+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/proc_sysctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 4d598a399bbf..d65390727541 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1626,7 +1626,8 @@ static void drop_sysctl_table(struct ctl_table_header *header) if (--header->nreg) return; - put_links(header); + if (parent) + put_links(header); start_unregistering(header); if (!--header->count) kfree_rcu(header, rcu); From 2623c4fbe2ad1341ff2d1e12410d0afdae2490ca Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 29 Mar 2019 12:36:04 -0700 Subject: [PATCH 465/468] LSM: Revive CONFIG_DEFAULT_SECURITY_* for "make oldconfig" Commit 70b62c25665f636c ("LoadPin: Initialize as ordered LSM") removed CONFIG_DEFAULT_SECURITY_{SELINUX,SMACK,TOMOYO,APPARMOR,DAC} from security/Kconfig and changed CONFIG_LSM to provide a fixed ordering as a default value. That commit expected that existing users (upgrading from Linux 5.0 and earlier) will edit CONFIG_LSM value in accordance with their CONFIG_DEFAULT_SECURITY_* choice in their old kernel configs. But since users might forget to edit CONFIG_LSM value, this patch revives the choice (only for providing the default value for CONFIG_LSM) in order to make sure that CONFIG_LSM reflects CONFIG_DEFAULT_SECURITY_* from their old kernel configs. Note that since TOMOYO can be fully stacked against the other legacy major LSMs, when it is selected, it explicitly disables the other LSMs to avoid them also initializing since TOMOYO does not expect this currently. Reported-by: Jakub Kicinski Reported-by: Randy Dunlap Fixes: 70b62c25665f636c ("LoadPin: Initialize as ordered LSM") Co-developed-by: Tetsuo Handa Signed-off-by: Tetsuo Handa Signed-off-by: Kees Cook Acked-by: Casey Schaufler Signed-off-by: James Morris --- security/Kconfig | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/security/Kconfig b/security/Kconfig index 1d6463fb1450..353cfef71d4e 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -239,8 +239,46 @@ source "security/safesetid/Kconfig" source "security/integrity/Kconfig" +choice + prompt "First legacy 'major LSM' to be initialized" + default DEFAULT_SECURITY_SELINUX if SECURITY_SELINUX + default DEFAULT_SECURITY_SMACK if SECURITY_SMACK + default DEFAULT_SECURITY_TOMOYO if SECURITY_TOMOYO + default DEFAULT_SECURITY_APPARMOR if SECURITY_APPARMOR + default DEFAULT_SECURITY_DAC + + help + This choice is there only for converting CONFIG_DEFAULT_SECURITY + in old kernel configs to CONFIG_LSM in new kernel configs. Don't + change this choice unless you are creating a fresh kernel config, + for this choice will be ignored after CONFIG_LSM has been set. + + Selects the legacy "major security module" that will be + initialized first. Overridden by non-default CONFIG_LSM. + + config DEFAULT_SECURITY_SELINUX + bool "SELinux" if SECURITY_SELINUX=y + + config DEFAULT_SECURITY_SMACK + bool "Simplified Mandatory Access Control" if SECURITY_SMACK=y + + config DEFAULT_SECURITY_TOMOYO + bool "TOMOYO" if SECURITY_TOMOYO=y + + config DEFAULT_SECURITY_APPARMOR + bool "AppArmor" if SECURITY_APPARMOR=y + + config DEFAULT_SECURITY_DAC + bool "Unix Discretionary Access Controls" + +endchoice + config LSM string "Ordered list of enabled LSMs" + default "yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK + default "yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR + default "yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO + default "yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC default "yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor" help A comma-separated list of LSMs, in initialization order. From 0aab8e4df4702b31314a27ec4b0631dfad0fae0a Mon Sep 17 00:00:00 2001 From: Kangjie Lu Date: Sat, 9 Mar 2019 00:04:11 -0600 Subject: [PATCH 466/468] leds: pca9532: fix a potential NULL pointer dereference In case of_match_device cannot find a match, return -EINVAL to avoid NULL pointer dereference. Fixes: fa4191a609f2 ("leds: pca9532: Add device tree support") Signed-off-by: Kangjie Lu Signed-off-by: Jacek Anaszewski --- drivers/leds/leds-pca9532.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/leds/leds-pca9532.c b/drivers/leds/leds-pca9532.c index 7fea18b0c15d..7cb4d685a1f1 100644 --- a/drivers/leds/leds-pca9532.c +++ b/drivers/leds/leds-pca9532.c @@ -513,6 +513,7 @@ static int pca9532_probe(struct i2c_client *client, const struct i2c_device_id *id) { int devid; + const struct of_device_id *of_id; struct pca9532_data *data = i2c_get_clientdata(client); struct pca9532_platform_data *pca9532_pdata = dev_get_platdata(&client->dev); @@ -528,8 +529,11 @@ static int pca9532_probe(struct i2c_client *client, dev_err(&client->dev, "no platform data\n"); return -EINVAL; } - devid = (int)(uintptr_t)of_match_device( - of_pca9532_leds_match, &client->dev)->data; + of_id = of_match_device(of_pca9532_leds_match, + &client->dev); + if (unlikely(!of_id)) + return -EINVAL; + devid = (int)(uintptr_t) of_id->data; } else { devid = id->driver_data; } From 909346433064b8d840dc82af26161926b8d37558 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 14 Mar 2019 15:06:14 +0100 Subject: [PATCH 467/468] leds: trigger: netdev: use memcpy in device_name_store If userspace doesn't end the input with a newline (which can easily happen if the write happens from a C program that does write(fd, iface, strlen(iface))), we may end up including garbage from a previous, longer value in the device_name. For example # cat device_name # printf 'eth12' > device_name # cat device_name eth12 # printf 'eth3' > device_name # cat device_name eth32 I highly doubt anybody is relying on this behaviour, so switch to simply copying the bytes (we've already checked that size is < IFNAMSIZ) and unconditionally zero-terminate it; of course, we also still have to strip a trailing newline. This is also preparation for future patches. Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Signed-off-by: Rasmus Villemoes Acked-by: Pavel Machek Signed-off-by: Jacek Anaszewski --- drivers/leds/trigger/ledtrig-netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c index 167a94c02d05..136f86a1627d 100644 --- a/drivers/leds/trigger/ledtrig-netdev.c +++ b/drivers/leds/trigger/ledtrig-netdev.c @@ -122,7 +122,8 @@ static ssize_t device_name_store(struct device *dev, trigger_data->net_dev = NULL; } - strncpy(trigger_data->device_name, buf, size); + memcpy(trigger_data->device_name, buf, size); + trigger_data->device_name[size] = 0; if (size > 0 && trigger_data->device_name[size - 1] == '\n') trigger_data->device_name[size - 1] = 0; From 79a3aaa7b82e3106be97842dedfd8429248896e6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 31 Mar 2019 14:39:29 -0700 Subject: [PATCH 468/468] Linux 5.1-rc3 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 52f067eadc48..026fbc450906 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 1 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -rc3 NAME = Shy Crocodile # *DOCUMENTATION*