The size computations done in the ioctl function use an integer.
If userspace submits a request with req->cmd_nr or req->cmd_buf_nr
set to INT_MAX, the integer computations overflow later, leading
to potential (kernel) memory corruption.
Prevent this issue by enforcing a limit on the number of submitted
commands, so that we have enough headroom later for the size
computations.
Note that this change has no impact on the currently available
users in userspace, like e.g. libdrm/exynos.
While at it, also make a comment about the size computation more
detailed.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
We were trying to print an error message if we timed out here, but the
loop actually ends with "tries" set to UINT_MAX and not zero. Fix this
by changing from tries-- to --tries.
A for loop would actually be the most natural way to do this. My fix
means we only loop 99 times instead of 100 but that's probably ok.
Fixes: a696394c52 ('drm/exynos: mixer: simplify loop in vp_win_reset()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
This patch replaces specific atomic commit function
with atomic helper commit one.
For this, it removes existing atomic commit function
and relevant code specific to Exynos DRM and makes
atomic helper commit to be used instead.
Below are changes for the use of atomic helper commit:
- add atomic_commit_tail callback specific to Exynos DRM
. default implemention of atomic helper doesn't mesh well
with runtime PM so the device driver which supports runtime
PM should call drm_atomic_helper_commit_modeset_enables function
prior to drm_atomic_helper_commit_planes function call.
atomic_commit_tail callback implements this call ordering.
- allow plane commit only in case that CRTC device is enabled.
. for this, it calls atomic_helper_commit_planes function
with DRM_PLANE_COMMIT_ACTIVE_ONLY flag in atomic_commit_tail callback.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com>
This patch removes exynos_drm_crtc_cancel_page_flip call
when drm is closed because at that time, events will be released
by drm_events_release function.
Changelog v1:
- remove exynos_drm_crtc_cancel_page_flip function also because
this funtion isn't used anymore.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
This patch adds runtime support calls to notify device core when MIC
device is really in use. Runtime PM is implemented by enabling and
disabling clocks like in other Exynos DRM subdrivers. Adding runtime
PM support is needed to let power domain with this device to be turned
off when display is not used.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
This is the deprecated function for when you embedded the framebuffer
somewhere else (which breaks refcounting). But exynos is using
drm_framebuffer_remove and a free-standing fb, so this is rendundant.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
The OF graph is not necessary because the panel is a child of
dsi. therefore, the parse_dt function of dsi does not need to
check the remote_node connected to the panel. and the whole
parse_dt function should be refactored later.
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Before applying the patch, used the of_get_videomode function to
parse the display-timings in the panel which is the child driver
of dsi in the devicetree. this is wrong. So removed the
of_get_videomode and fixed to get videomode struct through
mode_set callback function.
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
drivers/gpu/drm/sti/sti_plane.c:76:33: error: ‘struct drm_framebuffer’
has no member named ‘pixel_format’; did you mean ‘format’?
I didn't look to hard at the casting to a char * and just did a
mechanical transformation of s/pixel_format/format->format/ as given in
commit 438b74a549 ("drm: Nuke fb->pixel_format").
Fixes: 438b74a549 ("drm: Nuke fb->pixel_format")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Vincent Abriou <vincent.abriou@st.com>
Acked-by: Vincent Abriou <vincent.abriou@st.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- cleanups&fixes for dw-hdmi bride driver (Laurent)
- updates for adv bridge driver (John Stultz) for nexus
- drm_crtc_from_index helper rollout (Shawn Guo)
- removing drm_framebuffer_unregister_private from drivers&core
- target_vblank (Andrey Grodzovsky)
- misc tiny stuff
* tag 'drm-misc-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc: (49 commits)
drm: qxl: Open code teardown function for qxl
drm: qxl: Open code probing sequence for qxl
drm/bridge: adv7511: Re-write the i2c address before EDID probing
drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID
drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused internally
drm/bridge: adv7511: Enable HPD interrupts to support hotplug and improve monitor detection
drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
drm: vc4: use crtc helper drm_crtc_from_index()
drm: tegra: use crtc helper drm_crtc_from_index()
drm: nouveau: use crtc helper drm_crtc_from_index()
drm: mediatek: use crtc helper drm_crtc_from_index()
drm: kirin: use crtc helper drm_crtc_from_index()
drm: exynos: use crtc helper drm_crtc_from_index()
dt-bindings: display: dw-hdmi: Clean up DT bindings documentation
drm: bridge: dw-hdmi: Assert SVSRET before resetting the PHY
drm: bridge: dw-hdmi: Fix the name of the PHY reset macros
drm: bridge: dw-hdmi: Define and use macros for PHY register addresses
drm: bridge: dw-hdmi: Detect PHY type at runtime
drm: bridge: dw-hdmi: Handle overflow workaround based on device version
...
Final block of feature work for 4.11:
- gen8 pd cleanup from Matthew Auld
- more cleanups for view/vma (Chris)
- dmc support on glk (Anusha Srivatsa)
- use core crc api (Tomue)
- track wedged requests using fence.error (Chris)
- lots of psr fixes (Nagaraju, Vathsala)
- dp mst support, acked for merging through drm-intel by Takashi
(Libin)
- huc loading support, including uapi for libva to use it (Anusha
Srivatsa)
* tag 'drm-intel-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-intel: (111 commits)
drm/i915: Update DRIVER_DATE to 20170123
drm/i915: reinstate call to trace_i915_vma_bind
drm/i915: Assert that created vma has a whole number of pages
drm/i915: Assert the drm_mm_node is allocated when on the VM lists
drm/i915: Treat an error from i915_vma_instance() as unlikely
drm/i915: Reject vma creation larger than address space
drm/i915: Use common LRU inactive vma bumping for unpin_from_display
drm/i915: Do an unlocked wait before set-cache-level ioctl
drm/i915/huc: Assert that HuC vma is placed in GuC accessible range
drm/i915/huc: Avoid attempting to authenticate non-existent fw
drm/i915: Set adjustment to zero on Up/Down interrupts if freq is already max/min
drm/i915: Remove the double handling of 'flags from intel_mode_from_pipe_config()
drm/i915: Remove crtc->config usage from intel_modeset_readout_hw_state()
drm/i915: Release temporary load-detect state upon switching
drm/i915: Remove i915_gem_object_to_ggtt()
drm/i915: Remove i915_vma_create from VMA API
drm/i915: Add a check that the VMA instance we lookup matches the request
drm/i915: Rename some warts in the VMA API
drm/i915: Track pinned vma in intel_plane_state
drm/i915/get_params: Add HuC status to getparams
...
This reverts commit 54a07c7bb0,
and reinstates the original.
[airlied: this might be a bad plan for git].
commit 3846fd9b86
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jan 11 10:01:17 2017 +0100
drm/probe-helpers: Drop locking from poll_enable
It was only needed to protect the connector_list walking, see
commit 8c4ccc4ab6
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Jul 9 23:44:26 2015 +0200
drm/probe-helper: Grab mode_config.mutex in poll_init/enable
Unfortunately the commit message of that patch fails to mention that
the new locking check was for the connector_list.
But that requirement disappeared in
commit c36a3254f7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Dec 15 16:58:43 2016 +0100
drm: Convert all helpers to drm_connector_list_iter
and so we can drop this again.
This fixes a locking inversion on nouveau, where the rpm code needs to
re-enable. But in other places the rpm_get() calls are nested within
the big modeset locks.
While at it, also improve the kerneldoc for these two functions a
notch.
v2: Update the kerneldoc even more to explain that these functions
can't be called concurrently, or bad things happen (Chris).
Backmerge Linus master to get the connector locking revert.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: (645 commits)
sysctl: fix proc_doulongvec_ms_jiffies_minmax()
Revert "drm/probe-helpers: Drop locking from poll_enable"
MAINTAINERS: add Dan Streetman to zbud maintainers
MAINTAINERS: add Dan Streetman to zswap maintainers
mm: do not export ioremap_page_range symbol for external module
mn10300: fix build error of missing fpu_save()
romfs: use different way to generate fsid for BLOCK or MTD
frv: add missing atomic64 operations
mm, page_alloc: fix premature OOM when racing with cpuset mems update
mm, page_alloc: move cpuset seqcount checking to slowpath
mm, page_alloc: fix fast-path race with cpuset update or removal
mm, page_alloc: fix check for NULL preferred_zone
kernel/panic.c: add missing \n
fbdev: color map copying bounds checking
frv: add atomic64_add_unless()
mm/mempolicy.c: do not put mempolicy before using its nodemask
radix-tree: fix private list warnings
Documentation/filesystems/proc.txt: add VmPin
mm, memcg: do not retry precharge charges
proc: add a schedule point in proc_pid_readdir()
...
We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.
We need to do the opposite operation when value is written by user.
Only matters when HZ != 1000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A bunch of fixes to the Intel drivers: broxton, baytrail.
Bugs related to register offsets, IRQ, debounce functionality.
- Fix a conflict amongst UART settings on the meson.
- Fix the ethernet setting on the Uniphier.
- A compilation warning squelched.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYifv7AAoJEEEQszewGV1zvWQQALAfKAjbQAzO8hUnb0A1Gu+C
BsZ5baygZN78Xk6WLbGGqyX9Yo7rZaznXabx0TDn2ND/RC0gGTmsM5GV2ay7KoDW
Rru5dVzuYZEXuy/Pwah9TdzRRqD8GNPHuCCxU/Iq8fYGaV1ORcyjOCYcHLZ8GftA
Nh7eqgGL14kQnzZDh3jL/V4PY6x0bzCZz6piU1j2WAFqPHCDRmCGPXQjH0OgfG43
dUvqHx8y3+rZvCsZU7JS7R7ZQYQ96DYELw3O1Li1W9Lga3sqkhMt2wA1jdTG5xtD
97PKj6+5BYJ3PL9L09yx6jQ3IYznol2kvR1YTgF9d4Z6Zjo96z2iXI/MxyoF07ea
jueAA+m7nhf34cJzRcNlMhIjbcckUpt3nTsVBTE3BNdUwAvs9Vseh2Ft8scBFvTP
IKeIDqX3X961DgtdH/q02xuxYBvOrr8jQEsk/HdMvl1fw1EYoiW6zp3lVqY426Rh
O+kwuy1ubj8gv+dJ1tHMMx523OzGv+Z2PXMoWoNhxEe3bIYDlSLYnSQVp+AOm6vR
GQW5Q0OwFLnlMDL3WIow6SZRDaPJq6N+xojbObYp6Nh78q/XAQbeIa3p5DzSXyGb
DoLeXRMfQZ1Evcz1RX1uzIjBjXtAF7m2SA+Zz7/LVFux8weyDgnHq04v0BemyWF7
jutvZhAYGewIPDzK/Qry
=AXUf
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"A bunch of pin control fixes for v4.10 that didn't get sent off until
now, sorry for the delay.
It's only driver fixes:
- A bunch of fixes to the Intel drivers: broxton, baytrail. Bugs
related to register offsets, IRQ, debounce functionality.
- Fix a conflict amongst UART settings on the meson.
- Fix the ethernet setting on the Uniphier.
- A compilation warning squelched"
* tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: uniphier: fix Ethernet (RMII) pin-mux setting for LD20
pinctrl: meson: fix uart_ao_b for GXBB and GXL/GXM
pinctrl: amd: avoid maybe-uninitalized warning
pinctrl: baytrail: Do not add all GPIOs to IRQ domain
pinctrl: baytrail: Rectify debounce support
pinctrl: intel: Set pin direction properly
pinctrl: broxton: Use correct PADCFGLOCK offset
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYiQ5uAAoJEAx081l5xIa+YTwP/jBQ7yjAACjpPS5SI3ProLd6
FZ8QrvjSv65ez8b59MEKe1Ld5qSyF61CyorA0F3oxA/dV2SG14sjRI8AXteKkoQN
XSk2NGiub7c3+Edtr2Cr+pzK91pqwHjtEBLh2Ftex2HrCCq2AP9HNgxU0l0056sH
hfplVP1kwavgn2zpbXA99JjoNHk3+UMX/UbUibH3lOhJkaM401cpBhco9yh07rK5
XfckHjnQwTH5jtvocorOq7Zh/cSWFqTWdw+kkW8TawHzT9GdoYSh6B0XR57LLwyB
yQ0tHa706oqjWMUW0kxOKB/EjEGsyKDSMuc52qOisvwrgmw0JS/RbqHctfjsGrrb
leCtffxe4yFgdB9jFbvyrY5tYt2lrxLm87oiYeB3T/r2SnWhkmyZsNCIJ+IGOkId
1DyuvZnh2UztHpru+TE5frALvM6I/gCF6UwdITAm++yJp87OqWo3fckz+KG5ln6Q
5c0p2jLR15Ii3tbs2q277FZMMOYoO6NMrcDVoDv1pqtG1vFA8ydcLyyPUVBdDF3Z
fpGHjOvGXLoQN4QFXXU/6utlZ6Twn0WGfPHQUH6iNL8pxY1JuyyrAkZS6ubFWNQd
dt71uznvVMszLAT95pUBhjmtttVh+XIH0BYVejFflpC/JoxR5xWce2Ka0rVPwcHe
kpv4K+s1PeTBKijd/t5s
=PHLN
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux
Pull drm revert from Dave Airlie:
"Revert one patch missing some prereqs.
One of the connector fixes was missing some prereqs, we have an
alternate driver fix that should work that I'll send tomorrow.
Today is a holiday here so quickly smashing this out"
Daniel Vetter explains:
"I pushed a locking change to fix a nouveau rpm issue to -fixes that
needed the connector_list rework. And that's only in -next, but I
missed that. Dave has the revert in a pull, and he'll follow-up with
the hack nouveau patch for 4.10, and then we'll reapply the proper fix
again for -next and revert the hacks. A bit a mess, but should be
sorted soon"
* tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/probe-helpers: Drop locking from poll_enable"
This reverts commit 3846fd9b86.
There were some precursor commits missing for this around connector
locking, we should probably merge Lyude's nouveau avoid the problem patch.
ARM DMA fixes
vhost vsock bugfix
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYh9ZlAAoJECgfDbjSjVRptqsIAJ03KvIYEUH2+kWr21UWWoFk
wYGGtf2Voexnj6SVcY6JMTG4J0PF5y1K9XL1/Z0kXFQWy+PCqp4bdE7PUpxs8PlB
ShapIoMTU7dfSL2cFfrYb3Nx4hFiube3HXSYz5LhGUBwWT7v5j9fRodoNZjyi3rs
faNKR9psNHFpkSgRTJKVnj53/dU+HcSP1S+x9qpkS2bYHvLA2vQo/FaKg/M8i9jt
JzCcX/RbOij9DlHgzT64cFTnIZVekauUsQAJcs8e/SHhwm7CLAlrVIrjcG4H4mvg
XIKcL1YJKVrJzjnDcezSkfQpc+oPn2t4Qk+VOcnqbHPsg23Hr5Rj8krixCl6XmQ=
=Yeqz
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost fixes from Michael Tsirkin:
- ARM DMA fixes
- vhost vsock bugfix
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vring: Force use of DMA API for ARM-based systems with legacy devices
virtio_mmio: Set DMA masks appropriately
vhost/vsock: handle vhost_vq_init_access() error
Merge fixes from Andrew Morton:
"26 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
MAINTAINERS: add Dan Streetman to zbud maintainers
MAINTAINERS: add Dan Streetman to zswap maintainers
mm: do not export ioremap_page_range symbol for external module
mn10300: fix build error of missing fpu_save()
romfs: use different way to generate fsid for BLOCK or MTD
frv: add missing atomic64 operations
mm, page_alloc: fix premature OOM when racing with cpuset mems update
mm, page_alloc: move cpuset seqcount checking to slowpath
mm, page_alloc: fix fast-path race with cpuset update or removal
mm, page_alloc: fix check for NULL preferred_zone
kernel/panic.c: add missing \n
fbdev: color map copying bounds checking
frv: add atomic64_add_unless()
mm/mempolicy.c: do not put mempolicy before using its nodemask
radix-tree: fix private list warnings
Documentation/filesystems/proc.txt: add VmPin
mm, memcg: do not retry precharge charges
proc: add a schedule point in proc_pid_readdir()
mm: alloc_contig: re-allow CMA to compact FS pages
mm/slub.c: trace free objects at KERN_INFO
...
Recently, I've found cases in which ioremap_page_range was used
incorrectly, in external modules, leading to crashes. This can be
partly attributed to the fact that ioremap_page_range is lower-level,
with fewer protections, as compared to the other functions that an
external module would typically call. Those include:
ioremap_cache
ioremap_nocache
ioremap_prot
ioremap_uc
ioremap_wc
ioremap_wt
...each of which wraps __ioremap_caller, which in turn provides a safer
way to achieve the mapping.
Therefore, stop EXPORT-ing ioremap_page_range.
Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_FPU is not enabled on arch/mn10300, <asm/switch_to.h> causes
a build error with a call to fpu_save():
kernel/built-in.o: In function `.L410':
core.c:(.sched.text+0x28a): undefined reference to `fpu_save'
Fix this by including <asm/fpu.h> in <asm/switch_to.h> so that an empty
static inline fpu_save() is defined.
Link: http://lkml.kernel.org/r/dc421c4f-4842-4429-1b99-92865c2f24b6@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8a59f5d252 ("fs/romfs: return f_fsid for statfs(2)") generates
a 64bit id from sb->s_bdev->bd_dev. This is only correct when romfs is
defined with CONFIG_ROMFS_ON_BLOCK. If romfs is only defined with
CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev
will triger an oops.
Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y,
both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined.
Therefore when calling huge_encode_dev() to generate a 64bit id, I use
the follow order to choose parameter,
- CONFIG_ROMFS_ON_BLOCK defined
use sb->s_bdev->bd_dev
- CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined
use sb->s_dev when,
- both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined
leave id as 0
When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev
is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index,
otherwise sb->s_dev is 0.
This is a try-best effort to generate a uniq file system ID, if all the
above conditions are not meet, f_fsid of this romfs instance will be 0.
Generally only one romfs can be built on single MTD block device, this
method is enough to identify multiple romfs instances in a computer.
Link: http://lkml.kernel.org/r/1482928596-115155-1-git-send-email-colyli@suse.de
Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Nong Li <nongli1031@gmail.com>
Tested-by: Nong Li <nongli1031@gmail.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some more atomic64 operations were missing and as a result frv
allmodconfig was failing. Add the missing operations.
Link: http://lkml.kernel.org/r/1485193844-12850-1-git-send-email-sudip.mukherjee@codethink.co.uk
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory. The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.
The problem comes from insufficient protection against cpuset changes,
which can cause get_page_from_freelist() to consider all zones as
non-eligible due to nodemask and/or current->mems_allowed. This was
masked in the past by sufficient retries, but since commit 682a3385e7
("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
the preferred_zoneref once, and don't iterate over the whole zonelist in
further attempts, thus the only eligible zones might be placed in the
zonelist before our starting point and we always miss them.
A previous patch fixed this problem for current->mems_allowed. However,
cpuset changes also update the task's mempolicy nodemask. The fix has
two parts. We have to repeat the preferred_zoneref search when we
detect cpuset update by way of seqcount, and we have to check the
seqcount before considering OOM.
[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a preparation for the following patch to make review simpler.
While the primary motivation is a bug fix, this also simplifies the fast
path, although the moved code is only enabled when cpusets are in use.
Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory. The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.
One possible cause is that in the fast path we find the preferred
zoneref according to current mems_allowed, so that it points to the
middle of the zonelist, skipping e.g. zones of node 1 completely. If
the mems_allowed is updated to contain only node 1, we never reach it in
the zonelist, and trigger OOM before checking the cpuset_mems_cookie.
This patch fixes the particular case by redoing the preferred zoneref
search if we switch back to the original nodemask. The condition is
also slightly changed so that when the last non-root cpuset is removed,
we don't miss it.
Note that this is not a full fix, and more patches will follow.
Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz
Fixes: 682a3385e7 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "fix premature OOM regression in 4.7+ due to cpuset races".
This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1]. The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice. That's why the patches try to
be not too intrusive.
Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem. I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.
Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way. I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies. Until now I expected the parameter to
alloc_pages_nodemask() to be stable. I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset. I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form. So we
have both crazy complexity and overhead, AFAICS.
[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz
This patch (of 4):
Since commit c33d6c06f6 ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification. We
check the zoneref pointer which is never NULL and we should check the zone
pointer. Also document this in first_zones_zonelist() comment per Michal
Hocko.
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line. Fix it.
Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Copying color maps to userspace doesn't check the value of to->start,
which will cause kernel heap buffer OOB read due to signedness wraps.
CVE-2016-8405
Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Peter Pi (@heisecode) of Trend Micro
Cc: Min Chong <mchong@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The build of frv allmodconfig was failing with the error:
lib/atomic64_test.c:209:9: error:
implicit declaration of function 'atomic64_add_unless'
All the atomic64 operations were defined in frv, but
atomic64_add_unless() was not done.
Implement atomic64_add_unless() as done in other arches.
Link: http://lkml.kernel.org/r/1484781236-6698-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit be97a41b29 ("mm/mempolicy.c: merge alloc_hugepage_vma to
alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
mpol_cond_put() before accessing the embedded nodemask by
__alloc_pages_nodemask(). The commit log says it's so "we can use a
single exit path within the function" but that's clearly wrong. We can
still do that when doing mpol_cond_put() after the allocation attempt.
Make sure the mempolicy is not freed prematurely, otherwise
__alloc_pages_nodemask() can end up using a bogus nodemask, which could
lead e.g. to premature OOM.
Fixes: be97a41b29 ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The newly introduced warning in radix_tree_free_nodes() was testing the
wrong variable; it should have been 'old' instead of 'node'.
Fixes: ea07b862ac ("mm: workingset: fix use-after-free in shadow node shrinker")
Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
added VmPin in /proc/<pid>/status. Report that in
Documentation/filesystems/proc.txt
Also move Umask after Name to keep correct order.
Link: http://lkml.kernel.org/r/20170114201219.30387-1-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When memory.move_charge_at_immigrate is enabled and precharges are
depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
increase the size of the precharge.
Prevent precharges from ever looping by setting __GFP_NORETRY. This was
probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
pointless as written.
Fixes: 0029e19ebf ("mm: memcontrol: remove explicit OOM parameter in charge path")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701130208510.69402@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have seen proc_pid_readdir() invocations holding cpu for more than 50
ms. Add a cond_resched() to be gentle with other tasks.
[akpm@linux-foundation.org: coding style fix]
Link: http://lkml.kernel.org/r/1484238380.15816.42.camel@edumazet-glaptop3.roam.corp.google.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 73e64c51af ("mm, compaction: allow compaction for GFP_NOFS
requests") changed compation to skip FS pages if not explicitly allowed
to touch them, but missed to update the CMA compact_control.
This leads to a very high isolation failure rate, crippling performance
of CMA even on a lightly loaded system. Re-allow CMA to compact FS
pages by setting the correct GFP flags, restoring CMA behavior and
performance to the kernel 4.9 level.
Fixes: 73e64c51af (mm, compaction: allow compaction for GFP_NOFS requests)
Link: http://lkml.kernel.org/r/20170113115155.24335-1-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently when trace is enabled (e.g. slub_debug=T,kmalloc-128 ) the
trace messages are mostly output at KERN_INFO. However the trace code
also calls print_section() to hexdump the head of a free object. This
is hard coded to use KERN_ERR, meaning the console is deluged with trace
messages even if we've asked for quiet.
Fix this the obvious way but adding a level parameter to
print_section(), allowing calls from the trace code to use the same
trace level as other trace messages.
Link: http://lkml.kernel.org/r/20170113154850.518-1-daniel.thompson@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With >=32 CPUs the userfaultfd selftest triggered a graceful but
unexpected SIGBUS because VM_FAULT_RETRY was returned by
handle_userfault() despite the UFFDIO_COPY wasn't completed.
This seems caused by rwsem waking the thread blocked in
handle_userfault() and we can't run up_read() before the wait_event
sequence is complete.
Keeping the wait_even sequence identical to the first one, would require
running userfaultfd_must_wait() again to know if the loop should be
repeated, and it would also require retaking the rwsem and revalidating
the whole vma status.
It seems simpler to wait the targeted wakeup so that if false wakeups
materialize we still wait for our specific wakeup event, unless of
course there are signals or the uffd was released.
Debug code collecting the stack trace of the wakeup showed this:
$ ./userfaultfd 100 99999
nr_pages: 25600, nr_pages_per_cpu: 800
bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
bounces: 99997, mode: rnd ver poll, Bus error (core dumped)
save_stack_trace+0x2b/0x50
try_to_wake_up+0x2a6/0x580
wake_up_q+0x32/0x70
rwsem_wake+0xe0/0x120
call_rwsem_wake+0x1b/0x30
up_write+0x3b/0x40
vm_mmap_pgoff+0x9c/0xc0
SyS_mmap_pgoff+0x1a9/0x240
SyS_mmap+0x22/0x30
entry_SYSCALL_64_fastpath+0x1f/0xbd
0xffffffffffffffff
FAULT_FLAG_ALLOW_RETRY missing 70
CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G W 4.8.0+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xb8/0x112
handle_userfault+0x572/0x650
handle_mm_fault+0x12cb/0x1520
__do_page_fault+0x175/0x500
trace_do_page_fault+0x61/0x270
do_async_page_fault+0x19/0x90
async_page_fault+0x25/0x30
This always happens when the main userfault selftest thread is running
clone() while glibc runs either mprotect or mmap (both taking mmap_sem
down_write()) to allocate the thread stack of the background threads,
while locking/userfault threads already run at full throttle and are
susceptible to false wakeups that may cause handle_userfault() to return
before than expected (which results in graceful SIGBUS at the next
attempt).
This was reproduced only with >=32 CPUs because the loop to start the
thread where clone() is too quick with fewer CPUs, while with 32 CPUs
there's already significant activity on ~32 locking and userfault
threads when the last background threads are started with clone().
This >=32 CPUs SMP race condition is likely reproducible only with the
selftest because of the much heavier userfault load it generates if
compared to real apps.
We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
patch floating around that provides it also hidden this problem but in
reality only is successfully at hiding the problem.
False wakeups could still happen again the second time
handle_userfault() is invoked, even if it's a so rare race condition
that getting false wakeups twice in a row is impossible to reproduce.
This full fix is needed for correctness, the only alternative would be
to allow VM_FAULT_RETRY to be returned infinitely. With this fix the WP
support can stick to a strict "one more" VM_FAULT_RETRY logic (no need
of returning it infinite times to avoid the SIGBUS).
Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Shubham Kumar Sharma <shubham.kumar.sharma@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc-7 produces a harmless false-postive warning about a possible NULL
pointer access:
drivers/memstick/core/memstick.c: In function 'h_memstick_read_dev_id':
drivers/memstick/core/memstick.c:309:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(mrq->data, buf, mrq->data_len);
This can't happen because the caller sets the command to 'MS_TPC_READ_REG',
which causes the data direction to be 'READ' and the NULL pointer not
accessed.
As a simple workaround for the warning, we can pass a pointer to the
data that we actually want to read into. This is not needed here, but
also harmless, and lets the compiler know that the access is ok.
Link: http://lkml.kernel.org/r/20170111144143.548867-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.
This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.
What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold. Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.
Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly. As a result, a false positive from the nmi
watchdog is reported.
Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.
Fix provided by Ulrich Obergfell.
[akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As reported by Arnd:
https://lkml.org/lkml/2017/1/10/756
Compiling with the following configuration:
# CONFIG_EXT2_FS is not set
# CONFIG_EXT4_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_FS_IOMAP depends on the above filesystems, as is not set
CONFIG_FS_DAX=y
generates build warnings about unused functions in fs/dax.c:
fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function]
static int dax_insert_mapping(struct address_space *mapping,
^~~~~~~~~~~~~~~~~~
fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function]
static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
^~~~~~~~~~~~~
fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function]
static int dax_load_hole(struct address_space *mapping, void **entry,
^~~~~~~~~~~~~
fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function]
static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
^~~~~~~~~~~~~~~~~~
Now that the struct buffer_head based DAX fault paths and I/O path have
been removed we really depend on iomap support being present for DAX.
Make this explicit by selecting FS_IOMAP if we compile in DAX support.
This allows us to remove conditional selections of FS_IOMAP when FS_DAX
was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c.
Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 19be0eaffa ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead. Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.
However, a similar check in huge_memory.c was forgotten. As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.
While in this state the process is stil SIGKILLable, but little else
works (e.g. no ptrace attach, no other signals). This is easily
reproduced with the following code (assuming thp are set to always):
#include <assert.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define TEST_SIZE 5 * 1024 * 1024
int main(void) {
int status;
pid_t child;
int fd = open("/proc/self/mem", O_RDWR);
void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr != MAP_FAILED);
pid_t parent_pid = getpid();
if ((child = fork()) == 0) {
void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr2 != MAP_FAILED);
memset(addr2, 'a', TEST_SIZE);
pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
return 0;
}
assert(child == waitpid(child, &status, 0));
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
return 0;
}
Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit. The same pattern exists
in follow_devmap_pmd. However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
online_{kernel|movable} is used to change the memory zone to
ZONE_{NORMAL|MOVABLE} and online the memory.
To check that memory zone can be changed, zone_can_shift() is used.
Currently the function returns minus integer value, plus integer
value and 0. When the function returns minus or plus integer value,
it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.
But when the function returns 0, there are two meanings.
One of the meanings is that the memory zone does not need to be changed.
For example, when memory is in ZONE_NORMAL and onlined by online_kernel
the memory zone does not need to be changed.
Another meaning is that the memory zone cannot be changed. When memory
is in ZONE_NORMAL and onlined by online_movable, the memory zone may
not be changed to ZONE_MOVALBE due to memory online limitation(see
Documentation/memory-hotplug.txt). In this case, memory must not be
onlined.
The patch changes the return type of zone_can_shift() so that memory
online operation fails when memory zone cannot be changed as follows:
Before applying patch:
# grep -A 35 "Node 2" /proc/zoneinfo
Node 2, zone Normal
<snip>
node_scanned 0
spanned 8388608
present 7864320
managed 7864320
# echo online_movable > memory4097/state
# grep -A 35 "Node 2" /proc/zoneinfo
Node 2, zone Normal
<snip>
node_scanned 0
spanned 8388608
present 8388608
managed 8388608
online_movable operation succeeded. But memory is onlined as
ZONE_NORMAL, not ZONE_MOVABLE.
After applying patch:
# grep -A 35 "Node 2" /proc/zoneinfo
Node 2, zone Normal
<snip>
node_scanned 0
spanned 8388608
present 7864320
managed 7864320
# echo online_movable > memory4097/state
bash: echo: write error: Invalid argument
# grep -A 35 "Node 2" /proc/zoneinfo
Node 2, zone Normal
<snip>
node_scanned 0
spanned 8388608
present 7864320
managed 7864320
online_movable operation failed because of failure of changing
the memory zone from ZONE_NORMAL to ZONE_MOVABLE
Fixes: df429ac039 ("memory-hotplug: more general validation of zone during online")
Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Booting Linux on an ARM fastmodel containing an SMMU emulation results
in an unexpected I/O page fault from the legacy virtio-blk PCI device:
[ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002
[ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
[ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000
[ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
<system hangs failing to read partition table>
This is because the legacy virtio-blk device is behind an SMMU, so we
have consequently swizzled its DMA ops and configured the SMMU to
translate accesses. This then requires the vring code to use the DMA API
to establish translations, otherwise all transactions will result in
fatal faults and termination.
Given that ARM-based systems only see an SMMU if one is really present
(the topology is all described by firmware tables such as device-tree or
IORT), then we can safely use the DMA API for all legacy virtio devices.
Modern devices can advertise the prescense of an IOMMU using the
VIRTIO_F_IOMMU_PLATFORM feature flag.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 876945dbf6 ("arm64: Hook up IOMMU dma_ops")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Once DMA API usage is enabled, it becomes apparent that virtio-mmio is
inadvertently relying on the default 32-bit DMA mask, which leads to
problems like rapidly exhausting SWIOTLB bounce buffers.
Ensure that we set the appropriate 64-bit DMA mask whenever possible,
with the coherent mask suitably limited for the legacy vring as per
a0be1db430 ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio
devices").
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Fixes: b42111382f ("virtio_mmio: Use the DMA API if enabled")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Propagate the error when vhost_vq_init_access() fails and set
vq->private_data to NULL.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SHORT SUMMARY OF CHANGES FOR LINUS
MAINTAINERS:
- Add myself to X86 PLATFORM DRIVERS as a co-maintainer
ideapad-laptop:
- handle ACPI event 1
intel_mid_powerbtn:
- Set IRQ_ONESHOT
surface3-wmi:
- fix uninitialized symbol
- Shut up unused-function warning
mlx-platform:
- free first dev on error
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEhiZOUlnC9oKN3n3AmT3/83c5Sy0FAliHm4gACgkQmT3/83c5
Sy0g1A/+PkP7Eqgd6wR1EO589ZVuJh9eMy1Oa65MBxOZMI0XH9CCAgvZyOKMwcZV
Pyjufm6VFWNroxvgLIgpo2j6+fwHd04+yzDlGIv9qKkwsMwjbOi0UyGV+NuI8mZC
7YnZA8r2zQ7Mhyzjw0khvL00h1vYkfXWFtxD4p3x2d1Qb7TUT1Yo58vlaHpPygfY
6ghlYSzyD0gXC10Fa5QT5eUVzE8L4y1RpKPklX9ihwuntIvcqsV+Caz/81iK8lHP
qMesQDSoxH61AZRYdLQ2QHP6k7Y+EwY/40YGs6mjY2HEn5w0zEbRN8jPK4u09Gcj
qay5DKSNlSLXvnIHbXtmlozsz/gowwMAUGFH19Q72tDLOHMlIGbqBQA4Rfz4ADqX
b61zOhTI8Xho2vz6KO2aQsQaIEXpDhw+mWlwFq+qyCCl4fs3QRXIz/sVQae1uM6C
BwrPJgAPLuBKTkLI/gb5XLR/u4nDTC4rix9r1IrABxKQNQgMm5KtWnSmuGfM0gvs
SJQ75JijkA6e3+NxVqcWJgSAUWkkIwDNXGe78RWFet0CTcjMAByIlwVFQy9CTj1T
UUWyq7Gh34KndQJ1/SpzTd5aqxK+bxoMKJ4AIy88pM73IrsnLWIB7Y8FUgbmAbqi
c9BSEfN6LVnBXOW2IWXkdh25l0MaJlvkjlvvvuwXYDEmGz4HDpM=
=XDl+
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform-driver fixes from Andy Shevchenko:
"This is my first pull request since I become a co-maintainer of
Platform Drivers x86 subsystem. It's a bit bigger than usual due to
material collected for almost two weeks in a row.
MAINTAINERS:
- Add myself to X86 PLATFORM DRIVERS as a co-maintainer
ideapad-laptop:
- handle ACPI event 1
intel_mid_powerbtn:
- Set IRQ_ONESHOT
surface3-wmi:
- fix uninitialized symbol
- Shut up unused-function warning
mlx-platform:
- free first dev on error"
* tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86:
MAINTAINERS: Add myself to X86 PLATFORM DRIVERS as a co-maintainer
platform/x86: ideapad-laptop: handle ACPI event 1
platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT
platform/x86: surface3-wmi: fix uninitialized symbol
platform/x86: surface3-wmi: Shut up unused-function warning
platform/x86: mlx-platform: free first dev on error
Pull namespace fix from Eric Biederman:
"This has a single brown bag fix.
The possible deadlock with dec_pid_namespaces that I had thought was
fixed earlier turned out only to have been moved. So instead of being
cleaver this change takes ucounts_lock with irqs disabled. So
dec_ucount can be used from any context without fear of deadlock.
The items accounted for dec_ucount and inc_ucount are all
comparatively heavy weight objects so I don't exepct this will have
any measurable performance impact"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Make ucounts lock irq-safe