Commit graph

5348 commits

Author SHA1 Message Date
Rafael J. Wysocki
c24efa6732 PM: runtime: Capture device status before disabling runtime PM
In some cases (for example, during system-wide suspend and resume of
devices) it is useful to know whether or not runtime PM has ever been
enabled for a given device and, if so, what the runtime PM status of
it had been right before runtime PM was disabled for it last time.

For this reason, introduce a new struct dev_pm_info field called
last_status that will be used for capturing the runtime PM status of
the device when its power.disable_depth counter changes from 0 to 1.

The new field will be set to RPM_INVALID to start with and whenever
power.disable_depth changes from 1 to 0, so it will be valid only
when runtime PM of the device is currently disabled, but it has been
enabled at least once.

Immediately use power.last_status in rpm_resume() to make it handle
the case when PM runtime is disabled for the device, but its runtime
PM status is RPM_ACTIVE more consistently.  Namely, make it return 1
if power.last_status is also equal to RPM_ACTIVE in that case (the
idea being that if the status was RPM_ACTIVE last time when
power.disable_depth was changing from 0 to 1 and it is still
RPM_ACTIVE, it can be assumed to reflect what happened to the device
last time when it was using runtime PM) and -EACCES otherwise.

Update the documentation to provide a description of last_status and
change the description of pm_runtime_resume() in it to reflect the
new behavior of rpm_active().

While at it, rearrange the code in pm_runtime_enable() to be more
straightforward and replace the WARN() macro in it with a pr_warn()
invocation which is less disruptive.

Link: https://lore.kernel.org/linux-pm/20211026222626.39222-1-ulf.hansson@linaro.org/t/#u
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17 16:16:40 +01:00
Thomas Gleixner
a80713fea3 platform-msi: Simplify platform device MSI code
The allocation code is overly complex. It tries to have the MSI index space
packed, which is not working when an interrupt is freed. There is no
requirement for this. The only requirement is that the MSI index is unique.

Move the MSI descriptor allocation into msi_domain_populate_irqs() and use
the Linux interrupt number as MSI index which fulfils the unique
requirement.

This requires to lock the MSI descriptors which makes the lock order
reverse to the regular MSI alloc/free functions vs. the domain
mutex. Assign a seperate lockdep class for these MSI device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.956731741@linutronix.de
2021-12-16 22:22:19 +01:00
Thomas Gleixner
653b50c5f9 platform-msi: Let core code handle MSI descriptors
Use the core functionality for platform MSI interrupt domains. The platform
device MSI interrupt domains will be converted in a later step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.903173257@linutronix.de
2021-12-16 22:22:19 +01:00
Thomas Gleixner
125282cd4f genirq/msi: Move descriptor list to struct msi_device_data
It's only required when MSI is in use.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210747.650487479@linutronix.de
2021-12-16 22:22:16 +01:00
Thomas Gleixner
dba27c7fa3 platform-msi: Use msi_desc::msi_index
Use the common msi_index member and get rid of the pointless wrapper struct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.413638645@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
fc22e7dbcd platform-msi: Store platform private data pointer in msi_device_data
Storing the platform private data in a MSI descriptor is sloppy at
best. The data belongs to the device and not to the descriptor.
Add a pointer to struct msi_device_data and store the pointer there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.287680528@linutronix.de
2021-12-16 22:16:39 +01:00
Thomas Gleixner
9835cec6d5 platform-msi: Rename functions and clarify comments
It's hard to distinguish what platform_msi_domain_alloc() and
platform_msi_domain_alloc_irqs() are about. Make the distinction more
explicit and add comments which explain the use cases properly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.228706214@linutronix.de
2021-12-16 22:16:39 +01:00
Thomas Gleixner
25ce693ef7 platform-msi: Let the core code handle sysfs groups
Set the domain info flag and remove the local sysfs code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.109408832@linutronix.de
2021-12-16 22:16:39 +01:00
Thomas Gleixner
077aeadb6c platform-msi: Allocate MSI device data on first use
Allocate the MSI device data on first invocation of the allocation function
for platform MSI private data.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221813.805529729@linutronix.de
2021-12-16 22:16:38 +01:00
Thomas Gleixner
34fff62827 device: Move MSI related data into a struct
The only unconditional part of MSI data in struct device is the irqdomain
pointer. Everything else can be allocated on demand. Create a data
structure and move the irqdomain pointer into it. The other MSI specific
parts are going to be removed from struct device in later steps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211210221813.617178827@linutronix.de
2021-12-16 22:16:38 +01:00
Mateusz Jończyk
0dd8d6cb9e rtc: Check return value from mc146818_get_time()
There are 4 users of mc146818_get_time() and none of them was checking
the return value from this function. Change this.

Print the appropriate warnings in callers of mc146818_get_time() instead
of in the function mc146818_get_time() itself, in order not to add
strings to rtc-mc146818-lib.c, which is kind of a library.

The callers of alpha_rtc_read_time() and cmos_read_time() may use the
contents of (struct rtc_time *) even when the functions return a failure
code. Therefore, set the contents of (struct rtc_time *) to 0x00,
which looks more sensible then 0xff and aligns with the (possibly
stale?) comment in cmos_read_time:

	/*
	 * If pm_trace abused the RTC for storage, set the timespec to 0,
	 * which tells the caller that this RTC value is unusable.
	 */

For consistency, do this in mc146818_get_time().

Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a
second. It is very unlikely, though, that the RTC suddenly stops
working and mc146818_get_time() would consistently fail.

Only compile-tested on alpha.

Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: linux-alpha@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211210200131.153887-4-mat.jonczyk@o2.pl
2021-12-16 21:50:06 +01:00
Jarkko Sakkinen
50468e4313 x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem ==

The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems.  It can be as small as dozens of MB's
and as large as many GB's on servers.  Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.

== Solution ==

Introduce a new sysfs file:

	/sys/devices/system/node/nodeX/x86/sgx_total_bytes

to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.

'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system.  They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available.  'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.

== Implementation Details ==

Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:

== ABI Design Discussion ==

As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node.  Essentially, a
single value would rule out NUMA optimizations for enclaves.

Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory.  Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:

	MemTotal:       xxxx kB // sgx_total_bytes (implemented here)
	MemFree:        yyyy kB // sgx_free_bytes
	SwapTotal:      zzzz kB // sgx_swapped_bytes

So, at *least* three.  I think we will eventually end up needing
something more along the lines of a dozen.  A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.

Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific.  It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX.  Using "sgx/"
as opposed to "x86/" was also considered.  But, there is a real chance
this can get used for other arch-specific purposes.

[ dhansen: rewrite changelog ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-12-09 07:02:22 -08:00
Thomas Gleixner
cd119b09a8 PCI/MSI: Move msi_lock to struct pci_dev
It's only required for PCI/MSI. So no point in having it in every struct
device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211206210224.925241961@linutronix.de
2021-12-09 11:52:22 +01:00
Daniel Scally
c097af1d0a device property: Check fwnode->secondary when finding properties
fwnode_property_get_reference_args() searches for named properties
against a fwnode_handle, but these could instead be against the fwnode's
secondary. If the property isn't found against the primary, check the
secondary to see if it's there instead.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Daniel Scally <djrscally@gmail.com>
Link: https://lore.kernel.org/r/20211128232455.39332-1-djrscally@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 16:42:43 +01:00
Ira Weiny
e1b5186810 Documentation/auxiliary_bus: Move the text into the code
The code and documentation are more difficult to maintain when kept
separately.  This is further compounded when the standard structure
documentation infrastructure is not used.

Move the documentation into the code, use the standard documentation
infrastructure, add current documented functions, and reference the text
in the rst file.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20211202044305.4006853-8-ira.weiny@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 16:41:50 +01:00
Ira Weiny
8a2d6ffe77 Documentation/auxiliary_bus: Clarify the release of devices from find device
auxiliary_find_device() takes a proper get_device() reference on the
device before returning the matched device.

Users of this call should be informed that they need to properly release
this reference with put_device().

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20211202044305.4006853-7-ira.weiny@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 16:41:50 +01:00
Ira Weiny
05021dca78 Documentation/auxiliary_bus: Clarify __auxiliary_driver_register
__auxiliary_driver_register is not intended to be called directly unless
a custom name is required.  Add documentation for this fact.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20211202044305.4006853-5-ira.weiny@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 16:41:50 +01:00
Ira Weiny
b247703873 Documentation/auxiliary_bus: Clarify auxiliary_device creation
The documentation for creating an auxiliary device is a 3 step not a 2
step process.  Specifically the requirements of setting the name, id,
dev.release, and dev.parent fields was not clear as a precursor to the '2
step' process documented.

Clarify by declaring this a 3 step process starting with setting the
fields of struct auxiliary_device correctly.

Also add some sample code and tie the change into the rest of the
documentation.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20211202044305.4006853-2-ira.weiny@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 16:41:50 +01:00
Heiko Carstens
f1045056c7 topology/sysfs: rework book and drawer topology ifdefery
Provide default defines for the topology_book_[id|cpumask] and
topology_drawer_[id|cpumask] macros just like for each other topology
level.
This way all topology levels are handled in a similar way. Still the
the book and drawer levels are only used on s390, and also the sysfs
attributes are only created on s390. However other architectures may
opt in if wanted.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129130309.3256168-4-hca@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 15:58:27 +01:00
Heiko Carstens
e795707703 topology/sysfs: export cluster attributes only if an architectures has support
The cluster_id and cluster_cpus topology sysfs attributes have been
added with commit c5e22feffd ("topology: Represent clusters of CPUs
within a die").

They are currently only used for x86, arm64, and riscv (via generic
arch topology), however they are still present with bogus default
values for all other architectures. Instead of enforcing such new
sysfs attributes to all architectures, make them only optional visible
if an architecture opts in by defining both the topology_cluster_id
and topology_cluster_cpumask attributes.

This is similar to what was done when the book and drawer topology
levels were introduced: avoid useless and therefore confusing sysfs
attributes for architectures which cannot make use of them.

This should not break any existing applications, since this is a
new interface introduced with the v5.16 merge window.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129130309.3256168-3-hca@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 15:58:27 +01:00
Heiko Carstens
2c4dcd7fd5 topology/sysfs: export die attributes only if an architectures has support
The die_id and die_cpus topology sysfs attributes have been added with
commit 0e344d8c70 ("cpu/topology: Export die_id") and commit
2e4c54dac7 ("topology: Create core_cpus and die_cpus sysfs attributes").

While they are currently only used and useful for x86 they are still
present with bogus default values for all architectures. Instead of
enforcing such new sysfs attributes to all architectures, make them
only optional visible if an architecture opts in by defining both the
topology_die_id and topology_die_cpumask attributes.

This is similar to what was done when the book and drawer topology
levels were introduced: avoid useless and therefore confusing sysfs
attributes for architectures which cannot make use of them.

This should not break any existing applications, since this is a
rather new interface and applications should be able to handle also
older kernel versions without such attributes - besides that they
contain only useful information for x86.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129130309.3256168-2-hca@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-03 15:58:27 +01:00
Cai Huoqing
2043727c28 driver core: platform: Make use of the helper function dev_err_probe()
When possible using dev_err_probe() helps to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20211105071509.969-1-caihuoqing@baidu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-27 10:56:45 +01:00
Heikki Krogerus
2338e7bcef device property: Remove device_add_properties() API
There are no more users for it.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24 15:28:01 +01:00
Heikki Krogerus
982b94ba09 driver core: Don't call device_remove_properties() from device_del()
All the drivers that relied on device_del() to call
device_remove_properties() have now been converted to either
use device_create_managed_software_node() instead of
device_add_properties(), or to register the software node
completely separately from the device.

This will make it finally possible to share and reuse the
software nodes that hold the additional device properties.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24 15:28:01 +01:00
Lukasz Luba
7e97b3dc25 arch_topology: Remove unused topology_set_thermal_pressure() and related
There is no need of this function (and related) since code has been
converted to use the new arch_update_thermal_pressure() API. The old
code can be removed.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-23 15:10:26 +05:30
Lukasz Luba
c214f12416 arch_topology: Introduce thermal pressure update function
The thermal pressure is a mechanism which is used for providing
information about reduced CPU performance to the scheduler. Usually code
has to convert the value from frequency units into capacity units,
which are understandable by the scheduler. Create a common conversion code
which can be just used via a handy API.

Internally, the topology_update_thermal_pressure() operates on frequency
in MHz and max CPU frequency is taken from 'freq_factor' (per-cpu).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-23 15:10:26 +05:30
Ansuel Smith
02d6fdecb9
regmap: allow to define reg_update_bits for no bus configuration
Some device requires a special handling for reg_update_bits and can't use
the normal regmap read write logic. An example is when locking is
handled by the device and rmw operations requires to do atomic operations.
Allow to declare a dedicated function in regmap_config for
reg_update_bits in no bus configuration.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20211104150040.1260-1-ansuelsmth@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-11-15 13:27:13 +00:00
Wang ShaoBo
4cc4cc28ec arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
When testing cpu online and offline, warning happened like this:

[  146.746743] WARNING: CPU: 92 PID: 974 at kernel/sched/topology.c:2215 build_sched_domains+0x81c/0x11b0
[  146.749988] CPU: 92 PID: 974 Comm: kworker/92:2 Not tainted 5.15.0 #9
[  146.750402] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.79 08/21/2021
[  146.751213] Workqueue: events cpuset_hotplug_workfn
[  146.751629] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  146.752048] pc : build_sched_domains+0x81c/0x11b0
[  146.752461] lr : build_sched_domains+0x414/0x11b0
[  146.752860] sp : ffff800040a83a80
[  146.753247] x29: ffff800040a83a80 x28: ffff20801f13a980 x27: ffff20800448ae00
[  146.753644] x26: ffff800012a858e8 x25: ffff800012ea48c0 x24: 0000000000000000
[  146.754039] x23: ffff800010ab7d60 x22: ffff800012f03758 x21: 000000000000005f
[  146.754427] x20: 000000000000005c x19: ffff004080012840 x18: ffffffffffffffff
[  146.754814] x17: 3661613030303230 x16: 30303078303a3239 x15: ffff800011f92b48
[  146.755197] x14: ffff20be3f95cef6 x13: 2e6e69616d6f642d x12: 6465686373204c4c
[  146.755578] x11: ffff20bf7fc83a00 x10: 0000000000000040 x9 : 0000000000000000
[  146.755957] x8 : 0000000000000002 x7 : ffffffffe0000000 x6 : 0000000000000002
[  146.756334] x5 : 0000000090000000 x4 : 00000000f0000000 x3 : 0000000000000001
[  146.756705] x2 : 0000000000000080 x1 : ffff800012f03860 x0 : 0000000000000001
[  146.757070] Call trace:
[  146.757421]  build_sched_domains+0x81c/0x11b0
[  146.757771]  partition_sched_domains_locked+0x57c/0x978
[  146.758118]  rebuild_sched_domains_locked+0x44c/0x7f0
[  146.758460]  rebuild_sched_domains+0x2c/0x48
[  146.758791]  cpuset_hotplug_workfn+0x3fc/0x888
[  146.759114]  process_one_work+0x1f4/0x480
[  146.759429]  worker_thread+0x48/0x460
[  146.759734]  kthread+0x158/0x168
[  146.760030]  ret_from_fork+0x10/0x20
[  146.760318] ---[ end trace 82c44aad6900e81a ]---

For some architectures like risc-v and arm64 which use common code
clear_cpu_topology() in shutting down CPUx, When CONFIG_SCHED_CLUSTER
is set, cluster_sibling in cpu_topology of each sibling adjacent
to CPUx is missed clearing, this causes checking failed in
topology_span_sane() and rebuilding topology failure at end when CPU online.

Different sibling's cluster_sibling in cpu_topology[] when CPU92 offline
(CPU 92, 93, 94, 95 are in one cluster):

Before revision:
CPU                 [92]      [93]      [94]      [95]
cluster_sibling     [92]     [92-95]   [92-95]   [92-95]

After revision:
CPU                 [92]      [93]      [94]      [95]
cluster_sibling     [92]     [93-95]   [93-95]   [93-95]

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Barry Song <song.bao.hua@hisilicon.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20211110095856.469360-1-bobo.shaobowang@huawei.com
2021-11-11 13:09:33 +01:00
Linus Torvalds
d422555f32 More power management updates for 5.16-rc1
- Fix 2 intel_pstate driver regressions related to the HWP interrupt
    handling added recently (Srinivas Pandruvada).
 
  - Fix intel_pstate driver regression introduced during the 5.11
    cycle and causing HWP desired performance to be mishandled in
    some cases when switching driver modes and during system
    suspend and shutdown (Rafael Wysocki).
 
  - Fix system-wide device suspend and resume locking to avoid
    deadlocks when device objects are deleted during a system-wide
    PM transition (Rafael Wysocki).
 
  - Modify system-wide suspend of devices to prevent cpuidle drivers
    based on runtime PM from misbehaving during the "no IRQ" phase of
    it (Ulf Hansson).
 
  - Fix return value of _opp_add_static_v2() helper (YueHaibing).
 
  - Fix required-opp handle count (Pavankumar Kondeti).
 
  - Add resource managed OPP helpers, update dev_pm_opp_attach_genpd(),
    update their devfreq users, and make minor DT binding change (Dmitry
    Osipenko).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGL0/4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxaeMP/09xsXf0LP2LhP9YY77kvdrMONjyh1nj
 XuelrhuWl1Cz2X+gh3d7zXrjbuPW0lZF8Syc8GY1Ct8o3omQ8jnyiH1LIBfkDVx7
 WqVk8rGMMBXNpg3Os+MMADm3s0woW8c7IF+5cYwxL03OaOGH8uHiZxS9ZTLocH6H
 sOh9EAy72b/vLCq4aAzrG8ux10k9t8+2A6EuxzKrkO8rz/egtLjb8KznCER4n4nx
 lJ5Us+Kpo9TwuQ4pwujcvk7M9U2boHeKEk8ItiQUQwpW1VJsEUg1konz/UziE3Nf
 2vHZIellKs3RB/cVzFZsIJJt0etDss6YtivAMdpgBS+ciiZYgyjqKY+whZ/eBrue
 7nn79kSIhi0TVsVnM02HUYpz8oWHo0P0xqZr0t8IzOU+xvJF1e3fb/AkiQbGhYPs
 OXUTDqdOHFPplymRy6yrq3NqXyfqYVEKMHXWsQ+uY6FhOvVQTlkBdWscF9inoSDf
 LCmoc+xYLIl9R79syq9BqmVDE9cKLTZowmkexKI7MNu32ZQrPrKNdjMbYvvU0Sxf
 248RsZokiobnmXZdEupIEKXmI4ANyZFbeU0CL22MY3B14YcRqN7xPWXz2a/6fXWL
 h+DS8GCO6j3fgdrywUwA9/LyxR4SOJYEzrC3m34atufMm9LFYII6ShTh96DDPQV6
 i5hcVlC5wXAb
 =lduU
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These fix three intel_pstate driver regressions, fix locking in the
  core code suspending and resuming devices during system PM
  transitions, fix the handling of cpuidle drivers based on runtime PM
  during system-wide suspend, fix two issues in the operating
  performance points (OPP) framework and resource-managed helpers to it.

  Specifics:

   - Fix two intel_pstate driver regressions related to the HWP
     interrupt handling added recently (Srinivas Pandruvada).

   - Fix intel_pstate driver regression introduced during the 5.11 cycle
     and causing HWP desired performance to be mishandled in some cases
     when switching driver modes and during system suspend and shutdown
     (Rafael Wysocki).

   - Fix system-wide device suspend and resume locking to avoid
     deadlocks when device objects are deleted during a system-wide PM
     transition (Rafael Wysocki).

   - Modify system-wide suspend of devices to prevent cpuidle drivers
     based on runtime PM from misbehaving during the "no IRQ" phase of
     it (Ulf Hansson).

   - Fix return value of _opp_add_static_v2() helper (YueHaibing).

   - Fix required-opp handle count (Pavankumar Kondeti).

   - Add resource managed OPP helpers, update dev_pm_opp_attach_genpd(),
     update their devfreq users, and make minor DT binding change
     (Dmitry Osipenko)"

* tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Avoid calling put_device() under dpm_list_mtx
  cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable
  cpufreq: intel_pstate: Fix unchecked MSR 0x773 access
  cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline
  PM: sleep: Fix runtime PM based cpuidle support
  dt-bindings: opp: Allow multi-worded OPP entry name
  opp: Fix return in _opp_add_static_v2()
  PM / devfreq: tegra30: Check whether clk_round_rate() returns zero rate
  PM / devfreq: tegra30: Use resource-managed helpers
  PM / devfreq: Add devm_devfreq_add_governor()
  opp: Add more resource-managed variants of dev_pm_opp_of_add_table()
  opp: Change type of dev_pm_opp_attach_genpd(names) argument
  opp: Fix required-opps phandle array count check
2021-11-10 11:59:55 -08:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
David Hildenbrand
50f9481ed9 mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Mike Rapoport
4421cca0a3 memblock: use memblock_free for freeing virtual pointers
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()

The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.

    @@
    identifier vaddr;
    expression size;
    @@
    (
    - memblock_phys_free(__pa(vaddr), size);
    + memblock_free(vaddr, size);
    |
    - memblock_free_ptr(vaddr, size);
    + memblock_free(vaddr, size);
    )

[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au

Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
fa27717110 memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().

Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().

Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
5787ea5bed arch_numa: simplify numa_distance allocation
Patch series "memblock: cleanup memblock_free interface", v2.

This is the fix for memblock freeing APIs mismatch [1].

The first patch is a cleanup of numa_distance allocation in arch_numa
I've spotted during the conversion.  The second patch is a fix for Xen
memory freeing on some of the error paths.

[1] https://lore.kernel.org/all/CAHk-=wj9k4LZTz+svCxLYs5Y1=+yKrbAUArH1+ghyG3OLd8VVg@mail.gmail.com

This patch (of 6):

Memory allocation of numa_distance uses memblock_phys_alloc_range()
without actual range limits, converts the returned physical address to
virtual and then only uses the virtual address for further
initialization.

Simplify this by replacing memblock_phys_alloc_range() with
memblock_alloc().

Link: https://lkml.kernel.org/r/20210930185031.18648-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210930185031.18648-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:40 -07:00
Kefeng Wang
09cea61950 arm64: support page mapping percpu first chunk allocator
Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,

  percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000
  percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000
  percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000

then we could get

  WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838

and the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:37 -07:00
Rafael J. Wysocki
2aa36604e8 PM: sleep: Avoid calling put_device() under dpm_list_mtx
It is generally unsafe to call put_device() with dpm_list_mtx held,
because the given device's release routine may carry out an action
depending on that lock which then may deadlock, so modify the
system-wide suspend and resume of devices to always drop dpm_list_mtx
before calling put_device() (and adjust white space somewhat while
at it).

For instance, this prevents the following splat from showing up in
the kernel log after a system resume in certain configurations:

[ 3290.969514] ======================================================
[ 3290.969517] WARNING: possible circular locking dependency detected
[ 3290.969519] 5.15.0+ #2420 Tainted: G S
[ 3290.969523] ------------------------------------------------------
[ 3290.969525] systemd-sleep/4553 is trying to acquire lock:
[ 3290.969529] ffff888117ab1138 ((wq_completion)hci0#2){+.+.}-{0:0}, at: flush_workqueue+0x87/0x4a0
[ 3290.969554]
               but task is already holding lock:
[ 3290.969556] ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0
[ 3290.969571]
               which lock already depends on the new lock.

[ 3290.969573]
               the existing dependency chain (in reverse order) is:
[ 3290.969575]
               -> #3 (dpm_list_mtx){+.+.}-{3:3}:
[ 3290.969583]        __mutex_lock+0x9d/0xa30
[ 3290.969591]        device_pm_add+0x2e/0xe0
[ 3290.969597]        device_add+0x4d5/0x8f0
[ 3290.969605]        hci_conn_add_sysfs+0x43/0xb0 [bluetooth]
[ 3290.969689]        hci_conn_complete_evt.isra.71+0x124/0x750 [bluetooth]
[ 3290.969747]        hci_event_packet+0xd6c/0x28a0 [bluetooth]
[ 3290.969798]        hci_rx_work+0x213/0x640 [bluetooth]
[ 3290.969842]        process_one_work+0x2aa/0x650
[ 3290.969851]        worker_thread+0x39/0x400
[ 3290.969859]        kthread+0x142/0x170
[ 3290.969865]        ret_from_fork+0x22/0x30
[ 3290.969872]
               -> #2 (&hdev->lock){+.+.}-{3:3}:
[ 3290.969881]        __mutex_lock+0x9d/0xa30
[ 3290.969887]        hci_event_packet+0xba/0x28a0 [bluetooth]
[ 3290.969935]        hci_rx_work+0x213/0x640 [bluetooth]
[ 3290.969978]        process_one_work+0x2aa/0x650
[ 3290.969985]        worker_thread+0x39/0x400
[ 3290.969993]        kthread+0x142/0x170
[ 3290.969999]        ret_from_fork+0x22/0x30
[ 3290.970004]
               -> #1 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}:
[ 3290.970013]        process_one_work+0x27d/0x650
[ 3290.970020]        worker_thread+0x39/0x400
[ 3290.970028]        kthread+0x142/0x170
[ 3290.970033]        ret_from_fork+0x22/0x30
[ 3290.970038]
               -> #0 ((wq_completion)hci0#2){+.+.}-{0:0}:
[ 3290.970047]        __lock_acquire+0x15cb/0x1b50
[ 3290.970054]        lock_acquire+0x26c/0x300
[ 3290.970059]        flush_workqueue+0xae/0x4a0
[ 3290.970066]        drain_workqueue+0xa1/0x130
[ 3290.970073]        destroy_workqueue+0x34/0x1f0
[ 3290.970081]        hci_release_dev+0x49/0x180 [bluetooth]
[ 3290.970130]        bt_host_release+0x1d/0x30 [bluetooth]
[ 3290.970195]        device_release+0x33/0x90
[ 3290.970201]        kobject_release+0x63/0x160
[ 3290.970211]        dpm_resume+0x164/0x3e0
[ 3290.970215]        dpm_resume_end+0xd/0x20
[ 3290.970220]        suspend_devices_and_enter+0x1a4/0xba0
[ 3290.970229]        pm_suspend+0x26b/0x310
[ 3290.970236]        state_store+0x42/0x90
[ 3290.970243]        kernfs_fop_write_iter+0x135/0x1b0
[ 3290.970251]        new_sync_write+0x125/0x1c0
[ 3290.970257]        vfs_write+0x360/0x3c0
[ 3290.970263]        ksys_write+0xa7/0xe0
[ 3290.970269]        do_syscall_64+0x3a/0x80
[ 3290.970276]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3290.970284]
               other info that might help us debug this:

[ 3290.970285] Chain exists of:
                 (wq_completion)hci0#2 --> &hdev->lock --> dpm_list_mtx

[ 3290.970297]  Possible unsafe locking scenario:

[ 3290.970299]        CPU0                    CPU1
[ 3290.970300]        ----                    ----
[ 3290.970302]   lock(dpm_list_mtx);
[ 3290.970306]                                lock(&hdev->lock);
[ 3290.970310]                                lock(dpm_list_mtx);
[ 3290.970314]   lock((wq_completion)hci0#2);
[ 3290.970319]
                *** DEADLOCK ***

[ 3290.970321] 7 locks held by systemd-sleep/4553:
[ 3290.970325]  #0: ffff888103bcd448 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xa7/0xe0
[ 3290.970341]  #1: ffff888115a14488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x103/0x1b0
[ 3290.970355]  #2: ffff888100f719e0 (kn->active#233){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1b0
[ 3290.970369]  #3: ffffffff82661048 (autosleep_lock){+.+.}-{3:3}, at: state_store+0x12/0x90
[ 3290.970384]  #4: ffffffff82658ac8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9f/0x310
[ 3290.970399]  #5: ffffffff827f2a48 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x4c/0x80
[ 3290.970416]  #6: ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0
[ 3290.970428]
               stack backtrace:
[ 3290.970431] CPU: 3 PID: 4553 Comm: systemd-sleep Tainted: G S                5.15.0+ #2420
[ 3290.970438] Hardware name: Dell Inc. XPS 13 9380/0RYJWW, BIOS 1.5.0 06/03/2019
[ 3290.970441] Call Trace:
[ 3290.970446]  dump_stack_lvl+0x44/0x57
[ 3290.970454]  check_noncircular+0x105/0x120
[ 3290.970468]  ? __lock_acquire+0x15cb/0x1b50
[ 3290.970474]  __lock_acquire+0x15cb/0x1b50
[ 3290.970487]  lock_acquire+0x26c/0x300
[ 3290.970493]  ? flush_workqueue+0x87/0x4a0
[ 3290.970503]  ? __raw_spin_lock_init+0x3b/0x60
[ 3290.970510]  ? lockdep_init_map_type+0x58/0x240
[ 3290.970519]  flush_workqueue+0xae/0x4a0
[ 3290.970526]  ? flush_workqueue+0x87/0x4a0
[ 3290.970544]  ? drain_workqueue+0xa1/0x130
[ 3290.970552]  drain_workqueue+0xa1/0x130
[ 3290.970561]  destroy_workqueue+0x34/0x1f0
[ 3290.970572]  hci_release_dev+0x49/0x180 [bluetooth]
[ 3290.970624]  bt_host_release+0x1d/0x30 [bluetooth]
[ 3290.970687]  device_release+0x33/0x90
[ 3290.970695]  kobject_release+0x63/0x160
[ 3290.970705]  dpm_resume+0x164/0x3e0
[ 3290.970710]  ? dpm_resume_early+0x251/0x3b0
[ 3290.970718]  dpm_resume_end+0xd/0x20
[ 3290.970723]  suspend_devices_and_enter+0x1a4/0xba0
[ 3290.970737]  pm_suspend+0x26b/0x310
[ 3290.970746]  state_store+0x42/0x90
[ 3290.970755]  kernfs_fop_write_iter+0x135/0x1b0
[ 3290.970764]  new_sync_write+0x125/0x1c0
[ 3290.970777]  vfs_write+0x360/0x3c0
[ 3290.970785]  ksys_write+0xa7/0xe0
[ 3290.970794]  do_syscall_64+0x3a/0x80
[ 3290.970803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3290.970811] RIP: 0033:0x7f41b1328164
[ 3290.970819] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 4a d2 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[ 3290.970824] RSP: 002b:00007ffe6ae21b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3290.970831] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f41b1328164
[ 3290.970836] RDX: 0000000000000004 RSI: 000055965e651070 RDI: 0000000000000004
[ 3290.970839] RBP: 000055965e651070 R08: 000055965e64f390 R09: 00007f41b1e3d1c0
[ 3290.970843] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000004
[ 3290.970846] R13: 0000000000000001 R14: 000055965e64f2b0 R15: 0000000000000004

Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-05 16:10:11 +01:00
Ulf Hansson
a2bd7be12b PM: sleep: Fix runtime PM based cpuidle support
In the cpuidle-psci case, runtime PM in combination with the generic PM
domain (genpd), may be used when entering/exiting a shared idlestate. More
precisely, genpd relies on runtime PM to be enabled for the attached device
(in this case it belongs to a CPU), to properly manage the reference
counting of its PM domain.

This works fine most of the time, but during system suspend in
dpm_suspend_late(), the PM core disables runtime PM for all devices. Beyond
this point, calls to pm_runtime_get_sync() to runtime resume a device may
fail and therefore it could also mess up the reference counting in genpd.

To fix this problem, let's call wake_up_all_idle_cpus() in
dpm_suspend_late(), prior to disabling runtime PM. In this way a device
that belongs to a CPU, becomes runtime resumed through cpuidle-psci and
stays like that, because the runtime PM usage count has been bumped in
device_prepare().

Diagnosed-by: Maulik Shah <mkshah@codeaurora.org>
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:45:38 +01:00
Linus Torvalds
95faf6ba65 Driver core changes for 5.16-rc1
Here is the big set of driver core changes for 5.16-rc1.
 
 All of these have been in linux-next for a while now with no reported
 problems.
 
 Included in here are:
 	- big update and cleanup of the sysfs abi documentation files
 	  and scripts from Mauro.  We are almost at the place where we
 	  can properly check that the running kernel's sysfs abi is
 	  documented fully.
 	- firmware loader updates
 	- dyndbg updates
 	- kernfs cleanups and fixes from Christoph
 	- device property updates
 	- component fix
 	- other minor driver core cleanups and fixes
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPbjQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ync9gCfXKMUI1GAnCfJWAwTdTcd18q5akoAoMw32/AH
 0yh5TjAWFyFd7xz5d7qs
 =itsC
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core changes for 5.16-rc1.

  All of these have been in linux-next for a while now with no reported
  problems.

  Included in here are:

   - big update and cleanup of the sysfs abi documentation files and
     scripts from Mauro. We are almost at the place where we can
     properly check that the running kernel's sysfs abi is documented
     fully.

   - firmware loader updates

   - dyndbg updates

   - kernfs cleanups and fixes from Christoph

   - device property updates

   - component fix

   - other minor driver core cleanups and fixes"

* tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits)
  device property: Drop redundant NULL checks
  x86/build: Tuck away built-in firmware under FW_LOADER
  vmlinux.lds.h: wrap built-in firmware support under FW_LOADER
  firmware_loader: move struct builtin_fw to the only place used
  x86/microcode: Use the firmware_loader built-in API
  firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE()
  firmware_loader: formalize built-in firmware API
  component: do not leave master devres group open after bind
  dyndbg: refine verbosity 1-4 summary-detail
  gpiolib: acpi: Replace custom code with device_match_acpi_handle()
  i2c: acpi: Replace custom function with device_match_acpi_handle()
  driver core: Provide device_match_acpi_handle() helper
  dyndbg: fix spurious vNpr_info change
  dyndbg: no vpr-info on empty queries
  dyndbg: vpr-info on remove-module complete, not starting
  device property: Add missed header in fwnode.h
  Documentation: dyndbg: Improve cli param examples
  dyndbg: Remove support for ddebug_query param
  dyndbg: make dyndbg a known cli param
  dyndbg: show module in vpr-info in dd-exec-queries
  ...
2021-11-04 08:32:38 -07:00
Linus Torvalds
833db72142 Power management updates for 5.16-rc1
- Add support for inefficient operating performance points to the
    Energy Model and modify cpufreq to use them properly (Vincent
    Donnefort).
 
  - Rearrange the DTPM framework code to simplify it and make it easier
    to follow (Daniel Lezcano).
 
  - Fix power intialization in DTPM (Daniel Lezcano).
 
  - Add CPU load consideration when estimating the instaneous power
    consumption in DTPM (Daniel Lezcano).
 
  - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang
    Rui).
 
  - Make intel_pstate process HWP Guaranteed change notifications from
    the processor (Srinivas Pandruvada).
 
  - Fix typo in cpufreq.h (Rafael Wysocki).
 
  - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
 
  - Fix the parameter usage of the newly added perf-domain API (Hector
    Yuan).
 
  - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
    Guenter Roeck, and Arnd Bergmann).
 
  - Fix kobject memory leaks in cpuidle error paths (Anel Orazgaliyeva).
 
  - Make intel_idle enable interrupts before entering C1 on some Xeon
    processor models (Artem Bityutskiy).
 
  - Clean up hib_wait_io() (Falla Coulibaly).
 
  - Fix sparse warnings in hibernation-related code (Anders Roxell).
 
  - Use vzalloc() and kzalloc() instead of their open-coded
    equivalents in hibernation-related code (Cai Huoqing).
 
  - Prevent user space from crashing the kernel by attempting to
    restore the system state from a swap partition in use (Ye Bin).
 
  - Do not let "syscore" devices runtime-suspend during system PM
    transitions (Rafael Wysocki).
 
  - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).
 
  - Pause cpuidle later and resume it earlier during system PM
    transitions (Rafael Wysocki).
 
  - Make system suspend code use valid_state() consistently (Rafael
    Wysocki).
 
  - Add support for enabling wakeup IRQs after invoking the
    ->runtime_suspend() callback and make two drivers use it (Chunfeng
    Yun).
 
  - Make the association of ACPI device objects with PCI devices more
    straightforward and simplify the code doing that for all devices
    in general (Rafael Wysocki).
 
  - Eliminate struct pci_platform_pm_ops and handle the both of its
    users (PCI and Intel MID) directly in the PCI bus code (Rafael
    Wysocki).
 
  - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki).
 
  - Fix ordering of operations in pci_back_from_sleep() (Rafael
    Wysocki).
 
  - Make exynos-ppmu use hyphens in DT properties (Krzysztof
    Kozlowski).
 
  - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof
    Kozlowski).
 
  - Strengthen check for freq_table in devfreq (Samuel Holland).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGBkiQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxDZwP/1kyhVdk+FSMMwEfhn8NriOkE8drJ3nB
 Y/PWI93oPhgA7ITMwBv4ufLcxx6uYu5bctVDx3XG2lqvFC30t1fzTosJiwireRaS
 WC5p7ZTqAECxd1LiZWPjL5EzHmtbyzrz3/AxSxOEN1K66qENBd6q5QldKZylxhY3
 bawCQjabz6CLXvOjzdv8M8A7Fmd8LTcjHI5Y3/IOcDydPJqyN4/rDCoqft3/xcNB
 kwN6de73aQwB3AQYufS/VAiNN4XOOkhPwF4QfULmAlnMdCsov6YzahtMB2+oG7O4
 G3DF/OVFrONr3GPMMuMJSC6GXyFiBuW8FRva4W9HpY0MA8xVGLPUpwlpaFVaX1+c
 vAYcRBTyJvOWgRap8+q+UKTlkj37pAgHp7kRiaO1wkVnKxJB1w40OSJZO1nnsExe
 3qeCJHOJ9r+S/FsSPKCmws8vr0XQH5wPXY639Kmj9OI/t3gXGrfy3cXm9pa+gSh0
 eMyHxtCp5ItT7V2FMpYh+wn+wfe5h//sK3tESZs+h6FKwJG1hYIbG4+F3ztIgzHp
 t0rT3JXZIkY41KREGFhCMS9+wnLugOik21w9O0qVZfn/dJtDe73Kely7rY/EA3Mw
 H4aBJDD19BvbIKqaTguxJXEc9zJI737fy/Ze4rrzTDkbXU8qVmjvFoEl2i/Ef4o2
 b6aiDdz3V/CW
 =L2Jo
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These make the power management of PCI devices with ACPI companions
  more straightforwad, add support for inefficient operating performance
  points to the Energy model and make cpufreq handle them as
  appropriate, rearrange the handling of cpuidle during system PM
  transitions, update a few cpufreq drivers and intel_idle, fix assorded
  issues and clean up code in multiple places.

  Specifics:

   - Add support for inefficient operating performance points to the
     Energy Model and modify cpufreq to use them properly (Vincent
     Donnefort).

   - Rearrange the DTPM framework code to simplify it and make it easier
     to follow (Daniel Lezcano).

   - Fix power intialization in DTPM (Daniel Lezcano).

   - Add CPU load consideration when estimating the instaneous power
     consumption in DTPM (Daniel Lezcano).

   - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang
     Rui).

   - Make intel_pstate process HWP Guaranteed change notifications from
     the processor (Srinivas Pandruvada).

   - Fix typo in cpufreq.h (Rafael Wysocki).

   - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).

   - Fix the parameter usage of the newly added perf-domain API (Hector
     Yuan).

   - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
     Guenter Roeck, and Arnd Bergmann).

   - Fix kobject memory leaks in cpuidle error paths (Anel
     Orazgaliyeva).

   - Make intel_idle enable interrupts before entering C1 on some Xeon
     processor models (Artem Bityutskiy).

   - Clean up hib_wait_io() (Falla Coulibaly).

   - Fix sparse warnings in hibernation-related code (Anders Roxell).

   - Use vzalloc() and kzalloc() instead of their open-coded equivalents
     in hibernation-related code (Cai Huoqing).

   - Prevent user space from crashing the kernel by attempting to
     restore the system state from a swap partition in use (Ye Bin).

   - Do not let "syscore" devices runtime-suspend during system PM
     transitions (Rafael Wysocki).

   - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).

   - Pause cpuidle later and resume it earlier during system PM
     transitions (Rafael Wysocki).

   - Make system suspend code use valid_state() consistently (Rafael
     Wysocki).

   - Add support for enabling wakeup IRQs after invoking the
     ->runtime_suspend() callback and make two drivers use it (Chunfeng
     Yun).

   - Make the association of ACPI device objects with PCI devices more
     straightforward and simplify the code doing that for all devices in
     general (Rafael Wysocki).

   - Eliminate struct pci_platform_pm_ops and handle the both of its
     users (PCI and Intel MID) directly in the PCI bus code (Rafael
     Wysocki).

   - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki).

   - Fix ordering of operations in pci_back_from_sleep() (Rafael
     Wysocki).

   - Make exynos-ppmu use hyphens in DT properties (Krzysztof
     Kozlowski).

   - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof
     Kozlowski).

   - Strengthen check for freq_table in devfreq (Samuel Holland)"

* tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (49 commits)
  cpufreq: Fix parameter in parse_perf_domain()
  usb: mtu3: enable wake-up interrupt after runtime_suspend called
  usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called
  PM / wakeirq: support enabling wake-up irq after runtime_suspend called
  PM / devfreq: Strengthen check for freq_table
  devfreq: exynos-ppmu: simplify parsing event-type from DT
  devfreq: exynos-ppmu: use node names with hyphens
  cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
  PM: suspend: Use valid_state() consistently
  PM: sleep: Pause cpuidle later and resume it earlier during system transitions
  PM: suspend: Do not pause cpuidle in the suspend-to-idle path
  PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
  PM: hibernate: Get block device exclusively in swsusp_check()
  powercap/drivers/dtpm: Fix power limit initialization
  powercap/drivers/dtpm: Scale the power with the load
  powercap/drivers/dtpm: Use container_of instead of a private data field
  powercap/drivers/dtpm: Simplify the dtpm table
  powercap/drivers/dtpm: Encapsulate even more the code
  PM: hibernate: swap: Use vzalloc() and kzalloc()
  PM: hibernate: fix sparse warnings
  ...
2021-11-02 16:04:28 -07:00
Rafael J. Wysocki
b62b306469 Merge branch 'pm-sleep'
Merge updates related to system sleep for 5.16-rc1:

 - Clean up hib_wait_io() (Falla Coulibaly).

 - Fix sparse warnings in hibernation-related code (Anders Roxell).

 - Use vzalloc() and kzalloc() instead of their open-coded
   equivalents in hibernation-related code (Cai Huoqing).

 - Prevent user space from crashing the kernel by attempting to
   restore the system state from a swap partition in use (Ye Bin).

 - Do not let "syscore" devices runtime-suspend during system PM
   transitions (Rafael Wysocki).

 - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).

 - Pause cpuidle later and resume it earlier during system PM
   transitions (Rafael Wysocki).

 - Make system suspend code use valid_state() consistently (Rafael
   Wysocki).

 - Add support for enabling wakeup IRQs after invoking the
   ->runtime_suspend() callback and make two drivers use it (Chunfeng
   Yun).

* pm-sleep:
  usb: mtu3: enable wake-up interrupt after runtime_suspend called
  usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called
  PM / wakeirq: support enabling wake-up irq after runtime_suspend called
  PM: suspend: Use valid_state() consistently
  PM: sleep: Pause cpuidle later and resume it earlier during system transitions
  PM: suspend: Do not pause cpuidle in the suspend-to-idle path
  PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
  PM: hibernate: Get block device exclusively in swsusp_check()
  PM: hibernate: swap: Use vzalloc() and kzalloc()
  PM: hibernate: fix sparse warnings
  Revert "PM: sleep: Do not assume that "mem" is always present"
  PM: hibernate: Remove blk_status_to_errno in hib_wait_io
  PM: sleep: Do not assume that "mem" is always present
2021-11-02 19:11:49 +01:00
Linus Torvalds
fc02cb2b37 Core:
- Remove socket skb caches
 
  - Add a SO_RESERVE_MEM socket op to forward allocate buffer space
    and avoid memory accounting overhead on each message sent
 
  - Introduce managed neighbor entries - added by control plane and
    resolved by the kernel for use in acceleration paths (BPF / XDP
    right now, HW offload users will benefit as well)
 
  - Make neighbor eviction on link down controllable by userspace
    to work around WiFi networks with bad roaming implementations
 
  - vrf: Rework interaction with netfilter/conntrack
 
  - fq_codel: implement L4S style ce_threshold_ect1 marking
 
  - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
 
 BPF:
 
  - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
    as implemented in LLVM14
 
  - Introduce bpf_get_branch_snapshot() to capture Last Branch Records
 
  - Implement variadic trace_printk helper
 
  - Add a new Bloomfilter map type
 
  - Track <8-byte scalar spill and refill
 
  - Access hw timestamp through BPF's __sk_buff
 
  - Disallow unprivileged BPF by default
 
  - Document BPF licensing
 
 Netfilter:
 
  - Introduce egress hook for looking at raw outgoing packets
 
  - Allow matching on and modifying inner headers / payload data
 
  - Add NFT_META_IFTYPE to match on the interface type either from
    ingress or egress
 
 Protocols:
 
  - Multi-Path TCP:
    - increase default max additional subflows to 2
    - rework forward memory allocation
    - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS
 
  - MCTP flow support allowing lower layer drivers to configure msg
    muxing as needed
 
  - Automatic Multicast Tunneling (AMT) driver based on RFC7450
 
  - HSR support the redbox supervision frames (IEC-62439-3:2018)
 
  - Support for the ip6ip6 encapsulation of IOAM
 
  - Netlink interface for CAN-FD's Transmitter Delay Compensation
 
  - Support SMC-Rv2 eliminating the current same-subnet restriction,
    by exploiting the UDP encapsulation feature of RoCE adapters
 
  - TLS: add SM4 GCM/CCM crypto support
 
  - Bluetooth: initial support for link quality and audio/codec
    offload
 
 Driver APIs:
 
  - Add a batched interface for RX buffer allocation in AF_XDP
    buffer pool
 
  - ethtool: Add ability to control transceiver modules' power mode
 
  - phy: Introduce supported interfaces bitmap to express MAC
    capabilities and simplify PHY code
 
  - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks
 
 New drivers:
 
  - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)
 
  - Ethernet driver for ASIX AX88796C SPI device (x88796c)
 
 Drivers:
 
  - Broadcom PHYs
    - support 72165, 7712 16nm PHYs
    - support IDDQ-SR for additional power savings
 
  - PHY support for QCA8081, QCA9561 PHYs
 
  - NXP DPAA2: support for IRQ coalescing
 
  - NXP Ethernet (enetc): support for software TCP segmentation
 
  - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
    Gigabit-capable IP found on RZ/G2L SoC
 
  - Intel 100G Ethernet
    - support for eswitch offload of TC/OvS flow API, including
      offload of GRE, VxLAN, Geneve tunneling
    - support application device queues - ability to assign Rx and Tx
      queues to application threads
    - PTP and PPS (pulse-per-second) extensions
 
  - Broadcom Ethernet (bnxt)
    - devlink health reporting and device reload extensions
 
  - Mellanox Ethernet (mlx5)
    - offload macvlan interfaces
    - support HW offload of TC rules involving OVS internal ports
    - support HW-GRO and header/data split
    - support application device queues
 
  - Marvell OcteonTx2:
    - add XDP support for PF
    - add PTP support for VF
 
  - Qualcomm Ethernet switch (qca8k): support for QCA8328
 
  - Realtek Ethernet DSA switch (rtl8366rb)
    - support bridge offload
    - support STP, fast aging, disabling address learning
    - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch
 
  - Mellanox Ethernet/IB switch (mlxsw)
    - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
    - offload root TBF qdisc as port shaper
    - support multiple routing interface MAC address prefixes
    - support for IP-in-IP with IPv6 underlay
 
  - MediaTek WiFi (mt76)
    - mt7921 - ASPM, 6GHz, SDIO and testmode support
    - mt7915 - LED and TWT support
 
  - Qualcomm WiFi (ath11k)
    - include channel rx and tx time in survey dump statistics
    - support for 80P80 and 160 MHz bandwidths
    - support channel 2 in 6 GHz band
    - spectral scan support for QCN9074
    - support for rx decapsulation offload (data frames in 802.3
      format)
 
  - Qualcomm phone SoC WiFi (wcn36xx)
    - enable Idle Mode Power Save (IMPS) to reduce power consumption
      during idle
 
  - Bluetooth driver support for MediaTek MT7922 and MT7921
 
  - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x
    and Realtek 8822C/8852A
 
  - Microsoft vNIC driver (mana)
    - support hibernation and kexec
 
  - Google vNIC driver (gve)
    - support for jumbo frames
    - implement Rx page reuse
 
 Refactor:
 
  - Make all writes to netdev->dev_addr go thru helpers, so that we
    can add this address to the address rbtree and handle the updates
 
  - Various TCP cleanups and optimizations including improvements
    to CPU cache use
 
  - Simplify the gnet_stats, Qdisc stats' handling and remove
    qdisc->running sequence counter
 
  - Driver changes and API updates to address devlink locking
    deficiencies
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S
 IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf
 wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z
 wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS
 n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO
 lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD
 +qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld
 6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ
 QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde
 0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg
 QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr
 84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE=
 =srde
 -----END PGP SIGNATURE-----

Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Remove socket skb caches

   - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and
     avoid memory accounting overhead on each message sent

   - Introduce managed neighbor entries - added by control plane and
     resolved by the kernel for use in acceleration paths (BPF / XDP
     right now, HW offload users will benefit as well)

   - Make neighbor eviction on link down controllable by userspace to
     work around WiFi networks with bad roaming implementations

   - vrf: Rework interaction with netfilter/conntrack

   - fq_codel: implement L4S style ce_threshold_ect1 marking

   - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()

  BPF:

   - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
     as implemented in LLVM14

   - Introduce bpf_get_branch_snapshot() to capture Last Branch Records

   - Implement variadic trace_printk helper

   - Add a new Bloomfilter map type

   - Track <8-byte scalar spill and refill

   - Access hw timestamp through BPF's __sk_buff

   - Disallow unprivileged BPF by default

   - Document BPF licensing

  Netfilter:

   - Introduce egress hook for looking at raw outgoing packets

   - Allow matching on and modifying inner headers / payload data

   - Add NFT_META_IFTYPE to match on the interface type either from
     ingress or egress

  Protocols:

   - Multi-Path TCP:
      - increase default max additional subflows to 2
      - rework forward memory allocation
      - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS

   - MCTP flow support allowing lower layer drivers to configure msg
     muxing as needed

   - Automatic Multicast Tunneling (AMT) driver based on RFC7450

   - HSR support the redbox supervision frames (IEC-62439-3:2018)

   - Support for the ip6ip6 encapsulation of IOAM

   - Netlink interface for CAN-FD's Transmitter Delay Compensation

   - Support SMC-Rv2 eliminating the current same-subnet restriction, by
     exploiting the UDP encapsulation feature of RoCE adapters

   - TLS: add SM4 GCM/CCM crypto support

   - Bluetooth: initial support for link quality and audio/codec offload

  Driver APIs:

   - Add a batched interface for RX buffer allocation in AF_XDP buffer
     pool

   - ethtool: Add ability to control transceiver modules' power mode

   - phy: Introduce supported interfaces bitmap to express MAC
     capabilities and simplify PHY code

   - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks

  New drivers:

   - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)

   - Ethernet driver for ASIX AX88796C SPI device (x88796c)

  Drivers:

   - Broadcom PHYs
      - support 72165, 7712 16nm PHYs
      - support IDDQ-SR for additional power savings

   - PHY support for QCA8081, QCA9561 PHYs

   - NXP DPAA2: support for IRQ coalescing

   - NXP Ethernet (enetc): support for software TCP segmentation

   - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
     Gigabit-capable IP found on RZ/G2L SoC

   - Intel 100G Ethernet
      - support for eswitch offload of TC/OvS flow API, including
        offload of GRE, VxLAN, Geneve tunneling
      - support application device queues - ability to assign Rx and Tx
        queues to application threads
      - PTP and PPS (pulse-per-second) extensions

   - Broadcom Ethernet (bnxt)
      - devlink health reporting and device reload extensions

   - Mellanox Ethernet (mlx5)
      - offload macvlan interfaces
      - support HW offload of TC rules involving OVS internal ports
      - support HW-GRO and header/data split
      - support application device queues

   - Marvell OcteonTx2:
      - add XDP support for PF
      - add PTP support for VF

   - Qualcomm Ethernet switch (qca8k): support for QCA8328

   - Realtek Ethernet DSA switch (rtl8366rb)
      - support bridge offload
      - support STP, fast aging, disabling address learning
      - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch

   - Mellanox Ethernet/IB switch (mlxsw)
      - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
      - offload root TBF qdisc as port shaper
      - support multiple routing interface MAC address prefixes
      - support for IP-in-IP with IPv6 underlay

   - MediaTek WiFi (mt76)
      - mt7921 - ASPM, 6GHz, SDIO and testmode support
      - mt7915 - LED and TWT support

   - Qualcomm WiFi (ath11k)
      - include channel rx and tx time in survey dump statistics
      - support for 80P80 and 160 MHz bandwidths
      - support channel 2 in 6 GHz band
      - spectral scan support for QCN9074
      - support for rx decapsulation offload (data frames in 802.3
        format)

   - Qualcomm phone SoC WiFi (wcn36xx)
      - enable Idle Mode Power Save (IMPS) to reduce power consumption
        during idle

   - Bluetooth driver support for MediaTek MT7922 and MT7921

   - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and
     Realtek 8822C/8852A

   - Microsoft vNIC driver (mana)
      - support hibernation and kexec

   - Google vNIC driver (gve)
      - support for jumbo frames
      - implement Rx page reuse

  Refactor:

   - Make all writes to netdev->dev_addr go thru helpers, so that we can
     add this address to the address rbtree and handle the updates

   - Various TCP cleanups and optimizations including improvements to
     CPU cache use

   - Simplify the gnet_stats, Qdisc stats' handling and remove
     qdisc->running sequence counter

   - Driver changes and API updates to address devlink locking
     deficiencies"

* tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits)
  Revert "net: avoid double accounting for pure zerocopy skbs"
  selftests: net: add arp_ndisc_evict_nocarrier
  net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
  net: arp: introduce arp_evict_nocarrier sysctl parameter
  libbpf: Deprecate AF_XDP support
  kbuild: Unify options for BTF generation for vmlinux and modules
  selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
  bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
  bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
  net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c
  net: avoid double accounting for pure zerocopy skbs
  tcp: rename sk_wmem_free_skb
  netdevsim: fix uninit value in nsim_drv_configure_vfs()
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  bpf: Add missing map_delete_elem method to bloom filter map
  selftests/bpf: Add bloom map success test for userspace calls
  bpf: Add alignment padding for "map_extra" + consolidate holes
  bpf: Bloom filter map naming fixups
  selftests/bpf: Add test cases for struct_ops prog
  bpf: Add dummy BPF STRUCT_OPS for test purpose
  ...
2021-11-02 06:20:58 -07:00
Linus Torvalds
d2cdb12231 regmap: Update for v5.16
This update has a single change which will use the maximum transfer and
 message sizes advertised by SPI controllers to configure limits within
 the regmap core, ensuring better interoperation.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmF/6F8ACgkQJNaLcl1U
 h9Calgf+PaI6wr17nU0/m4N50Vjat0bL7iBuEQYPKFWu963M0QasPZEKteJV7sa+
 8XVxcBh43E2unQkY+iMfldmBIEfZJwmsaWecdxSair6cWcm8MT6agOOkl6KjiCHM
 AQiPI2+UbXDHnPociBmyA+MqI3hWSPZIMirVC+3WAFGLs8wHuRe0G97egkcPh88a
 doQ5f2Ei7eCgP4uy+p93S4QWXbsbQVm/yszw0BH0gaqnBylwUfQtU1vdlLyBXW4y
 9bB1MnzCdlLbKf4bQd7NcjB2bA8lgAuZ3t5rJoRjZtnf+//fRTB1cu/UkLRFlZme
 vsrt60wo52r2v5tiLKA4wrgcvEFR+A==
 =8MR4
 -----END PGP SIGNATURE-----

Merge tag 'regmap-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap update from Mark Brown:
 "A single change to use the maximum transfer and message sizes
  advertised by SPI controllers to configure limits within the
  regmap core, ensuring better interoperation"

* tag 'regmap-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: spi: Set regmap max raw r/w from max_transfer_size
2021-11-01 19:01:51 -07:00
Linus Torvalds
9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Jakub Kicinski
7df621a3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  7b50ecfcc6 ("net: Rename ->stream_memory_read to ->sock_is_readable")
  4c1e34c0db ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f6 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 10:43:58 -07:00
Linus Torvalds
8685de2ed8 regmap: Fix for v5.15
This fixes a potential double free when handling an out of memory error
 inserting a node into an rbtree regcache.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmF6lHQACgkQJNaLcl1U
 h9DNugf/WXDIhnn1ToxWpe80tGZFgTcyXN1T6QfCiP5iKbOsYHFYrITDUhnbCrV+
 4z5YngU8Jw2bz6nx72Yt3c4bwQRl6ZESgH6Rupr1nClQO06k8bLySLVWJPH1V3y4
 ofzdSc7x6JVHaVLj09NR6N3rrOKbmMxS8Ny0RMr7Futu0cPog0/7f57zG/Mkq9Fw
 vBsLvtpwt9cAOPRl7lxiU3NXKVdWYyIXxOCGW7z+e7u/HK7W5ByELrgXo1BFLXST
 dUks5d5qKHmExB/7plxv8mxOQlNHVimTxXpt2sL9Z9FoQR3OCs5kfG+eaIDyx5UW
 Y10gaxu61sRO6g9EqUPhfzOkiAYk5Q==
 =hMOi
 -----END PGP SIGNATURE-----

Merge tag 'regmap-fix-v5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "This fixes a potential double free when handling an out of memory
  error inserting a node into an rbtree regcache"

* tag 'regmap-fix-v5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: Fix possible double-free in regcache_rbtree_exit()
2021-10-28 10:00:58 -07:00
Chunfeng Yun
259714100d PM / wakeirq: support enabling wake-up irq after runtime_suspend called
When the dedicated wake IRQ is level trigger, and it uses the
device's low-power status as the wakeup source, that means if the
device is not in low-power state, the wake IRQ will be triggered
if enabled; For this case, need enable the wake IRQ after running
the device's ->runtime_suspend() which make it enter low-power state.

e.g.
Assume the wake IRQ is a low level trigger type, and the wakeup
signal comes from the low-power status of the device.
The wakeup signal is low level at running time (0), and becomes
high level when the device enters low-power state (runtime_suspend
(1) is called), a wakeup event at (2) make the device exit low-power
state, then the wakeup signal also becomes low level.

                ------------------
               |           ^     ^|
----------------           |     | --------------
 |<---(0)--->|<--(1)--|   (3)   (2)    (4)

if enable the wake IRQ before running runtime_suspend during (0),
a wake IRQ will arise, it causes resume immediately;
it works if enable wake IRQ ( e.g. at (3) or (4)) after running
->runtime_suspend().

This patch introduces a new status WAKE_IRQ_DEDICATED_REVERSE to
optionally support enabling wake IRQ after running ->runtime_suspend().

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-27 20:49:32 +02:00
Andy Shevchenko
27e0bcd029 device property: Drop redundant NULL checks
In cases when functions are called via fwnode operations,
we already know that this is software node we are dealing
with, hence no need to check if it's NULL, it can't be,

Reported-by: YE Chengfeng <cyeaa@connect.ust.hk>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20211026162954.89811-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-26 19:14:01 +02:00
Rafael J. Wysocki
23f62d7ab2 PM: sleep: Pause cpuidle later and resume it earlier during system transitions
Commit 8651f97bd9 ("PM / cpuidle: System resume hang fix with
cpuidle") that introduced cpuidle pausing during system suspend
did that to work around a platform firmware issue causing systems
to hang during resume if CPUs were allowed to enter idle states
in the system suspend and resume code paths.

However, pausing cpuidle before the last phase of suspending
devices is the source of an otherwise arbitrary difference between
the suspend-to-idle path and other system suspend variants, so it is
cleaner to do that later, before taking secondary CPUs offline (it
is still safer to take secondary CPUs offline with cpuidle paused,
though).

Modify the code accordingly, but in order to avoid code duplication,
introduce new wrapper functions, pm_sleep_disable_secondary_cpus()
and pm_sleep_enable_secondary_cpus(), to combine cpuidle_pause()
and cpuidle_resume(), respectively, with the handling of secondary
CPUs during system-wide transitions to sleep states.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26 15:52:07 +02:00
Rafael J. Wysocki
8d89835b04 PM: suspend: Do not pause cpuidle in the suspend-to-idle path
It is pointless to pause cpuidle in the suspend-to-idle path,
because it is going to be resumed in the same path later and
pausing it does not serve any particular purpose in that case.

Rework the code to avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26 15:52:07 +02:00
Sean Anderson
65aa371ea5 net: Convert more users of mdiobus_* to mdiodev_*
This converts users of mdiobus to mdiodev using the following semantic
patch:

@@
identifier mdiodev;
expression regnum;
@@

- mdiobus_read(mdiodev->bus, mdiodev->addr, regnum)
+ mdiodev_read(mdiodev, regnum)

@@
identifier mdiodev;
expression regnum, val;
@@

- mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val)
+ mdiodev_write(mdiodev, regnum, val)

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24 13:40:33 +01:00
Lucas Tanure
f231ff38b7
regmap: spi: Set regmap max raw r/w from max_transfer_size
Set regmap raw read/write from spi max_transfer_size
so regmap_raw_read/write can split the access into chunks

Signed-off-by: Lucas Tanure <tanureal@opensource.cirrus.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
[André: fix build warning]
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Link: https://lore.kernel.org/r/20211021132721.13669-1-andrealmeid@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-23 19:55:03 +01:00
Rafael J. Wysocki
928265e360 PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.

Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.

Fixes: 057d51a126 ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-22 18:09:58 +02:00
Luis Chamberlain
e2e2c0f20f firmware_loader: move struct builtin_fw to the only place used
Now that x86 doesn't abuse picking at internals to the firmware
loader move out the built-in firmware struct to its only user.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211021155843.1969401-5-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-22 14:13:53 +02:00
Luis Chamberlain
48d09e9787 firmware_loader: formalize built-in firmware API
Formalize the built-in firmware with a proper API. This can later
be used by other callers where all they need is built-in firmware.

We export the firmware_request_builtin() call for now only
under the TEST_FIRMWARE symbol namespace as there are no
direct modular users for it. If they pop up they are free
to export it generally. Built-in code always gets access to
the callers and we'll demonstrate a hidden user which has been
lurking in the kernel for a while and the reason why using a
proper API was better long term.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211021155843.1969401-2-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-22 14:13:44 +02:00
David S. Miller
bdfa75ad70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.

With a build fix from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22 11:41:16 +01:00
Kai Vehmanen
c87761db21 component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.

This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:

 i915_pci_remove()
   -> intel_display_driver_unregister()
     -> i915_audio_component_cleanup()
       -> component_del()
         -> component.c:take_down_master()
           -> hdac_component_master_unbind() [via master->ops->unbind()]
           -> devres_release_group(master->parent, NULL)

With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:

component_master_add_with_match()
[  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)

audio driver completes its PCI probe()
[  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)

component_del() called() at DRM/i915 unbind()
[  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)

So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.

Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a77 ("drivers/base: fix devres handling for master device")

Fixes: 9e1ccb4a77 ("drivers/base: fix devres handling for master device")
Cc: stable@vger.kernel.org
Acked-by: Imre Deak <imre.deak@intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-21 13:01:56 +02:00
Andy Shevchenko
a164ff53cb driver core: Provide device_match_acpi_handle() helper
We have a couple of users of this helper, make it available for them.

The prototype for the helper is specifically crafted in order to be
easily used with bus_find_device() call. That's why its location is
in the driver core rather than ACPI.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20211014134756.39092-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20 19:38:29 +02:00
Greg Kroah-Hartman
b5bc8ac25a Merge 5.15-rc6 into driver-core-next
We need the driver-core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-18 09:43:37 +02:00
Linus Torvalds
cf52ad5ff1 Driver core fixes for 5.15-rc6
Here are some small driver core fixes for 5.15-rc6, all of which have
 been in linux-next for a while with no reported issues.
 
 They include:
 	- kernfs negative dentry bugfix
 	- simple pm bus fixes to resolve reported issues
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYWvzFg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymfLQCfSCP698AAvoCgG0fOfLakFkw80h0AoKIVm3lk
 t0GUdnplU18CjnO5M1Zj
 =+dh9
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core fixes for 5.15-rc6, all of which have
  been in linux-next for a while with no reported issues.

  They include:

   - kernfs negative dentry bugfix

   - simple pm bus fixes to resolve reported issues"

* tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers: bus: Delete CONFIG_SIMPLE_PM_BUS
  drivers: bus: simple-pm-bus: Add support for probing simple bus only devices
  driver core: Reject pointless SYNC_STATE_ONLY device links
  kernfs: don't create a negative dentry if inactive node exists
2021-10-17 17:17:28 -10:00
Jonathan Cameron
c5e22feffd topology: Represent clusters of CPUs within a die
Both ACPI and DT provide the ability to describe additional layers of
topology between that of individual cores and higher level constructs
such as the level at which the last level cache is shared.
In ACPI this can be represented in PPTT as a Processor Hierarchy
Node Structure [1] that is the parent of the CPU cores and in turn
has a parent Processor Hierarchy Nodes Structure representing
a higher level of topology.

For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data, but each cluster
has local L3 tag. On the other hand, each clusters will share some
internal system bus.

+-----------------------------------+                          +---------+
|  +------+    +------+             +--------------------------+         |
|  | CPU0 |    | cpu1 |             |    +-----------+         |         |
|  +------+    +------+             |    |           |         |         |
|                                   +----+    L3     |         |         |
|  +------+    +------+   cluster   |    |    tag    |         |         |
|  | CPU2 |    | CPU3 |             |    |           |         |         |
|  +------+    +------+             |    +-----------+         |         |
|                                   |                          |         |
+-----------------------------------+                          |         |
+-----------------------------------+                          |         |
|  +------+    +------+             +--------------------------+         |
|  |      |    |      |             |    +-----------+         |         |
|  +------+    +------+             |    |           |         |         |
|                                   |    |    L3     |         |         |
|  +------+    +------+             +----+    tag    |         |         |
|  |      |    |      |             |    |           |         |         |
|  +------+    +------+             |    +-----------+         |         |
|                                   |                          |         |
+-----------------------------------+                          |   L3    |
                                                               |   data  |
+-----------------------------------+                          |         |
|  +------+    +------+             |    +-----------+         |         |
|  |      |    |      |             |    |           |         |         |
|  +------+    +------+             +----+    L3     |         |         |
|                                   |    |    tag    |         |         |
|  +------+    +------+             |    |           |         |         |
|  |      |    |      |             |    +-----------+         |         |
|  +------+    +------+             +--------------------------+         |
+-----------------------------------|                          |         |
+-----------------------------------|                          |         |
|  +------+    +------+             +--------------------------+         |
|  |      |    |      |             |    +-----------+         |         |
|  +------+    +------+             |    |           |         |         |
|                                   +----+    L3     |         |         |
|  +------+    +------+             |    |    tag    |         |         |
|  |      |    |      |             |    |           |         |         |
|  +------+    +------+             |    +-----------+         |         |
|                                   |                          |         |
+-----------------------------------+                          |         |
+-----------------------------------+                          |         |
|  +------+    +------+             +--------------------------+         |
|  |      |    |      |             |   +-----------+          |         |
|  +------+    +------+             |   |           |          |         |
|                                   |   |    L3     |          |         |
|  +------+    +------+             +---+    tag    |          |         |
|  |      |    |      |             |   |           |          |         |
|  +------+    +------+             |   +-----------+          |         |
|                                   |                          |         |
+-----------------------------------+                          |         |
+-----------------------------------+                          |         |
|  +------+    +------+             +--------------------------+         |
|  |      |    |      |             |  +-----------+           |         |
|  +------+    +------+             |  |           |           |         |
|                                   |  |    L3     |           |         |
|  +------+    +------+             +--+    tag    |           |         |
|  |      |    |      |             |  |           |           |         |
|  +------+    +------+             |  +-----------+           |         |
|                                   |                          +---------+
+-----------------------------------+

That means spreading tasks among clusters will bring more bandwidth
while packing tasks within one cluster will lead to smaller cache
synchronization latency. So both kernel and userspace will have
a chance to leverage this topology to deploy tasks accordingly to
achieve either smaller cache latency within one cluster or an even
distribution of load among clusters for higher throughput.

This patch exposes cluster topology to both kernel and userspace.
Libraried like hwloc will know cluster by cluster_cpus and related
sysfs attributes. PoC of HWLOC support at [2].

Note this patch only handle the ACPI case.

Special consideration is needed for SMT processors, where it is
necessary to move 2 levels up the hierarchy from the leaf nodes
(thus skipping the processor core level).

Note that arm64 / ACPI does not provide any means of identifying
a die level in the topology but that may be unrelate to the cluster
level.

[1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node
    structure (Type 0)
[2] https://github.com/hisilicon/hwloc/tree/linux-cluster

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com
2021-10-15 11:25:15 +02:00
Jakub Kicinski
e15f5972b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/ioam6.sh
  7b1700e009 ("selftests: net: modify IOAM tests for undef bits")
  bf77b1400a ("selftests: net: Test for the IOAM encapsulation with IPv6")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14 16:50:14 -07:00
Yang Yingliang
55e6d80378
regmap: Fix possible double-free in regcache_rbtree_exit()
In regcache_rbtree_insert_to_block(), when 'present' realloc failed,
the 'blk' which is supposed to assign to 'rbnode->block' will be freed,
so 'rbnode->block' points a freed memory, in the error handling path of
regcache_rbtree_init(), 'rbnode->block' will be freed again in
regcache_rbtree_exit(), KASAN will report double-free as follows:

BUG: KASAN: double-free or invalid-free in kfree+0xce/0x390
Call Trace:
 slab_free_freelist_hook+0x10d/0x240
 kfree+0xce/0x390
 regcache_rbtree_exit+0x15d/0x1a0
 regcache_rbtree_init+0x224/0x2c0
 regcache_init+0x88d/0x1310
 __regmap_init+0x3151/0x4a80
 __devm_regmap_init+0x7d/0x100
 madera_spi_probe+0x10f/0x333 [madera_spi]
 spi_probe+0x183/0x210
 really_probe+0x285/0xc30

To fix this, moving up the assignment of rbnode->block to immediately after
the reallocation has succeeded so that the data structure stays valid even
if the second reallocation fails.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 3f4ff561bc ("regmap: rbtree: Make cache_present bitmap per node")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211012023735.1632786-1-yangyingliang@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-12 11:48:43 +01:00
Linus Torvalds
fa58787605 linux-kselftest-kunit-fixes-5.15-rc6
This KUnit fixes update for Linux 5.15-rc6 consists of:
 
 - Fixes to address the structleak plugin causing the stack frame size
   to grow immensely when used with KUnit. Fixes include adding a new
   makefile to disable structleak and using it from KUnit iio, device
   property, thunderbolt, and bitfield tests to disable it.
 
 - KUnit framework reference count leak in kfree_at_end
 
 - KUnit tool fix to resolve conflict between --json and --raw_output
   and generate correct test output in either case.
 
 - kernel-doc warnings due to mismatched arg names
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmFkqr8ACgkQCwJExA0N
 QxyziBAAn+02Iq8Kd1p9/+ajr78iQKmjIpSHH51R6wPwOBJg0RgusyuheJ/pccl1
 OOt7sAIdoBWlbZjkwi9SOoFabdvtAO6i7SWpw1R0AdWT5iG4ewe17ODIJiGQ6l9E
 4fNayU+XFunXSqxxEsiSJYO5ztoyDDNOqWJ1a43r+9j6ApsHPH3aKwtx4hFutaJA
 TAUvSOJOMv93SWTHdQh1ihHMVBkhvdnnpSvsRzJFzkEmI5rmL+p7HcbuM0Zk/gyD
 9obygEx8QsJ4pg1AJs8qAj1YbV599qbVhBYYr3EewWscyWME4RrIiXPGImcYKN3/
 99S5pOVTI+wp12r5c8WOdXta+Qe+1RvT9wvTPS+msGQakQhIq0eNePtO1AwIoIor
 S+36JbJdW1rOepg1WssL55F6+re//GYdluSUGwmwYCL3eUEIMpfRqeamg7BKQH3j
 gfukFJvLzzdLiyeYRg2f/G3Vxwa18shlIJgz+6ADCmemura3waeelTG0zFWiNsd5
 XYXXhrxV8eq+Cj5ee/iBddKIgn2+QjFtDmZtsLtiZts3aV7zlugYm2bAYCN1Yxsu
 ZDvymnZUozoOCRFvBe60Eo4DSTDTnmAaRRnuX5L/x1ybpAUvwZQxzyNDIuvfZKQ9
 JB8LrxAJDz7G4jj/9OY5IeacoXNKqDVZYK+Kb9L7hAS08ITk6vU=
 =08Ra
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kunit fixes from Shuah Khan:

 - Fixes to address the structleak plugin causing the stack frame size
   to grow immensely when used with KUnit. Fixes include adding a new
   makefile to disable structleak and using it from KUnit iio, device
   property, thunderbolt, and bitfield tests to disable it.

 - KUnit framework reference count leak in kfree_at_end

 - KUnit tool fix to resolve conflict between --json and --raw_output
   and generate correct test output in either case.

 - kernel-doc warnings due to mismatched arg names

* tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: fix kernel-doc warnings due to mismatched arg names
  bitfield: build kunit tests without structleak plugin
  thunderbolt: build kunit tests without structleak plugin
  device property: build kunit tests without structleak plugin
  iio/test-format: build kunit tests without structleak plugin
  gcc-plugins/structleak: add makefile var for disabling structleak
  kunit: fix reference count leak in kfree_at_end
  kunit: tool: better handling of quasi-bool args (--json, --raw_output)
2021-10-11 17:25:08 -07:00
Jakub Kicinski
9fe1155233 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07 15:24:06 -07:00
Jakub Kicinski
433baf0719 device property: move mac addr helpers to eth.c
Move the mac address helpers out, eth.c already contains
a bunch of similar helpers.

Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07 13:39:51 +01:00
Brendan Higgins
6a1e2d93d5 device property: build kunit tests without structleak plugin
The structleak plugin causes the stack frame size to grow immensely when
used with KUnit:

../drivers/base/test/property-entry-test.c:492:1: warning: the frame size of 2832 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/base/test/property-entry-test.c:322:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/base/test/property-entry-test.c:250:1: warning: the frame size of 4976 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/base/test/property-entry-test.c:115:1: warning: the frame size of 3280 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Turn it off in this file.

Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-06 17:53:42 -06:00
Saravana Kannan
f729a592ad driver core: Reject pointless SYNC_STATE_ONLY device links
SYNC_STATE_ONLY device links intentionally allow cycles because cyclic
sync_state() dependencies are valid and necessary.

However a SYNC_STATE_ONLY device link where the consumer and the supplier
are the same device is pointless because the device link would be deleted
as soon as the device probes (because it's also the consumer) and won't
affect when the sync_state() callback is called. It's a waste of CPU cycles
and memory to create this device link. So reject any attempts to create
such a device link.

Fixes: 05ef983e0d ("driver core: Add device link support for SYNC_STATE_ONLY flag")
Cc: stable <stable@vger.kernel.org>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210929190549.860541-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 17:45:54 +02:00
Luis Chamberlain
0f8d7ccc2e firmware_loader: add a sanity check for firmware_request_builtin()
Right now firmware_request_builtin() is used internally only
and so we have control over the callers. But if we want to expose
that API more broadly we should ensure the firmware pointer
is valid.

This doesn't fix any known issue, it just prepares us to later
expose this API to other users.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20210917182226.3532898-4-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 16:26:49 +02:00
Luis Chamberlain
7c4fd90741 firmware_loader: split built-in firmware call
There are two ways the firmware_loader can use the built-in
firmware: with or without the pre-allocated buffer. We already
have one explicit use case for each of these, and so split them
up so that it is clear what the intention is on the caller side.

This also paves the way so that eventually other callers outside
of the firmware loader can uses these if and when needed.

While at it, adopt the firmware prefix for the routine names.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20210917182226.3532898-3-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 16:26:37 +02:00
Luis Chamberlain
f7a07f7b96 firmware_loader: fix pre-allocated buf built-in firmware use
The firmware_loader can be used with a pre-allocated buffer
through the use of the API calls:

  o request_firmware_into_buf()
  o request_partial_firmware_into_buf()

If the firmware was built-in and present, our current check
for if the built-in firmware fits into the pre-allocated buffer
does not return any errors, and we proceed to tell the caller
that everything worked fine. It's a lie and no firmware would
end up being copied into the pre-allocated buffer. So if the
caller trust the result it may end up writing a bunch of 0's
to a device!

Fix this by making the function that checks for the pre-allocated
buffer return non-void. Since the typical use case is when no
pre-allocated buffer is provided make this return successfully
for that case. If the built-in firmware does *not* fit into the
pre-allocated buffer size return a failure as we should have
been doing before.

I'm not aware of users of the built-in firmware using the API
calls with a pre-allocated buffer, as such I doubt this fixes
any real life issue. But you never know... perhaps some oddball
private tree might use it.

In so far as upstream is concerned this just fixes our code for
correctness.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20210917182226.3532898-2-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 16:26:29 +02:00
Mianhan Liu
30b7ecf731 drivers/base/component.c: remove superfluous header files from component.c
component.c hasn't use any macro or function declared in linux/kref.h.
Thus, these files can be removed from component.c safely without
affecting the compilation of the drivers/base/ module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Link: https://lore.kernel.org/r/20210928193849.28717-1-liumh1@shanghaitech.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 16:21:10 +02:00
Mianhan Liu
b39214911a drivers/base/arch_topology.c: remove superfluous header
arch_topology.c hasn't use any macro or function declared in linux/percpu.h,
linux/smp.h and linux/string.h.
Thus, these files can be removed from arch_topology.c safely without
affecting the compilation of the drivers/base/ module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Link: https://lore.kernel.org/r/20210928193138.24192-1-liumh1@shanghaitech.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 16:21:01 +02:00
Max Gurtovoy
d460d7f7bb driver core: use NUMA_NO_NODE during device_initialize
Don't use (-1) constant for setting initial device node. Instead, use
the generic NUMA_NO_NODE definition to indicate that "no node id
specified".

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20211004133453.18881-1-mgurtovoy@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 15:42:22 +02:00
Yang Yingliang
df0a181494 driver core: Fix possible memory leak in device_link_add()
I got memory leak as follows:

unreferenced object 0xffff88801f0b2200 (size 64):
  comm "i2c-lis2hh12-21", pid 5455, jiffies 4294944606 (age 15.224s)
  hex dump (first 32 bytes):
    72 65 67 75 6c 61 74 6f 72 3a 72 65 67 75 6c 61  regulator:regula
    74 6f 72 2e 30 2d 2d 69 32 63 3a 31 2d 30 30 31  tor.0--i2c:1-001
  backtrace:
    [<00000000bf5b0c3b>] __kmalloc_track_caller+0x19f/0x3a0
    [<0000000050da42d9>] kvasprintf+0xb5/0x150
    [<000000004bbbed13>] kvasprintf_const+0x60/0x190
    [<00000000cdac7480>] kobject_set_name_vargs+0x56/0x150
    [<00000000bf83f8e8>] dev_set_name+0xc0/0x100
    [<00000000cc1cf7e3>] device_link_add+0x6b4/0x17c0
    [<000000009db9faed>] _regulator_get+0x297/0x680
    [<00000000845e7f2b>] _devm_regulator_get+0x5b/0xe0
    [<000000003958ee25>] st_sensors_power_enable+0x71/0x1b0 [st_sensors]
    [<000000005f450f52>] st_accel_i2c_probe+0xd9/0x150 [st_accel_i2c]
    [<00000000b5f2ab33>] i2c_device_probe+0x4d8/0xbe0
    [<0000000070fb977b>] really_probe+0x299/0xc30
    [<0000000088e226ce>] __driver_probe_device+0x357/0x500
    [<00000000c21dda32>] driver_probe_device+0x4e/0x140
    [<000000004e650441>] __device_attach_driver+0x257/0x340
    [<00000000cf1891b8>] bus_for_each_drv+0x166/0x1e0

When device_register() returns an error, the name allocated in dev_set_name()
will be leaked, the put_device() should be used instead of kfree() to give up
the device reference, then the name will be freed in kobject_cleanup() and the
references of consumer and supplier will be decreased in device_link_release_fn().

Fixes: 287905e68d ("driver core: Expose device link details in sysfs")
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210930085714.2057460-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05 15:39:29 +02:00
Greg Kroah-Hartman
bb76c82358 Merge 5.15-rc4 into driver-core-next
We need the driver core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-04 09:20:57 +02:00
Linus Torvalds
84928ce3bb Driver core fixes for 5.15-rc4
Here are some driver core and kernfs fixes for reported issues for
 5.15-rc4.  These fixes include:
 	- kernfs positive dentry bugfix
 	- debugfs_create_file_size error path fix
 	- cpumask sysfs file bugfix to preserve the user/kernel abi (has
 	  been reported multiple times.)
 	- devlink fixes for mdiobus devices as reported by the subsystem
 	  maintainers.
 
 Also included in here are some devlink debugging changes to make it
 easier for people to report problems when asked.  They have already
 helped with the mdiobus and other subsystems reporting issues.
 
 All of these have been linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYVmC1A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykfCQCgtQ0kdPoPdUG6mPCn45rrbJQLBY4AnRL5JyhO
 zB60l6C2EHHnLnRxMnQq
 =gygB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core and kernfs fixes for reported issues for
  5.15-rc4. These fixes include:

   - kernfs positive dentry bugfix

   - debugfs_create_file_size error path fix

   - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
     reported multiple times.)

   - devlink fixes for mdiobus devices as reported by the subsystem
     maintainers.

  Also included in here are some devlink debugging changes to make it
  easier for people to report problems when asked. They have already
  helped with the mdiobus and other subsystems reporting issues.

  All of these have been linux-next for a while with no reported issues"

* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: also call kernfs_set_rev() for positive dentry
  driver core: Add debug logs when fwnode links are added/deleted
  driver core: Create __fwnode_link_del() helper function
  driver core: Set deferred probe reason when deferred by driver core
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  driver core: fw_devlink: Improve handling of cyclic dependencies
  cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error
2021-10-03 11:10:09 -07:00
Saravana Kannan
ebd6823af3 driver core: Add debug logs when fwnode links are added/deleted
This will help with debugging fw_devlink issues.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915172808.620546-4-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28 09:48:48 +02:00
Saravana Kannan
76f130810b driver core: Create __fwnode_link_del() helper function
The same code is repeated in multiple locations. Create a helper
function for it.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915172808.620546-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28 09:48:48 +02:00
Saravana Kannan
68223eeec7 driver core: Set deferred probe reason when deferred by driver core
When the driver core defers the probe of a device, set the deferred
probe reason so that it's easier to debug. The deferred probe reason is
available in debugfs under devices_deferred.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915172808.620546-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28 09:48:48 +02:00
Linus Torvalds
47d7e65d64 Device properties framework fix for 5.15-rc3
Fix software node refcount imbalance on device removal (Laurentiu Tudor).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmFOBoUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxcgQP/RJGlD5rm2W3LSTi0MpdHm88PWTH/6JL
 XYqGDJ42DOS6+pm7KiCoI+XZ7plXtT9FL3kPwgj4+RSHfZQV7r5DVOFwXUa5lpkG
 IDeU8H/DXil/9VZSioxCN7HSE1mUsMj1eZM4odf9eZBiccLxdn050k7by4ZVt0Dx
 3OnBJ8Tv+4jp27XuuEiOFku2m2QA0myK0Fg7tHQ8D6UoBgfxawVMkFsDrNFOBEsx
 xKmGXml7VFsEbX6+FeirJCRLTGln2EOLbdHcwMbCzaOzgEp9MvuLm96PlLANyohk
 rZoAoasXmjMMgHk2tOC5nkEtxINVKWqM18HQqyzBLtx9fn/YP94HXXC00SzsZAwJ
 NQ8vpwtbHzHI8QydS8lJ9Sp8PLQGcHRrbut5Is1oXF3f8uNViyE1IHmuPh/AW0Mg
 UjQ/cqZ1yWDoYYaJ2Ar2G6uwFyfMmgPhoNOlUmW1niZqEm3iMA1VRd2EbSXiAbpu
 ExLzNh7dljDoVgP6Vzhqlj6oyV4JYZCHIuMjBixmheVb282pqanQaxsk877rTQiR
 KAk9d/V7GluXvNnkVQA1j8k8ONqEx4M2dxgn6tIFMGhT7eQeq/tjlljl7x3C1OW5
 iTQPmXzEr1mLPJ8HwdNLDFDbNZ1HEsLUS+tb6nz+1uwvyvhnJqEX2ewbx5nRUdrY
 XBMVa46lxeh8
 =PFOV
 -----END PGP SIGNATURE-----

Merge tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull device properties framework fix from Rafael Wysocki:
 "Fix software node refcount imbalance on device removal (Laurentiu
  Tudor)"

* tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  software node: balance refcount for managed software nodes
2021-09-24 11:20:29 -07:00
Saravana Kannan
5501765a02 driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.

However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.

Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.

Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes: ea718c6990 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915170940.617415-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-23 19:26:54 +02:00
Douglas Anderson
7065f92255 driver core: Clarify that dev_err_probe() is OK even w/out -EPROBE_DEFER
There is some debate about whether it's deemed acceptable to call
dev_err_probe() if you know that the error code can never be
-EPROBE_DEFER. Clarify in the function comments that this is
OK. Specifically this makes us able to transform code like this:

  ret = do_something_that_cant_defer();
  if (ret < 0) {
    dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret));
    return ret;
  }

to code like this:
  ret = do_something_that_cant_defer();
  if (ret < 0)
    return dev_err_probe(dev, ret, "The foo failed to bar\n");

It is also possible that in the future folks might want a CONFIG
option to strip out all probe error strings to save space (keeping
non-probe errors) with the argument that probe errors rarely happen
after bringup. Having probe errors reported with a consistent function
would allow that.

Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20210916161931.1.I32bea713bd6c6fb419a24da73686145742b6c117@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-21 18:29:35 +02:00
Saravana Kannan
2de9d8e0d2 driver core: fw_devlink: Improve handling of cyclic dependencies
When we have a dependency of the form:

Device-A -> Device-C
	Device-B

Device-C -> Device-B

Where,
* Indentation denotes "child of" parent in previous line.
* X -> Y denotes X is consumer of Y based on firmware (Eg: DT).

We have cyclic dependency: device-A -> device-C -> device-B -> device-A

fw_devlink current treats device-C -> device-B dependency as an invalid
dependency and doesn't enforce it but leaves the rest of the
dependencies as is.

While the current behavior is necessary, it is not sufficient if the
false dependency in this example is actually device-A -> device-C. When
this is the case, device-C will correctly probe defer waiting for
device-B to be added, but device-A will be incorrectly probe deferred by
fw_devlink waiting on device-C to probe successfully. Due to this, none
of the devices in the cycle will end up probing.

To fix this, we need to go relax all the dependencies in the cycle like
we already do in the other instances where fw_devlink detects cycles.
A real world example of this was reported[1] and analyzed[2].

[1] - https://lore.kernel.org/lkml/0a2c4106-7f48-2bb5-048e-8c001a7c3fda@samsung.com/
[2] - https://lore.kernel.org/lkml/CAGETcx8peaew90SWiux=TyvuGgvTQOmO4BFALz7aj0Za5QdNFQ@mail.gmail.com/

Fixes: f9aa460672 ("driver core: Refactor fw_devlink feature")
Cc: stable <stable@vger.kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915170940.617415-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-21 18:21:48 +02:00
Linus Torvalds
c6460daea2 xen: branch for v5.15-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYUR8DAAKCRCAXGG7T9hj
 vjclAP0ekisZSabOsmzM0iz8rpuYj113/kQozokFeFD5eQlEiAD/b5LytLosic/1
 5XKa5fbOyimAyRBwblWhpLLhrW6uiAg=
 =WoAB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - The first hunk of a Xen swiotlb fixup series fixing multiple minor
   issues and doing some small cleanups

 - Some further Xen related fixes avoiding WARN() splats when running as
   Xen guests or dom0

 - A Kconfig fix allowing the pvcalls frontend to be built as a module

* tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  swiotlb-xen: drop DEFAULT_NSLABS
  swiotlb-xen: arrange to have buffer info logged
  swiotlb-xen: drop leftover __ref
  swiotlb-xen: limit init retries
  swiotlb-xen: suppress certain init retries
  swiotlb-xen: maintain slab count properly
  swiotlb-xen: fix late init retry
  swiotlb-xen: avoid double free
  xen/pvcalls: backend can be a module
  xen: fix usage of pmd_populate in mremap for pv guests
  xen: reset legacy rtc flag for PV domU
  PM: base: power: don't try to use non-existing RTC for storing data
  xen/balloon: use a kernel thread instead a workqueue
2021-09-17 08:31:49 -07:00
Laurentiu Tudor
5aeb05b27f software node: balance refcount for managed software nodes
software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
software nodes, thus leading to underflow errors. Balance the refcount by
bumping it in the device_create_managed_software_node() function.

The error [1] was encountered after adding a .shutdown() op to our
fsl-mc-bus driver.

[1]
pc : refcount_warn_saturate+0xf8/0x150
lr : refcount_warn_saturate+0xf8/0x150
sp : ffff80001009b920
x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
Call trace:
 refcount_warn_saturate+0xf8/0x150
 kobject_put+0x10c/0x120
 software_node_notify+0xd8/0x140
 device_platform_notify+0x4c/0xb4
 device_del+0x188/0x424
 fsl_mc_device_remove+0x2c/0x4c
 rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
 device_for_each_child+0x5c/0xac
 dprc_remove+0x9c/0xc0
 fsl_mc_driver_remove+0x28/0x64
 __device_release_driver+0x188/0x22c
 device_release_driver+0x30/0x50
 bus_remove_device+0x128/0x134
 device_del+0x16c/0x424
 fsl_mc_bus_remove+0x8c/0x114
 fsl_mc_bus_shutdown+0x14/0x20
 platform_shutdown+0x28/0x40
 device_shutdown+0x15c/0x330
 __do_sys_reboot+0x218/0x2a0
 __arm64_sys_reboot+0x28/0x34
 invoke_syscall+0x48/0x114
 el0_svc_common+0x40/0xdc
 do_el0_svc+0x2c/0x94
 el0_svc+0x2c/0x54
 el0t_64_sync_handler+0xa8/0x12c
 el0t_64_sync+0x198/0x19c
---[ end trace 32eb1c71c7d86821 ]---

Fixes: 151f6ff78c ("software node: Provide replacement for device_add_properties()")
Reported-by: Jon Nettleton <jon@solid-run.com>
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: 5.12+ <stable@vger.kernel.org> # 5.12+
[ rjw: Fix up the software_node_notify() invocation ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-16 13:13:48 +02:00
Linus Torvalds
77e02cf57b memblock: introduce saner 'memblock_free_ptr()' interface
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.

Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:

   https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/

or just random memory corruption if the debug checks don't catch it:

   https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/

I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.

I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.

So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer.  And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 13:23:22 -07:00
Cai Huoqing
86854b4379 driver core: platform: Make use of the helper macro SET_RUNTIME_PM_OPS()
Use the helper macro SET_RUNTIME_PM_OPS() instead of the verbose
operators ".runtime_suspend/.runtime_resume", because the
SET_RUNTIME_PM_OPS() is a nice helper macro that could be brought
in to make code a little clearer, a little more concise.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210828090219.1177-1-caihuoqing@baidu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-14 16:52:41 +02:00
Juergen Gross
0560204b36 PM: base: power: don't try to use non-existing RTC for storing data
If there is no legacy RTC device, don't try to use it for storing trace
data across suspend/resume.

Cc: <stable@vger.kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210903084937.19392-2-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-14 09:56:20 +02:00
Rafael J. Wysocki
be2d24336f Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
* pm-cpufreq:
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()

* pm-sleep:
  PM: sleep: core: Avoid setting power.must_resume to false
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

* pm-em:
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments
2021-09-10 20:26:08 +02:00
Linus Torvalds
30f3490978 More power management updates for 5.15-rc1
- Add new cpufreq driver for the MediaTek MT6779 platform called
    mediatek-hw along with corresponding DT bindings (Hector.Yuan).
 
  - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
    Gopinath).
 
  - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
    policy flag (Taniya Das).
 
  - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
    Andersson).
 
  - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
    flag (Viresh Kumar).
 
  - Add new cpufreq driver callback to allow drivers to register
    with the Energy Model in a consistent way and make several
    drivers use it (Viresh Kumar).
 
  - Change the remaining users of the .ready() cpufreq driver callback
    to move the code from it elsewhere and drop it from the cpufreq
    core (Viresh Kumar).
 
  - Revert recent intel_pstate change adding HWP guaranteed performance
    change notification support to it that led to problems, because
    the notification in question is triggered prematurely on some
    systems (Rafael Wysocki).
 
  - Convert the OPP DT bindings to DT schema and clean them up while
    at it (Rob Herring).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41VgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxCN4P+gMjMrrZmuU6gZsbbvpDlaBhCd2Xq3TD
 xR/DMDi7znkh3TUX3uwL+xnr+k0krIH0jBIeUQUE7NeNIoT6wgbjJ4Ty5rFq76qB
 AODmmZ4vO7lmnupSyqUQbHfYohDmyICSKiStf8UOEj1o+jSWNmrgUYUv0tDtDUH+
 Cn0vByah8gJAnoZX8Y8BM1jmRc3YoNHWpvtTQIhIBPkVZ//+NOKvDZvwUUPZFb+M
 1PzMSfX7WsIDiUrUHpdvtZsoBniaMk0WS1EqVBRvEprqUXad1eHF19yuhtLxeUPH
 8xh/7o8kYzjqVJvs7blTT8DztxRDScWHeGKSVdwoEJupbCwc5R3qfGaD6PWyhI1x
 9R5Swsp64nLptTCwH7ZmgdJbC9IqN3cz1Nadd5v2Q2wr21KvZnj7zI2ijkPKGnZo
 kqYQHghqnDkGPFVjdls/RKUXGCQIXZFQb+FeuyCvpVlz9Ol8+DnTYtgFjjc6VcU1
 kApIqsE8V8GQEyzmm/OIAf6xtsA+mEUUh3Qds16KNCwBCRglC8I6v5IWnBc8PEJz
 a+wtwjx+tyKSSlAMEvcDNtrWVN+3JNrwCFG+Q+QjMrwLiAgACrtJZ6e6PmLj9sZv
 FPbZM8rzbzN7Zqd7XVNY37KHjqNs7zzDScAnATTUCKThp8ijDgtmG3ZhExmOPayc
 74aQLham4TBO
 =WSpr
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are mostly ARM cpufreq driver updates, including one new
  MediaTek driver that has just passed all of the reviews, with the
  addition of a revert of a recent intel_pstate commit, some core
  cpufreq changes and a DT-related update of the operating performance
  points (OPP) support code.

  Specifics:

   - Add new cpufreq driver for the MediaTek MT6779 platform called
     mediatek-hw along with corresponding DT bindings (Hector.Yuan).

   - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
     Gopinath).

   - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
     policy flag (Taniya Das).

   - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
     Andersson).

   - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
     flag (Viresh Kumar).

   - Add new cpufreq driver callback to allow drivers to register with
     the Energy Model in a consistent way and make several drivers use
     it (Viresh Kumar).

   - Change the remaining users of the .ready() cpufreq driver callback
     to move the code from it elsewhere and drop it from the cpufreq
     core (Viresh Kumar).

   - Revert recent intel_pstate change adding HWP guaranteed performance
     change notification support to it that led to problems, because the
     notification in question is triggered prematurely on some systems
     (Rafael Wysocki).

   - Convert the OPP DT bindings to DT schema and clean them up while at
     it (Rob Herring)"

* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
  cpufreq: mediatek-hw: Add support for CPUFREQ HW
  cpufreq: Add of_perf_domain_get_sharing_cpumask
  dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
  cpufreq: Remove ready() callback
  cpufreq: sh: Remove sh_cpufreq_cpu_ready()
  cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
  cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
  cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
  cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
  cpufreq: scmi: Use .register_em() to register with energy model
  cpufreq: vexpress: Use .register_em() to register with energy model
  cpufreq: scpi: Use .register_em() to register with energy model
  dt-bindings: opp: Convert to DT schema
  dt-bindings: Clean-up OPP binding node names in examples
  ARM: dts: omap: Drop references to opp.txt
  cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
  cpufreq: omap: Use .register_em() to register with energy model
  cpufreq: mediatek: Use .register_em() to register with energy model
  cpufreq: imx6q: Use .register_em() to register with energy model
  ...
2021-09-08 16:38:25 -07:00
Linus Torvalds
2d338201d5 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "147 patches, based on 7d2a07b769.

  Subsystems affected by this patch series: mm (memory-hotplug, rmap,
  ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
  alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
  checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
  selftests, ipc, and scripts"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  scripts: check_extable: fix typo in user error message
  mm/workingset: correct kernel-doc notations
  ipc: replace costly bailout check in sysvipc_find_ipc()
  selftests/memfd: remove unused variable
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  configs: remove the obsolete CONFIG_INPUT_POLLDEV
  prctl: allow to setup brk for et_dyn executables
  pid: cleanup the stale comment mentioning pidmap_init().
  kernel/fork.c: unexport get_{mm,task}_exe_file
  coredump: fix memleak in dump_vma_snapshot()
  fs/coredump.c: log if a core dump is aborted due to changed file permissions
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  trap: cleanup trap_init()
  init: move usermodehelper_enable() to populate_rootfs()
  ...
2021-09-08 12:55:35 -07:00
David Hildenbrand
3fcebf9020 mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged
KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
can have, primarily, because there is no coordiantion across memory
devices and we don't want to create zone-imbalances accidentially when
unplugging memory.

However, within a single memory device it's different.  Let's allow for
KERNEL memory within a dynamic memory group to allow for more MOVABLE
within the same memory group.  The only thing we have to take care of is
that the managing driver avoids zone imbalances by unplugging MOVABLE
memory first, otherwise there can be corner cases where unplug of memory
could result in (accidential) zone imbalances.

virtio-mem is the only user of dynamic memory groups and recently added
support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
don't need a new toggle to enable it for dynamic memory groups.

We limit this handling to dynamic memory groups, because:

* We want to keep the runtime overhead for collecting stats when
  onlining a single memory block small.  We tend to have only a handful of
  dynamic memory groups, but we can have quite some static memory groups
  (e.g., 256 DIMMs).

* It doesn't make too much sense for static memory groups, as we try
  onlining all applicable memory blocks either completely to ZONE_MOVABLE
  or not.  In ordinary operation, we won't have a mixture of zones within
  a static memory group.

When adding memory to a dynamic memory group, we'll first online memory to
ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
online the next unit(s) to ZONE_NORMAL, until we can online the next
unit(s) to ZONE_MOVABLE.

For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
result in a layout like:

  [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
  ^ movable memory due to early kernel memory
			   ^ allows for more movable memory ...
			      ^-----^ ... here
				       ^ allows for more movable memory ...
				          ^-----^ ... here

While the created layout is sub-optimal when it comes to contiguous zones,
it gives us the maximum flexibility when dynamically growing/shrinking a
device; we can grow small VMs really big in small steps, and still shrink
reliably to e.g., 1/4 of the maximum VM size in this example, removing
full memory blocks along with meta data more reliably.

Mark dynamic memory groups in the xarray such that we can efficiently
iterate over them when collecting stats.  In usual setups, we have one
virtio-mem device per NUMA node, and usually only a small number of NUMA
nodes.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
David Hildenbrand
445fcf7c72 mm/memory_hotplug: memory group aware "auto-movable" online policy
Use memory groups to improve our "auto-movable" onlining policy:

1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE
   only if all other memory blocks in the group are either MOVABLE or could
   be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture.

2. For dynamic memory groups (e.g., a virtio-mem device), online a
   memory block MOVABLE only if all other memory blocks inside the
   current unit are either MOVABLE or could be onlined MOVABLE. For a
   virtio-mem device with a device block size with 512 MiB, all 128 MiB
   memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not
   a mixture.

We have to pass the memory group to zone_for_pfn_range() to take the
memory group into account.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
David Hildenbrand
836809ec75 mm/memory_hotplug: track present pages in memory groups
Let's track all present pages in each memory group.  Especially, track
memory present in ZONE_MOVABLE and memory present in one of the kernel
zones (which really only is ZONE_NORMAL right now as memory groups only
apply to hotplugged memory) separately within a memory group, to prepare
for making smart auto-online decision for individual memory blocks within
a memory group based on group statistics.

Link: https://lkml.kernel.org/r/20210806124715.17090-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
David Hildenbrand
028fc57a1c drivers/base/memory: introduce "memory groups" to logically group memory blocks
In our "auto-movable" memory onlining policy, we want to make decisions
across memory blocks of a single memory device.  Examples of memory
devices include ACPI memory devices (in the simplest case a single DIMM)
and virtio-mem.  For now, we don't have a connection between a single
memory block device and the real memory device.  Each memory device
consists of 1..X memory block devices.

Let's logically group memory blocks belonging to the same memory device in
"memory groups".  Memory groups can span multiple physical ranges and a
memory group itself does not contain any information regarding physical
ranges, only properties (e.g., "max_pages") necessary for improved memory
onlining.

Introduce two memory group types:

1) Static memory group: E.g., a single ACPI memory device, consisting
   of 1..X memory resources.  A memory group consists of 1..Y memory
   blocks.  The whole group is added/removed in one go.  If any part
   cannot get offlined, the whole group cannot be removed.

2) Dynamic memory group: E.g., a single virtio-mem device.  Memory is
   dynamically added/removed in a fixed granularity, called a "unit",
   consisting of 1..X memory blocks.  A unit is added/removed in one go.
   If any part of a unit cannot get offlined, the whole unit cannot be
   removed.

In case of 1) we usually want either all memory managed by ZONE_MOVABLE or
none.  In case of 2) we usually want to have as many units as possible
managed by ZONE_MOVABLE.  We want a single unit to be of the same type.

For now, memory groups are an internal concept that is not exposed to user
space; we might want to change that in the future, though.

add_memory() users can specify a mgid instead of a nid when passing the
MHP_NID_IS_MGID flag.

Link: https://lkml.kernel.org/r/20210806124715.17090-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
David Hildenbrand
4b09700244 mm: track present early pages per zone
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3.

I. Goal

The goal of this series is improving in-kernel auto-online support.  It
tackles the fundamental problems that:

 1) We can create zone imbalances when onlining all memory blindly to
    ZONE_MOVABLE, in the worst case crashing the system. We have to know
    upfront how much memory we are going to hotplug such that we can
    safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
    via "online_movable". This is far from practical and only applicable in
    limited setups -- like inside VMs under the RHV/oVirt hypervisor which
    will never hotplug more than 3 times the boot memory (and the
    limitation is only in place due to the Linux limitation).

 2) We see more setups that implement dynamic VM resizing, hot(un)plugging
    memory to resize VM memory. In these setups, we might hotplug a lot of
    memory, but it might happen in various small steps in both directions
    (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
    primary driver of this upstream right now, performing such dynamic
    resizing NUMA-aware via multiple virtio-mem devices.

    Onlining all hotplugged memory to ZONE_NORMAL means we basically have
    no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
    easily run into zone imbalances when growing a VM. We want a mixture,
    and we want as much memory as reasonable/configured in ZONE_MOVABLE.
    Details regarding zone imbalances can be found at [1].

 3) Memory devices consist of 1..X memory block devices, however, the
    kernel doesn't really track the relationship. Consequently, also user
    space has no idea. We want to make per-device decisions.

    As one example, for memory hotunplug it doesn't make sense to use a
    mixture of zones within a single DIMM: we want all MOVABLE if
    possible, otherwise all !MOVABLE, because any !MOVABLE part will easily
    block the whole DIMM from getting hotunplugged.

    As another example, virtio-mem operates on individual units that span
    1..X memory blocks. Similar to a DIMM, we want a unit to either be all
    MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however,
    all units of a virtio-mem device logically belong together and are
    managed (added/removed) by a single driver. We want as much memory of
    a virtio-mem device to be MOVABLE as possible.

 4) We want memory onlining to be done right from the kernel while adding
    memory, not triggered by user space via udev rules; for example, this
    is reqired for fast memory hotplug for drivers that add individual
    memory blocks, like virito-mem. We want a way to configure a policy in
    the kernel and avoid implementing advanced policies in user space.

The auto-onlining support we have in the kernel is not sufficient.  All we
have is a) online everything MOVABLE (online_movable) b) online everything
!MOVABLE (online_kernel) c) keep zones contiguous (online).  This series
allows configuring c) to mean instead "online movable if possible
according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio"
-- a new onlining policy.

II. Approach

This series does 3 things:

 1) Introduces the "auto-movable" online policy that initially operates on
    individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
    to make a decision whether a memory block will be onlined to
    ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
    memory does not allow for more MOVABLE memory (details in the
    patches). CMA memory is treated like MOVABLE memory.

 2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
    groups and uses group information to make decisions in the
    "auto-movable" online policy across memory blocks of a single memory
    device (modeled as memory group). More details can be found in patch
    #3 or in the DIMM example below.

 3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
    allowing ZONE_NORMAL memory within a dynamic memory group to allow for
    more ZONE_MOVABLE memory within the same memory group. The target use
    case is dynamic VM resizing using virtio-mem. See the virtio-mem
    example below.

I remember that the basic idea of using a ratio to implement a policy in
the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I
lost the pointer to that discussion).

For me, the main use case is using it along with virtio-mem (and DIMMs /
ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the
amount of memory we can hotunplug reliably again if we might eventually
hotplug a lot of memory to a VM.

III. Target Usage

The target usage will be:

 1) Linux boots with "mhp_default_online_type=offline"

 2) User space (e.g., systemd unit) configures memory onlining (according
    to a config file and system properties), for example:
    * Setting memory_hotplug.online_policy=auto-movable
    * Setting memory_hotplug.auto_movable_ratio=301
    * Setting memory_hotplug.auto_movable_numa_aware=true

 3) User space enabled auto onlining via "echo online >
    /sys/devices/system/memory/auto_online_blocks"

 4) User space triggers manual onlining of all already-offline memory
    blocks (go over offline memory blocks and set them to "online")

IV. Example

For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
301% results in the following layout:
	Memory block 0-15:    DMA32   (early)
	Memory block 32-47:   Normal  (early)
	Memory block 48-79:   Movable (DIMM 0)
	Memory block 80-111:  Movable (DIMM 1)
	Memory block 112-143: Movable (DIMM 2)
	Memory block 144-275: Normal  (DIMM 3)
	Memory block 176-207: Normal  (DIMM 4)
	... all Normal
	(-> hotplugged Normal memory does not allow for more Movable memory)

For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
will result in the following layout:
	Memory block 0-15:    DMA32   (early)
	Memory block 32-47:   Normal  (early)
	Memory block 48-143:  Movable (virtio-mem, first 12 GiB)
	Memory block 144:     Normal  (virtio-mem, next 128 MiB)
	Memory block 145-147: Movable (virtio-mem, next 384 MiB)
	Memory block 148:     Normal  (virtio-mem, next 128 MiB)
	Memory block 149-151: Movable (virtio-mem, next 384 MiB)
	... Normal/Movable mixture as above
	(-> hotplugged Normal memory allows for more Movable memory within
	    the same device)

Which gives us maximum flexibility when dynamically growing/shrinking a
VM in smaller steps.

V. Doc Update

I'll update the memory-hotplug.rst documentation, once the overhaul [1] is
usptream. Until then, details can be found in patch #2.

VI. Future Work

 1) Use memory groups for ppc64 dlpar
 2) Being able to specify a portion of (early) kernel memory that will be
    excluded from the ratio. Like "128 MiB globally/per node" are excluded.

    This might be helpful when starting VMs with extremely small memory
    footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting
    the first hotplugged units getting onlined to ZONE_MOVABLE. One
    alternative would be a trigger to not consider ZONE_DMA memory
    in the ratio. We'll have to see if this is really rrequired.
 3) Indicate to user space that MOVABLE might be a bad idea -- especially
    relevant when memory ballooning without support for balloon compaction
    is active.

This patch (of 9):

For implementing a new memory onlining policy, which determines when to
online memory blocks to ZONE_MOVABLE semi-automatically, we need the
number of present early (boot) pages -- present pages excluding hotplugged
pages.  Let's track these pages per zone.

Pass a page instead of the zone to adjust_present_page_count(), similar as
adjust_managed_page_count() and derive the zone from the page.

It's worth noting that a memory block to be offlined/onlined is either
completely "early" or "not early".  add_memory() and friends can only add
complete memory blocks and we only online/offline complete (individual)
memory blocks.

Link: https://lkml.kernel.org/r/20210806124715.17090-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210806124715.17090-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
Mike Rapoport
859a85ddf9 mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE".

After recent updates to freeing unused parts of the memory map, no
architecture can have holes in the memory map within a pageblock.  This
makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration
option redundant.

The first patch removes them both in a mechanical way and the second patch
simplifies memory_hotplug::test_pages_in_a_zone() that had
pfn_valid_within() surrounded by more logic than simple if.

This patch (of 2):

After recent changes in freeing of the unused parts of the memory map and
rework of pfn_valid() in arm and arm64 there are no architectures that can
have holes in the memory map within a pageblock and so nothing can enable
CONFIG_HOLES_IN_ZONE which guards non trivial implementation of
pfn_valid_within().

With that, pfn_valid_within() is always hardwired to 1 and can be
completely removed.

Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE.

Link: https://lkml.kernel.org/r/20210713080035.7464-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210713080035.7464-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:22 -07:00
Rafael J. Wysocki
eabf9e616e Merge branch 'pm-cpufreq'
* pm-cpufreq:
  Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
  cpufreq: mediatek-hw: Add support for CPUFREQ HW
  cpufreq: Add of_perf_domain_get_sharing_cpumask
  dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
  cpufreq: Remove ready() callback
  cpufreq: sh: Remove sh_cpufreq_cpu_ready()
  cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
  cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
  cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
  cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
  cpufreq: scmi: Use .register_em() to register with energy model
  cpufreq: vexpress: Use .register_em() to register with energy model
  cpufreq: scpi: Use .register_em() to register with energy model
  cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
  cpufreq: omap: Use .register_em() to register with energy model
  cpufreq: mediatek: Use .register_em() to register with energy model
  cpufreq: imx6q: Use .register_em() to register with energy model
  cpufreq: dt: Use .register_em() to register with energy model
  cpufreq: Add callback to register with energy model
  cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
2021-09-08 16:42:02 +02:00
Prasad Sodagudi
4a9344cd0a PM: sleep: core: Avoid setting power.must_resume to false
There are variables(power.may_skip_resume and dev->power.must_resume)
and DPM_FLAG_MAY_SKIP_RESUME flags to control the resume of devices after
a system wide suspend transition.

Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows
its "noirq" and "early" resume callbacks to be skipped if the device
can be left in suspend after a system-wide transition into the working
state. PM core determines that the driver's "noirq" and "early" resume
callbacks should be skipped or not with dev_pm_skip_resume() function by
checking power.may_skip_resume variable.

power.must_resume variable is getting set to false in __device_suspend()
function without checking device's DPM_FLAG_MAY_SKIP_RESUME settings.
In problematic scenario, where all the devices in the suspend_late
stage are successful and some device can fail to suspend in
suspend_noirq phase. So some devices successfully suspended in suspend_late
stage are not getting chance to execute __device_suspend_noirq()
to set dev->power.must_resume variable to true and not getting
resumed in early_resume phase.

Add a check for device's DPM_FLAG_MAY_SKIP_RESUME flag before
setting power.must_resume variable in __device_suspend function.

Fixes: 6e176bf8d4 ("PM: sleep: core: Do not skip callbacks in the resume phase")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07 21:18:34 +02:00
Sergey Shtylyov
d216bfb4d7 PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()
This function has the 'irq' parameter which isn't ever used, so drop it.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07 21:10:09 +02:00
Linus Torvalds
3de18c865f Merge branch 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
 "A new feature called restricted DMA pools. It allows SWIOTLB to
  utilize per-device (or per-platform) allocated memory pools instead of
  using the global one.

  The first big user of this is ARM Confidential Computing where the
  memory for DMA operations can be set per platform"

* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
  swiotlb: use depends on for DMA_RESTRICTED_POOL
  of: restricted dma: Don't fail device probe on rmem init failure
  of: Move of_dma_set_restricted_buffer() into device.c
  powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
  s390/pv: fix the forcing of the swiotlb
  swiotlb: Free tbl memory in swiotlb_exit()
  swiotlb: Emit diagnostic in swiotlb_exit()
  swiotlb: Convert io_default_tlb_mem to static allocation
  of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
  swiotlb: add overflow checks to swiotlb_bounce
  swiotlb: fix implicit debugfs declarations
  of: Add plumbing for restricted DMA pool
  dt-bindings: of: Add restricted DMA pool
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add restricted DMA alloc/free support
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  ...
2021-09-03 10:34:44 -07:00
Linus Torvalds
14726903c8 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
2021-09-03 10:08:28 -07:00
Mike Rapoport
a7259df767 memblock: make memblock_find_in_range method private
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.

memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.

Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.

This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().

Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[ACPI]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nick Kossifidis <mick@ics.forth.gr>			[riscv]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:17 -07:00
Ohhoon Kwon
fc1f5e980a mm: sparse: pass section_nr to find_memory_block
With CONFIG_SPARSEMEM_EXTREME enabled, __section_nr() which converts
mem_section to section_nr could be costly since it iterates all section
roots to check if the given mem_section is in its range.

On the other hand, __nr_to_section() which converts section_nr to
mem_section can be done in O(1).

Let's pass section_nr instead of mem_section ptr to find_memory_block() in
order to reduce needless iterations.

Link: https://lkml.kernel.org/r/20210707150212.855-3-ohoono.kwon@samsung.com
Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:14 -07:00
Linus Torvalds
a9c9a6f741 SCSI misc on 20210902
This series consists of the usual driver updates (ufs, qla2xxx,
 target, smartpqi, lpfc, mpt3sas).  The core change causing the most
 churn was replacing the command request field request with a macro,
 allowing us to offset map to it and remove the redundant field; the
 same was also done for the tag field.  The most impactful change is
 the final removal of scsi_ioctl, which has been deprecated for over a
 decade.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYTD/TiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdUkAQCjb3Ux
 4K9438mMelHlzM4er1S1IJ0WNnvObaVMNO9LBwD+JUz+rHsrKvuEX9j3g3C3u6JH
 hC3BUEW8f2LLnujWanQ=
 =lC5o
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, qla2xxx,
  target, smartpqi, lpfc, mpt3sas).

  The core change causing the most churn was replacing the command
  request field request with a macro, allowing us to offset map to it
  and remove the redundant field; the same was also done for the tag
  field.

  The most impactful change is the final removal of scsi_ioctl, which
  has been deprecated for over a decade"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
  scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
  scsi: ufs: ufs-exynos: Fix static checker warning
  scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Copyright updates for 14.0.0.1 patches
  scsi: lpfc: Update lpfc version to 14.0.0.1
  scsi: lpfc: Add bsg support for retrieving adapter cmf data
  scsi: lpfc: Add cmf_info sysfs entry
  scsi: lpfc: Add debugfs support for cm framework buffers
  scsi: lpfc: Add support for maintaining the cm statistics buffer
  scsi: lpfc: Add rx monitoring statistics
  scsi: lpfc: Add support for the CM framework
  scsi: lpfc: Add cmfsync WQE support
  scsi: lpfc: Add support for cm enablement buffer
  scsi: lpfc: Add cm statistics buffer support
  scsi: lpfc: Add EDC ELS support
  scsi: lpfc: Expand FPIN and RDF receive logging
  scsi: lpfc: Add MIB feature enablement support
  scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
  scsi: fc: Add EDC ELS definition
  ...
2021-09-02 15:09:46 -07:00
Linus Torvalds
75d6e7d9ce Nothing changed in the clk framework core this time around. We did get
some updates to the basic clk types to use determine_rate for the
 divider type and add a power of two fractional divider flag though.
 Otherwise, this is a collection of clk driver updates. More than half
 the diffstat is in the Qualcomm clk driver where we add a bunch of data
 to describe clks on various SoCs and fix bugs. The other big new thing
 in here is the Mediatek MT8192 clk driver. That's been under review for
 a while and it's nice to see that it's finally upstream.
 
 Beyond that it's the usual set of minor fixes and tweaks to clk drivers.
 There are some non-clk driver bits in here which have all been acked by
 the respective maintainers.
 
 New Drivers:
  - Support video, gpu, display clks on qcom sc7280 SoCs
  - GCC clks on qcom MSM8953, SM4250/6115, and SM6350 SoCs
  - Multimedia clks (MMCC) on qcom MSM8994/MSM8992
  - RPMh clks on qcom SM6350 SoCs
  - Support for Mediatek MT8192 SoCs
  - Add display (DU and DSI) clocks on Renesas R-Car V3U
  - Add I2C, DMAC, USB, sound (SSIF-2), GPIO, CANFD, and ADC clocks and
    resets on Renesas RZ/G2L
 
 Updates:
  - Support the SD/OE pin on IDT VersaClock 5 and 6 clock generators
  - Add power of two flag to fractional divider clk type
  - Migrate some clk drivers to clk_divider_ops.determine_rate
  - Migrate to clk_parent_data in gcc-sdm660
  - Fix CLKOUT clocks on i.MX8MM and i.MX8MN by using imx_clk_hw_mux2
  - Switch from .round_rate to .determine_rate in clk-divider-gate
  - Fix clock tree update for TF-A controlled clocks for all i.MX8M
  - Add missing M7 core clock for i.MX8MN
  - YAML conversion of rk3399 clock controller binding
  - Removal of GRF dependency for the rk3328/rk3036 pll types
  - Drop CLK_IS_CRITICAL flag from Tegra fuse clk
  - Make CLK_R9A06G032 Kconfig symbol invisible
  - Convert various DT bindings to YAML
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmExEooRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSXXBhAAvhHm4fcm3fRjNdfImd+jDEl8XSvg+w43
 adSnmVxbYM6ZVNOiJ4CJWHbj0hOY/PJnsQYWbV0xXvXW+zXva6p495MMHHOGSi2o
 lMgZVMvj5UAwu304ZC9Xfn31dwo8XdGrltp4JqIcI2NEBMh1/PlZW22esT+jDiWN
 3SWFD3M7lu88xTREyiEu11FY3z/KiGzbGlqYcbivx1X0sHVnBRbl4qcqZway+BmQ
 95Ma4YWwhvDGYc+ypKH2EPxs/LikHXj05nMooigy65DOQ5wrM4L1eWkwmVUf6h+e
 t4x7sAVysLnkihzdH5r2pw6CcAIom76v8w0+maSfk+jINUu1LeGVuat1eXSesFTu
 49o+uTKRghkUe/Qh6r+7lbo8AZXQq+wUsLTYRuaWT/mSb+svAtJaUWAru8tJnMlH
 oK6OehcQwz4nGhH0HnBK1jCVdtgckxPBw8F/GYN9rYhsccIe0XmFjX1rzMM3s8De
 PLl6QO7Xzd+xb/FwAU8+S1WpKFdPU6ILTUnI2Ma3Mn/gfjZEZHvWAdTjo4oZGEsw
 +N4n924ArptbeSLRrlNUtqx4BVDL5yo54xS5gefNpmD5yezO7aoUtN0aGcBq+01p
 Qw0N5hKtcdsNYLBEFSvBGcZZmErMZbPwMXHWiUwNymXBDzJKgj5d+ks+1vJ3iCNW
 R5r9hvATJPQ=
 =Rrqg
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "Nothing changed in the clk framework core this time around. We did get
  some updates to the basic clk types to use determine_rate for the
  divider type and add a power of two fractional divider flag though.

  Otherwise, this is a collection of clk driver updates. More than half
  the diffstat is in the Qualcomm clk driver where we add a bunch of
  data to describe clks on various SoCs and fix bugs. The other big new
  thing in here is the Mediatek MT8192 clk driver. That's been under
  review for a while and it's nice to see that it's finally upstream.

  Beyond that it's the usual set of minor fixes and tweaks to clk
  drivers. There are some non-clk driver bits in here which have all
  been acked by the respective maintainers.

  New Drivers:
   - Support video, gpu, display clks on qcom sc7280 SoCs
   - GCC clks on qcom MSM8953, SM4250/6115, and SM6350 SoCs
   - Multimedia clks (MMCC) on qcom MSM8994/MSM8992
   - RPMh clks on qcom SM6350 SoCs
   - Support for Mediatek MT8192 SoCs
   - Add display (DU and DSI) clocks on Renesas R-Car V3U
   - Add I2C, DMAC, USB, sound (SSIF-2), GPIO, CANFD, and ADC clocks and
     resets on Renesas RZ/G2L

  Updates:
   - Support the SD/OE pin on IDT VersaClock 5 and 6 clock generators
   - Add power of two flag to fractional divider clk type
   - Migrate some clk drivers to clk_divider_ops.determine_rate
   - Migrate to clk_parent_data in gcc-sdm660
   - Fix CLKOUT clocks on i.MX8MM and i.MX8MN by using imx_clk_hw_mux2
   - Switch from .round_rate to .determine_rate in clk-divider-gate
   - Fix clock tree update for TF-A controlled clocks for all i.MX8M
   - Add missing M7 core clock for i.MX8MN
   - YAML conversion of rk3399 clock controller binding
   - Removal of GRF dependency for the rk3328/rk3036 pll types
   - Drop CLK_IS_CRITICAL flag from Tegra fuse clk
   - Make CLK_R9A06G032 Kconfig symbol invisible
   - Convert various DT bindings to YAML"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (128 commits)
  dt-bindings: clock: samsung: fix header path in example
  clk: tegra: fix old-style declaration
  clk: qcom: Add SM6350 GCC driver
  MAINTAINERS: clock: include S3C and S5P in Samsung SoC clock entry
  dt-bindings: clock: samsung: convert S5Pv210 AudSS to dtschema
  dt-bindings: clock: samsung: convert Exynos AudSS to dtschema
  dt-bindings: clock: samsung: convert Exynos4 to dtschema
  dt-bindings: clock: samsung: convert Exynos3250 to dtschema
  dt-bindings: clock: samsung: convert Exynos542x to dtschema
  dt-bindings: clock: samsung: add bindings for Exynos external clock
  dt-bindings: clock: samsung: convert Exynos5250 to dtschema
  clk: vc5: Add properties for configuring SD/OE behavior
  clk: vc5: Use dev_err_probe
  dt-bindings: clk: vc5: Add properties for configuring the SD/OE pin
  dt-bindings: clock: brcm,iproc-clocks: fix armpll properties
  clk: zynqmp: Fix kernel-doc format
  clk: at91: clk-generated: Limit the requested rate to our range
  clk: ralink: avoid to set 'CLK_IS_CRITICAL' flag for gates
  clk: zynqmp: Fix a memory leak
  clk: zynqmp: Check the return type
  ...
2021-09-02 14:17:24 -07:00
Linus Torvalds
df43d90382 printk changes for 5.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmEt+hwACgkQUqAMR0iA
 lPLppBAAiyrUNVmqqtdww+IJajEs1uD/4FqPsysHRwroHBFymJeQG1XCwUpDZ7jj
 6gXT0chxyjQE18gT/W9nf+PSmA9XvIVA1WSR+WCECTNW3YoZXqtgwiHfgnitXYku
 HlmoZLthYeuoXWw2wn+hVLfTRh6VcPHYEaC21jXrs6B1pOXHbvjJ5eTLHlX9oCfL
 UKSK+jFTHAJcn/GskRzviBe0Hpe8fqnkRol2XX13ltxqtQ73MjaGNu7imEH6/Pa7
 /MHXWtuWJtOvuYz17aztQP4Qwh1xy+kakMy3aHucdlxRBTP4PTzzTuQI3L/RYi6l
 +ttD7OHdRwqFAauBLY3bq3uJjYb5v/64ofd8DNnT2CJvtznY8wrPbTdFoSdPcL2Q
 69/opRWHcUwbU/Gt4WLtyQf3Mk0vepgMbbVg1B5SSy55atRZaXMrA2QJ/JeawZTB
 KK6D/mE7ccze/YFzsySunCUVKCm0veoNxEAcakCCZKXSbsvd1MYcIRC0e+2cv6e5
 2NEH7gL4dD+5tqu5nzvIuKDn3NrDQpbi28iUBoFbkxRgcVyvHJ9AGSa62wtb5h3D
 OgkqQMdVKBbjYNeUodPlQPzmXZDasytavyd0/BC/KENOcBvU/8gW++2UZTfsh/1A
 dLjgwFBdyJncQcCS9Abn20/EKntbIMEX8NLa97XWkA3fuzMKtak=
 =yEVq
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Optionally, provide an index of possible printk messages via
   <debugfs>/printk/index/. It can be used when monitoring important
   kernel messages on a farm of various hosts. The monitor has to be
   updated when some messages has changed or are not longer available by
   a newly deployed kernel.

 - Add printk.console_no_auto_verbose boot parameter. It allows to
   generate crash dump even with slow consoles in a reasonable time
   frame.

 - Remove printk_safe buffers. The messages are always stored directly
   to the main logbuffer, even in NMI or recursive context. Also it
   allows to serialize syslog operations by a mutex instead of a spin
   lock.

 - Misc clean up and build fixes.

* tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/index: Fix -Wunused-function warning
  lib/nmi_backtrace: Serialize even messages about idle CPUs
  printk: Add printk.console_no_auto_verbose boot parameter
  printk: Remove console_silent()
  lib/test_scanf: Handle n_bits == 0 in random tests
  printk: syslog: close window between wait and read
  printk: convert @syslog_lock to mutex
  printk: remove NMI tracking
  printk: remove safe buffers
  printk: track/limit recursion
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: Move the printk() kerneldoc comment to its new home
  printk/index: Fix warning about missing prototypes
  MIPS/asm/printk: Fix build failure caused by printk
  printk: index: Add indexing support to dev_printk
  printk: Userspace format indexing support
  printk: Rework parse_prefix into printk_parse_prefix
  printk: Straighten out log_flags into printk_info_flags
  string_helpers: Escape double quotes in escape_special
  printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01 18:41:13 -07:00
Linus Torvalds
c6c3c5704b Driver core update for 5.15-rc1
Here is the big set of driver core patches for 5.15-rc1.
 
 These do change a number of different things across different
 subsystems, and because of that, there were 2 stable tags created that
 might have already come into your tree from different pulls that did the
 following
 	- changed the bus remove callback to return void
 	- sysfs iomem_get_mapping rework
 
 The latter one will cause a tiny merge issue with your tree, as there
 was a last-minute fix for this in 5.14 in your tree, but the fixup
 should be "obvious".  If you want me to provide a fixed merge for this,
 please let me know.
 
 Other than those two things, there's only a few small things in here:
 	- kernfs performance improvements for huge numbers of sysfs
 	  users at once
 	- tiny api cleanups
 	- other minor changes
 
 All of these have been in linux-next for a while with no reported
 problems, other than the before-mentioned merge issue.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYS+FLQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylXuACfWECnysDtXNe66DdETCFs1a1RToYAoMokWeU5
 s8VFP1NY2BjmxJbkebLL
 =8kVu
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core patches for 5.15-rc1.

  These do change a number of different things across different
  subsystems, and because of that, there were 2 stable tags created that
  might have already come into your tree from different pulls that did
  the following

   - changed the bus remove callback to return void

   - sysfs iomem_get_mapping rework

  Other than those two things, there's only a few small things in here:

   - kernfs performance improvements for huge numbers of sysfs users at
     once

   - tiny api cleanups

   - other minor changes

  All of these have been in linux-next for a while with no reported
  problems, other than the before-mentioned merge issue"

* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
  MAINTAINERS: Add dri-devel for component.[hc]
  driver core: platform: Remove platform_device_add_properties()
  ARM: tegra: paz00: Handle device properties with software node API
  bitmap: extend comment to bitmap_print_bitmask/list_to_buf
  drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
  topology: use bin_attribute to break the size limitation of cpumap ABI
  lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
  cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
  sysfs: Rename struct bin_attribute member to f_mapping
  sysfs: Invoke iomem_get_mapping() from the sysfs open callback
  debugfs: Return error during {full/open}_proxy_open() on rmmod
  zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
  zorro: Simplify remove callback
  sh: superhyway: Simplify check in remove callback
  nubus: Simplify check in remove callback
  nubus: Make struct nubus_driver::remove return void
  kernfs: dont call d_splice_alias() under kernfs node lock
  kernfs: use i_lock to protect concurrent inode updates
  kernfs: switch kernfs to use an rwsem
  kernfs: use VFS negative dentry caching
  ...
2021-09-01 08:44:42 -07:00
Linus Torvalds
8e235ff9a1 Device properties framework updates for 5.15-rc1
Improve secondary firmware nodes handling in
 fwnode_graph_get_next_endpoint() (Daniel Scally).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEtI+0SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxcL4QAKNMT3O4IJp41NEghT5k3amkVpJsXy91
 RNKGNJM4ov178VoXWBj7ZWtzpdxEnh1R0idnTuR9BG1o3UbXAUnOrXHlS//uWzv7
 0PCR+pa/jKSWmyH0I8sjVpz3I2z15DUSIY9isIPrLF8mS8Ii+whw5BJDofklweA6
 EmEz52Gm1lP6gEF/6zsto0yJggHwWqhXtUpg8P1YB7NKN7AQ8CXYzthHcmQ+c7dP
 jRvCVtRY5f4whvgkqfmcDzQsIe96CphTw1tdvO8iaDWIfwzbcV8UjMrHSgDHw5QD
 BrrwqGKdO++bYSx8SJR3d6BMbN5UOkl/bTdWgIEaueiI3QFc7mrXf7W9uSA5CJKa
 FFFyLa8osnuLcQ7AwNxYxJ1KWVSWPdAWi8ihKhfYkChLCwlSUjysk9s/gYvt2P+y
 YDwybZwCO/0OFPi7YM2Dl/rBUlXT4Jz6Dcpvjj58Ram81Vq6hlC0g0IPHWigjbzv
 ox2nFcnXL35nsYnuFWpY2azPZX9UpDCPruIKmzP9y369QtcTkPTCAM1WMJya5Eqk
 2/Pp/Wa8C6/obeoOatIe4AVpNEPhcFbxixulJoai2MGGf80GKwY+Ec7mKZ3NZA/0
 j/PZodWwF0dm2MzjGZkykfGhlqdJfvh8lWBehv5N3Fk+c30AhwrSXquUaOomwYOn
 fkgxL1qChNl9
 =pc2K
 -----END PGP SIGNATURE-----

Merge tag 'devprop-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull device properties framework updates from Rafael Wysocki:
 "These improve the handling of secondary firmware nodes in
  fwnode_graph_get_next_endpoint() (Daniel Scally)"

* tag 'devprop-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "media: device property: Call fwnode_graph_get_endpoint_by_id() for fwnode->secondary"
  device property: Check fwnode->secondary in fwnode_graph_get_next_endpoint()
2021-08-31 13:36:01 -07:00
Linus Torvalds
6f1e8b12ee ACPI updates for 5.15-rc1
- Update ACPICA code in the kernel to upstream revision 20210730
    including the following changes:
    * Add support for the AEST table (data compiler) to iASL (Bob
      Moore).
    * Fix an if statement (add parens) (Bob Moore).
    * Drop trailing semicolon from some macros (Bob Moore).
    * Fix compilation of WPBT table with no command-line arguments
      in iASL (Bob Moore).
    * Add method name "_DIS" for use with aslmethod.c (Bob Moore).
    * Add new DBG2 Serial Port Subtypes (Marcin Wojtas).
 
  - Add new PCH FIVR methods to the DPTF code (Srinivas Pandruvada).
 
  - Add support for the new 16550-compatible Serial Port Subtype to
    the SPCR table parsing code (Marcin Wojtas).
 
  - Add DMI quirk for Lenovo Yoga 9 (14INTL5) to the ACPI button
    driver (Ulrich Huber).
 
  - Add LoongArch support for ACPI_PROCESSOR/ACPI_NUMA (Huacai Chen).
 
  - Add memory semantics to acpi_os_map_memory() (Lorenzo Pieralisi).
 
  - Replace deprecated CPU-hotplug functions in the ACPI processor
    driver (Sebastian Andrzej Siewior).
 
  - Optimize I2C-bus handling in the XPower PMIC driver (Hans de Goede).
 
  - Make platform-profile catch profile changes initiated by user space
    and notify user processes of them (Hans de Goede).
 
  - Clean up the ACPI companion binding and unbinding code and update
    debug messaging in the ACPI power resources code (Rafael Wysocki).
 
  - Clean up a couple of code pieces related to configfs (Andy
    Shevchenko).
 
  - Rearrange the FPDT table parsing code to avoid printing warning
    messages for reserved record types (Adrian Huang).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEtI2kSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxVXIP/jfi52N6F/PngBXR0f2CG1w9z02vHNej
 ylf0wruJ52MlVVBzizHHTTh/OtHnAziWsFzVieMkPCc1+xZXIORbGuoEEZw3E+Pz
 MUL2QjwLcSYcSqmC1D/aU51aFZLo/26R9ODAMNNzIFqMIbWq9sKCWliQXPKI+/f9
 0zuiKYx3alVGEHU1Gl+qzIppnXBdyeI+irDM7mCA5W4anlmCj1tn36yK4deatx5f
 NiwHpuC71ddVKHlI/UICmtIBXBCTULKYuqcHN28E1Vhn/4ieXxEmIrFoKeMd6Zhe
 hTejCyejwp+vvoqRl4UPmIkC5KPUbTmpsrNWvzvOQyssZq0sorg7IAu/kM9ePJZD
 VnaKT1JBWhACp4XhqfqvI8UoES9C8a39q2nXrGwLUy8/3x+F2EsOn/Awl6KHGu5f
 HuVCYoQFPY0OHjz6CAwsw0iuL1Qcj4bf/ixm89bBCQmBEyX5WhpD+gEQB1YnjYYm
 qctzqz60mBF7RDyGqIGWirOfgkbriJ8QnTxkdv1SYfJiOu5V0vg7d22ESOX6YPze
 PmF3OWC4YOcQHsHKuMB8z3X9GW+cP7pohmcFhdaFQ8g1cqqEhkjCtcC9jSTSktuY
 ck+0uy5R8gU/OkVcXWznQlVa26wdfa4VcVNSY6JR6Xy/v5a5AgkQPGbN6x92jXpS
 75h9WtL17/mO
 =YRjN
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA kernel code to upstream revision 20210730,
  clean up the ACPI companion binding code, optimize the I2C handling in
  the XPower PMIC driver, add 16550-compatible Serial Port Subtype
  support to the SPCR parsing code, add a few LoongArch support bits,
  add a ne quirk to the button driver, add new PCH FIVR methods to the
  DPTF code, replace deprecated CPU-hotplug functions in the processor
  driver, improve the acpi_os_map_memory() handling on non-x86 and do
  some assorted cleanups.

  Specifics:

   - Update ACPICA code in the kernel to upstream revision 20210730
     including the following changes:
       - Add support for the AEST table (data compiler) to iASL (Bob
         Moore)
       - Fix an if statement (add parens) (Bob Moore)
       - Drop trailing semicolon from some macros (Bob Moore)
       - Fix compilation of WPBT table with no command-line arguments in
         iASL (Bob Moore)
       - Add method name "_DIS" for use with aslmethod.c (Bob Moore)
       - Add new DBG2 Serial Port Subtypes (Marcin Wojtas)

   - Add new PCH FIVR methods to the DPTF code (Srinivas Pandruvada)

   - Add support for the new 16550-compatible Serial Port Subtype to the
     SPCR table parsing code (Marcin Wojtas)

   - Add DMI quirk for Lenovo Yoga 9 (14INTL5) to the ACPI button driver
     (Ulrich Huber)

   - Add LoongArch support for ACPI_PROCESSOR/ACPI_NUMA (Huacai Chen)

   - Add memory semantics to acpi_os_map_memory() (Lorenzo Pieralisi)

   - Replace deprecated CPU-hotplug functions in the ACPI processor
     driver (Sebastian Andrzej Siewior)

   - Optimize I2C-bus handling in the XPower PMIC driver (Hans de Goede)

   - Make platform-profile catch profile changes initiated by user space
     and notify user processes of them (Hans de Goede)

   - Clean up the ACPI companion binding and unbinding code and update
     debug messaging in the ACPI power resources code (Rafael Wysocki)

   - Clean up a couple of code pieces related to configfs (Andy
     Shevchenko)

   - Rearrange the FPDT table parsing code to avoid printing warning
     messages for reserved record types (Adrian Huang)"

* tag 'acpi-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
  ACPI: power: Drop name from struct acpi_power_resource
  ACPI: power: Use acpi_handle_debug() to print debug messages
  ACPI: tables: FPDT: Do not print FW_BUG message if record types are reserved
  ACPI: button: Add DMI quirk for Lenovo Yoga 9 (14INTL5)
  ACPI: Add memory semantics to acpi_os_map_memory()
  ACPI: SPCR: Add support for the new 16550-compatible Serial Port Subtype
  ACPI: platform-profile: call sysfs_notify() from platform_profile_store()
  ACPICA: Update version to 20210730
  ACPICA: Add method name "_DIS" For use with aslmethod.c
  ACPICA: iASL: Fix for WPBT table with no command-line arguments
  ACPICA: Headers: Add new DBG2 Serial Port Subtypes
  ACPICA: Macros should not use a trailing semicolon
  ACPICA: Fix an if statement (add parens)
  ACPICA: iASL: Add support for the AEST table (data compiler)
  ACPI: processor: Replace deprecated CPU-hotplug functions
  ACPI: DPTF: Add new PCH FIVR methods
  ACPI: configfs: Make get_header() to return error pointer
  ACPI: configfs: Use sysfs_emit() in "show" functions
  driver core: Split device_platform_notify()
  software nodes: Split software_node_notify()
  ...
2021-08-31 13:29:22 -07:00
Linus Torvalds
5cbba60596 Power management updates for 5.15-rc1
- Address 3 PCI device power management issues (Rafael Wysocki).
 
  - Add Power Limit4 support for Alder Lake to the Intel RAPL power
    capping driver (Sumeet Pawnikar).
 
  - Add HWP guaranteed performance change notification support to
    the intel_pstate driver (Srinivas Pandruvada).
 
  - Replace deprecated CPU-hotplug functions in code related to power
    management (Sebastian Andrzej Siewior).
 
  - Update CPU PM notifiers to use raw spinlocks (Valentin Schneider).
 
  - Add support for 'required-opps' DT property to the generic power
    domains (genpd) framework and use this property for I2C on ARM64
    sc7180 (Rajendra Nayak).
 
  - Fix Kconfig issue related to genpd (Geert Uytterhoeven).
 
  - Increase energy calculation precision in the Energy Model (Lukasz
    Luba).
 
  - Fix kobject deletion in the exit code of the schedutil cpufreq
    governor (Kevin Hao).
 
  - Unmark some functions as kernel-doc in the PM core to avoid
    false-positive documentation build warnings (Randy Dunlap).
 
  - Check RTC features instead of ops in suspend_test Alexandre
    Belloni).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEtIvYSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxnQ8QAK2QCyfFAdrPE3zn+XdTcVC8rpYhL0d2
 3YpbS2obZ4lOHsrNy7SrrU5NdgvCFPHl14Py5rxu4QZW8EF5km6SqJYj9MhrDsEm
 wJ/ZM3Zbaj98MAls7pulHxabMo8hvGfMSmcNOFfgh7yqCm5eu8gNnt/LbLaB9IvI
 v46ZGNvnpbtkWgk4Eaq+v3Y5/+4CoUfuAFn7K21SHNcgzDbhE3Ii6cam/rwPeCdn
 a+LcDLzeDQpnnYKukx7Zyk7Z+FblXWHWZ/ClLcR8JgYYSrbrXxSLtRdM5EtLje2J
 ZnqFKqmlMcq2OhTjBSoHSJjiOkxyz5eWWrEf7d1/oH19WSW+rv5VfZUXAkYtfGKo
 OJChuIomrLNqJhAwTnxG3fnVh2NMhLwDVEjkFgvej4shCZXhoVgWu39mWnbyTxV3
 57adiuVUv6AboH6Lyvi/yzV7sWmZbL/U/YVvt+Z9SEh0syy6YBb87bW3Mr1qnLeG
 QF4AmF553qzHwd9uxUoFfiUoBxH1GvNmOWtx6lYKhy0lm3AwvW1XHKqsDG4KbVJg
 zzok7J2yv1/1rd8dJ+8tCwyyC3WJjNjtq/ULOltugwQpqVK4PCwWAUmjJaTP3Gpp
 ZH/bGuLSQWxxiinwKCkFSxwbewJu+AvtoXYKBfk/Gf5O6GyUyryBbBMO9dsqP27A
 Zg3P4+ejS7RN
 =sx6G
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These address some PCI device power management issues, add new
  hardware support to the RAPL power capping driver, add HWP guaranteed
  performance change notification support to the intel_pstate driver,
  replace deprecated CPU-hotplug functions in a few places, update CPU
  PM notifiers to use raw spinlocks, update the PM domains framework
  (new DT property support, Kconfig fix), do a couple of cleanups in
  code related to system sleep, and improve the energy model and the
  schedutil cpufreq governor.

  Specifics:

   - Address 3 PCI device power management issues (Rafael Wysocki).

   - Add Power Limit4 support for Alder Lake to the Intel RAPL power
     capping driver (Sumeet Pawnikar).

   - Add HWP guaranteed performance change notification support to the
     intel_pstate driver (Srinivas Pandruvada).

   - Replace deprecated CPU-hotplug functions in code related to power
     management (Sebastian Andrzej Siewior).

   - Update CPU PM notifiers to use raw spinlocks (Valentin Schneider).

   - Add support for 'required-opps' DT property to the generic power
     domains (genpd) framework and use this property for I2C on ARM64
     sc7180 (Rajendra Nayak).

   - Fix Kconfig issue related to genpd (Geert Uytterhoeven).

   - Increase energy calculation precision in the Energy Model (Lukasz
     Luba).

   - Fix kobject deletion in the exit code of the schedutil cpufreq
     governor (Kevin Hao).

   - Unmark some functions as kernel-doc in the PM core to avoid
     false-positive documentation build warnings (Randy Dunlap).

   - Check RTC features instead of ops in suspend_test Alexandre
     Belloni)"

* tag 'pm-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: domains: Fix domain attach for CONFIG_PM_OPP=n
  powercap: Add Power Limit4 support for Alder Lake SoC
  cpufreq: intel_pstate: Process HWP Guaranteed change notification
  thermal: intel: Allow processing of HWP interrupt
  notifier: Remove atomic_notifier_call_chain_robust()
  PM: cpu: Make notifier chain use a raw_spinlock_t
  PM: sleep: unmark 'state' functions as kernel-doc
  arm64: dts: sc7180: Add required-opps for i2c
  PM: domains: Add support for 'required-opps' to set default perf state
  opp: Don't print an error if required-opps is missing
  cpufreq: schedutil: Use kobject release() method to free sugov_tunables
  PM: EM: Increase energy calculation precision
  PM: sleep: check RTC features instead of ops in suspend_test
  PM: sleep: s2idle: Replace deprecated CPU-hotplug functions
  cpufreq: Replace deprecated CPU-hotplug functions
  powercap: intel_rapl: Replace deprecated CPU-hotplug functions
  PCI: PM: Enable PME if it can be signaled from D3cold
  PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
  PCI: Use pci_update_current_state() in pci_enable_device_flags()
2021-08-31 13:21:58 -07:00
Rafael J. Wysocki
b2a6181e27 Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq driver changes for v5.15 from Viresh Kumar:

"This contains:

 - Update cpufreq-dt blocklist with more platforms (Bjorn Andersson).

 - Allow freq changes from any CPU for qcom-hw driver (Taniya Das).

 - Add DSVS interrupt's support for qcom-hw driver (Thara Gopinath).

 - A new callback (->register_em()) to register EM at a more convenient
   point of time."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
  cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
  cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
  cpufreq: scmi: Use .register_em() to register with energy model
  cpufreq: vexpress: Use .register_em() to register with energy model
  cpufreq: scpi: Use .register_em() to register with energy model
  cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
  cpufreq: omap: Use .register_em() to register with energy model
  cpufreq: mediatek: Use .register_em() to register with energy model
  cpufreq: imx6q: Use .register_em() to register with energy model
  cpufreq: dt: Use .register_em() to register with energy model
  cpufreq: Add callback to register with energy model
  cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
2021-08-31 14:02:16 +02:00
Linus Torvalds
7d6e3fa87e Updates to the interrupt core and driver subsystems:
Core changes:
 
    - The usual set of small fixes and improvements all over the place, but nothing
      outstanding
 
 MSI changes:
 
    - Further consolidation of the PCI/MSI interrupt chip code
 
    - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts
      of platform devices in the same way as PCI exposes them.
 
 Driver changes:
 
    - Support for ARM GICv3 EPPI partitions
 
    - Treewide conversion to generic_handle_domain_irq() for all chained
      interrupt controllers
 
    - Conversion to bitmap_zalloc() throughout the irq chip drivers
 
    - The usual set of small fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsnpsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoS+/EACQdpRkzl3IDIYqThxVZ8KQzp2rKKVn
 qisAQiWg/6koNJx/yYy62KNAUyKjCIObNtRnWi7OAOx6OvNtQTD2WOLAwkh3Pgw1
 8ePYYl55k+yCs8VoITsZM9jYeO+Tk878pU2A6R943zR+g6G7bskGJrxEyZ9TbzIe
 qKfusNKnRY9/jMQaRALUAAtA9VIVR867GqORX5X8hKz8yE2rqlpb4y+1CFba5BTV
 Vlxw7cIXvXBn7BKAom5diRqEGDNJEbX+56jJ7yDZshgLo7m11D7QLw72kmb6TNVC
 g7PchvFi4afpc1ifEAAp0tk4RiSIAQ91nS3n0+jLcLbodOjIkl14eY02ZCJGAP29
 uslyzUbmy1wgejG6CA63JtZ4MYdrf/OSMGuoN78qnOKYcIsWFzOvlJmBWWNW34qW
 LCaUF9QdJ/slXu6B4vIx30GfN9q4myml8bFUobE5q9mBRrEk4R0B7iyBvPu1xKYr
 ZEan67prI5VEu+afJGpp4r294m4HNVkMLfl3nYmE5+y4MoLeMNKDY3IPTvI9iP4G
 kaFgoPvQo23WnuclNYpJ+CaA4aRASlB2nTY+oAXIYfehbey9EW5vq4/EK864ek6w
 oyUTepxxNhE81tG2jpQbf2tR4COsEHy986clxqPP4AvsZXcbypCw8O2FcflpQbHO
 5DLEAfTmp7cziQ==
 =qyll
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates to the interrupt core and driver subsystems:

  Core changes:

   - The usual set of small fixes and improvements all over the place,
     but nothing stands out

  MSI changes:

   - Further consolidation of the PCI/MSI interrupt chip code

   - Make MSI sysfs code independent of PCI/MSI and expose the MSI
     interrupts of platform devices in the same way as PCI exposes them.

  Driver changes:

   - Support for ARM GICv3 EPPI partitions

   - Treewide conversion to generic_handle_domain_irq() for all chained
     interrupt controllers

   - Conversion to bitmap_zalloc() throughout the irq chip drivers

   - The usual set of small fixes and improvements"

* tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  platform-msi: Add ABI to show msi_irqs of platform devices
  genirq/msi: Move MSI sysfs handling from PCI to MSI core
  genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
  irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy
  irqdomain: Export irq_domain_disconnect_hierarchy()
  irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
  irqchip/apple-aic: Fix irq_disable from within irq handlers
  pinctrl/rockchip: drop the gpio related codes
  gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type
  gpio/rockchip: support next version gpio controller
  gpio/rockchip: use struct rockchip_gpio_regs for gpio controller
  gpio/rockchip: add driver for rockchip gpio
  dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank
  pinctrl/rockchip: add pinctrl device to gpio bank struct
  pinctrl/rockchip: separate struct rockchip_pin_bank to a head file
  pinctrl/rockchip: always enable clock for gpio controller
  genirq: Fix kernel doc indentation
  EDAC/altera: Convert to generic_handle_domain_irq()
  powerpc: Bulk conversion to generic_handle_domain_irq()
  nios2: Bulk conversion to generic_handle_domain_irq()
  ...
2021-08-30 14:38:37 -07:00
Linus Torvalds
4aed6ee53f regmap: Updates for v5.15
A few small fixes for regmaps this time, plus support for
 allowing drivers to select raw spinlocks for the locks in order
 to allow usage in interrutpt controllers.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmEsyXYTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0DigB/92NOc4pBVTGIRKL+zaM2YMVaH2wCzp
 +9B6A+W/9ZZEjaHPNHKeQJjcSGcC7bo8YmXgcH+eNUUG5PsGcNurxOpp4+hmIUKn
 QrhP+vqk9/fMh2cLsekkpcsYOmbRNaapNzhSCPW5MxQuVHohoE46dNjf74pqvO4d
 3SC9iJnTf78DCJsaDZRVDXf9d8mqaoEIdAVSOFwzT6afIrjG+rCyuzgPNV5G9DMC
 0haTCIoIVsWmhC9wc7r7KFyY6k/kyg+BzMX+opHujGo1uZZKWbnv7ca7bwKay2Jk
 Hvrf3dHWiLmHCH6hTwA2qFXMZu2qr1iJuBJ8KfvWlFULGpJ9ZlOhsvro
 =1pX9
 -----END PGP SIGNATURE-----

Merge tag 'regmap-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "A few small fixes for regmaps this time, plus support for allowing
  drivers to select raw spinlocks for the locks in order to allow usage
  in interrutpt controllers"

* tag 'regmap-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: teach regmap to use raw spinlocks if requested in the config
  regmap: allow const array for {devm_,}regmap_field_bulk_alloc reg_fields
  regmap: Prefer unsigned int to bare use of unsigned
  regmap: fix the offset of register error log
2021-08-30 11:33:34 -07:00
Rafael J. Wysocki
7c85154643 Merge branches 'acpi-numa', 'acpi-glue', 'acpi-config' and 'acpi-pmic'
* acpi-numa:
  ACPI: Add LoongArch support for ACPI_PROCESSOR/ACPI_NUMA

* acpi-glue:
  driver core: Split device_platform_notify()
  software nodes: Split software_node_notify()
  ACPI: glue: Eliminate acpi_platform_notify()
  ACPI: bus: Rename functions to avoid name collision
  ACPI: glue: Change return type of two functions to void
  ACPI: glue: Rearrange acpi_device_notify()

* acpi-config:
  ACPI: configfs: Make get_header() to return error pointer
  ACPI: configfs: Use sysfs_emit() in "show" functions

* acpi-pmic:
  ACPI / PMIC: XPower: optimize MIPI PMIQ sequence I2C-bus accesses
  ACPI / PMIC: XPower: optimize I2C-bus accesses
2021-08-30 19:30:37 +02:00
Rafael J. Wysocki
bc0d0b1dfe Merge back new PM domains material for v5.15. 2021-08-30 19:20:32 +02:00
Thara Gopinath
275157b367 cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
Add interrupt support to notify the kernel of h/w initiated frequency
throttling by LMh. Convey this to scheduler via thermal presssure
interface.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
[Viresh: Added changes for arch_topology.c to fix build errors ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-08-30 10:43:35 +05:30
Geert Uytterhoeven
656164181e PM: domains: Fix domain attach for CONFIG_PM_OPP=n
If CONFIG_PM_OPP=n, of_get_required_opp_performance_state() always
returns -EOPNOTSUPP, and all drivers for devices that are part of a PM
Domain fail to probe with:

    failed to set required performance state for power-domain foo: -95
    probe of bar failed with error -95

Fix this by treating -EOPNOTSUPP the same as -ENODEV.

Fixes: c016baf7dc ("PM: domains: Add support for 'required-opps' to set default perf state")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-27 20:48:23 +02:00
Dmitry Baryshkov
a649136b17 PM: runtime: add devm_pm_clk_create helper
A typical code pattern for pm_clk_create() call is to call it in the
_probe function and to call pm_clk_destroy() both from _probe error path
and from _remove function. For some drivers the whole remove function
would consist of the call to pm_remove_disable().

Add helper function to replace this bolierplate piece of code. Calling
devm_pm_clk_create() removes the need for calling pm_clk_destroy() both
in the probe()'s error path and in the remove() function.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210731195034.979084-3-dmitry.baryshkov@linaro.org
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-08-26 11:28:07 -07:00
Dmitry Baryshkov
b3636a3a2c PM: runtime: add devm_pm_runtime_enable helper
A typical code pattern for pm_runtime_enable() call is to call it in the
_probe function and to call pm_runtime_disable() both from _probe error
path and from _remove function. For some drivers the whole remove
function would consist of the call to pm_remove_disable().

Add helper function to replace this bolierplate piece of code. Calling
devm_pm_runtime_enable() removes the need for calling
pm_runtime_disable() both in the probe()'s error path and in the
remove() function.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210731195034.979084-2-dmitry.baryshkov@linaro.org
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-08-26 11:27:51 -07:00
Mark Brown
ca5537c9be
Merge remote-tracking branch 'regmap/for-5.15' into regmap-next 2021-08-26 13:45:27 +01:00
Mark Brown
d287801c49
Merge series "Use raw spinlocks in the ls-extirq driver" from Vladimir Oltean <vladimir.oltean@nxp.com>:
The ls-extirq irqchip driver accesses regmap inside its implementation
of the struct irq_chip :: irq_set_type method, and currently regmap
only knows to lock using normal spinlocks. But the method above wants
raw spinlock context, so this isn't going to work and triggers a
"[ BUG: Invalid wait context ]" splat.

The best we can do given the arrangement of the code is to patch regmap
and the syscon driver: regmap to support raw spinlocks, and syscon to
request them on behalf of its ls-extirq consumer.

Link: https://lore.kernel.org/lkml/20210825135438.ubcuxm5vctt6ne2q@skbuf/T/#u

Vladimir Oltean (2):
  regmap: teach regmap to use raw spinlocks if requested in the config
  mfd: syscon: request a regmap with raw spinlocks for some devices

 drivers/base/regmap/internal.h |  4 ++++
 drivers/base/regmap/regmap.c   | 35 +++++++++++++++++++++++++++++-----
 drivers/mfd/syscon.c           | 16 ++++++++++++++++
 include/linux/regmap.h         |  2 ++
 4 files changed, 52 insertions(+), 5 deletions(-)

--
2.25.1

base-commit: 6efb943b86
2021-08-26 13:40:35 +01:00
Vladimir Oltean
67021f25d9
regmap: teach regmap to use raw spinlocks if requested in the config
Some drivers might access regmap in a context where a raw spinlock is
held. An example is drivers/irqchip/irq-ls-extirq.c, which calls
regmap_update_bits() from struct irq_chip :: irq_set_type, which is a
method called by __irq_set_trigger() under the desc->lock raw spin lock.

Since desc->lock is a raw spin lock and the regmap internal lock for
mmio is a plain spinlock (which can become sleepable on RT), this is an
invalid locking scheme and we get a splat stating that this is a
"[ BUG: Invalid wait context ]".

It seems reasonable for regmap to have an option use a raw spinlock too,
so add that in the config such that drivers can request it.

Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210825205041.927788-2-vladimir.oltean@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26 12:07:32 +01:00
Dmitry Osipenko
3c5a272202 PM: domains: Improve runtime PM performance state handling
GENPD core doesn't support handling performance state changes while
consumer device is runtime-suspended or when runtime PM is disabled.
GENPD core may override performance state that was configured by device
driver while RPM of the device was disabled or device was RPM-suspended.

Let's close that gap by allowing drivers to control performance state
while RPM of a consumer device is disabled and to set up performance
state of RPM-suspended device that will be applied by GENPD core on
RPM-resume of the device.

Fixes: 5937c3ce21 ("PM: domains: Drop/restore performance state votes for devices at runtime PM")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25 20:15:54 +02:00
Barry Song
00ed1401a0 platform-msi: Add ABI to show msi_irqs of platform devices
PCI devices expose the associated MSI interrupts via sysfs, but platform
devices which utilize MSI interrupts do not. This information is important
for user space tools to optimize affinity settings.

Utilize the generic MSI sysfs facility to expose this information for
platform MSI.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210813035628.6844-3-21cnbao@gmail.com
2021-08-24 09:16:20 +02:00
Heikki Krogerus
bd1e336aa8 driver core: platform: Remove platform_device_add_properties()
There are no more users for it. The last place where it's
called is in platform_device_register_full(). Replacing that
call with device_create_managed_software_node() and
removing the function.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20210817102449.39994-3-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18 16:08:11 +02:00
Rajendra Nayak
c016baf7dc PM: domains: Add support for 'required-opps' to set default perf state
Some devices within power domains with performance states do not
support DVFS, but still need to vote on a default/static state
while they are active. They can express this using the 'required-opps'
property in device tree, which points to the phandle of the OPP
supported by the corresponding power-domains.

Add support to parse this information from DT and then set the
specified performance state during attach and drop it on detach.
runtime suspend/resume callbacks already have logic to drop/set
the vote as needed and should take care of dropping the default
perf state vote on runtime suspend and restore it back on runtime
resume.

Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:45:29 +02:00
Daniel Scally
a908877056 Revert "media: device property: Call fwnode_graph_get_endpoint_by_id() for fwnode->secondary"
This reverts commit acd418bfcf. Checking for
endpoints against fwnode->secondary in fwnode_graph_get_next_endpoint() is
a better way to do this since that function is also used in a bunch of
other places, for instance sensor drivers checking that they do have an
endpoint connected during probe.

This reversion depends on the previous patch in this series, "device property:
Check fwnode->secondary in fwnode_graph_get_next_endpoint()".

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:35:42 +02:00
Daniel Scally
b5b41ab6b0 device property: Check fwnode->secondary in fwnode_graph_get_next_endpoint()
Sensor drivers often check for an endpoint to make sure that they're
connected to a consuming device like a CIO2 during .probe(). Some of
those endpoints might be in the form of software_nodes assigned as
a secondary to the device's fwnode_handle. Account for this possibility
in fwnode_graph_get_next_endpoint() to avoid having to do it in the
sensor drivers themselves.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:35:41 +02:00
Tian Tao
75bd50fa84 drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
Reading /sys/devices/system/cpu/cpuX/nodeX/ returns cpumap and cpulist.
However, the size of this file is limited to PAGE_SIZE because of the
limitation for sysfs attribute.

This patch moves to use bin_attribute to extend the ABI to be more
than one page so that cpumap bitmask and list won't be potentially
trimmed.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lore.kernel.org/r/20210806110251.560-5-song.bao.hua@hisilicon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13 10:27:49 +02:00
Tian Tao
bb9ec13d15 topology: use bin_attribute to break the size limitation of cpumap ABI
Reading /sys/devices/system/cpu/cpuX/topology/ returns cpu topology.
However, the size of this file is limited to PAGE_SIZE because of
the limitation for sysfs attribute.
This patch moves to use bin_attribute to extend the ABI to be more
than one page so that cpumap bitmask and list won't be potentially
trimmed.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lore.kernel.org/r/20210806110251.560-4-song.bao.hua@hisilicon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13 10:27:49 +02:00
Thomas Gleixner
77e89afc25 PCI/MSI: Protect msi_desc::masked for multi-MSI
Multi-MSI uses a single MSI descriptor and there is a single mask register
when the device supports per vector masking. To avoid reading back the mask
register the value is cached in the MSI descriptor and updates are done by
clearing and setting bits in the cache and writing it to the device.

But nothing protects msi_desc::masked and the mask register from being
modified concurrently on two different CPUs for two different Linux
interrupts which belong to the same multi-MSI descriptor.

Add a lock to struct device and protect any operation on the mask and the
mask register with it.

This makes the update of msi_desc::masked unconditional, but there is no
place which requires a modification of the hardware register without
updating the masked cache.

msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
up changes.

The problem goes way back to the initial support of multi-MSI, but picking
the commit which introduced the mask cache is a valid cut off point
(2.6.30).

Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
2021-08-10 10:59:20 +02:00
Adrian Hunter
bf25967ac5 scsi: ufshcd: Fix device links when BOOT WLUN fails to probe
Managed device links are deleted by device_del(). However it is possible to
add a device link to a consumer before device_add(), and then discovering
an error prevents the device from being used. In that case normally
references to the device would be dropped and the device would be deleted.
However the device link holds a reference to the device, so the device link
and device remain indefinitely (unless the supplier is deleted).

For UFSHCD, if a LUN fails to probe (e.g. absent BOOT WLUN), the device
will not have been registered but can still have a device link holding a
reference to the device. The unwanted device link will prevent runtime
suspend indefinitely.

Amend device link removal to accept removal of a link with an unregistered
consumer device (suggested by Rafael), and fix UFSHCD by explicitly
deleting the device link when SCSI destroys the SCSI device.

Link: https://lore.kernel.org/r/a1c9bac8-b560-b662-f0aa-58c7e000cbbd@intel.com
Fixes: b294ff3e34 ("scsi: ufs: core: Enable power management for wlun")
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-09 23:19:26 -04:00
Greg Kroah-Hartman
bd935a7b21 Merge 5.14-rc5 into driver-core-next
We need the driver core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09 09:03:47 +02:00
Icenowy Zheng
29c34975c9
regmap: allow const array for {devm_,}regmap_field_bulk_alloc reg_fields
The reg_fields array fed to {devm_}regmap_field_bulk_alloc is currently
not const, which is not correct on semantics (the functions shouldn't
change reg_field contents) and prevents pre-defined const reg_field
array to be used.

As the implementation of this function doesn't change the content of it,
just add const to its prototype.

Signed-off-by: Icenowy Zheng <icenowy@sipeed.com>
Link: https://lore.kernel.org/r/20210802063741.76301-1-icenowy@sipeed.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-02 12:21:32 +01:00
Anirudh Rayabharam
75d95e2e39 firmware_loader: fix use-after-free in firmware_fallback_sysfs
This use-after-free happens when a fw_priv object has been freed but
hasn't been removed from the pending list (pending_fw_head). The next
time fw_load_sysfs_fallback tries to insert into the list, it ends up
accessing the pending_list member of the previously freed fw_priv.

The root cause here is that all code paths that abort the fw load
don't delete it from the pending list. For example:

        _request_firmware()
          -> fw_abort_batch_reqs()
              -> fw_state_aborted()

To fix this, delete the fw_priv from the list in __fw_set_state() if
the new state is DONE or ABORTED. This way, all aborts will remove
the fw_priv from the list. Accordingly, remove calls to list_del_init
that were being made before calling fw_state_(aborted|done).

Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
list if it is already aborted. Instead, just jump out and return early.

Fixes: bcfbd3523f ("firmware: fix a double abort case with fw_load_sysfs_fallback")
Cc: stable <stable@vger.kernel.org>
Reported-by: syzbot+de271708674e2093097b@syzkaller.appspotmail.com
Tested-by: syzbot+de271708674e2093097b@syzkaller.appspotmail.com
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20210728085107.4141-3-mail@anirudhrb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-29 17:22:15 +02:00
Anirudh Rayabharam
0d6434e10b firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs_fallback
The only motivation for using -EAGAIN in commit 0542ad88fb
("firmware loader: Fix _request_firmware_load() return val for fw load
abort") was to distinguish the error from -ENOMEM, and so there is no
real reason in keeping it. -EAGAIN is typically used to tell the
userspace to try something again and in this case re-using the sysfs
loading interface cannot be retried when a timeout happens, so the
return value is also bogus.

-ETIMEDOUT is received when the wait times out and returning that
is much more telling of what the reason for the failure was. So, just
propagate that instead of returning -EAGAIN.

Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210728085107.4141-2-mail@anirudhrb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-29 17:22:11 +02:00
Filip Schauer
4d1014c181 drivers core: Fix oops when driver probe fails
dma_range_map is freed to early, which might cause an oops when
a driver probe fails.
 Call trace:
  is_free_buddy_page+0xe4/0x1d4
  __free_pages+0x2c/0x88
  dma_free_contiguous+0x64/0x80
  dma_direct_free+0x38/0xb4
  dma_free_attrs+0x88/0xa0
  dmam_release+0x28/0x34
  release_nodes+0x78/0x8c
  devres_release_all+0xa8/0x110
  really_probe+0x118/0x2d0
  __driver_probe_device+0xc8/0xe0
  driver_probe_device+0x54/0xec
  __driver_attach+0xe0/0xf0
  bus_for_each_dev+0x7c/0xc8
  driver_attach+0x30/0x3c
  bus_add_driver+0x17c/0x1c4
  driver_register+0xc0/0xf8
  __platform_driver_register+0x34/0x40
  ...

This issue is introduced by commit d0243bbd5d ("drivers core:
Free dma_range_map when driver probe failed"). It frees
dma_range_map before the call to devres_release_all, which is too
early. The solution is to free dma_range_map only after
devres_release_all.

Fixes: d0243bbd5d ("drivers core: Free dma_range_map when driver probe failed")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Filip Schauer <filip@mg6.at>
Link: https://lore.kernel.org/r/20210727112311.GA7645@DESKTOP-E8BN1B0.localdomain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27 14:44:43 +02:00
Greg Kroah-Hartman
bdac4d8abb Merge 5.14-rc3 into driver-core-next
We need the driver-core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27 09:22:08 +02:00
Will Deacon
463e862ac6 swiotlb: Convert io_default_tlb_mem to static allocation
Since commit 69031f5008 ("swiotlb: Set dev->dma_io_tlb_mem to the
swiotlb pool used"), 'struct device' may hold a copy of the global
'io_default_tlb_mem' pointer if the device is using swiotlb for DMA. A
subsequent call to swiotlb_exit() will therefore leave dangling pointers
behind in these device structures, resulting in KASAN splats such as:

  |  BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0
  |  Read of size 8 at addr ffff8881d7830000 by task swapper/0/0
  |
  |  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1
  |  Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020
  |  Call Trace:
  |   <IRQ>
  |   dump_stack+0x9c/0xcf
  |   print_address_description.constprop.0+0x18/0x130
  |   kasan_report.cold+0x7f/0x111
  |   __iommu_dma_unmap_swiotlb+0x64/0xb0
  |   nvme_pci_complete_rq+0x73/0x130
  |   blk_complete_reqs+0x6f/0x80
  |   __do_softirq+0xfc/0x3be

Convert 'io_default_tlb_mem' to a static structure, so that the
per-device pointers remain valid after swiotlb_exit() has been invoked.
All users are updated to reference the static structure directly, using
the 'nslabs' field to determine whether swiotlb has been initialised.
The 'slots' array is still allocated dynamically and referenced via a
pointer rather than a flexible array member.

Cc: Claire Chang <tientzu@chromium.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fixes: 69031f5008 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Claire Chang <tientzu@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2021-07-23 20:14:43 -04:00
Jinchao Wang
e7deeb9d79 driver: base: Prefer unsigned int to bare use of unsigned
Fix checkpatch warnings:
    WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Signed-off-by: Jinchao Wang <wjc@cdjrlc.com>
Link: https://lore.kernel.org/r/20210628171907.63646-2-wjc@cdjrlc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 17:30:09 +02:00
Xiongfeng Wang
e022eac85e cacheinfo: clear cache_leaves(cpu) in free_cache_attributes()
On ARM64, when PPTT(Processor Properties Topology Table) is not
implemented in ACPI boot, we will goto 'free_ci' with the following
print:
  Unable to detect cache hierarchy for CPU 0

But some other codes may still use 'num_leaves' to iterate through the
'info_list', such as get_cpu_cacheinfo_id(). If 'info_list' is NULL , it
would crash. So clear 'num_leaves' in free_cache_attributes().

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/1626226375-58730-1-git-send-email-wangxiongfeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 17:29:40 +02:00
Adrian Hunter
e64daad660 driver core: Prevent warning when removing a device link from unregistered consumer
sysfs_remove_link() causes a warning if the parent directory does not
exist. That can happen if the device link consumer has not been registered.
So do not attempt sysfs_remove_link() in that case.

Fixes: 287905e68d ("driver core: Expose device link details in sysfs")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # 5.9+
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210716114408.17320-2-adrian.hunter@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 17:28:42 +02:00
Peter Ujfalusi
4afa0c22ee driver core: auxiliary bus: Fix memory leak when driver_register() fail
If driver_register() returns with error we need to free the memory
allocated for auxdrv->driver.name before returning from
__auxiliary_driver_register()

Fixes: 7de3697e9c ("Add auxiliary bus support")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210713093438.3173-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 16:36:06 +02:00
Zhen Lei
f04948dea2 driver core: Fix error return code in really_probe()
In the case of error handling, the error code returned by the subfunction
should be propagated instead of 0.

Fixes: 1901fb2604 ("Driver core: fix "driver" symlink timing")
Fixes: 23b6904442 ("driver core: add dev_groups to all drivers")
Fixes: 8fd456ec0c ("driver core: Add state_synced sysfs file for devices that support it")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210707074301.2722-1-thunder.leizhen@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 16:16:23 +02:00
Zhen Lei
3ecc8cb7c0 firmware: fix theoretical UAF race with firmware cache and resume
This race was discovered when I carefully analyzed the code to locate
another firmware-related UAF issue. It can be triggered only when the
firmware load operation is executed during suspend. This possibility is
almost impossible because there are few firmware load and suspend actions
in the actual environment.

		CPU0			CPU1
__device_uncache_fw_images():		assign_fw():
					fw_cache_piggyback_on_request()
					<----- P0
	spin_lock(&fwc->name_lock);
	...
	list_del(&fce->list);
	spin_unlock(&fwc->name_lock);

	uncache_firmware(fce->name);
					<----- P1
					kref_get(&fw_priv->ref);

If CPU1 is interrupted at position P0, the new 'fce' has been added to the
list fwc->fw_names by the fw_cache_piggyback_on_request(). In this case,
CPU0 executes __device_uncache_fw_images() and will be able to see it when
it traverses list fwc->fw_names. Before CPU1 executes kref_get() at P1, if
CPU0 further executes uncache_firmware(), the count of fw_priv->ref may
decrease to 0, causing fw_priv to be released in advance.

Move kref_get() to the lock protection range of fwc->name_lock to fix it.

Fixes: ac39b3ea73 ("firmware loader: let caching firmware piggyback on loading firmware")
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210719064531.3733-2-thunder.leizhen@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 16:11:42 +02:00
Jinchao Wang
16b0dd4092 driver: base: Replace symbolic permissions with octal permissions
Resolve following checkpatch issue,
Replace symbolic permissions with octal permissions

Signed-off-by: Jinchao Wang <wjc@cdjrlc.com>
Link: https://lore.kernel.org/r/20210626094606.53152-1-wjc@cdjrlc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 14:56:00 +02:00
Uwe Kleine-König
fc7a6209d5 bus: Make remove callback return void
The driver core ignores the return value of this callback because there
is only little it can do when a device disappears.

This is the final bit of a long lasting cleanup quest where several
buses were converted to also return void from their remove callback.
Additionally some resource leaks were fixed that were caused by drivers
returning an error code in the expectation that the driver won't go
away.

With struct bus_type::remove returning void it's prevented that newly
implemented buses return an ignored error code and so don't anticipate
wrong expectations for driver authors.

Reviewed-by: Tom Rix <trix@redhat.com> (For fpga)
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com> (For drivers/s390 and drivers/vfio)
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> (For ARM, Amba and related parts)
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Chen-Yu Tsai <wens@csie.org> (for sunxi-rsb)
Acked-by: Pali Rohár <pali@kernel.org>
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> (for media)
Acked-by: Hans de Goede <hdegoede@redhat.com> (For drivers/platform)
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-By: Vinod Koul <vkoul@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com> (For xen)
Acked-by: Lee Jones <lee.jones@linaro.org> (For mfd)
Acked-by: Johannes Thumshirn <jth@kernel.org> (For mcb)
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> (For slimbus)
Acked-by: Kirti Wankhede <kwankhede@nvidia.com> (For vfio)
Acked-by: Maximilian Luz <luzmaximilian@gmail.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> (For ulpi and typec)
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (For ipack)
Acked-by: Geoff Levand <geoff@infradead.org> (For ps3)
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com> (For thunderbolt)
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> (For intel_th)
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> (For pcmcia)
Acked-by: Rafael J. Wysocki <rafael@kernel.org> (For ACPI)
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> (rpmsg and apr)
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> (For intel-ish-hid)
Acked-by: Dan Williams <dan.j.williams@intel.com> (For CXL, DAX, and NVDIMM)
Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> (For isa)
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (For firewire)
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> (For hid)
Acked-by: Thorsten Scherer <t.scherer@eckelmann.de> (For siox)
Acked-by: Sven Van Asbroeck <TheSven73@gmail.com> (For anybuss)
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> (For MMC)
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210713193522.1770306-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 11:53:42 +02:00
Chris Down
ad7d61f159 printk: index: Add indexing support to dev_printk
While for most kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably be used to indicate issues,
in order to react to production issues quickly we sometimes need to work
with the interface which most kernel developers naturally use when
developing: printk, and printk-esques like dev_printk.

dev_printk is by far the most likely custom subsystem printk to benefit
from the printk indexing infrastructure, since niche device issues
brought about by production changes, firmware upgrades, and the like are
one of the most common things that we need printk infrastructure's
assistance to monitor.

Often these errors were never expected to practically manifest in
reality, and exhibit in code without extensive (or any) metrics present.
As such, there are typically very few options for issue detection
available to those with large fleets at the time the incident happens,
and we thus benefit strongly from monitoring netconsole in these
instances.

As such, add the infrastructure for dev_printk to be indexed in the
printk index. Even on a minimal kernel config, the coverage of the base
kernel's printk index is significantly improved:

Before:

    [root@ktst ~]# wc -l /sys/kernel/debug/printk/index/vmlinux
    4497 /sys/kernel/debug/printk/index/vmlinux

After:

    [root@ktst ~]# wc -l /sys/kernel/debug/printk/index/vmlinux
    5573 /sys/kernel/debug/printk/index/vmlinux

In terms of implementation, in order to trivially disambiguate them,
dev_printk is now a macro which wraps _dev_printk.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/959c7aed1017cb2c9de922e0a820d397e29c6a5a.1623775748.git.chris@chrisdown.name
2021-07-19 12:13:06 +02:00