We want ppc64 to be able to select between optimised assembly
checksum routines in big endian and the generic lib/checksum.c
routines in little endian.
The lpfc driver is forcing CONFIG_GENERIC_CSUM on which means
we are unable to make the decision to enable it in the arch
Kconfig. If the option exists it is always forced on.
This got introduced in 3.10 via commit 6a7252fdb0 ([SCSI] lpfc:
fix up Kconfig dependencies). I spoke to Randy about it and
the original issue was with CRC_T10DIF not being defined.
As such, remove the select of CONFIG_GENERIC_CSUM.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
commit 385904f819 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
It's not allowed to clear masks of a cpuset if there're tasks in it,
but it's broken:
# mkdir /cgroup/sub
# echo 0 > /cgroup/sub/cpuset.cpus
# echo 0 > /cgroup/sub/cpuset.mems
# echo $$ > /cgroup/sub/tasks
# echo > /cgroup/sub/cpuset.cpus
(should fail)
This bug was introduced by commit 88fa523bff
("cpuset: allow to move tasks to empty cpusets").
tj: Dropped temp bool variables and nestes the conditionals directly.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes DT parsing and uses cpu->of_node instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Cc: Jason Cooper <jason@lakedaemon.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Cc: Deepak Sikri <sikrid@qti.qualcomm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Cc: Mark Langsdorf <mark.langsdorf@calxeda.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Now that the cpu device registration initialises the of_node(if available)
appropriately for all the cpus, parsing here is redundant.
This patch removes all DT parsing and uses cpu->of_node instead.
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Since the CPU device nodes can be retrieved using arch_of_get_cpu_node,
we can use it to avoid parsing the cpus node searching the cpu nodes and
mapping to logical index.
This patch removes parsing DT for cpu nodes by using of_get_cpu_node.
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Currently set_secondary_cpus_clock assume the CPU logical ordering
and the MPDIR in DT are same, which is incorrect.
Since the CPU device nodes can be retrieved in the logical ordering
using the DT helper, we can remove the devices tree parsing.
This patch removes DT parsing by making use of of_get_cpu_node.
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Jason Cooper <jason@lakedaemon.net>
Acked-by: Gregory Clement <gregory.clement@free-electrons.com>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Currently the topology code computes cpu capacity and stores it in
the list along with hwid(which is MPIDR) as it parses the CPU nodes
in the device tree. This is required as it needs to be mapped to the
logical CPU later.
Since the CPU device nodes can be retrieved in the logical ordering
using DT/OF helpers, its possible to store cpu_capacity also in logical
ordering and avoid storing hwid for each entry.
This patch removes hwid by making use of of_get_cpu_node.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Multiple drivers need to get the cpu device node from the cpu logical
index and then access the of_node.
This patch adds helper function to fetch the device node directly.
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
CPUs are also registered as devices but the of_node in these cpu
devices are not initialized. Currently different drivers requiring
to access cpu device node are parsing the nodes themselves and
initialising the of_node in cpu device.
The of_node in all the cpu devices needs to be initialized properly
and at one place. The best place to update this is CPU subsystem
driver when registering the cpu devices.
The OF/DT core library now provides of_get_cpu_node to retrieve a cpu
device node for a given logical index by abstracting the architecture
specific details.
This patch uses of_get_cpu_node to assign of_node when registering the
cpu devices.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
OF/DT core library now provides architecture specific hook to match the
logical cpu index with the corresponding physical identifier. Most of the
cpu DT node parsing and initialisation is contained in devtree.c. So it's
better to define ARM specific arch_match_cpu_phys_id there.
This mainly helps to avoid replication of the code doing CPU node parsing
and physical(MPIDR) to logical mapping.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
This patch moves the generalized implementation of of_get_cpu_node from
PowerPC to DT core library, thereby adding support for retrieving cpu
node for a given logical cpu index on any architecture.
The CPU subsystem can now use this function to assign of_node in the
cpu device while registering CPUs.
It is recommended to use these helper function only in pre-SMP/early
initialisation stages to retrieve CPU device node pointers in logical
ordering. Once the cpu devices are registered, it can be retrieved easily
from cpu device of_node which avoids unnecessary parsing and matching.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Currently different drivers requiring to access cpu device node are
parsing the device tree themselves. Since the ordering in the DT need
not match the logical cpu ordering, the parsing logic needs to consider
that. However, this has resulted in lots of code duplication and in some
cases even incorrect logic.
It's better to consolidate them by adding support for getting cpu
device node for a given logical cpu index in DT core library. However
logical to physical index mapping can be architecture specific.
PowerPC has it's own implementation to get the cpu node for a given
logical index.
This patch refactors the current implementation of of_get_cpu_node.
This in preparation to move the implementation to DT core library.
It separates out the logical to physical mapping so that a default
matching of the physical id to the logical cpu index can be added
when moved to common code. Architecture specific code can override it.
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Grant Likely <grant.likely@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
This patch removes the declaration of the function 'of_get_cpu_node'
which is not defined for openrisc. This is in preparation to move
it's definition from PPC to DT common code.
Again it could be there as it was originally copied from powerpc.
Acked-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
This patch removes the declaration of the function 'of_get_cpu_node'
which is not defined for microblaze. This is in preparation to move
it's definition from PPC to DT common code.
Michal Simek says: "it was just there because Microblaze
was based on powerpc code"
Acked-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
The VLAN code needs to know the length of the per-port VLAN bitmap to
perform its most basic operations (retrieving VLAN informations, removing
VLANs, forwarding database manipulation, etc). Unfortunately, in the
current implementation we are using a macro that indicates the bitmap
size in longs in places where the size in bits is expected, which in
some cases can cause what appear to be random failures.
Use the correct macro.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 77145f1cbd was introduced
error which cause that reclocking on nv40 not working anymore.
There is missing assigment of return value from pll_calc to ret.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allocating type=0 marks the memory as free. This allows the ltcg memory
to be allocated twice.
Add a BUG_ON in core/mm.c to prevent this ever happening again.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some registers were not initialized in init, this causes them to be
uninitialized after suspend.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit dceef5d87 (drm/nouveau/fb: initialise vram controller as pfb
sub-object) moved some code around and introduced these null derefs.
pfb->ram is set to the new ram object outside of this ctor.
Reported-by: Ronald Uitermark <ronald645@gmail.com>
Tested-by: Ronald Uitermark <ronald645@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
John W. Linville says:
====================
Regarding the iwlwifi bits, Johannes says:
"We revert an rfkill bugfix that unfortunately caused more bugs, shuffle
some code to avoid touching the PCIe device before it's enabled and
disconnect if firmware fails to do our bidding. I also have Stanislaw's
fix to not crash in some channel switch scenarios."
As for the mac80211 bits, Johannes says:
"This time, I have one fix from Dan Carpenter for users of
nl80211hdr_put(), and one fix from myself fixing a regression with the
libertas driver."
Along with the above...
Dan Carpenter fixes some incorrectly placed "address of" operators
in hostap that caused copying of junk data.
Jussi Kivilinna corrects zd1201 to use an allocated buffer rather
than the stack for a URB operation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit
ee80fbf301 ("packet: account statistics only in tpacket_stats_u")
cleaned up the getsockopt PACKET_STATISTICS code.
This also changed semantics. Historically, tp_packets included
tp_drops on return. The commit removed the line that adds tp_drops
into tp_packets.
This patch reinstates the old semantics.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to fix a problem in the rtl8211 where the driver
wasn't properly enabled the interrupt on link change status.
it has to enable the ineterrupt on the bit 10 in the register 18
(INER).
Reported-by: Sharma Bhupesh <B45370@freescale.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Check if the skb has been correctly prepared before going on
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABCAAGBQJSEkvHAAoJEADl0hg6qKeOyTQP/ifIXk5t26Tu8GTCH+lQnF36
1HY4nkEhLBkrEaKv0RXXEwLCDe1Gk8INewSXhtgDe7v696287zvxSDiftxXOwSn8
EkrP3jakxqNgyEstVUMxXuHQMxn8YsOnU+u4L4MZvcsWNmh1V8FzNLxPDWF3Z0bi
ycXhFI+BR+waWzFd8rVZ5sJ00ZhgSuM5vJ/uQ28kT8DyDZXz0I0mvve7ZUh5fczc
L7vvnju9VRq84RxV6bQwf9hXDk54fCLz22WSMolrqaHCl0XF4OAu6OVcYBLA0bp7
GUU7fS8IUiqAuC02FS5HEYPy1VErCok8hP/fzvjz8Bxuzz0I5SdYPurFTZQzAx73
U0GCLtNOE7zkwIsRbKhMdUcB6DoFZJVUaLo8YS9E1tl/nn1oRILFGTFb2T9/WzQ8
lbCkmYm+WhLdKeZbLkf8PPs9PDrhRQKr/QRHrCHVKh4rqzP1BUm4FqIBXo71Kiiz
no4GJoern8vW0CzoR59P5++/iFOCVTIx4ZJWvnYjWbqsYRazKjjCFtHffpz6mz+1
pHlrdYAZo+DOvme/2putfe6ViR+bA3lPxPkM7k3gADMifJcCAl3D7OM53QaDSKAk
Gw3yHafxBaFPXFdqxQkkC6ks7T6qoTsPI2lLqUG6srU3XA399bWVcLq7X9JEcR+9
ODwzHPj9fD5CZCe22lIO
=gllO
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- Check if the skb has been correctly prepared before going on
Do not clear Broadcast/Multicast/Unicast Wake Flag or LanWake in
Config5. This is necessary to preserve WOL state when the driver is
loaded. Although the r8168 vendor driver does not write Config5 (it has
been commented out), Hayes Wang from Realtek said that masking bits like
this is more sensible.
Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bad things happen if ACPI hotplug events are handled during system
PM transitions, especially if devices are removed as a result.
To prevent those bad things from happening, acquire acpi_scan_lock
when a PM transition is started and release it when that transition
is complete or has been aborted.
This fixes resume lockup on my test-bed Acer Aspire S5 that happens
when Thunderbolt devices are disconnected from the machine while
suspended.
Also fixes the analogous problem for Mika Westerberg on an
Intel DZ77RE-75K board.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
If via_ircc_open() fails, data structures of the driver left uninitialized,
but probe (via_init_one()) returns zero. That can lead to null pointer dereference
in via_remove_one(), since it does not check drvdata for NULL.
The patch implements proper error code propagation.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user turns off VNET_HDR support on the
macvtap device, there is no way to provide any
offload information to the user. So, it's safer
to ignore offload setting then depend on the user
setting them correctly.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user turns off IFF_VNET_HDR flag, attempts to change
offload features via TUNSETOFFLOAD do not work. This could cause
GSO packets to be delivered to the user when the user is
not prepared to handle them.
To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is
disabled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In macvtap, tap_features specific the features of that the user
has specified via ioctl(). If we treat macvtap as a macvlan+tap
then we could all the tap a pseudo-device and give it other features
like SG and GSO. Then we can stop using the features of lower
device (macvlan) when forwarding the traffic the tap.
This solves the issue of possible checksum offload mismatch between
tap feature and macvlan features.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent.
The "when" field must be set for such skb. It's used in tcp_rearm_rto
for example. If the "when" field isn't set, the retransmit timeout can
be calculated incorrectly and a tcp connected can stop for two minutes
(TCP_RTO_MAX).
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The branch emulation needs to handle the OCTEON BBIT instructions,
otherwise we get SIGILL instead of emulation.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5726/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
kernel BUG at drivers/xen/events.c:1328!
RCU has detected that a CPU has not entered a quiescent state within the
grace period. It needs to send the CPU a reschedule IPI if it is not
offline. rcu_implicit_offline_qs() does this check:
/*
* If the CPU is offline, it is in a quiescent state. We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}
Else the CPU is online. Send it a reschedule IPI.
The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()). See start_secondary():
set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:
/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();
/* enable local interrupts */
local_irq_enable();
The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug. In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.
The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:
kernel BUG at drivers/xen/events.c:1328!
xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.
There is at least one other place that has caused the same crash:
xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()
clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:
/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).
But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."
The conclusion was that:
"
1. At the IPI sender side:
It is incorrect to send an IPI to an offline CPU (cpu not present in
the cpu_online_mask). There are numerous places where we check this
and warn/complain.
2. At the IPI receiver side:
It is incorrect to let the world know of our presence (by setting
ourselves in global bitmasks) until our initialization steps are complete
to such an extent that we can handle the consequences (such as
receiving interrupts without crashing the sender etc.)
" (from Srivatsa)
As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.
It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.
This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.
Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When a event is being bound to a VCPU there is a window between the
EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
where an event may be lost. The hypervisor upcalls the new VCPU but
the kernel thinks that event is still bound to the old VCPU and
ignores it.
There is even a problem when the event is being bound to the same VCPU
as there is a small window beween the clear_bit() and set_bit() calls
in bind_evtchn_to_cpu(). When scanning for pending events, the kernel
may read the bit when it is momentarily clear and ignore the event.
Avoid this by masking the event during the whole bind operation.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: stable@vger.kernel.org
The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
resulting in only the first 64 (or 32 in 32-bit guests) ports having
their bindings being initialized to VCPU 0.
In most cases this does not cause a problem as request_irq() will set
the irq affinity which will set the correct local per-cpu mask.
However, if the request_irq() is called on a VCPU other than 0, there
is a window between the unmasking of the event and the affinity being
set were an event may be lost because it is not locally unmasked on
any VCPU. If request_irq() is called on VCPU 0 then local irqs are
disabled during the window and the race does not occur.
Fix this by initializing all NR_EVENT_CHANNEL bits in the local
per-cpu masks.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org
If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.
There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.
We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.
This fixes a boot failure on hosts with tboot.
tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.
(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 0000000000100000 - 0000000000800000 (usable)
(XEN) 0000000000800000 - 0000000000972000 (unusable)
tboot marked this region as unusable.
(XEN) 0000000000972000 - 00000000cf200000 (usable)
(XEN) 00000000cf200000 - 00000000cf38f000 (reserved)
(XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data)
(XEN) 00000000cf3ce000 - 00000000d0000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fe000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000630000000 (usable)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To iterate over all policies we currently iterate over all online
CPUs and then get the policy for each of them which is suboptimal.
Use the newly created cpufreq_policy_list for this purpose instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_policy_cpu per-cpu variables are used for storing the ID of
the CPU that manages the given CPU's policy. However, we also store
a policy pointer for each cpu in cpufreq_cpu_data, so the
cpufreq_policy_cpu information is simply redundant.
It is better to use cpufreq_cpu_data to retrieve a policy and get
policy->cpu from there, so make that happen everywhere and drop the
cpufreq_policy_cpu per-cpu variables which aren't necessary any more.
[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put
governor module as we are sure event can only be START/STOP here.
Remove the useless check.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_policy_list is a list of active policies. We do remove
policies from this list when all CPUs belonging to that policy are
removed. But during system suspend we don't really free a policy
struct as it will be used again during resume, so we didn't remove
it from cpufreq_policy_list as well..
However, this is incorrect. We are saying this policy isn't valid
anymore and must not be referenced (though we haven't freed it), but
it can still be used by code that iterates over cpufreq_policy_list.
Remove policy from this list during system suspend as well.
Of course, we must add it back whenever the first CPU belonging to
that policy shows up.
[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>