linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-03 15:47:36 +00:00

Author	SHA1	Message	Date
Glauber Costa	9ddabbe72e	KVM: KVM Steal time guest/host interface To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:03 +03:00
Paul Mackerras	aa04b4cc5b	KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	371fefd6f2	KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
David Gibson	54738c0971	KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:56 +03:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
Scott Wood	a4cd8b23ac	KVM: PPC: e500: enable magic page This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:37 +03:00
Avi Kivity	411c588dfb	KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:26 +03:00
Jan Kiszka	58f0964ee4	KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentation The documented behavior did not match the implemented one (which also never changed). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:19 +03:00
Jan Kiszka	91e3d71db2	KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation Neither host_irq nor the guest_msi struct are used anymore today. Tag the former, drop the latter to avoid confusion. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:16 +03:00
Jan Kiszka	7f4382e8fd	KVM: Fixup documentation section numbering Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:10 +03:00
Sasha Levin	55399a02e9	KVM: Document KVM_IOEVENTFD Document KVM_IOEVENTFD that can be used to receive notifications of PIO/MMIO events without triggering an exit. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:56 +03:00
Nadav Har'El	823e396558	KVM: nVMX: Documentation This patch includes a brief introduction to the nested vmx feature in the Documentation/kvm directory. The document also includes a copy of the vmcs12 structure, as requested by Avi Kivity. [marcelo: move to Documentation/virtual/kvm] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:22 +03:00
Avi Kivity	e76779339b	KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:55 +03:00
Andrea Righi	9b61fc4cf3	Documentation: fix cgroup blkio throttle filenames All the blkio.throttle.* file names are incorrectly reported without ".throttle" in the documentation. Fix it. Signed-off-by: Andrea Righi <andrea@betterlinux.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 13:17:51 -07:00
Jesper Juhl	316b379988	Documentation: update CodingStyle memory allocators The list of available general purpose memory allocators in Documentation/CodingStyle chapter 14 is incomplete. This patch adds the missing vzalloc() to the list. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 13:17:51 -07:00
Linus Torvalds	ba466c74d9	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Update documentation regarding driver removal PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist.	2011-07-03 13:33:16 -07:00
Linus Torvalds	b775c38925	Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: hwmon: (k10temp) Update documentation for Fam12h hwmon-vid: Fix typo in VIA CPU name hwmon: (f71882fg) Add support for the F71869A hwmon: Use <> rather than () around my e-mail address hwmon: (emc6w201) Properly handle all errors	2011-07-03 11:12:06 -07:00
Clemens Ladisch	af75d5b771	hwmon: (k10temp) Update documentation for Fam12h Add some CPU series IDs and links to the Fam12h datasheets. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2011-07-03 13:32:54 +02:00
Hans de Goede	5da556e33f	hwmon: (f71882fg) Add support for the F71869A The F71869A is almost the same as the F71869F/E, except that it has the normal number of temp and pwm zones for a F71882FG derived chip, rather then the limited number of the F71869F/E. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Max Baldwin <archerseven@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2011-07-03 13:32:53 +02:00
Rafael J. Wysocki	f5da24dbed	PM / Runtime: Update documentation regarding driver removal Commit `e1866b33b1` (PM / Runtime: Rework runtime PM handling during driver removal) forgot to update the documentation in Documentation/power/runtime_pm.txt to match the new code in drivers/base/dd.c. Update that documentation to match the code it describes. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Kevin Hilman <khilman@ti.com>	2011-07-02 14:27:11 +02:00
Kevin Hilman	5efb54cc3f	PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist. Replace reference to pm_runtime_idle_sync() in the driver core with pm_runtime_put_sync() which is used in the code. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-02 14:27:03 +02:00
Linus Torvalds	2e34b429a4	Merge branch 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 * 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: MAINTAINERS: add myself as maintainer of USB/IP usb: r8a66597-hcd: fix cannot detect low/full speed device USB: ehci-ath79: fix a NULL pointer dereference USB: Add new FT232H chip to drivers/usb/serial/ftdi_sio.c usb/isp1760: Fix bug preventing the unlinking of control urbs USB: Fix up URB error codes to reflect implementation. xhci: Always set urb->status to zero for isoc endpoints. xhci: Add reset on resume quirk for asrock p67 host xHCI 1.0: Incompatible Device Error USB: don't let errors prevent system sleep USB: don't let the hub driver prevent system sleep USB: change maintainership of ohci-hcd and ehci-hcd xHCI 1.0: Force Stopped Event(FSE) xhci: Don't warn about zeroed bMaxBurst descriptor field. USB: Free bandwidth when usb_disable_device is called. xhci: Reject double add of active endpoints. USB: TI 3410/5052 USB Serial Driver: Fix mem leak when firmware is too big. usb: musb: gadget: clear TXPKTRDY flag when set FLUSHFIFO usb: musb: host: compare status for negative error values	2011-06-28 11:15:17 -07:00
Rafael J. Wysocki	ca9c6890b5	PM / Domains: Update documentation Commit `4d27e9dcff` (PM: Make power domain callbacks take precedence over subsystem ones) forgot to update the device power management documentation to take changes made by it into account. Correct that mistake. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-06-21 23:25:32 +02:00
Rafael J. Wysocki	78420884e6	PM: Update documentation regarding sysdevs The part of Documentation/power/devices.txt regarding sysdevs is not valid any more after commit `2e711c04db` (PM: Remove sysdev suspend, resume and shutdown operations), so remove it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-06-21 23:19:15 +02:00
Kevin Hilman	129b656a0d	PM / Runtime: Update doc: usage count no longer incremented across system PM Commit `e866500247` (PM: Allow pm_runtime_suspend() to succeed during system suspend) removed usage count increment across system PM. Update doc to reflect this. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-06-21 23:17:59 +02:00
Linus Torvalds	357ed6b1a1	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Move RCU_BOOST #ifdefs to header file rcu: use softirq instead of kthreads except when RCU_BOOST=y rcu: Use softirq to address performance regression rcu: Simplify curing of load woes	2011-06-19 08:56:56 -07:00
Sarah Sharp	a9e758634f	USB: Fix up URB error codes to reflect implementation. Documentation/usb/error-codes.txt mentions that urb->status can be set to -EXDEV, if the isochronous transfer was not fully completed. However, in practice, EHCI, UHCI, and OHCI all only set -EXDEV in the individual frame status, never in the URB status. Those host controller actually always pass in a zero status to usb_hcd_giveback_urb, and rely on the core to set the appropriate status value. The xHCI driver ran into issues with the uvcvideo driver when it tried to set -EXDEV in urb->status, because the driver refused to submit URBs, and the userspace camera application's video froze. Clean up the documentation to reflect the actual implementation. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu>	2011-06-17 11:28:21 -07:00
Jörg Sommer	67de0162fb	Documentation: fix cgroup typos and formatting Fix format and spelling. Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:52:50 -07:00
Jörg Sommer	f6e07d3807	Documentation: update cgroupfs mount point According to commit `676db4af04` ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:52:50 -07:00
Maxin B. John	06a2c45d6b	Documentation: update kmemleak supported archs Instead of listing the architectures that are supported by kmemleak in Documentation/kmemleak.txt, just refer people to the list of supported architecutures in lib/Kconfig.debug so that Documentation/kmemleak.txt does not need more updates for this. Signed-off-by: Maxin B. John <maxin.john@gmail.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:52:50 -07:00
Andrew Murray	04c55715cb	Documentation: update printk-formats.txt This patch updates the incomplete documentation concerning the printk extended format specifiers. Signed-off-by: Andrew Murray <amurray@mpc-data.co.uk> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:52:50 -07:00
akpm@linux-foundation.org	31b5f8eeec	Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt Commit `a77aea9201` ("cgroup: remove the ns_cgroup") removed the ns_cgroup but it forgot to remove the related doc in feature-removal-schedule.txt. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:04:02 -07:00
Ying Han	50c35e5ba2	memcg: add documentation for the memory.numastat API [akpm@linux-foundation.org: rework text, fit it into 80-cols] Signed-off-by: Ying Han <yinghan@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
Michael Hennerich	a59ec1e7ff	backlight: new driver for the ADP8870 backlight devices Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
Shaohua Li	09223371de	rcu: Use softirq to address performance regression Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: "Alex,Shi" <alex.shi@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-06-14 15:25:39 -07:00
Linus Torvalds	81eb3dd843	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md/raid5: remove unusual use of bio_iovec_idx() md/raid5: fix FUA request handling in ops_run_io() md/raid5: fix raid5_set_bi_hw_segments md:Documentation/md.txt - fix typo md/bitmap: remove unused fields from struct bitmap md/bitmap: use proper accessor macro md: check ->hot_remove_disk when removing disk md: Using poll /proc/mdstat can monitor the events of adding a spare disks MD: use is_power_of_2 macro MD: raid5 do not set fullsync MD: support initial bitmap creation in-kernel MD: add sync_super to mddev_t struct MD: raid1 changes to allow use by device mapper MD: move thread wakeups into resume MD: possible typo MD: no sync IO while suspended MD: no integrity register if no gendisk	2011-06-14 11:21:21 -07:00
NeilBrown	f699bf2328	md:Documentation/md.txt - fix typo Reported-by: CoolCold <coolthecold@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-06-09 11:43:04 +10:00
Alan Stern	21c13a4f7b	usb-storage: redo incorrect reads Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Sven Geggus <sven-usbst@geggus.net> Tested-by: Paul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> CC: <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-06-07 09:05:42 -07:00
Linus Torvalds	f0f52a9463	Merge git://git.infradead.org/iommu-2.6 * git://git.infradead.org/iommu-2.6: intel-iommu: Fix off-by-one in RMRR setup intel-iommu: Add domain check in domain_remove_one_dev_info intel-iommu: Remove Host Bridge devices from identity mapping intel-iommu: Use coherent DMA mask when requested intel-iommu: Dont cache iova above 32bit intel-iommu: Speed up processing of the identity_mapping function intel-iommu: Check for identity mapping candidate using system dma mask intel-iommu: Only unlink device domains from iommu intel-iommu: Enable super page (2MiB, 1GiB, etc.) support intel-iommu: Flush unmaps at domain_exit intel-iommu: Remove obsolete comment from detect_intel_iommu intel-iommu: fix VT-d PMR disable for TXT on S3 resume	2011-06-02 05:48:50 +09:00
Youquan Song	6dd9a7c737	intel-iommu: Enable super page (2MiB, 1GiB, etc.) support There are no externally-visible changes with this. In the loop in the internal __domain_mapping() function, we simply detect if we are mapping: - size >= 2MiB, and - virtual address aligned to 2MiB, and - physical address aligned to 2MiB, and - on hardware that supports superpages. (and likewise for larger superpages). We automatically use a superpage for such mappings. We never have to worry about breaking superpages, since we trust that we will always unmap the same range that was mapped. So all we need to do is ensure that dma_pte_clear_range() will also cope with superpages. Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so it can return a PTE at the appropriate level rather than always extending the page tables all the way down to level 1. Again, this is simplified by the fact that we should never encounter existing small pages when we're creating a mapping; any old mapping that used the same virtual range will have been entirely removed and its obsolete page tables freed. Provide an 'intel_iommu=sp_off' argument on the command line as a chicken bit. Not that it should ever be required. == The original commit seen in the iommu-2.6.git was Youquan's implementation (and completion) of my own half-baked code which I'd typed into an email. Followed by half a dozen subsequent 'fixes'. I've taken the unusual step of rewriting history and collapsing the original commits in order to keep the main history simpler, and make life easier for the people who are going to have to backport this to older kernels. And also so I can give it a more coherent commit comment which (hopefully) gives a better explanation of what's going on. The original sequence of commits leading to identical code was: Youquan Song (3): intel-iommu: super page support intel-iommu: Fix superpage alignment calculation error intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte() David Woodhouse (4): intel-iommu: Precalculate superpage support for dmar_domain intel-iommu: Fix hardware_largepage_caps() intel-iommu: Fix inappropriate use of superpages in __domain_mapping() intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2011-06-01 12:26:35 +01:00
Rusty Russell	990c91f0af	lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY. No virtio device does this any more, so no need to clutter lguest with it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-05-30 11:14:12 +09:30
Rusty Russell	bc805a03c2	lguest: fix up compilation after move `ed16648eb5` "Move kvm, uml, and lguest subdirectories" broke the lguest example launcher. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-05-30 11:14:12 +09:30
Linus Torvalds	2ba781ced9	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (43 commits) acer-wmi: support integer return type from WMI methods msi-laptop: fix section mismatch in reference from the function load_scm_model_init acer-wmi: support to set communication device state by new wmid method acer-wmi: allow 64-bits return buffer from WMI methods acer-wmi: check the existence of internal 3G device when set capability platform/x86:delete two unused variables support wlan hotkey on Acer Travelmate 5735Z platform-x86: intel_mid_thermal: Fix memory leak platform/x86: Fix Makefile for intel_mid_powerbtn platform/x86: Simplify intel_mid_powerbtn acer-wmi: Delete out-of-date documentation acerhdf: Clean up includes acerhdf: Drop pointless dependency on THERMAL_HWMON acer-wmi: Update MAINTAINERS wmi: Orphan ACPI-WMI driver tc1100-wmi: Orphan driver acer-wmi: does not allow negative number set to initial device state platform/oaktrail: ACPI EC Extra driver for Oaktrail thinkpad_acpi: Convert printks to pr_<level> thinkpad_acpi: Correct !CONFIG_THINKPAD_ACPI_VIDEO warning ...	2011-05-29 11:44:33 -07:00
Matthew Garrett	437cb0dbd1	Merge branch 'x86-platform-next' into x86-platform	2011-05-29 14:27:13 -04:00
Linus Torvalds	daa94222b6	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI EC: remove redundant code ACPI: Add D3 cold state ACPI: processor: fix processor_physically_present in UP kernel ACPI: Split out custom_method functionality into an own driver ACPI: Cleanup custom_method debug stuff ACPI EC: enable MSI workaround for Quanta laptops ACPICA: Update to version 20110413 ACPICA: Execute an orphan _REG method under the EC device ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place ACPICA: Update internal address SpaceID for DataTable regions ACPICA: Add more methods eligible for NULL package element removal ACPICA: Split all internal Global Lock functions to new file - evglock ACPI: EC: add another DMI check for ASUS hardware ACPI EC: remove dead code ACPICA: Fix code divergence of global lock handling ACPICA: Use acpi_os_create_lock interface ACPI: osl, add acpi_os_create_lock interface ACPI:Fix goto flows in thermal-sys	2011-05-29 11:19:16 -07:00
Linus Torvalds	f310642123	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds	2011-05-29 11:18:09 -07:00
Len Brown	5d4c47e019	x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param mwait_idle() is a C1-only idle loop intended to be more efficient than HLT on SMP hardware that supports it. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle(). Deprecate mwait_idle() and the "idle=mwait" cmdline param to simplify the x86 idle code. After this change, kernels configured with (!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware that support MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be used. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 03:39:17 -04:00
Len Brown	cdaab4a0d3	x86 idle: deprecate "no-hlt" cmdline param We'd rather that modern machines not check if HLT works on every entry into idle, for the benefit of machines that had marginal electricals 15-years ago. If those machines are still running the upstream kernel, they can use "idle=poll". The only difference will be that they'll now invoke HLT in machine_hlt(). cc: x86@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 03:39:16 -04:00
Len Brown	99c6322143	x86 idle APM: deprecate CONFIG_APM_CPU_IDLE We don't want to export the pm_idle function pointer to modules. Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to. CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit uniprocessor laptops that are over 10 years old. It calls into the BIOS during idle, and is known to cause a number of machines to fail. Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting pm_idle. Any systems that were calling into the APM BIOS at run-time will simply use HLT instead. cc: x86@kernel.org cc: Jiri Kosina <jkosina@suse.cz> cc: stable@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 03:39:15 -04:00
Len Brown	3b70b2e5fc	x86 idle floppy: deprecate disable_hlt() Plan to remove floppy_disable_hlt in 2012, an ancient workaround with comments that it should be removed. This allows us to remove clutter and a run-time branch from the idle code. WARN_ONCE() on invocation until it is removed. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 03:39:15 -04:00

1 2 3 4 5 ...

8931 commits