Commit graph

99722 commits

Author SHA1 Message Date
Yinghai Lu
87a1c441e1 x86: get x86_phys_bits early
when try to make hpet_enable use io_remap instead fixmap got

ioremap: invalid physical address fed00000
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:161 __ioremap_caller+0x8c/0x2f3()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.26-rc9-tip-01873-ga9827e7-dirty #358

Call Trace:
 [<ffffffff8026615e>] warn_on_slowpath+0x6c/0xa7
 [<ffffffff802e2313>] ? __slab_alloc+0x20a/0x3fb
 [<ffffffff802d85c5>] ? mpol_new+0x88/0x17d
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8024b0d2>] __ioremap_caller+0x8c/0x2f3
 [<ffffffff80e86dbd>] ? hpet_enable+0x39/0x241
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8024b466>] ioremap_nocache+0x2a/0x40
 [<ffffffff80e86dbd>] hpet_enable+0x39/0x241
 [<ffffffff80e7a1f6>] hpet_time_init+0x21/0x4e
 [<ffffffff80e730e9>] start_kernel+0x302/0x395
 [<ffffffff80e722aa>] x86_64_start_reservations+0xb9/0xd4
 [<ffffffff80e722fe>] ? x86_64_init_pda+0x39/0x4f
 [<ffffffff80e72400>] x86_64_start_kernel+0xec/0x107

---[ end trace a7919e7f17c0a725 ]---

it seems for amd system that is set later...
try to move setting early in early_identify_cpu.
and remove same code for intel and centaur.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:24:16 +02:00
Yinghai Lu
32b23e9a73 x86: max_low_pfn_mapped fix #4
only add direct mapping for aperture

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:24:16 +02:00
Mike Travis
11369f356b x86: change _node_to_cpumask_ptr to return const ptr
* Strengthen the return type for the _node_to_cpumask_ptr to be
    a const pointer.  This adds compiler checking to insure that
    node_to_cpumask_map[] is not changed inadvertently.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 19:11:58 +02:00
Maciej W. Rozycki
ce8b06b985 x86: I/O APIC: remove an IRQ2-mask hack
Now that IRQ2 is never made available to the I/O APIC, there is no need
to special-case it and mask as a workaround for broken systems.  Actually,
because of the former, mask_IO_APIC_irq(2) is a no-op already.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 11:43:48 +02:00
Yinghai Lu
3d88cca708 x86: fix numaq_tsc_disable calling
got this on a test-system:

 calling  numaq_tsc_disable+0x0/0x39
 NUMAQ: disabling TSC
 initcall numaq_tsc_disable+0x0/0x39 returned 0 after 0 msecs

that's because we should not be using arch_initcall to call numaq_tsc_disable.

need to call it in setup_arch before time_init()/tsc_init()
and call it in init_intel() to make the cpu feature bits right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:45 +02:00
Yinghai Lu
7b479becdb x86, e820: remove end_user_pfn
end_user_pfn used to modify the meaning of the e820 maps.

Now that all e820 operations are cleaned up, unified, tightened up,
the e820 map always get updated to reality, we don't need to keep
this secondary mechanism anymore.

If you hit this commit in bisection it means something slipped through.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:40 +02:00
Yinghai Lu
9958e810f8 x86: max_low_pfn_mapped fix, #3
optimization: try to merge the range with same page size in
init_memory_mapping, to get the best possible linear mappings set up.

thus when GBpages is not there, we could do 2M pages.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:16 +02:00
Yinghai Lu
965194c15d x86: max_low_pfn_mapped fix, #2
tighten the boundary checks around max_low_pfn_mapped - dont overmap
nor undermap into holes.

also print out tseg for AMD cpus, for diagnostic purposes.
(this is an SMM area, and we split up any big mappings around that area)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:16 +02:00
Yinghai Lu
7ab073b6e0 x86: max_low_pfn_mapped fix, #1
fix crash on Ingo's big box:

calling  pci_iommu_init+0x0/0x17
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ d0000000 size 65536 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
BUG: unable to handle kernel paging request at ffff88000003be88
IP: [<ffffffff8026d377>] __alloc_pages_internal+0xc3/0x3f2
PGD 202063 PUD 206063 PMD 22fc00163 PTE 3b162
Oops: 0000 [1] SMP

and e820 is:

 BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
 BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
 BIOS-e820: 000000007ff70000 - 000000007ff86000 (ACPI data)
 BIOS-e820: 000000007ff86000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 0000000080000000 - 00000000cfe00000 (usable)
 BIOS-e820: 00000000cfe00000 - 00000000d0000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000830000000 (usable)

system has 32 GB RAM installed.

max_low_pfn_mapped is 0xcfe00, and GART aperture is not mapped.

So try to use init_memory_mapping to map that area, because the iommu
thinks that area is ram ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:15 +02:00
Ingo Molnar
ae94b8075a Merge branch 'linus' into x86/core
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:29:02 +02:00
Roland McGrath
eca91e7838 x86_64: fix delayed signals
On three of the several paths in entry_64.S that call
do_notify_resume() on the way back to user mode, we fail to properly
check again for newly-arrived work that requires another call to
do_notify_resume() before going to user mode.  These paths set the
mask to check only _TIF_NEED_RESCHED, but this is wrong.  The other
paths that lead to do_notify_resume() do this correctly already, and
entry_32.S does it correctly in all cases.

All paths back to user mode have to check all the _TIF_WORK_MASK
flags at the last possible stage, with interrupts disabled.
Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
set any time after we entered do_notify_resume().  More work flags
can be set (or left set) synchronously inside do_notify_resume(), as
TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
(which then send an asynchronous interrupt).

There are many different scenarios that could hit this bug, most of
them races.  The simplest one to demonstrate does not require any
race: when one signal has done handler setup at the check before
returning from a syscall, and there is another signal pending that
should be handled.  The second signal's handler should interrupt the
first signal handler before it actually starts (so the interrupted PC
is still at the handler's entry point).  Instead, it runs away until
the next kernel entry (next syscall, tick, etc).

This test behaves correctly on 32-bit kernels, and fails on 64-bit
(either 32-bit or 64-bit test binary).  With this fix, it works.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>
    #include <sys/ucontext.h>

    #ifndef REG_RIP
    #define REG_RIP REG_EIP
    #endif

    static sig_atomic_t hit1, hit2;

    static void
    handler (int sig, siginfo_t *info, void *ctx)
    {
      ucontext_t *uc = ctx;

      if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
        {
          if (sig == SIGUSR1)
            hit1 = 1;
          else
            hit2 = 1;
        }

      printf ("%s at %#lx\n", strsignal (sig),
              uc->uc_mcontext.gregs[REG_RIP]);
    }

    int
    main (void)
    {
      struct sigaction sa;
      sigset_t set;

      sigemptyset (&sa.sa_mask);
      sa.sa_flags = SA_SIGINFO;
      sa.sa_sigaction = &handler;

      if (sigaction (SIGUSR1, &sa, NULL)
          || sigaction (SIGUSR2, &sa, NULL))
        return 2;

      sigemptyset (&set);
      sigaddset (&set, SIGUSR1);
      sigaddset (&set, SIGUSR2);
      if (sigprocmask (SIG_BLOCK, &set, NULL))
        return 3;

      printf ("main at %p, handler at %p\n", &main, &handler);

      raise (SIGUSR1);
      raise (SIGUSR2);

      if (sigprocmask (SIG_UNBLOCK, &set, NULL))
        return 4;

      if (hit1 + hit2 == 1)
        {
          puts ("PASS");
          return 0;
        }

      puts ("FAIL");
      return 1;
    }

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:11:10 +02:00
Rafael J. Wysocki
da1f29f5df x86: remove conflicting nx6325 and nx6125 quirks
We have two conflicting DMA-based quirks in there for the same set of
boxes (HP nx6325 and nx6125) and one of them actually breaks my box.

So remove the extra code.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: =?iso-8859-1?q?T=F6r=F6k_Edwin?= <edwintorok@gmail.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 06:44:58 +02:00
Linus Torvalds
a26929fb48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [PATCH] IPMI: return correct value from ipmi_write
2008-07-11 17:00:17 -07:00
Mark Rustad
3976df9b04 [PATCH] IPMI: return correct value from ipmi_write
This patch corrects the handling of write operations to the IPMI watchdog
to work as intended by returning the number of characters actually
processed. Without this patch, an "echo V >/dev/watchdog" enables the
watchdog if IPMI is providing the watchdog function.

Signed-off-by: Mark Rustad <MRustad@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2008-07-11 20:31:05 +00:00
Ingo Molnar
6c82a000a2 Merge branch 'x86/generalize-visws' into x86/core 2008-07-11 21:22:18 +02:00
Maciej W. Rozycki
5b4d2386c2 x86: Recover timer_ack lost in the merge of the NMI watchdog
In the course of the recent unification of the NMI watchdog an assignment
to timer_ack to switch off unnecesary POLL commands to the 8259A in the
case of a watchdog failure has been accidentally removed.  The statement
used to be limited to the 32-bit variation as since the rewrite of the
timer code it has been relevant for the 82489DX only.  This change brings
it back.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:03 +02:00
Maciej W. Rozycki
af174783b9 x86: I/O APIC: Never configure IRQ2
There is no such entity as ISA IRQ2.  The ACPI spec does not make it
explicitly clear, but does not preclude it either -- all it says is ISA
legacy interrupts are identity mapped by default (subject to overrides),
but it does not state whether IRQ2 exists or not.  As a result if there is
no IRQ0 override, then IRQ2 is normally initialised as an ISA interrupt,
which implies an edge-triggered line, which is unmasked by default as this
is what we do for edge-triggered I/O APIC interrupts so as not to miss an
edge.

To the best of my knowledge it is useless, as IRQ2 has not been in use
since the PC/AT as back then it was taken by the 8259A cascade interrupt
to the slave, with the line position in the slot rerouted to newly-created
IRQ9.  No device could thus make use of this line with the pair of 8259A
chips.  Now in theory INTIN2 of the I/O APIC may be usable, but the
interrupt of the device wired to it would not be available in the PIC mode
at all, so I seriously doubt if anybody decided to reuse it for a regular
device.

However there are two common uses of INTIN2.  One is for IRQ0, with an
ACPI interrupt override (or its equivalent in the MP table).  But in this
case IRQ2 is gone entirely with INTIN0 left vacant.  The other one is for
an 8959A ExtINTA cascade.  In this case IRQ0 goes to INTIN0 and if ACPI is
used INTIN2 is assumed to be IRQ2 (there is no override and ACPI has no
way to report ExtINTA interrupts).  This is where a problem happens.

The problem is INTIN2 is configured as a native APIC interrupt, with a
vector assigned and the mask cleared.  And the line may indeed get active
and inject interrupts if the master 8959A has its timer interrupt enabled
(it might happen for other interrupts too, but they are normally masked in
the process of rerouting them to the I/O APIC).  There are two cases where
it will happen:

* When the I/O APIC NMI watchdog is enabled.  This is actually a misnomer
  as the watchdog pulses are delivered through the 8259A to the LINT0
  inputs of all the local APICs in the system.  The implication is the
  output of the master 8259A goes high and low repeatedly, signalling
  interrupts to INTIN2 which is enabled too!

  [The origin of the name is I think for a brief period during the
  development we had a capability in our code to configure the watchdog to
  use an I/O APIC input; that would be INTIN2 in this scenario.]

* When the native route of IRQ0 via INTIN0 fails for whatever reason -- as
  it happens with the system considered here.  In this scenario the timer
  pulse is delivered through the 8259A to LINT0 input of the local APIC of
  the bootstrap processor, quite similarly to how is done for the watchdog
  described above.  The result is, again, INTIN2 receives these pulses
  too.  Rafael's system used to escape this scenario, because an incorrect
  IRQ0 override would occupy INTIN2 and prevent it from being unmasked.

My conclusion is IRQ2 should be excluded from configuration in all the
cases and the current exception for ACPI systems should be lifted.  The
reason being the exception not only being useless, but harmful as well.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:03 +02:00
Maciej W. Rozycki
c88ac1df48 x86: L-APIC: Always fully configure IRQ0
Unlike the 32-bit one, the 64-bit variation of the LVT0 setup code for
the "8259A Virtual Wire" through the local APIC timer configuration does
not fully configure the relevant irq_chip structure.  Instead it relies on
the preceding I/O APIC code to have set it up, which does not happen if
the I/O APIC variants have not been tried.

The patch includes corresponding changes to the 32-bit variation too
which make them both the same, barring a small syntactic difference
involving sequence of functions in the source.  That should work as an aid
with the upcoming merge.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:02 +02:00
Maciej W. Rozycki
1baea6e2fe x86: L-APIC: Set IRQ0 as edge-triggered
IRQ0 is edge-triggered, but the "8259A Virtual Wire" through the local
APIC configuration in the 32-bit version uses the "fasteoi" handler
suitable for level-triggered APIC interrupt.  Rewrite code so that the
"edge" handler is used.  The 64-bit version uses different code and is
unaffected.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:02 +02:00
Glauber Costa
392a0fc96b x86: merge dwarf2 headers
Merge dwarf2_32.h and dwarf2_64.h into dwarf2.h.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:49:39 +02:00
Glauber Costa
d73a731abe x86: use AS_CFI instead of UNWIND_INFO
In dwarf2_32.h, test for CONFIG_AS_CFI instead of
CONFIG_UNWIND_INFO. Turns out that searching for UNWIND_INFO
returns no match in any Kconfig or Makefile, so we're really
just throwing everything away regarding dwarf frames for i386.

The test that generates CONFIG_AS_CFI does not have anything
x86_64-specific, and right now, checking V=1 builds shows me
that the flags is there anyway, although unused.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:49:35 +02:00
Glauber Costa
70f1bba4c8 x86: use ignore macro instead of hash comment
In dwarf_64.h header, use the "ignore" macro the way
i386 does.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:49:32 +02:00
Glauber Costa
557d7d4e29 x86: use matching CFI_ENDPROC
The RING0_INT_FRAME macro defines a CFI_STARTPROC.
So we should really be using CFI_ENDPROC after it.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:49:28 +02:00
Linus Torvalds
4d727a781f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-acpi: don't call sleeping function from invalid context
  Added Targa Visionary 1000 IDE adapter to pata_sis.c
  libata-acpi: filter out DIPM enable
2008-07-11 11:37:55 -07:00
Dave Chinner
49641f1acf Fix reference counting race on log buffers
When we release the iclog, we do an atomic_dec_and_lock to determine if
we are the last reference and need to trigger update of log headers and
writeout.  However, in xlog_state_get_iclog_space() we also need to
check if we have the last reference count there.  If we do, we release
the log buffer, otherwise we decrement the reference count.

But the compare and decrement in xlog_state_get_iclog_space() is not
atomic, so both places can see a reference count of 2 and neither will
release the iclog.  That leads to a filesystem hang.

Close the race by replacing the atomic_read() and atomic_dec() pair with
atomic_add_unless() to ensure that they are executed atomically.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Tim Shimmin <tes@sgi.com>
Tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-11 11:37:18 -07:00
Ingo Molnar
d9fc3fd3fa x86: fix savesegment() bug causing crashes on 64-bit
i spent a fair amount of time chasing a 64-bit bootup crash that manifested
itself as bootup segfaults:

  S10network[1825]: segfault at 7f3e2b5d16b8 ip 00000031108748c9 sp 00007fffb9c14c70 error 4 in libc-2.7.so[3110800000+14d000]

eventually causing init to die and panic the system:

  Kernel panic - not syncing: Attempted to kill init!
  Pid: 1, comm: init Not tainted 2.6.26-rc9-tip #13878

after a maratonic bisection session, the bad commit turned out to be:

| b7675791859075418199c7af86a116ea34eaf5bd is first bad commit
| commit b7675791859075418199c7af86a116ea34eaf5bd
| Author: Jeremy Fitzhardinge <jeremy@goop.org>
| Date:   Wed Jun 25 00:19:00 2008 -0400
|
|     x86: remove open-coded save/load segment operations
|
|     This removes a pile of buggy open-coded implementations of savesegment
|     and loadsegment.

after some more bisection of this patch itself, it turns out that what
makes the difference are the savesegment() changes to __switch_to().

Taking a look at this portion of arch/x86/kernel/process_64.o revealed
this crutial difference:

| good:    99c:       8c e0                   mov    %fs,%eax
|          99e:       89 45 cc                mov    %eax,-0x34(%rbp)
|
| bad:     99c:       8c 65 cc                mov    %fs,-0x34(%rbp)

which is due to:

|                 unsigned fsindex;
| -               asm volatile("movl %%fs,%0" : "=r" (fsindex));
| +               savesegment(fs, fsindex);

savesegment() is implemented as:

 #define savesegment(seg, value)                                \
          asm("mov %%" #seg ",%0":"=rm" (value) : : "memory")

note the "m" modifier - it allows GCC to generate the segment move
into a memory operand as well.

But regarding segment operands there's a subtle detail in the x86
instruction set: the above 16-bit moves are zero-extend, but only
if it goes to a register.

If it goes to a memory operand, -0x34(%rbp) in the above case, there's
no zero-extend to 32-bit and the instruction will only save 16 bits
instead of the intended 32-bit.

The other 16 bits is random data - which can cause problems when that
value is used later on.

The solution is to only allow segment operands to go to registers.
This fix allows my test-system to boot up without crashing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 19:51:47 +02:00
Jeremy Fitzhardinge
b6ad92d4fa x86_64: vdso32 cleanup using feature flags
Use the X86_FEATURE_SYSENTER32 to remove hard-coded CPU vendor check.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:44:58 +02:00
Jeremy Fitzhardinge
8d28aab59f x86_64: add pseudo-features for 32-bit compat syscall
Add pseudo-feature bits to describe whether the CPU supports sysenter
and/or syscall from ia32-compat userspace.  This removes a hardcoded
test in vdso32-setup.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:44:57 +02:00
Zhang Rui
3c1e389634 libata-acpi: don't call sleeping function from invalid context
The problem is introduced by commit
664d080c41.

acpi_evaluate_integer is a sleeping function,
and it should not be called with spin_lock_irqsave.
https://bugzilla.redhat.com/show_bug.cgi?id=451399

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-11 09:42:03 -04:00
Kai Krakow
edb804713f Added Targa Visionary 1000 IDE adapter to pata_sis.c
This enables short 40-wire detection for my laptop thus
enabling UDMA/100.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-11 09:38:24 -04:00
Tejun Heo
b344991ace libata-acpi: filter out DIPM enable
Some BIOSen enable DIPM via _GTF which causes command timeouts under
certain configuration.  This didn't occur on 2.6.25 because 2.6.25
defaulted to SRST, so _GTF wasn't executed during boot probe, so ahci
host reset disabled DIPM and as _GTF wasn't executed after SRST, DIPM
wasn't enabled.  On 2.6.26, hardreset is used during probe and after
probe _GTF is executed enabling DIPM and thus the failures.

This patch could theoretically disable DIPM on machines which used to
have it enabled on 2.6.25 but AFAIK ahci is currently the only driver
which uses SATA ACPI hierarchy (_SDD) and as the host reset would have
always disabled DIPM, this shouldn't happen.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-11 09:38:23 -04:00
Ingo Molnar
3d0decc4f4 x86: fix tsc unification buglet with ftrace and stackprotector
Yinghai Lu reported crashes on 64-bit x86:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 IP: [<ffffffff80253b17>] hrtick_start_fair+0x89/0x173
 [...]

And with a long session of debugging and a lot of difficulty, tracked it down
to this commit:

 --------------->
 8fbbc4b45c is first bad commit
 commit 8fbbc4b45c
 Author: Alok Kataria <akataria@vmware.com>
 Date:   Tue Jul 1 11:43:34 2008 -0700

     x86: merge tsc_init and clocksource code
 <--------------

The problem is that the TSC unification missed these Makefile rules
in arch/x86/kernel/Makefile:

  # Do not profile debug and lowlevel utilities
  CFLAGS_REMOVE_tsc_64.o = -pg
  CFLAGS_REMOVE_tsc_32.o = -pg
  ...
  CFLAGS_tsc_64.o         := $(nostackp)
  ...

which rules make sure that various instrumentation and debugging
facilities are disabled for code that might end up in a VDSO - such as
the TSC code.

Reported-and-bisected-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Conflicts:

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:09:15 +02:00
Yinghai Lu
f361a450bf x86: introduce max_low_pfn_mapped for 64-bit
when more than 4g memory is installed, don't map the big hole below 4g.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:24:04 +02:00
Yinghai Lu
f302a5bbe5 x86: reserve SLIT
save the SLIT, in case we are using fixmap to read it, and that fixmap
could be cleared by others.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:22:33 +02:00
Yinghai Lu
69a7704d7a x86: e820: user-defined memory maps: remove the range instead of update it to reserved
also let mem= to print out modified e820 map too

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:21:24 +02:00
Paul Gortmaker
61ca9daa2c rtc: fix reported IRQ rate for when HPET is enabled
The IRQ rate reported back by the RTC is incorrect when HPET is enabled.

Newer hardware that has HPET to emulate the legacy RTC device gets this value
wrong since after it sets the rate, it returns before setting the variable
used to report the IRQ rate back to users of the device -- so the set rate and
the reported rate get out of sync.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Brownell <david-b@pacbell.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 18:04:43 -07:00
Uwe Kleine-König
ac310bb5db Fix name of Russell King in various comments
This patch was created by

	git grep -E -l 'Rus(el|s?e)l King' | xargs -r -t perl -p -i -e 's/Rus(el|s?e)l King/Russell King/g'

Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Most-Definitely-Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 18:04:43 -07:00
Eugene Surovegin
a7de3902ed rapidio: fix device reference counting
Fix RapidIO device reference counting.

Signed-of-by: Eugene Surovegin <ebs@ebshome.net>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 18:04:43 -07:00
Marcin Obara
fb0e7e11d0 tpm: add Intel TPM TIS device HID
This patch adds Intel TPM TIS device HID:  ICO0102

Signed-off-by: Marcin Obara <marcin_obara@users.sourceforge.net>
Acked-by: Marcel Selhorst <tpm@selhorst.net>
Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 18:04:43 -07:00
Linus Torvalds
e5a5816f78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  tun: Persistent devices can get stuck in xoff state
  xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
  ipv6: missed namespace context in ipv6_rthdr_rcv
  netlabel: netlink_unicast calls kfree_skb on error path by itself
  ipv4: fib_trie: Fix lookup error return
  tcp: correct kcalloc usage
  ip: sysctl documentation cleanup
  Documentation: clarify tcp_{r,w}mem sysctl docs
  netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP
  netfilter: nf_conntrack_tcp: fix endless loop
  libertas: fix memory alignment problems on the blackfin
  zd1211rw: stop beacons on remove_interface
  rt2x00: Disable synchronization during initialization
  rc80211_pid: Fix fast_start parameter handling
  sctp: Add documentation for sctp sysctl variable
  ipv6: fix race between ipv6_del_addr and DAD timer
  irda: Fix netlink error path return value
  irda: New device ID for nsc-ircc
  irda: via-ircc proper dma freeing
  sctp: Mark the tsn as received after all allocations finish
  ...
2008-07-10 17:58:47 -07:00
Max Krasnyansky
e35259a953 tun: Persistent devices can get stuck in xoff state
The scenario goes like this. App stops reading from tun/tap.
TX queue gets full and driver does netif_stop_queue().
App closes fd and TX queue gets flushed as part of the cleanup.
Next time the app opens tun/tap and starts reading from it but
the xoff state is not cleared. We're stuck.
Normally xoff state is cleared when netdev is brought up. But
in the case of persistent devices this happens only during
initial setup.

The fix is trivial. If device is already up when an app opens
it we clear xoff state and that gets things moving again.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:59:11 -07:00
Steffen Klassert
ccf9b3b83d xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for
the selector family. Userspace applications can set this flag to leave
the selector family of the xfrm_state unspecified.  This can be used
to to handle inter family tunnels if the selector is not set from
userspace.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:55:37 -07:00
Denis V. Lunev
0ce28553cc ipv6: missed namespace context in ipv6_rthdr_rcv
Signed-off-by: Denis V. Lunev <den@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:54:50 -07:00
Denis V. Lunev
fe785bee05 netlabel: netlink_unicast calls kfree_skb on error path by itself
So, no need to kfree_skb here on the error path. In this case we can
simply return.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:53:39 -07:00
Ben Hutchings
2e655571c6 ipv4: fib_trie: Fix lookup error return
In commit a07f5f508a "[IPV4] fib_trie: style
cleanup", the changes to check_leaf() and fn_trie_lookup() were wrong - where
fn_trie_lookup() would previously return a negative error value from
check_leaf(), it now returns 0.
 
Now fn_trie_lookup() doesn't appear to care about plen, so we can revert
check_leaf() to returning the error value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: William Boughton <bill@boughton.de>
Acked-by: Stephen Heminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:52:52 -07:00
Milton Miller
3d8ea1fd70 tcp: correct kcalloc usage
kcalloc is supposed to be called with the count as its first argument and
the element size as the second.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:51:32 -07:00
Stephen Hemminger
4edc2f3416 ip: sysctl documentation cleanup
Reduced version of the spelling cleanup patch.

Take out the confusing language in tcp_frto, and organize the
undocumented values.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:50:26 -07:00
J. Bruce Fields
53025f5efd Documentation: clarify tcp_{r,w}mem sysctl docs
Fix some of the defaults and attempt to clarify some language.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:47:41 -07:00
Dmitry Adamushko
bdb2192851 slub: Fix use-after-preempt of per-CPU data structure
Vegard Nossum reported a crash in kmem_cache_alloc():

	BUG: unable to handle kernel paging request at da87d000
	IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
	*pde = 28180163 *pte = 1a87d160
	Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
	Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
	EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
	EIP is at kmem_cache_alloc+0xc7/0xe0
	EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
	ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
	DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

and analyzed it:

  "The register %ecx looks innocent but is very important here. The disassembly:

       mov    %edx,%ecx
       shr    $0x2,%ecx
       rep stos %eax,%es:(%edi) <-- the fault

   So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
   (0x6b6b6b6b >> 2 == 0x1adadada.)

   %ecx is the counter for the memset, from here:

       memset(object, 0, c->objsize);

  i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
  Where did "c" come from? Uh-oh...

       c = get_cpu_slab(s, smp_processor_id());

  This looks like it has very much to do with CPU hotplug/unplug. Is
  there a race between SLUB/hotplug since the CPU slab is used after it
  has been freed?"

Good analysis.

Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
can be migrated on another CPU right after local_irq_restore() and
before memset().  The inital cpu can become offline in the mean time (or
a migration is a consequence of the CPU going offline) so its
'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).

At some point of time the caller continues on another CPU having an
obsolete pointer...

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 15:18:50 -07:00
Hugh Dickins
96a8e13ed4 exec: fix stack excutability without PT_GNU_STACK
Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.

Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup.  Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.

Reported-by: pageexec@freemail.hu
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 13:25:43 -07:00