Use key/offset caching to change /proc/net/route (use by iputils route)
from O(n^2) to O(n). This improves performance from 30sec with 160,000
routes to 1sec.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes possible problems when trie_firstleaf() returns NULL
to trie_leafindex().
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various RFCs have all sorts of things to say about the CS field of the
DSCP value. In particular they try to make the distinction between
values that should be used by "user applications" and things like
routing daemons.
This seems to have influenced the CAP_NET_ADMIN check which exists for
IP_TOS socket option settings, but in fact it has an off-by-one error
so it wasn't allowing CS5 which is meant for "user applications" as
well.
Further adding to the inconsistency and brokenness here, IPV6 does not
validate the DSCP values specified for the IPV6_TCLASS socket option.
The real actual uses of these TOS values are system specific in the
final analysis, and these RFC recommendations are just that, "a
recommendation". In fact the standards very purposefully use
"SHOULD" and "SHOULD NOT" when describing how these values can be
used.
In the final analysis the only clean way to provide consistency here
is to remove the CAP_NET_ADMIN check. The alternatives just don't
work out:
1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
setups.
2) If we just fix the off-by-one error in the class comparison in
IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
default. So people will just ask for a sysctl asking to
override that.
I checked several other freely available kernel trees and they
do not make any privilege checks in this area like we do. For
the BSD stacks, this goes back all the way to Stevens Volume 2
and beyond.
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if we don't want to register the WMI driver, we should initialize
the wmi_blocks list to be empty, since we don't want the wmi helper
functions to oops just because that basic list has not even been set up.
With this, "find_guid()" will happily return "not found" rather than
oopsing all over the place, and the callers will then just automatically
return false or AE_NOT_FOUND as appropriate.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The makefile magic for installing the 32-bit vdso images on disk had a
little error. A single-line change would fix that bug, but this does a
little more to reduce the error-prone duplication of this bit of
makefile variable magic.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully populated
[out to MAXNUMNODES] nodemask. set_mempolicy() [in do_set_mempolicy()]
calls contextualize_policy() which requires that the nodemask be a
subset of the current task's mems_allowed; else EINVAL will be returned.
A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
i.e., nodes with memory. So, a fully populated nodemask will be
declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the nodemask
that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy() will
complain [again, resulting in EINVAL] if the nodemask contains any
memoryless nodes. This is somewhat redundant as mpol_new() will remove
memoryless nodes for interleave policy, as will bind_zonelist()--called
by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
[ Patch history:
v1 -> v2:
- Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
- As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
v2 -> v3:
- As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner. ]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So I spent a while pounding my head against my monitor trying to figure
out the vmsplice() vulnerability - how could a failure to check for
*read* access turn into a root exploit? It turns out that it's a buffer
overflow problem which is made easy by the way get_user_pages() is
coded.
In particular, "len" is a signed int, and it is only checked at the
*end* of a do {} while() loop. So, if it is passed in as zero, the loop
will execute once and decrement len to -1. At that point, the loop will
proceed until the next invalid address is found; in the process, it will
likely overflow the pages array passed in to get_user_pages().
I think that, if get_user_pages() has been asked to grab zero pages,
that's what it should do. Thus this patch; it is, among other things,
enough to block the (already fixed) root exploit and any others which
might be lurking in similar code. I also think that the number of pages
should be unsigned, but changing the prototype of this function probably
requires some more careful review.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt is already the maintainer of SLOB which is one of the "SLAB" allocators in
the kernel so add him to MAINTAINERS.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
sata_mv: platform driver allocs dma without create
pata_ninja32: setup changes
pata_legacy: typo fix
pata_amd: Note in the module description it handles Nvidia
sata_mv: fix loop with last port
libata: ignore deverr on SETXFER if mode is configured
pata_via: fix SATA cable detection on cx700
This avoids warnings with unreferenced variables in the !NUMA case.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 313abe55 ("mlx4_core: For 64-bit systems, vmap() kernel queue
buffers") caused this to pop up on powerpc allyesconfig, looks like a
missing include file:
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
drivers/net/mlx4/alloc.c:162: error: implicit declaration of function 'vmap'
drivers/net/mlx4/alloc.c:162: error: 'VM_MAP' undeclared (first use in this function)
drivers/net/mlx4/alloc.c:162: error: (Each undeclared identifier is reported only once
drivers/net/mlx4/alloc.c:162: error: for each function it appears in.)
drivers/net/mlx4/alloc.c:162: warning: assignment makes pointer from integer without a cast
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_free':
drivers/net/mlx4/alloc.c:187: error: implicit declaration of function 'vunmap'
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit bdc807871d broke the build
for this config because the sim_defconfig selects CONFIG_HZ=250
but include/asm-ia64/param.h has an ifdef for the simulator to
force HZ to 32. So we ended up with a kernel/timeconst.h set
for HZ=250 ... which then failed the check for the right HZ
value and died with:
Drop the #ifdef magic from param.h and make force CONFIG_HZ=32
directly for the simulator.
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the sata_mv driver is used as a platform driver,
mv_create_dma_pools() is never called so it fails when trying
to alloc in mv_pool_start().
Signed-off-by: Byron Bradley <byron.bbradley@gmail.com>
Acked-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Forcibly set more of the configuration at init time. This seems to fix at
least one problem reported. We don't know what most of these bits do, but
we do know what windows stuffs there.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some controllers (VIA CX700) raise device error on SETXFER even after
mode configuration succeeded. Update ata_dev_set_mode() such that
device error is ignored if transfer mode is configured correctly. To
implement this, device is revalidated even after device error on
SETXFER.
This fixes kernel bugzilla bug 8563.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without this patch a Opteron test system here oopses at boot with
current git.
Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
SUNPRC: Fix printk format warning
nfsd: clean up svc_reserve_auth()
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
NLM: have server-side RPC clients default to soft RPC tasks
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
When make -s support were added to filechk to
combination created with make V=1 were not
covered.
Fix it by explicitly cover this case too.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Mike Frysinger <vapier@gentoo.org>
This patch fixes a use-after-free introduced by
commit a79d8e93d3 and spotted by the
Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix following warnings:
WARNING: drivers/net/sis190.o(.text+0x103): Section mismatch in reference from the function sis190_get_mac_addr() to the function .devinit.text:sis190_get_mac_addr_from_apc()
WARNING: drivers/net/sis190.o(.text+0x10e): Section mismatch in reference from the function sis190_get_mac_addr() to the function .devinit.text:sis190_get_mac_addr_from_eeprom()
Annotate sis190_get_mac_addr() with __devinit.
Signed-off-by: Sergio Luis <sergio@uece.br>
sis190.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word. But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface. Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.
[ Thanks to Grant Grundler for pointing out to me that this is exactly
what the write immediately above this is doing -- enabling device 8 to
respond to config space cycles.
- Matthew
Grant also says:
"Can you also add a comment which points at the Intel
documentation?
The 'Intel E7320 Memory Controller Hub (MCH) Datasheet' at
http://download.intel.com/design/chipsets/datashts/30300702.pdf
Page 69 documents register F4h (DEVPRES1).
And I just doubled checked that the 0xf4 register value is
restored later in the quirk (obvious when you look at the code
but not from the patch"
so here it is.
- Linus ]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
additional check of s390dbf level results in better performance
if the default low debugging level is active.
Signed-off-by: Peter Tiedemann <ptiedem@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since lcs makes use of 1 debug area only, the number of debug areas
is reduced, while the number of pages per area is increased.
Signed-off-by: Peter Tiedemann <ptiedem@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dummy NOP actions for fsm-statemachines have to be defined
separately for every using module of fsm-statemachines.
Thus the generic name fsm_action_nop is replaced by
module specific name netiucv_action_nop.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Volatile variables queme_switch and pk_delay are not used anyway.
They are just a left over from an unused timer based packing logic.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
New chipsets introduced variant Rx FIFO sizes that need to be taken into
account when setting up the tx pause watermarks. This patch introduces
the new device feature flags based on a version and implements the new
watermarks.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch supports a new fix in hardware regarding tx collisions. In
the cases where we are in autoneg mode and the link partner is in forced
mode, we need to setup the tx deferral register differently in order to
reduce collisions on the wire.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When ARP completes due to a request rather than a reply the neighbor is
marked NUD_STALE instead of reachable (see arp_process()). The handler
for the resulting netevent needs to check also for NUD_STALE.
Failure to use the arp entry can cause RDMA connection failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Show whether the MAC address was read from the EEPROM or
the onboard PAR registers.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Reading the ID register does not always return the correct ID
from the device, so we retry several times to see if we get
a correct value.
These failures seem to be excaserbated by the speed of the
access to the chip (possibly time between issuing the address
and then the data cycle).
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add entry to handle the MII ioctl() calls via the
generic_mii_ioctl call.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Allow the platform data to specify to the DM9000 driver
that there is no posibility of an attached EEPROM on the
device, so default all reads to 0xff and ignore any
write operations.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The writing of the data should implicitly truncate
the data to 8bits, so do not bother with the ands
in the code.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove the cal_CRC as this is basically wrappering the
ether_crc_le function, and is only being used by the
multicast hash table functions.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The code was using a delay of 8ms, when it should have been
using the EEPROM status flag from the device to indicate the
EEPROM transaction had finished.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Use the netif_msg_*() macros to enable the debugging based
on the board's msg_enable field. The output still goes via
the dev_dbg() macros, so will be tagged and output as
appropriate.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
We have a perfectly good version control system, so we do not
need to duplicate change comments in the header for this code.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>