Commit graph

690 commits

Author SHA1 Message Date
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Alan Stern
a120586873 [PATCH] Allow NULL pointers in percpu_free
The patch (as824b) makes percpu_free() ignore NULL arguments, as one would
expect for a deallocation routine.  (Note that free_percpu is #defined as
percpu_free in include/linux/percpu.h.) A few callers are updated to remove
now-unneeded tests for NULL.  A few other callers already seem to assume
that passing a NULL pointer to percpu_free() is okay!

The patch also removes an unnecessary NULL check in percpu_depopulate().

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Masahide NAKAMURA
4e33fa14fa [IPV6] RAW: Don't release unlocked sock.
When user builds IPv6 header and send it through raw socket, kernel
tries to release unlocked sock. (Kernel log shows
"BUG: bad unlock balance detected" with enabled debug option.)

The lock is held only for non-hdrincl sock in this function
then this patch fix to do nothing about lock for hdrincl one.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:09 -08:00
YOSHIFUJI Hideaki
9a217a1c7e [IPV6]: Repair IPv6 Fragments
The commit "[IPV6]: Use kmemdup" (commit-id:
af879cc704) broke IPv6 fragments.

Bug was spotted by Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:08 -08:00
Dmitry Mishin
74c9c0c17d [NETFILTER]: Fix {ip,ip6,arp}_tables hook validation
Commit 590bdf7fd2 introduced a regression
in match/target hook validation. mark_source_chains builds a bitmask
for each rule representing the hooks it can be reached from, which is
then used by the matches and targets to make sure they are only called
from valid hooks. The patch moved the match/target specific validation
before the mark_source_chains call, at which point the mask is always zero.

This patch returns back to the old order and moves the standard checks
to mark_source_chains. This allows to get rid of a special case for
standard targets as a nice side-effect.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:02 -08:00
Patrick McHardy
a3c479772c [NETFILTER]: Mark old IPv4-only connection tracking scheduled for removal
Also remove the references to "new connection tracking" from Kconfig.
After some short stabilization period of the new connection tracking
helpers/NAT code the old one will be removed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:11:01 -08:00
Patrick McHardy
bff9a89bca [NETFILTER]: nf_conntrack: endian annotations
Resync with Al Viro's ip_conntrack annotations and fix a missed
spot in ip_nat_proto_icmp.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:05:08 -08:00
Andrew Morton
b6332e6cf9 [TCP]: Fix warnings with TCP_MD5SIG disabled.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:52 -08:00
Adrian Bunk
f5b99bcddd [NET]: Possible cleanups.
This patch contains the following possible cleanups:
- make the following needlessly global functions statis:
  - ipv4/tcp.c: __tcp_alloc_md5sig_pool()
  - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
  - ipv4/udplite.c: udplite_rcv()
  - ipv4/udplite.c: udplite_err()
- make the following needlessly global structs static:
  - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
  - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
  - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
- net/ipv{4,6}/udplite.c: remove inline's from static functions
                          (gcc should know best when to inline them)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:51 -08:00
Patrick McHardy
76592584be [NETFILTER]: Fix PROC_FS=n warnings
Fix some unused function/variable warnings.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:34 -08:00
Patrick McHardy
baf7b1e112 [NETFILTER]: x_tables: add NFLOG target
Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6.
Currently we have two (unsupported by userspace) hacks in the LOG and ULOG
targets to optionally call to the nflog API. They lack a few features,
namely the IPv4 and IPv6 LOG targets can not specify a number of arguments
related to nfnetlink_log, while the ULOG target is only available for IPv4.
Remove those hacks and add a clean way to use nfnetlink_log.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:31 -08:00
Patrick McHardy
933a41e7e1 [NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:18 -08:00
Patrick McHardy
f8eb24a89a [NETFILTER]: nf_conntrack: move extern declaration to header files
Using extern in a C file is a bad idea because the compiler can't
catch type errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:16 -08:00
Martin Josefsson
605dcad6c8 [NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:09 -08:00
Gerrit Renker
4c0a6cb0db [UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.

Furthermore, there is the following code reduplication:
 * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
 * do_udp{,v6}_setsockopt is identical up to the following differerence
	--v4 in contrast to v4 additionally allows the experimental encapsulation
          types  UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
	--the remainder is identical between v4 and v6
   I believe that this difference is of little relevance.

The advantages in not duplicating twice almost completely identical code.

The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:45 -08:00
Thomas Graf
e3703b3de1 [RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code
IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar
way, therefore rtnl_put_cacheinfo() is added to reuse code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:44 -08:00
Ville Nuorvala
107a5fe619 [IPV6]: Improve IPv6 tunnel error reporting
Log an error if the remote tunnel endpoint is unable to handle
tunneled packets.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:27 -08:00
Ville Nuorvala
6fb32ddeb2 [IPV6]: Don't allocate memory for Tunnel Encapsulation Limit Option
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:26 -08:00
Ville Nuorvala
305d4b3ce8 [IPV6]: Allow link-local tunnel endpoints
Allow link-local tunnel endpoints if the underlying link is defined.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:25 -08:00
Ville Nuorvala
09c6bbf090 [IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime
Doing the mandatory tunnel endpoint checks when the tunnel is set up
isn't enough as interfaces can go up or down and addresses can be
added or deleted after this. The checks need to be done realtime when
the tunnel is processing a packet.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:24 -08:00
Ville Nuorvala
567131a722 [IPV6]: Fix SIOCCHGTUNNEL bug in IPv6 tunnels
A logic bug in tunnel lookup could result in duplicate tunnels when
changing an existing device.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:24 -08:00
Al Viro
ff1dcadb1b [NET]: Split skb->csum
... into anonymous union of __wsum and __u32 (csum and csum_offset resp.)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:18 -08:00
Al Viro
8e5200f540 [NET]: Fix assorted misannotations (from md5 and udplite merges).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:16 -08:00
Adrian Bunk
89c8945815 [IPV6] net/ipv6/sit.c: make 2 functions static
This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:15 -08:00
Arnaldo Carvalho de Melo
af879cc704 [IPV6]: Use kmemdup
Code diff stats:

[acme@newtoy net-2.6.20]$ codiff /tmp/ipv6.ko.before /tmp/ipv6.ko.after
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/ip6_output.c:
  ip6_output      |  -52
  ip6_append_data |   +2
 2 functions changed, 2 bytes added, 52 bytes removed

/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/addrconf.c:
  addrconf_sysctl_register |  -27
 1 function changed, 27 bytes removed

/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c:
  tcp_v6_syn_recv_sock  |  -32
  tcp_v6_parse_md5_keys |  -24
 2 functions changed, 56 bytes removed

/tmp/ipv6.ko.after:
 5 functions changed, 2 bytes added, 135 bytes removed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:58 -08:00
David S. Miller
7d9e9b3df4 [IPV6]: udp.c build fix
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:46 -08:00
Al Viro
f6ab028804 [NET]: Make mangling a checksum (0 -> 0xffff on the wire) explicit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:39 -08:00
Al Viro
b51655b958 [NET]: Annotate __skb_checksum_complete() and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:38 -08:00
Al Viro
5f92a7388a [NET]: Annotate callers of the reset of checksum.h stuff.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:34 -08:00
Al Viro
868c86bcb5 [NET]: annotate csum_ipv6_magic() callers in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:31 -08:00
Al Viro
e69a4adc66 [IPV6]: Misc endianness annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:52 -08:00
Gerrit Renker
ba4e58eca8 [NET]: Supporting UDP-Lite (RFC 3828) in Linux
This is a revision of the previously submitted patch, which alters
the way files are organized and compiled in the following manner:

	* UDP and UDP-Lite now use separate object files
	* source file dependencies resolved via header files
	  net/ipv{4,6}/udp_impl.h
	* order of inclusion files in udp.c/udplite.c adapted
	  accordingly

[NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)

This patch adds support for UDP-Lite to the IPv4 stack, provided as an
extension to the existing UDPv4 code:
        * generic routines are all located in net/ipv4/udp.c
        * UDP-Lite specific routines are in net/ipv4/udplite.c
        * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
        * shared API with extensions for partial checksum coverage

[NET/IPv6]: Extension for UDP-Lite over IPv6

It extends the existing UDPv6 code base with support for UDP-Lite
in the same manner as per UDPv4. In particular,
        * UDPv6 generic and shared code is in net/ipv6/udp.c
        * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
        * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
        * support for IPV6_ADDRFORM
        * aligned the coding style of protocol initialisation with af_inet6.c
        * made the error handling in udpv6_queue_rcv_skb consistent;
          to return `-1' on error on all error cases
        * consolidation of shared code

[NET]: UDP-Lite Documentation and basic XFRM/Netfilter support

The UDP-Lite patch further provides
        * API documentation for UDP-Lite
        * basic xfrm support
        * basic netfilter support for IPv4 and IPv6 (LOG target)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:46 -08:00
Thomas Graf
6051e2f4fb [IPv6] prefix: Convert RTM_NEWPREFIX notifications to use the new netlink api
RTM_GETPREFIX is completely unused and is thus removed.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:45 -08:00
Thomas Graf
04561c1fe7 [IPv6] iflink: Convert IPv6's RTM_GETLINK to use the new netlink api
By replacing the current method of exporting the device configuration
which included allocating a temporary buffer, copying ipv6_devconf
into it and copying that buffer into the message with a method that
uses nla_reserve() allowing to copy the device configuration directly
into the skb data buffer, a GFP_ATOMIC allocation could be removed.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:44 -08:00
David S. Miller
a928630a2f [TCP]: Fix some warning when MD5 is disabled.
Just some mis-placed ifdefs:

net/ipv4/tcp_minisocks.c: In function ‘tcp_twsk_destructor’:
net/ipv4/tcp_minisocks.c:364: warning: unused variable ‘twsk’
net/ipv6/tcp_ipv6.c:1846: warning: ‘tcp_sock_ipv6_specific’ defined but not used
net/ipv6/tcp_ipv6.c:1877: warning: ‘tcp_sock_ipv6_mapped_specific’ defined but not used

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:43 -08:00
YOSHIFUJI Hideaki
cfb6eeb4c8 [TCP]: MD5 Signature Option (RFC2385) support.
Based on implementation by Rick Payne.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:39 -08:00
Gerrit Renker
b9df3cb8cf [TCP/DCCP]: Introduce net_xmit_eval
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.

This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:27 -08:00
Brian Haley
d3a1be9cba [IPv6]: Only modify checksum for UDP
Only change upper-layer checksum from 0 to 0xFFFF for UDP (as RFC 768
states), not for others as RFC 4443 doesn't require it.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:13 -08:00
Thomas Graf
f465e489c4 [IPv6] rules: Remove bogus tos validation check
Noticed by Al Viro:
     (frh->tos & ~IPV6_FLOWINFO_MASK))
where IPV6_FLOWINFO_MASK is htonl(0xfffffff) and frh->tos
is u8, which makes no sense here...

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:12 -08:00
Thomas Graf
339bf98ffc [NETLINK]: Do precise netlink message allocations where possible
Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.

Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:11 -08:00
Gerrit Renker
a94f723d59 [TCP]: Remove dead code in init_sequence
This removes two redundancies:

1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
   is always true, due to
	* tcp_v6_conn_request() is the only function calling this one
	* tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
	  tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]

2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
   never used.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:10 -08:00
YOSHIFUJI Hideaki
a11d206d0f [IPV6]: Per-interface statistics support.
For IP MIB (RFC4293).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:08 -08:00
YOSHIFUJI Hideaki
7a3025b1b3 [IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}.
Otherwise, we will see a lot of casts...

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:07 -08:00
YOSHIFUJI Hideaki
40aa7b90a9 [IPV6] ROUTE: Use &rt->u.dst instead of cast.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:06 -08:00
YOSHIFUJI Hideaki
33e93c9699 [IPV6] ROUTE: Use macros to format /proc/net/ipv6_route.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:05 -08:00
David S. Miller
931731123a [TCP]: Don't set SKB owner in tcp_transmit_skb().
The data itself is already charged to the SKB, doing
the skb_set_owner_w() just generates a lot of noise and
extra atomics we don't really need.

Lmbench improvements on lat_tcp are minimal:

before:
TCP latency using localhost: 23.2701 microseconds
TCP latency using localhost: 23.1994 microseconds
TCP latency using localhost: 23.2257 microseconds

after:
TCP latency using localhost: 22.8380 microseconds
TCP latency using localhost: 22.9465 microseconds
TCP latency using localhost: 22.8462 microseconds

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:52 -08:00
David S. Miller
9ec75fe85c [IPV6] tcp: Fix typo _read_mostly --> __read_mostly.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:46 -08:00
Eric Dumazet
72a3effaf6 [NET]: Size listen hash tables using backlog hint
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:44 -08:00
Thomas Graf
1f6c9557e8 [NET] rules: Share common attribute validation policy
Move the attribute policy for the non-specific attributes into
net/fib_rules.h and include it in the respective protocols.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:41 -08:00
Thomas Graf
b8964ed9fa [NET] rules: Protocol independant mark selector
Move mark selector currently implemented per protocol into
the protocol independant part.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:41 -08:00
Thomas Graf
47dcf0cb10 [NET]: Rethink mark field in struct flowi
Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.

The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:39 -08:00
Thomas Graf
82e91ffef6 [NET]: Turn nfmark into generic mark
nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:38 -08:00
Al Viro
ae08e1f092 [IPV6]: ip6_output annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:26 -08:00
Al Viro
fede70b986 [IPV6]: annotate inet6_csk_search_req()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:22 -08:00
Al Viro
90bcaf7b4a [IPV6]: flowlabels are net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:21 -08:00
Al Viro
8a74ff7770 [IPV6]: annotate ipv6 mcast
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:15 -08:00
Al Viro
04ce69093f [IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:12 -08:00
Al Viro
8c689a6eae [XFRM]: misc annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:11 -08:00
Al Viro
d2ecd9ccd0 [IPV6]: annotate inet6_hashtables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:10 -08:00
David S. Miller
d54a81d341 [IPV6] NDISC: Calculate packet length correctly for allocation.
MAX_HEADER does not include the ipv6 header length in it,
so we need to add it in explicitly.

With help from YOSHIFUJI Hideaki.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:06:31 -08:00
YOSHIFUJI Hideaki
f2776ff047 [IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture.
TCP and RAW do not have this issue.  Closes Bug #7432.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-21 17:41:56 -08:00
Yasuyuki Kozakai
53ab61c6d8 [IPV6] IP6TUNNEL: Add missing nf_reset() on input path.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21 16:16:27 -08:00
Yasuyuki Kozakai
b3fdd9f115 [IPV6] IP6TUNNEL: Delete all tunnel device when unloading module.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21 16:16:26 -08:00
YOSHIFUJI Hideaki
ea659e0775 [IPV6] ROUTE: Do not enable router reachability probing in router mode.
RFC4191 explicitly states that the procedures are applicable to
hosts only.  We should not have changed behavior of routers.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21 16:16:25 -08:00
YOSHIFUJI Hideaki
557e92efd4 [IPV6] ROUTE: Prefer reachable nexthop only if the caller requests.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21 16:16:24 -08:00
YOSHIFUJI Hideaki
ea73ee23c4 [IPV6] ROUTE: Try to use router which is not known unreachable.
Only routers in "FAILED" state should be considered unreachable.
Otherwise, we do not try to use speicific routes unless all least specific
routers are considered unreachable.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21 16:16:23 -08:00
Patrick McHardy
337dde798d [NETFILTER]: ip6_tables: use correct nexthdr value in ipv6_find_hdr()
nexthdr is NEXTHDR_FRAGMENT, the nexthdr value from the fragment header
is hp->nexthdr.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15 21:18:50 -08:00
Patrick McHardy
d8a585d78e [NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue
Based on patch by James D. Nurmi:

I've got some code very dependant on nfnetlink_queue, and turned up a
large number of warns coming from skb_trim.  While it's quite possibly
my code, having not seen it on older kernels made me a bit suspect.

Anyhow, based on some googling I turned up this thread:
http://lkml.org/lkml/2006/8/13/56

And believe the issue to be related, so attached is a small patch to
the kernel -- not sure if this is completely correct, but for anyone
else hitting the WARN_ON(1) in skbuff.h, it might be helpful..

Signed-off-by: James D. Nurmi <jdnurmi@gmail.com>

Ported to ip6_queue and nfnetlink_queue and added return value
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15 21:18:48 -08:00
Patrick McHardy
daccff024f [IPV6]: Give sit driver an appropriate module alias.
It would be nice to keep things working even with this built as a
module, it took me some time to realize my IPv6 tunnel was broken
because of the missing sit module. This module alias fixes things
until distributions have added an appropriate alias to modprobe.conf.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-05 15:47:04 -08:00
Dmitry Mishin
36f73d0c3b [IPV6]: Add ndisc_netdev_notifier unregister.
If inet6_init() fails later than ndisc_init() call, or IPv6 module is
unloaded, ndisc_netdev_notifier call remains in the list and will follows in
oops later.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-05 14:11:33 -08:00
Al Viro
5b1225454f [IPV6]: File the fingerprints off ah6->spi/esp6->spi
In theory these are opaque 32bit values.  However, we end up
allocating them sequentially in host-endian and stick unchanged
on the wire.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-01 15:42:35 -08:00
James Morris
1b7c2dbc07 [IPV6]: fix flowlabel seqfile handling
There's a bug in the seqfile show operation for flowlabel objects, where 
each hash chain is traversed cumulatively for each element.  The following 
function is called for each element of each chain:

static void ip6fl_fl_seq_show(struct seq_file *seq, struct ip6_flowlabel *fl)
{
        while(fl) {
                seq_printf...
		
		fl = fl->next;
	}
}

Thus, objects can appear mutliple times when reading 
/proc/net/ip6_flowlabel, as the above is called for each element in the 
chain.

The solution is to remove the while() loop from the above, and traverse 
each chain exactly once, per the patch below.  This also removes the 
ip6fl_fl_seq_show() function, which does nothing else.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-31 00:43:44 -08:00
James Morris
c6817e4c32 [IPV6]: return EINVAL for invalid address with flowlabel lease request
Currently, when an application requests a lease for a flowlabel via the 
IPV6_FLOWLABEL_MGR socket option, no error is returned if an invalid type 
of destination address is supplied as part of the request, leading to a 
silent failure.  This patch ensures that EINVAL is returned to the 
application in this case.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 18:56:06 -08:00
Dmitry Mishin
590bdf7fd2 [NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables
There is a number of issues in parsing user-provided table in
translate_table(). Malicious user with CAP_NET_ADMIN may crash system by
passing special-crafted table to the *_tables.

The first issue is that mark_source_chains() function is called before entry
content checks. In case of standard target, mark_source_chains() function
uses t->verdict field in order to determine new position. But the check, that
this field leads no further, than the table end, is in check_entry(), which
is called later, than mark_source_chains().

The second issue, that there is no check that target_offset points inside
entry. If so, *_ITERATE_MATCH macro will follow further, than the entry
ends. As a result, we'll have oops or memory disclosure.

And the third issue, that there is no check that the target is completely
inside entry. Results are the same, as in previous issue.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 15:24:44 -08:00
Patrick McHardy
844dc7c880 [NETFILTER]: remove masq/NAT from ip6tables Kconfig help
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 15:24:43 -08:00
James Morris
bcd620757d [IPV6]: fix lockup via /proc/net/ip6_flowlabel
There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where,
after finding a flowlabel, the code will loop forever not finding any
further flowlabels, first traversing the rest of the hash bucket then just
looping.

This patch fixes the problem by breaking after the hash bucket has been
traversed.

Note that this bug can cause lockups and oopses, and is trivially invoked
by an unpriveleged user.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 15:24:42 -08:00
Heiko Carstens
a27b58fed9 [NET]: fix uaccess handling
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 15:24:41 -08:00
Patrick McHardy
6d381634d2 [NETFILTER]: Fix ip6_tables extension header bypass bug
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on extension header matches.

When extension headers occur in the non-first fragment after the fragment
header (possibly with an incorrect nexthdr value in the fragment header)
a rule looking for this extension header will never match.

Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
Since all extension headers are before the protocol header this makes sure
an extension header is either not present or in the first fragment, where
we can properly parse it.

With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-24 16:15:10 -07:00
Patrick McHardy
51d8b1a652 [NETFILTER]: Fix ip6_tables protocol bypass bug
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on protocol matches.

When the protocol header doesn't follow the fragment header immediately,
the fragment header contains the protocol number of the next extension
header. When the extension header and the protocol header are sent in
a second fragment a rule like "ip6tables .. -p udp -j DROP" will never
match.

Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.

With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-24 16:14:04 -07:00
Thomas Graf
375216ad0c [IPv6] fib: initialize tb6_lock in common place to give lockdep a key
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-21 20:20:54 -07:00
David S. Miller
6723ab549d [IPV6]: Fix route.c warnings when multiple tables are disabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 21:20:57 -07:00
Thomas Graf
9ce8ade015 [IPv6] route: Fix prohibit and blackhole routing decision
Lookups resolving to ip6_blk_hole_entry must result in silently
discarding the packets whereas an ip6_pkt_prohibit_entry is
supposed to cause an ICMPV6_ADM_PROHIBITED message to be sent.

Thanks to Kim Nordlund <kim.nordlund@nokia.com> for noticing
this bug.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 20:46:54 -07:00
Ville Nuorvala
22e1e4d8dc [IPV6]: Always copy rt->u.dst.error when copying a rt6_info.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 19:55:30 -07:00
Ville Nuorvala
264e91b68a [IPV6]: Make IPV6_SUBTREES depend on IPV6_MULTIPLE_TABLES.
As IPV6_SUBTREES can't work without IPV6_MULTIPLE_TABLES have IPV6_SUBTREES
depend on it.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 19:55:29 -07:00
Ville Nuorvala
e0eda7bbaa [IPV6]: Clean up BACKTRACK().
The fn check is unnecessary as fn can never be NULL in BACKTRACK().

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 19:55:28 -07:00
Ville Nuorvala
4251320fa2 [IPV6]: Make sure error handling is done when calling ip6_route_output().
As ip6_route_output() never returns NULL, error checking must be done by
looking at dst->error in stead of comparing dst against NULL.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 19:55:27 -07:00
Jan Dittmer
39c850863d [IPV6] sit: Add missing MODULE_LICENSE
This is missing the MODULE_LICENSE statements and taints the kernel
upon loading. License is obvious from the beginning of the file.

Signed-off-by: Jan Dittmer <jdi@l4x.org>
Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15 23:14:21 -07:00
YOSHIFUJI Hideaki
f1a95859a8 [IPV6]: Remove bogus WARN_ON in Proxy-NA handling.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15 23:14:20 -07:00
Thomas Graf
adaa70bbdf [IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors
Fixes rt6_lookup() to provide the source address in the flow
and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in
the flow.

Avoids unnecessary prefix comparisons by checking for a prefix
length first.

Fixes the rule logic to not match packets if a source selector
has been specified but no source address is available.

Thanks to Kim Nordlund <kim.nordlund@nokia.com> for working
on this patch with me.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15 23:14:19 -07:00
YOSHIFUJI Hideaki
9469c7b4aa [NET]: Use typesafe inet_twsk() inline function instead of cast.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:58 -07:00
YOSHIFUJI Hideaki
4244f8a9f8 [TCP]: Use TCPOLEN_TSTAMP_ALIGNED macro instead of magic number.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:54 -07:00
Joerg Roedel
0be669bb37 [IPV6]: Seperate sit driver to extra module (addrconf.c changes)
This patch contains the changes to net/ipv6/addrconf.c to remove sit
specific code if the sit driver is not selected.

Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:52 -07:00
Joerg Roedel
989e5b96e1 [IPV6]: Seperate sit driver to extra module
This patch removes the driver of the IPv6-in-IPv4 tunnel driver (sit)
from the IPv6 module. It adds an option to Kconfig which makes it
possible to compile it as a seperate module.

Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:50 -07:00
Venkat Yekkirala
5b368e61c2 IPsec: correct semantics for SELinux policy matching
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.

The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.

This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).

Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL.  We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely).  This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

This patch: Fix the selinux side of things.

This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.

Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-10-11 23:59:37 -07:00
Linus Torvalds
fefd26b3b8 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
  Remove all inclusions of <linux/config.h>

Manually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory.
2006-10-04 09:59:57 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Diego Beltrami
0a69452cb4 [XFRM]: BEET mode
This patch introduces the BEET mode (Bound End-to-End Tunnel) with as
specified by the ietf draft at the following link:

http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt

The patch provides only single family support (i.e. inner family =
outer family).

Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com>
Signed-off-by: Miika Komu     <miika@iki.fi>
Signed-off-by: Herbert Xu     <herbert@gondor.apana.org.au>
Signed-off-by: Abhinav Pathak <abhinav.pathak@hiit.fi>
Signed-off-by: Jeff Ahrenholz <ahrenholz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-04 00:31:09 -07:00
Herbert Xu
1e0c14f49d [UDP]: Fix MSG_PROBE crash
UDP tracks corking status through the pending variable.  The
IP layer also tracks it through the socket write queue.  It
is possible for the two to get out of sync when MSG_PROBE is
used.

This patch changes UDP to check the write queue to ensure
that the two stay in sync.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-04 00:31:00 -07:00
Herbert Xu
132a55f3c5 [UDP6]: Fix flowi clobbering
The udp6_sendmsg function uses a shared buffer to store the
flow without taking any locks.  This leads to races with SMP.
This patch moves the flowi object onto the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-04 00:30:59 -07:00
Herbert Xu
1a9e9ef684 [IPV6]: Disable SG for GSO unless we have checksum
Because the system won't turn off the SG flag for us we
need to do this manually on the IPv6 path.  Otherwise we
will throw IPv6 packets with bad checksums at the hardware.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:45 -07:00