Commit Graph

109 Commits

Author SHA1 Message Date
Jason A. Donenfeld a251c17aa5 treewide: use get_random_u32() when possible
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:58 -06:00
Pengcheng Yang eba1a872cb ipvs: correctly print the memory size of ip_vs_conn_tab
The memory size of ip_vs_conn_tab changed after we use hlist
instead of list.

Fixes: 731109e784 ("ipvs: use hlist instead of list")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-19 11:04:06 +02:00
Andrea Claudi 69e73dbfda ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
ip_vs_conn_tab_bits may be provided by the user through the
conn_tab_bits module parameter. If this value is greater than 31, or
less than 0, the shift operator used to derive tab_size causes undefined
behaviour.

Fix this checking ip_vs_conn_tab_bits value to be in the range specified
in ipvs Kconfig. If not, simply use default value.

Fixes: 6f7edb4881 ("IPVS: Allow boot time change of hash size")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-14 00:57:28 +02:00
longguang.yue 073b04e76b ipvs: inspect reply packets from DR/TUN real servers
Just like for MASQ, inspect the reply packets coming from DR/TUN
real servers and alter the connection's state and timeout
according to the protocol.

It's ipvs's duty to do traffic statistic if packets get hit,
no matter what mode it is.

Signed-off-by: longguang.yue <bigclouds@163.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12 01:57:34 +02:00
Andrew Sy Kim 35dfb01314 ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1
When expire_nodest_conn=1 and a destination is deleted, IPVS does not
expire the existing connections until the next matching incoming packet.
If there are many connection entries from a single client to a single
destination, many packets may get dropped before all the connections are
expired (more likely with lots of UDP traffic). An optimization can be
made where upon deletion of a destination, IPVS queues up delayed work
to immediately expire any connections with a deleted destination. This
ensures any reused source ports from a client (within the IPVS timeouts)
are scheduled to new real servers instead of silently dropped.

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-22 01:17:59 +02:00
Julian Anastasov f9200a52ee ipvs: avoid expiring many connections from timer
Add new functions ip_vs_conn_del() and ip_vs_conn_del_put()
to release many IPVS connections in process context.
They are suitable for connections found in table
when we do not want to overload the timers.

Currently, the change is useful for the dropentry delayed
work but it will be used also in following patch
when flushing connections to failed destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-01 10:18:20 +02:00
Thomas Gleixner 2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Matteo Croce b71ed54dc2 ipvs: don't show negative times in ip_vs_conn
Since commit 500462a9de ("timers: Switch to a non-cascading wheel"),
timers duration can last even 12.5% more than the scheduled interval.

IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync,
which shows the remaining time before that a connection expires.
The default expire time for a connection is 60 seconds, and the
expiration timer can fire even 4 seconds later than the scheduled time.
The expiration time is calculated subtracting jiffies to the scheduled
expiration time, and it's shown as a huge number when the timer fires late,
since both values are unsigned.

This can confuse script and tools which relies on it, like ipvsadm:

    root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done
    TCP 00:05  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:04  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:03  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:02  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:01  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:00  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:44 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:43 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:42 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:41 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:40 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:39 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:57 +02:00
Julian Anastasov 762c400766 ipvs: drop conn templates under attack
Before now, connection templates were ignored by the random
dropentry procedure. But Michal Koutný suggests that we
should add exception for connections under SYN attack.
He provided patch that implements it for TCP:

<quote>

IPVS includes protection against filling the ip_vs_conn_tab by
dropping 1/32 of feasible entries every second. The template
entries (for persistent services) are never directly deleted by
this mechanism but when a picked TCP connection entry is being
dropped (1), the respective template entry is dropped too (realized
by expiring 60 seconds after the connection entry being dropped).

There is another mechanism that removes connection entries when they
time out (2), in this case the associated template entry is not deleted.
Under SYN flood template entries would accumulate (due to their entry
longer timeout).

The accumulation takes place also with drop_entry being enabled. Roughly
15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
(1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
for SYN_RECV), thus template entries would still accumulate.

The patch ensures that when a connection entry times out, we also remove
the template entry from the table. To prevent breaking persistent
services (since the connection may time out in already established state)
we add a new entry flag to protect templates what spawned at least one
established TCP connection.

</quote>

We already added ASSURED flag for the templates in previous patch, so
that we can use it now to decide which connection templates should be
dropped under attack. But we also have some cases that need special
handling.

We modify the dropentry procedure as follows:

- Linux timers currently use LIFO ordering but we can not rely on
this to drop controlling connections. So, set cp->timeout to 0
to indicate that connection was dropped and that on expiration we
should try to drop our controlling connections. As result, we can
now avoid the ip_vs_conn_expire_now call.

- move the cp->n_control check above, so that it avoids restarting
the timer for controlling connections when not needed.

- drop unassured connection templates here if they are not referred
by any connections.

On connection expiration: if connection was dropped (cp->timeout=0)
try to drop our controlling connection except if it is a template
in assured state.

In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls
according to the LIFO timer expiration order. It should work
faster for controlling connections with single controlled one.

Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:41 +02:00
Julian Anastasov ec1b28ca96 ipvs: provide just conn to ip_vs_state_name
In preparation for followup patches, provide just the cp
ptr to ip_vs_state_name.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:39 +02:00
Kees Cook 42bc47b353 treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vmalloc(a * b)

with:
        vmalloc(array_size(a, b))

as well as handling cases of:

        vmalloc(a * b * c)

with:

        vmalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vmalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vmalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vmalloc(C1 * C2 * C3, ...)
|
  vmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vmalloc(C1 * C2, ...)
|
  vmalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds cf626b0da7 Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull procfs updates from Al Viro:
 "Christoph's proc_create_... cleanups series"

* 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
  xfs, proc: hide unused xfs procfs helpers
  isdn/gigaset: add back gigaset_procinfo assignment
  proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
  tty: replace ->proc_fops with ->proc_show
  ide: replace ->proc_fops with ->proc_show
  ide: remove ide_driver_proc_write
  isdn: replace ->proc_fops with ->proc_show
  atm: switch to proc_create_seq_private
  atm: simplify procfs code
  bluetooth: switch to proc_create_seq_data
  netfilter/x_tables: switch to proc_create_seq_private
  netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
  neigh: switch to proc_create_seq_data
  hostap: switch to proc_create_{seq,single}_data
  bonding: switch to proc_create_seq_data
  rtc/proc: switch to proc_create_single_data
  drbd: switch to proc_create_single
  resource: switch to proc_create_seq_data
  staging/rtl8192u: simplify procfs code
  jfs: simplify procfs code
  ...
2018-06-04 10:00:01 -07:00
Christoph Hellwig c350637227 proc: introduce proc_create_net{,_data}
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release.  All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:24:30 +02:00
Julian Anastasov a050d345ce ipvs: fix refcount usage for conns in ops mode
Connections in One-packet scheduling mode (-o, --ops) are
removed with refcnt=0 because they are not hashed in conn table.
To avoid refcount_dec reporting this as error, change them to be
removed with refcount_dec_if_one as all other connections.

refcount_t hit zero at ip_vs_conn_put+0x31/0x40 [ip_vs]
in sh[15519], uid/euid: 497/497
WARNING: CPU: 0 PID: 15519 at ../kernel/panic.c:657
refcount_error_report+0x94/0x9e
Modules linked in: ip_vs_rr cirrus ttm sb_edac
edac_core drm_kms_helper crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc mousedev drm aesni_intel aes_x86_64
crypto_simd glue_helper cryptd psmouse evdev input_leds led_class
intel_agp fb_sys_fops syscopyarea sysfillrect intel_rapl_perf mac_hid
intel_gtt serio_raw sysimgblt agpgart i2c_piix4 i2c_core ata_generic
pata_acpi floppy cfg80211 rfkill button loop macvlan ip_vs
nf_conntrack libcrc32c crc32c_generic ip_tables x_tables ipv6
crc_ccitt autofs4 ext4 crc16 mbcache jbd2 fscrypto ata_piix libata
atkbd libps2 scsi_mod crc32c_intel i8042 rtc_cmos serio af_packet
dm_mod dax fuse xen_netfront xen_blkfront
CPU: 0 PID: 15519 Comm: sh Tainted: G        W
4.15.17 #1-NixOS
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
RIP: 0010:refcount_error_report+0x94/0x9e
RSP: 0000:ffffa344dde039c8 EFLAGS: 00010296
RAX: 0000000000000057 RBX: ffffffff92f20e06 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffa344dde165c0
RBP: ffffa344dde03b08 R08: 0000000000000218 R09: 0000000000000004
R10: ffffffff93006a80 R11: 0000000000000001 R12: ffffa344d68cd100
R13: 00000000000001f1 R14: ffffffff92f12fb0 R15: 0000000000000004
FS:  00007fc9d2040fc0(0000) GS:ffffa344dde00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000262a000 CR3: 0000000016a0c004 CR4: 00000000001606f0
Call Trace:
 <IRQ>
 ex_handler_refcount+0x4e/0x80
 fixup_exception+0x33/0x40
 do_trap+0x83/0x140
 do_error_trap+0x83/0xf0
 ? ip_vs_conn_drop_conntrack+0x120/0x1a5 [ip_vs]
 ? ip_finish_output2+0x29c/0x390
 ? ip_finish_output2+0x1a2/0x390
 invalid_op+0x1b/0x40
RIP: 0010:ip_vs_conn_put+0x31/0x40 [ip_vs]
RSP: 0000:ffffa344dde03bb8 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffa344df31cf00 RCX: ffffa344d7450198
RDX: 0000000000000003 RSI: 00000000fffffe01 RDI: ffffa344d7450140
RBP: 0000000000000002 R08: 0000000000000476 R09: 0000000000000000
R10: ffffa344dde03b28 R11: ffffa344df200000 R12: ffffa344d7d09000
R13: ffffa344def3a980 R14: ffffffffc04f6e20 R15: 0000000000000008
 ip_vs_in.part.29.constprop.36+0x34f/0x640 [ip_vs]
 ? ip_vs_conn_out_get+0xe0/0xe0 [ip_vs]
 ip_vs_remote_request4+0x47/0xa0 [ip_vs]
 ? ip_vs_in.part.29.constprop.36+0x640/0x640 [ip_vs]
 nf_hook_slow+0x43/0xc0
 ip_local_deliver+0xac/0xc0
 ? ip_rcv_finish+0x400/0x400
 ip_rcv+0x26c/0x380
 __netif_receive_skb_core+0x3a0/0xb10
 ? inet_gro_receive+0x23c/0x2b0
 ? netif_receive_skb_internal+0x24/0xb0
 netif_receive_skb_internal+0x24/0xb0
 napi_gro_receive+0xb8/0xe0
 xennet_poll+0x676/0xb40 [xen_netfront]
 net_rx_action+0x139/0x3a0
 __do_softirq+0xde/0x2b4
 irq_exit+0xae/0xb0
 xen_evtchn_do_upcall+0x2c/0x40
 xen_hvm_callback_vector+0x7d/0x90
 </IRQ>
RIP: 0033:0x7fc9d11c91f9
RSP: 002b:00007ffebe8a2ea0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff0c
RAX: 00000000ffffffff RBX: 0000000002609808 RCX: 0000000000000054
RDX: 0000000000000001 RSI: 0000000002605440 RDI: 00000000025f940e
RBP: 00000000025f940e R08: 000000000260213d R09: 1999999999999999
R10: 000000000262a808 R11: 00000000025f942d R12: 00000000025f940e
R13: 00007fc9d1301e20 R14: 00000000025f9408 R15: 00007fc9d1302720
Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 e0 05 00
00 45 8b 84 24 38 04 00 00 41 89 c1 48 89 de 48 c7 c7 a8 2f f2 92 e8
7c fa ff ff <0f> 0b 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5
41 56

Reported-by: Net Filter <netfilternetfilter@gmail.com>
Fixes: b54ab92b84 ("netfilter: refcounter conversions")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-08 14:15:12 +02:00
Alexey Dobriyan 4c87158dae netfilter: delete /proc THIS_MODULE references
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

	-               if (de->proc_fops)
	-                       inode->i_fop = de->proc_fops;
	+               if (de->proc_fops) {
	+                       if (S_ISREG(inode->i_mode))
	+                               inode->i_fop = &proc_reg_file_ops;
	+                       else
	+                               inode->i_fop = de->proc_fops;
	+               }

VFS stopped pinning module at this point.

# ipvs
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-19 14:10:53 +01:00
Gao Feng 6b3d933000 netfilter: ipvs: Remove useless ipvsh param of frag_safe_skb_hp
The param of frag_safe_skb_hp, ipvsh, isn't used now. So remove it and
update the callers' codes too.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:02 +01:00
Linus Torvalds 5bbcc0f595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Maintain the TCP retransmit queue using an rbtree, with 1GB
      windows at 100Gb this really has become necessary. From Eric
      Dumazet.

   2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

   3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
      Lunn.

   4) Add meter action support to openvswitch, from Andy Zhou.

   5) Add a data meta pointer for BPF accessible packets, from Daniel
      Borkmann.

   6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

   7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

   8) More work to move the RTNL mutex down, from Florian Westphal.

   9) Add 'bpftool' utility, to help with bpf program introspection.
      From Jakub Kicinski.

  10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
      Dangaard Brouer.

  11) Support 'blocks' of transformations in the packet scheduler which
      can span multiple network devices, from Jiri Pirko.

  12) TC flower offload support in cxgb4, from Kumar Sanghvi.

  13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
      Leitner.

  14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

  15) Add RED qdisc offloadability, and use it in mlxsw driver. From
      Nogah Frankel.

  16) eBPF based device controller for cgroup v2, from Roman Gushchin.

  17) Add some fundamental tracepoints for TCP, from Song Liu.

  18) Remove garbage collection from ipv6 route layer, this is a
      significant accomplishment. From Wei Wang.

  19) Add multicast route offload support to mlxsw, from Yotam Gigi"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
  tcp: highest_sack fix
  geneve: fix fill_info when link down
  bpf: fix lockdep splat
  net: cdc_ncm: GetNtbFormat endian fix
  openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
  netem: remove unnecessary 64 bit modulus
  netem: use 64 bit divide by rate
  tcp: Namespace-ify sysctl_tcp_default_congestion_control
  net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
  ipv6: set all.accept_dad to 0 by default
  uapi: fix linux/tls.h userspace compilation error
  usbnet: ipheth: prevent TX queue timeouts when device not ready
  vhost_net: conditionally enable tx polling
  uapi: fix linux/rxrpc.h userspace compilation errors
  net: stmmac: fix LPI transitioning for dwmac4
  atm: horizon: Fix irq release error
  net-sysfs: trigger netlink notification on ifalias change via sysfs
  openvswitch: Using kfree_rcu() to simplify the code
  openvswitch: Make local function ovs_nsh_key_attr_size() static
  openvswitch: Fix return value check in ovs_meter_cmd_features()
  ...
2017-11-15 11:56:19 -08:00
Kees Cook 8ef81c6548 netfilter: ipvs: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: lvs-devel@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
2017-11-08 15:53:58 -08:00
Helge Deller c5cc0c6971 netfilter: ipvs: Use %pS printk format for direct addresses
The debug and error printk functions in ipvs uses wrongly the %pF instead of
the %pS printk format specifier for printing symbols for the address returned
by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64
architectures.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Cc: lvs-devel@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06 14:44:20 +01:00
Reshetova, Elena b54ab92b84 netfilter: refcounter conversions
refcount_t type and corresponding API (see include/linux/refcount.h)
should be used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-17 12:49:43 +01:00
Alexey Dobriyan 5b5e0928f7 lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Marco Angaroni 3ec10d3a2b ipvs: update real-server binding of outgoing connections in SIP-pe
Previous patch that introduced handling of outgoing packets in SIP
persistent-engine did not call ip_vs_check_template() in case packet was
matching a connection template. Assumption was that real-server was
healthy, since it was sending a packet just in that moment.

There are however real-server fault conditions requiring that association
between call-id and real-server (represented by connection template)
gets updated. Here is an example of the sequence of events:
  1) RS1 is a back2back user agent that handled call-id1 and call-id2
  2) RS1 is down and was marked as unavailable
  3) new message from outside comes to IPVS with call-id1
  4) IPVS reschedules the message to RS2, which becomes new call handler
  5) RS2 forwards the message outside, translating call-id1 to call-id2
  6) inside pe->conn_out() IPVS matches call-id2 with existing template
  7) IPVS does not change association call-id2 <-> RS1
  8) new message comes from client with call-id2
  9) IPVS reschedules the message to a real-server potentially different
     from RS2, which is now the correct destination

This patch introduces ip_vs_check_template() call in the handling of
outgoing packets for SIP-pe. And also introduces a second optional
argument for ip_vs_check_template() that allows to check if dest
associated to a connection template is the same dest that was identified
as the source of the packet. This is to change the real-server bound to a
particular call-id independently from its availability status: the idea
is that it's more reliable, for in->out direction (where internal
network can be considered trusted), to always associate a call-id with
the last real-server that used it in one of its messages. Think about
above sequence of events where, just after step 5, RS1 returns instead
to be available.

Comparison of dests is done by simply comparing pointers to struct
ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its
memory address, but represent a different real-server in terms of
ip-address / port.

Fixes: 39b9722315 ("ipvs: handle connections started by real-servers")
Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2016-06-06 09:47:25 +09:00
Marco Angaroni 698e2a8dca ipvs: make drop_entry protection effective for SIP-pe
DoS protection policy that deletes connections to avoid out of memory is
currently not effective for SIP-pe plus OPS-mode for two reasons:
  1) connection templates (holding SIP call-id) are always skipped in
     ip_vs_random_dropentry()
  2) in_pkts counter (used by drop_entry algorithm) is not incremented
     for connection templates

This patch addresses such problems with the following changes:
  a) connection templates associated (via their dest) to virtual-services
     configured in OPS mode are included in ip_vs_random_dropentry()
     monitoring. This applies to SIP-pe over UDP (which requires OPS mode),
     but is more general principle: when OPS is controlled by templates
     memory can be used only by templates themselves, since OPS conns are
     deleted after packet is forwarded.
  b) OPS connections, if controlled by a template, cause increment of
     in_pkts counter of their template. This is already happening but only
     in case director is in master-slave mode (see ip_vs_sync_conn()).

Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2016-05-06 16:26:23 +09:00
Marco Angaroni 8fb04d9fc7 ipvs: don't alter conntrack in OPS mode
When using OPS mode in conjunction with SIP persistent-engine, packets
originating from the same ip-address/port could be balanced to different
real servers, and (to properly handle SIP responses) OPS connections
are created in the in-out direction too, where ip_vs_update_conntrack()
is called to modify the reply tuple.

As a result, there can be collision of conntrack tuples, causing random
packet drops, as explained below:

conntrack1: orig=CIP->VIP, reply=RIP1->CIP
conntrack2: orig=RIP2->CIP, reply=CIP->VIP

Tuple CIP->VIP is both in orig of conntrack1 and reply of conntrack2.
The collision triggers packet drop inside nf_conntrack processing.

In addition, the current implementation deletes the conntrack object at
every expire of an OPS connection (once every forwarded packet), to have
it recreated from scratch at next packet traversing IPVS.

Since in OPS mode, by definition, we don't expect any associated
response, the choices implemented in this patch are:
a) don't call nf_conntrack_alter_reply() for OPS connections inside
   ip_vs_update_conntrack().
b) don't delete the conntrack object at OPS connection expire.

The result is that created conntrack objects for each tuple CIP->VIP,
RIP-N->CIP, etc. are left in UNREPLIED state and not modified by IPVS
OPS connection management. This eliminates packet drops and leaves
a single conntrack object for each tuple packets are sent from.

Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20 12:34:17 +10:00
Marco Angaroni 013b042465 ipvs: optimize release of connections in OPS mode
One-packet-scheduling is the most expensive mode in IPVS from
performance point of view: for each packet to be processed a new
connection data structure is created and, after packet is sent,
deleted by starting a new timer set to expire immediately.

SIP persistent-engine needs OPS mode to have Call-ID based load
balancing, so OPS mode performance has negative impact in SIP
protocol load balancing.

This patch aims to improve performance of OPS mode by means of the
following changes in the release mechanism of OPS connections:
a) call expire callback ip_vs_conn_expire() directly instead of
   starting a timer programmed to fire immediately.
b) avoid call_rcu() overhead inside expire callback, since OPS
   connection are not inserted in the hash-table and last just the
   time to process the packet, hence there is no concurrent access
   to such data structures.

Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20 12:34:17 +10:00
Simon Horman 92240e8dc0 ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup}
If CONFIG_PROC_FS is undefined then the arguments of proc_create()
and remove_proc_entry() are unused. As a result the net variables of
ip_vs_conn_net_{init,cleanup} are unused.

net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’:
net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ [-Wunused-variable]
net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’:
net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ [-Wunused-variable]
...

Resolve this by dereferencing net as needed rather than storing it
in a variable.

Fixes: 3d99376689 ("ipvs: Pass ipvs not net into ip_vs_control_net_(init|cleanup)")
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2015-10-07 10:12:00 +09:00
Eric W. Biederman 2f3edc6a5b ipvs: Pass ipvs not net into ip_vs_conn_net_init and ip_vs_conn_net_cleanup
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:41 +09:00
Eric W. Biederman d889717aaf ipvs: Pass ipvs not net into ip_vs_conn_net_flush
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:41 +09:00
Eric W. Biederman 754b81a357 ipvs: Pass ipvs not net to ip_vs_conn_hashkey
Use the address of struct netns_ipvs in the hash not the address of
struct net.  Both addresses are equally valid candidates and by using
the address of struct netns_ipvs there becomes no need deal with
struct net in this part of the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:41 +09:00
Eric W. Biederman 0cf705c8c2 ipvs: Pass ipvs into conn_out_get
Move the hack of relying on "net_ipvs(skb_net(skb))" to derive the
ipvs up a layer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:41 +09:00
Eric W. Biederman ab16197642 ipvs: Pass ipvs into .conn_in_get and ip_vs_conn_in_get_proto
Stop relying on "net_ipvs(skb_net(skb))" to derive the ipvs as
skb_net is a hack.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:41 +09:00
Eric W. Biederman f5099dd4d9 ipvs: Pass ipvs into ip_vs_conn_fill_param_proto
Move the ugly hack net_ipvs(skb_net(skb)) up a layer in the call stack
so it is easier to remove.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:40 +09:00
Eric W. Biederman 423b55954d ipvs: Pass ipvs not net to ip_vs_random_drop_entry
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:39 +09:00
Eric W. Biederman b61a8c1a40 ipvs: Pass ipvs not net to ip_vs_sync_conn
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:38 +09:00
Eric W. Biederman 18d6ade63c ipvs: Pass ipvs not net to ip_vs_proto_data_get
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:35 +09:00
Eric W. Biederman dc2add6f2e ipvs: Pass ipvs not net to ip_vs_find_dest
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:34 +09:00
Eric W. Biederman 19913dec1b ipvs: Pass ipvs not net to ip_vs_fill_conn
ipvs is what is actually desired so change the parameter and the modify
the callers to pass struct netns_ipvs.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:33 +09:00
Eric W. Biederman e64e2b460c ipvs: Store ipvs not net in struct ip_vs_conn_param
In practice struct netns_ipvs is as meaningful as struct net and more
useful as it holds the ipvs specific data.  So store a pointer to
struct netns_ipvs.

Update the accesses of param->net to access param->ipvs->net instead.

When lookup up struct ip_vs_conn in a hash table replace comparisons
of cp->net with comparisons of cp->ipvs which is possible
now that ipvs is present in ip_vs_conn_param.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:33 +09:00
Eric W. Biederman 58dbc6f260 ipvs: Store ipvs not net in struct ip_vs_conn
In practice struct netns_ipvs is as meaningful as struct net and more
useful as it holds the ipvs specific data.  So store a pointer to
struct netns_ipvs.

Update the accesses of conn->net to access conn->ipvs->net instead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24 09:34:33 +09:00
Alex Gartrell 802c41adcf ipvs: drop inverse argument to conn_{in,out}_get
No longer necessary since the information is included in the ip_vs_iphdr
itself.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01 10:33:37 +09:00
Julian Anastasov f18ae7206e ipvs: use the new dest addr family field
Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:28 +09:00
Alex Gartrell 8052ba2925 ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell 655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Julian Anastasov 2627b7e15c ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
commit 8f4e0a1868 ("IPVS netns exit causes crash in conntrack")
added second ip_vs_conn_drop_conntrack call instead of just adding
the needed check. As result, the first call still can cause
crash on netns exit. Remove it.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-07-16 09:39:28 +09:00
Michal Kubecek 2a971354e7 ipvs: fix AF assignment in ip_vs_conn_new()
If a fwmark is passed to ip_vs_conn_new(), it is passed in
vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in
vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we
may copy only first 4 bytes of an IPv6 address into cp->daddr.

Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-02-04 21:13:47 +09:00
Aruna-Hewapathirane 63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Julian Anastasov acaac5d8bb ipvs: drop SCTP connections depending on state
Drop SCTP connections under load (dropentry context) depending
on the protocol state, just like for TCP: INIT conns are
dropped immediately, established are dropped randomly while
connections in progress or shutdown are skipped.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Simon Horman a38e5e230e ipvs: use cond_resched_rcu() helper when walking connections
This avoids the situation where walking of a large number of connections
may prevent scheduling for a long time while also avoiding excessive
calls to rcu_read_unlock() and rcu_read_lock().

Note that in the case of !CONFIG_PREEMPT_RCU this will
add a call to cond_resched().

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 14:23:18 +02:00
Julian Anastasov 7cf2eb7bcc ipvs: fix sparse warnings for ip_vs_conn listing
kbuild test robot reports for sparse warnings
in commit 088339a57d ("ipvs: convert connection locking"):

net/netfilter/ipvs/ip_vs_conn.c:962:13: warning: context imbalance
	in 'ip_vs_conn_array' - wrong count at exit
include/linux/rcupdate.h:326:30: warning: context imbalance in
	'ip_vs_conn_seq_next' - unexpected unlock
include/linux/rcupdate.h:326:30: warning: context imbalance in
	'ip_vs_conn_seq_stop' - unexpected unlock

Fix it by running ip_vs_conn_array under RCU lock
to avoid conditional locking and by adding proper RCU
annotations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-23 11:43:05 +09:00