Commit graph

3168 commits

Author SHA1 Message Date
Herbert Xu
b824478b21 rhashtable: Add multiple rehash support
This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
18093d1c0d rhashtable: Shrink to fit
This patch changes rhashtable_shrink to shrink to the smallest
size possible rather than halving the table.  This is needed
because with multiple rehashing we will defer shrinking until
all other rehashing is done, meaning that when we do shrink
we may be able to shrink a lot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
31ccde2dac rhashtable: Allow hashfn to be unset
Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:51 -04:00
Herbert Xu
d88252f9bb rhashtable: Add barrier to ensure we see new tables in walker
The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:51 -04:00
David S. Miller
0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Herbert Xu
dc0ee268d8 rhashtable: Rip out obsolete out-of-line interface
Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
b182aa6e96 test_rhashtable: Use inlined rhashtable interface
This patch converts test_rhashtable to the inlined rhashtable
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
02fd97c3d4 rhashtable: Allow hash/comparison functions to be inlined
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable.  We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.

The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.

This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.

This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not.  It
also adds a new function obj_cmpfn to compare a key against an
object.  This means that the caller no longer needs to supply
explicit compare functions.

All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
488fb86ee9 rhashtable: Make rhashtable_init params argument const
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.

This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Thomas Graf
a998f712f7 rhashtable: Round up/down min/max_size to ensure we respect limit
Round up min_size respectively round down max_size to the next power
of two to make sure we always respect the limit specified by the
user. This is required because we compare the table size against the
limit before we expand or shrink.

Also fixes a minor bug where we modified min_size in the params
provided instead of the copy stored in struct rhashtable.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19 21:02:23 -04:00
Herbert Xu
e2e21c1c58 rhashtable: Remove max_shift and min_shift
Now that nobody uses max_shift and min_shift, we can safely remove
them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:41 -04:00
Herbert Xu
4f509df4f5 test_rhashtable: Use rhashtable max_size instead of max_shift
This patch converts test_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Herbert Xu
c2e213cff7 rhashtable: Introduce max_size/min_size
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Herbert Xu
6aebd94084 rhashtable: Remove shift from bucket_table
Keeping both size and shift is silly.  We only need one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Thomas Graf
617011e7d5 rhashtable: Avoid calculating hash again to unlock
Caching the lock pointer avoids having to hash on the object
again to unlock the bucket locks.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 17:14:34 -04:00
Thomas Graf
db4374f48a rhashtable: Annotate RCU locking of walkers
Fixes the following sparse warnings:

lib/rhashtable.c:767:5: warning: context imbalance in 'rhashtable_walk_start' - wrong count at exit
lib/rhashtable.c:849:6: warning: context imbalance in 'rhashtable_walk_stop' - unexpected unlock

Fixes: f2dba9c6ff ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:24:13 -04:00
Herbert Xu
565e86404e rhashtable: Fix rhashtable_remove failures
The commit 9d901bc051 ("rhashtable:
Free bucket tables asynchronously after rehash") causes gratuitous
failures in rhashtable_remove.

The reason is that it inadvertently introduced multiple rehashing
from the perspective of readers.  IOW it is now possible to see
more than two tables during a single RCU critical section.

Fortunately the other reader rhashtable_lookup already deals with
this correctly thanks to c4db8848af
("rhashtable: rhashtable: Move future_tbl into struct bucket_table")
so only rhashtable_remove is broken by this change.

This patch fixes this by looping over every table from the first
one to the last or until we find the element that we were trying
to delete.

Incidentally the simple test for detecting rehashing to prevent
starting another shrinking no longer works.  Since it isn't needed
anyway (the work queue and the mutex serves as a natural barrier
to unnecessary rehashes) I've simply killed the test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 22:22:08 -04:00
Herbert Xu
963ecbd41a rhashtable: Fix use-after-free in rhashtable_walk_stop
The commit c4db8848af ("rhashtable:
Move future_tbl into struct bucket_table") introduced a use-after-
free bug in rhashtable_walk_stop because it dereferences tbl after
droping the RCU read lock.

This patch fixes it by moving the RCU read unlock down to the bottom
of rhashtable_walk_stop.  In fact this was how I had it originally
but it got dropped while rearranging patches because this one
depended on the async freeing of bucket_table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 22:22:08 -04:00
Herbert Xu
c4db8848af rhashtable: Move future_tbl into struct bucket_table
This patch moves future_tbl to open up the possibility of having
multiple rehashes on the same table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
63d512d0cf rhashtable: Add rehash counter to bucket_table
This patch adds a rehash counter to bucket_table to indicate
the last bucket that has been rehashed.  This serves two purposes:

1. Any bucket that has been rehashed can never gain a new object.
2. If the rehash counter reaches the size of the table, the table
will forever remain empty.

This patch also downsizes bucket_table->size to an unsigned int
since we do not support sizes greater than 32 bits yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
9d901bc051 rhashtable: Free bucket tables asynchronously after rehash
There is in fact no need to wait for an RCU grace period in the
rehash function, since all insertions are guaranteed to go into
the new table through spin locks.

This patch uses call_rcu to free the old/rehashed table at our
leisure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
5269b53da4 rhashtable: Move seed init into bucket_table_alloc
It seems that I have already made every rehash redo the random
seed even though my commit message indicated otherwise :)

Since we have already taken that step, this patch goes one step
further and moves the seed initialisation into bucket_table_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
8f2484bdb5 rhashtable: Use SINGLE_DEPTH_NESTING
We only nest one level deep there is no need to roll our own
subclasses.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
eddee5ba34 rhashtable: Fix walker behaviour during rehash
Previously whenever the walker encountered a resize it simply
snaps back to the beginning and starts again.  However, this only
works if the rehash started and completed while the walker was
idle.

If the walker attempts to restart while the rehash is still ongoing,
we may miss objects that we shouldn't have.

This patch fixes this by making the walker walk the old table
followed by the new table just like all other readers.  If a
rehash is detected we will still signal our caller of the fact
so they can prepare for duplicates but we will simply continue
the walk onto the new table after the old one is finished either
by us or by the rehasher.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Linus Torvalds
f788baadbd Merge branch 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull gadgetfs fixes from Al Viro:
 "Assorted fixes around AIO on gadgetfs: leaks, use-after-free, troubles
  caused by ->f_op flipping"

* 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gadgetfs: really get rid of switching ->f_op
  gadgetfs: get rid of flipping ->f_op in ep_config()
  gadget: switch ep_io_operations to ->read_iter/->write_iter
  gadgetfs: use-after-free in ->aio_read()
  gadget/function/f_fs.c: switch to ->{read,write}_iter()
  gadget/function/f_fs.c: use put iov_iter into io_data
  gadget/function/f_fs.c: close leaks
  move iov_iter.c from mm/ to lib/
  new helper: dup_iter()
2015-03-13 10:55:32 -07:00
Herbert Xu
393619474e rhashtable: Fix read-side crash during rehash
This patch fixes a typo rhashtable_lookup_compare where we fail
to recompute the hash when looking up the new table.  This causes
elements to be missed and potentially a crash during a resize.

Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Daniel Borkmann
a5b6846f9e rhashtable: kill ht->shift atomic operations
Commit c0c09bfdc4 ("rhashtable: avoid unnecessary wakeup for worker
queue") changed ht->shift to be atomic, which is actually unnecessary.

Instead of leaving the current shift in the core rhashtable structure,
it can be cached inside the individual bucket tables.

There, it will only be initialized once during a new table allocation
in the shrink/expansion slow path, and from then onward it stays immutable
for the rest of the bucket table liftime.

That allows shift to be non-atomic. The patch also moves hash_rnd
management into the table setup. The rhashtable structure now consumes
3 instead of 4 cachelines.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Herbert Xu
9497df88ab rhashtable: Fix reader/rehash race
There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.

This patch closes this window by adding smp_wmb/smp_rmb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Herbert Xu
ec9f71c59e rhashtable: Remove obj_raw_hashfn
Now that the only caller of obj_raw_hashfn is head_hashfn, we can
simply kill it and fold it into the latter.

This patch also moves the common shift from head_hashfn/key_hashfn
into rht_bucket_index.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu
cffaa9cb92 rhashtable: Remove key length argument to key_hashfn
key_hashfn has only one caller and it doesn't really need to supply
the key length as an extra parameter.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu
eca8493330 rhashtable: Use head_hashfn instead of obj_raw_hashfn
Now that we don't have cross-table hashes, we no longer need to
keep the entire hash value so all users of obj_raw_hashfn can
use head_hashfn instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu
8d2b18793d rhashtable: Move masking back into key_hashfn
This patch reverts commit c88455ce50
("rhashtable: key_hashfn() must return full hash value") because
the only user of it always masks the hash value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu
84ed82b74d rhashtable: Add annotation to nested lock
Commit aa34a6cb04 ("rhashtable:
Add arbitrary rehash function") killed the annotation on the
nested lock which leads to bitching from lockdep.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:53:40 -04:00
Herbert Xu
aa34a6cb04 rhashtable: Add arbitrary rehash function
This patch adds a rehash function that supports the use of any
hash function for the new table.  This is needed to support changing
the random seed value during the lifetime of the hash table.

However for now the random seed value is still constant and the
rehash function is simply used to replace the existing expand/shrink
functions.

[ ASSERT_BUCKET_LOCK() and thus debug_dump_table() +
  debug_dump_buckets() are not longer used, so delete them
  entirely. -DaveM ]

Signed-off-by: Herbert Xu <herbert.xu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:36:21 -04:00
Herbert Xu
988dfbd795 rhashtable: Move hash_rnd into bucket_table
Currently hash_rnd is a parameter that users can set.  However,
no existing users set this parameter.  It is also something that
people are unlikely to want to set directly since it's just a
random number.

In preparation for allowing the reseeding/rehashing of rhashtable,
this patch moves hash_rnd into bucket_table so that it's now an
internal state rather than a parameter.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:28:25 -04:00
Linus Torvalds
e7901af143 This includes fixes for seq_buf_bprintf() truncation issue. It also
contains fixes to ftrace when /proc/sys/kernel/ftrace_enabled and
 function tracing are started. Doing the following causes some issues:
 
  # echo 0 > /proc/sys/kernel/ftrace_enabled
  # echo function_graph > /sys/kernel/debug/tracing/current_tracer
  # echo 1 > /proc/sys/kernel/ftrace_enabled
  # echo nop > /sys/kernel/debug/tracing/current_tracer
  # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 
 As well as with function tracing too. Pratyush Anand first reported
 this issue to me and supplied a patch. When I tested this on my x86
 test box, it caused thousands of backtraces and warnings to appear in
 dmesg, which also caused a denial of service (a warning for every
 function that was listed). I applied Pratyush's patch but it did not
 fix the issue for me. I looked into it and found a slight problem
 with trampoline accounting. I fixed it and sent Pratyush a patch, but
 he said that it did not fix the issue for him.
 
 I later learned tha Pratyush was using an ARM64 server, and when I tested
 on my ARM board, I was able to reproduce the same issue as Pratyush.
 After applying his patch, it fixed the problem. The above test uncovered
 two different bugs, one in x86 and one in ARM and ARM64. As this looked
 like it would affect PowerPC, I tested it on my PPC64 box. It too broke,
 but neither the patch that fixed ARM or x86 fixed this box (the changes
 were all in generic code!). The above test, uncovered two more bugs that
 affected PowerPC. Again, the changes were only done to generic code.
 It's the way the arch code expected things to be done that was different
 between the archs. Some where more sensitive than others.
 
 The rest of this series fixes the PPC bugs as well.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU/cQSAAoJEEjnJuOKh9lde9sH/1MAPq+6jr7YaEFru0GKajE9
 rVHjw8rde/I4tN2UxIVk+Qm6pXRZYpv3OKxHT48EHzkvgm++voioykpJP4IEVrP5
 mEDuIcYe28csE2nV5u5Q9kwnZoC86TQW5nVV6zB1Gx/3IEzA8Z046jAov40Jya0y
 zqHc/U43JeeVIDIOkwjzbH6OaFEDP13FkF3TO502WJhJLqMo+kPOalIgv0eauKzy
 lVCQBSC4WS3rVsgW4W3dSrEBaUxbJxgunjxOuV2DwHj5eghHq0M2MKeIUxBz0PuN
 wnhTrpf5cAfshTvYHxKlE0uItdyYfVb7UChAD5zTbBL4kMUFhpb183zVKH8K8kU=
 =8R8y
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull seq-buf/ftrace fixes from Steven Rostedt:
 "This includes fixes for seq_buf_bprintf() truncation issue.  It also
  contains fixes to ftrace when /proc/sys/kernel/ftrace_enabled and
  function tracing are started.  Doing the following causes some issues:

    # echo 0 > /proc/sys/kernel/ftrace_enabled
    # echo function_graph > /sys/kernel/debug/tracing/current_tracer
    # echo 1 > /proc/sys/kernel/ftrace_enabled
    # echo nop > /sys/kernel/debug/tracing/current_tracer
    # echo function_graph > /sys/kernel/debug/tracing/current_tracer

  As well as with function tracing too.  Pratyush Anand first reported
  this issue to me and supplied a patch.  When I tested this on my x86
  test box, it caused thousands of backtraces and warnings to appear in
  dmesg, which also caused a denial of service (a warning for every
  function that was listed).  I applied Pratyush's patch but it did not
  fix the issue for me.  I looked into it and found a slight problem
  with trampoline accounting.  I fixed it and sent Pratyush a patch, but
  he said that it did not fix the issue for him.

  I later learned tha Pratyush was using an ARM64 server, and when I
  tested on my ARM board, I was able to reproduce the same issue as
  Pratyush.  After applying his patch, it fixed the problem.  The above
  test uncovered two different bugs, one in x86 and one in ARM and
  ARM64.  As this looked like it would affect PowerPC, I tested it on my
  PPC64 box.  It too broke, but neither the patch that fixed ARM or x86
  fixed this box (the changes were all in generic code!).  The above
  test, uncovered two more bugs that affected PowerPC.  Again, the
  changes were only done to generic code.  It's the way the arch code
  expected things to be done that was different between the archs.  Some
  where more sensitive than others.

  The rest of this series fixes the PPC bugs as well"

* tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled
  ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl
  ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl
  seq_buf: Fix seq_buf_bprintf() truncation
  seq_buf: Fix seq_buf_vprintf() truncation
2015-03-09 18:44:06 -07:00
Steven Rostedt (Red Hat)
4d4eb4d4fb seq_buf: Fix seq_buf_bprintf() truncation
In seq_buf_bprintf(), bstr_printf() is used to copy the format into the
buffer remaining in the seq_buf structure. The return of bstr_printf()
is the amount of characters written to the buffer excluding the '\0',
unless the line was truncated!

If the line copied does not fit, it is truncated, and a '\0' is added
to the end of the buffer. But in this case, '\0' is included in the length
of the line written. To know if the buffer had overflowed, the return
length will be the same or greater than the length of the buffer passed in.

The check in seq_buf_bprintf() only checked if the length returned from
bstr_printf() would fit in the buffer, as the seq_buf_bprintf() is only
to be an all or nothing command. It either writes all the string into
the seq_buf, or none of it. If the string is truncated, the pointers
inside the seq_buf must be reset to what they were when the function was
called. This is not the case. On overflow, it copies only part of the string.

The fix is to change the overflow check to see if the length returned from
bstr_printf() is less than the length remaining in the seq_buf buffer, and not
if it is less than or equal to as it currently does. Then seq_buf_bprintf()
will know if the write from bstr_printf() was truncated or not.

Link: http://lkml.kernel.org/r/1425500481.2712.27.camel@perches.com

Cc: stable@vger.kernel.org
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-04 23:40:19 -05:00
Steven Rostedt (Red Hat)
4a8fe4e181 seq_buf: Fix seq_buf_vprintf() truncation
In seq_buf_vprintf(), vsnprintf() is used to copy the format into the
buffer remaining in the seq_buf structure. The return of vsnprintf()
is the amount of characters written to the buffer excluding the '\0',
unless the line was truncated!

If the line copied does not fit, it is truncated, and a '\0' is added
to the end of the buffer. But in this case, '\0' is included in the length
of the line written. To know if the buffer had overflowed, the return
length will be the same as the length of the buffer passed in.

The check in seq_buf_vprintf() only checked if the length returned from
vsnprintf() would fit in the buffer, as the seq_buf_vprintf() is only
to be an all or nothing command. It either writes all the string into
the seq_buf, or none of it. If the string is truncated, the pointers
inside the seq_buf must be reset to what they were when the function was
called. This is not the case. On overflow, it copies only part of the string.

The fix is to change the overflow check to see if the length returned from
vsnprintf() is less than the length remaining in the seq_buf buffer, and not
if it is less than or equal to as it currently does. Then seq_buf_vprintf()
will know if the write from vsnpritnf() was truncated or not.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-04 09:56:02 -05:00
Linus Torvalds
789d7f60cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If an IPVS tunnel is created with a mixed-family destination
    address, it cannot be removed.  Fix from Alexey Andriyanov.

 2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
    Neira Ayuso.

 3) Generic statistics infrastructure can reference variables sitting on
    a released function stack, therefore use dynamic allocation always.
    Fix from Ignacy Gawędzki.

 4) skb_copy_bits() return value test is inverted in ip_check_defrag().

 5) Fix network namespace exit in openvswitch, we have to release all of
    the per-net vports.  From Pravin B Shelar.

 6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.

 7) Fix rhashtable grow/shrink behavior, only expand during inserts and
    shrink during deletes.  From Daniel Borkmann.

 8) Netdevice names with semicolons should never be allowed, because
    they serve as a separator.  From Matthew Thode.

 9) Use {,__}set_current_state() where appropriate, from Fabian
    Frederick.

10) Revert byte queue limits support in r8169 driver, it's causing
    regressions we can't figure out.

11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
    measure packets in flight, properly use tcp_packets_in_flight()
    instead.  From Neal Cardwell.

12) Fix accidental removal of support for bluetooth in CSR based Intel
    wireless cards.  From Marcel Holtmann.

13) We accidently added a behavioral change between native and compat
    tasks, wrt testing the MSG_CMSG_COMPAT bit.  Just ignore it if the
    user happened to set it in a native binary as that was always the
    behavior we had.  From Catalin Marinas.

14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
    handling, from Bob Copeland.

15) Fix stale ->radar_required setting in mac80211 that can prevent
    starting new scans, from Eliad Peller.

16) Fix memory leak in nl80211 monitor, from Johannes Berg.

17) Fix race in TX index handling in xen-netback, from David Vrabel.

18) Don't enable interrupts in amx-xgbe driver until all software et al.
    state is ready for the interrupt handler to run.  From Thomas
    Lendacky.

19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
    W Biederman.

20) The amount of header space needed in macvtap was not calculated
    properly, fix it otherwise we splat past the beginning of the
    packet.  From Eric Dumazet.

21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.

22) Don't raw initialize or mod timers, use setup_timer() and
    mod_timer() instead.  From Vaishali Thakkar.

23) Fix software maintained statistics in bcmgenet and systemport
    drivers, from Florian Fainelli.

24) DMA descriptor updates in sh_eth need proper memory barriers, from
    Ben Hutchings.

25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
    Kubecek.

26) Openvswitch's non-masked set actions aren't constructed properly
    into netlink messages, fix from Joe Stringer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  openvswitch: Fix serialization of non-masked set actions.
  gianfar: Reduce logging noise seen due to phy polling if link is down
  ibmveth: Add function to enable live MAC address changes
  net: bridge: add compile-time assert for cb struct size
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  sh_eth: Really fix padding of short frames on TX
  Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
  sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
  sh_eth: Ensure proper ordering of descriptor active bit write/read
  net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
  net/mlx4_core: Fix wrong mask and error flow for the update-qp command
  net: systemport: fix software maintained statistics
  net: bcmgenet: fix software maintained statistics
  rxrpc: don't multiply with HZ twice
  rxrpc: terminate retrans loop when sending of skb fails
  net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
  net: pasemi: Use setup_timer and mod_timer
  net: stmmac: Use setup_timer and mod_timer
  net: 8390: axnet_cs: Use setup_timer and mod_timer
  net: 8390: pcnet_cs: Use setup_timer and mod_timer
  ...
2015-03-03 15:30:07 -08:00
Eric Dumazet
5beb5c90c1 rhashtable: use cond_resched()
If a hash table has 128 slots and 16384 elems, expand to 256 slots
takes more than one second. For larger sets, a soft lockup is detected.

Holding cpu for that long, even in a work queue is a show stopper
for non preemptable kernels.

cond_resched() at strategic points to allow process scheduler
to reschedule us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 17:55:14 -05:00
Daniel Borkmann
4c4b52d9b2 rhashtable: remove indirection for grow/shrink decision functions
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:06:02 -05:00
Daniel Borkmann
8331de75cb rhashtable: unconditionally grow when max_shift is not specified
While commit c0c09bfdc4 ("rhashtable: avoid unnecessary wakeup for
worker queue") rightfully moved part of the decision making of
whether we should expand or shrink from the expand/shrink functions
themselves into insert/delete functions in order to avoid unnecessary
worker wake-ups, it however introduced a regression by doing so.

Before that change, if no max_shift was specified (= 0) on rhashtable
initialization, rhashtable_expand() would just grow unconditionally
and lets the available memory be the limiting factor. After that
change, if no max_shift was specified, there would be _no_ expansion
step at all.

Given that netlink and tipc have a max_shift specified, it was not
visible there, but Josh Hunt reported that if nft that starts out
with a default element hint of 3 if not otherwise provided, would
slow i.e. inserts down trememdously as it cannot grow larger to
relax table occupancy.

Given that the test case verifies shrinks/expands manually, we also
must remove pointer to the helper functions to explicitly avoid
parallel resizing on insertions/deletions. test_bucket_stats() and
test_rht_lookup() could also be wrapped around rhashtable mutex to
explicitly synchronize a walk from resizing, but I think that defeats
the actual test case which intended to have explicit test steps,
i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object
verification after each stage.

Reported-by: Josh Hunt <johunt@akamai.com>
Fixes: c0c09bfdc4 ("rhashtable: avoid unnecessary wakeup for worker queue")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Josh Hunt <johunt@akamai.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:06:02 -05:00
Sasha Levin
71bb0012c3 rhashtable: initialize all rhashtable walker members
Commit f2dba9c6ff ("rhashtable: Introduce rhashtable_walk_*") forgot to
initialize the members of struct rhashtable_walker after allocating it, which
caused an undefined value for 'resize' which is used later on.

Fixes: f2dba9c6ff ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-23 15:23:19 -05:00
Daniel Borkmann
6dd0c1655b rhashtable: allow to unload test module
There's no good reason why to disallow unloading of the rhashtable
test case module.

Commit 9d6dbe1bba moved the code from a boot test into a stand-alone
module, but only converted the subsys_initcall() handler into a
module_init() function without a related exit handler, and thus
preventing the test module from unloading.

Fixes: 9d6dbe1bba ("rhashtable: Make selftest modular")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:38:10 -05:00
Daniel Borkmann
eb6d1abf1b rhashtable: better high order allocation attempts
When trying to allocate future tables via bucket_table_alloc(), it seems
overkill on large table shifts that we probe for kzalloc() unconditionally
first, as it's likely to fail.

Only probe with kzalloc() for more reasonable table sizes and use vzalloc()
either as a fallback on failure or directly in case of large table sizes.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:38:09 -05:00
Daniel Borkmann
342100d937 rhashtable: don't test for shrink on insert, expansion on delete
Restore pre 54c5b7d311 behaviour and only probe for expansions on inserts
and shrinks on deletes. Currently, it will happen that on initial inserts
into a sparse hash table, we may i.e. shrink it first simply because it's
not fully populated yet, only to later realize that we need to grow again.

This however is counter intuitive, e.g. an initial default size of 64
elements is already small enough, and in case an elements size hint is given
to the hash table by a user, we should avoid unnecessary expansion steps,
so a shrink is clearly unintended here.

Fixes: 54c5b7d311 ("rhashtable: introduce rhashtable_wakeup_worker helper function")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:38:09 -05:00
Daniel Borkmann
b7f5e5c7f8 rhashtable: don't allocate ht structure on stack in test_rht_init
With object runtime debugging enabled, the rhashtable test suite
will rightfully throw a warning "ODEBUG: object is on stack, but
not annotated" from rhashtable_init().

This is because run_work is (correctly) being initialized via
INIT_WORK(), and not annotated by INIT_WORK_ONSTACK(). Meaning,
rhashtable_init() is okay as is, we just need to move ht e.g.,
into global scope.

It never triggered anything, since test_rhashtable is rather a
controlled environment and effectively runs to completion, so
that stack memory is not vanishing underneath us, we shouldn't
confuse any testers with it though.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 16:33:30 -05:00
Linus Torvalds
b11a278397 Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kconfig updates from Michal Marek:
 "Yann E Morin was supposed to take over kconfig maintainership, but
  this hasn't happened.  So I'm sending a few kconfig patches that I
  collected:

   - Fix for missing va_end in kconfig
   - merge_config.sh displays used if given too few arguments
   - s/boolean/bool/ in Kconfig files for consistency, with the plan to
     only support bool in the future"

* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kconfig: use va_end to match corresponding va_start
  merge_config.sh: Display usage if given too few arguments
  kconfig: use bool instead of boolean for type definition attributes
2015-02-19 10:36:45 -08:00
Linus Torvalds
53861af9a1 OK, this has the big virtio 1.0 implementation, as specified by OASIS.
On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to
 double-check the implementation.
 
 Then comes the inevitable fixes and cleanups from that work.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5
 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn
 YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH
 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq
 vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt
 X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7
 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I
 z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n
 JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx
 mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY
 Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78
 eI3FDUWsE36NqE5ECWmz
 =ivCe
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

  On top of tht is the major rework of lguest, to use PCI and virtio
  1.0, to double-check the implementation.

  Then comes the inevitable fixes and cleanups from that work"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
  virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
  virtio_net: unconditionally define struct virtio_net_hdr_v1.
  tools/lguest: don't use legacy definitions for net device in example launcher.
  virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
  tools/lguest: use common error macros in the example launcher.
  tools/lguest: give virtqueues names for better error messages
  tools/lguest: more documentation and checking of virtio 1.0 compliance.
  lguest: don't look in console features to find emerg_wr.
  tools/lguest: don't start devices until DRIVER_OK status set.
  tools/lguest: handle indirect partway through chain.
  tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: rename virtio_pci_cfg_cap field to match spec.
  tools/lguest: fix features_accepted logic in example launcher.
  tools/lguest: handle device reset correctly in example launcher.
  virtual: Documentation: simplify and generalize paravirt_ops.txt
  lguest: remove NOTIFY call and eventfd facility.
  lguest: remove NOTIFY facility from demonstration launcher.
  lguest: use the PCI console device's emerg_wr for early boot messages.
  lguest: always put console in PCI slot #1.
  ...
2015-02-18 09:24:01 -08:00
Al Viro
d879cb8341 move iov_iter.c from mm/ to lib/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-17 22:22:17 -05:00