Commit graph

199 commits

Author SHA1 Message Date
Jon Paul Maloy
22d85c7942 tipc: change sk_buffer handling in tipc_link_xmit()
When the function tipc_link_xmit() is given a buffer list for
transmission, it currently consumes the list both when transmission
is successful and when it fails, except for the special case when
it encounters link congestion.

This behavior is inconsistent, and needs to be corrected if we want
to avoid problems in later commits in this series.

In this commit, we change this to let the function consume the list
only when transmission is successful, and leave the list with the
sender in all other cases. We also modifiy the socket code so that
it adapts to this change, i.e., purges the list when a non-congestion
error code is returned.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:41:15 -07:00
Jon Paul Maloy
9d13ec65ed tipc: introduce link entry structure to struct tipc_node
struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.

We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:41:14 -07:00
Jon Paul Maloy
7d967b673c tipc: purge backlog queue counters when broadcast link is reset
In commit 1f66d161ab
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.

Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.

We fix this bug with this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:43:02 -07:00
Jon Paul Maloy
dd3f9e70f5 tipc: add packet sequence number at instant of transmission
Currently, the packet sequence number is updated and added to each
packet at the moment a packet is added to the link backlog queue.
This is wasteful, since it forces the code to traverse the send
packet list packet by packet when adding them to the backlog queue.
It would be better to just splice the whole packet list into the
backlog queue when that is the right action to do.

In this commit, we do this change. Also, since the sequence numbers
cannot now be assigned to the packets at the moment they are added
the backlog queue, we do instead calculate and add them at the moment
of transmission, when the backlog queue has to be traversed anyway.
We do this in the function tipc_link_push_packet().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 12:24:46 -04:00
Jon Paul Maloy
a97b9d3fa9 tipc: rename fields in struct tipc_link
We rename some fields in struct tipc_link, in order to give them more
descriptive names:

next_in_no -> rcv_nxt
next_out_no-> snd_nxt
fsm_msg_cnt-> silent_intv_cnt
cont_intv  -> keepalive_intv
last_retransmitted -> last_retransm

There are no functional changes in this commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 12:24:46 -04:00
Richard Alpe
670f4f8818 tipc: add broadcast link window set/get to nl api
Add the ability to get or set the broadcast link window through the
new netlink API. The functionality was unintentionally missing from
the new netlink API. Adding this means that we also fix the breakage
in the old API when coming through the compat layer.

Fixes: 37e2d4843f (tipc: convert legacy nl link prop set to nl compat)
Reported-by: Tomi Ollila <tomi.ollila@iki.fi>
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:40:02 -04:00
Jon Paul Maloy
ed193ece26 tipc: simplify link mtu negotiation
When a link is being established, the two endpoints advertise their
respective interface MTU in the transmitted RESET and ACTIVATE messages.
If there is any difference, the lower of the two MTUs will be selected
for use by both endpoints.

However, as a remnant of earlier attempts to introduce TIPC level
routing. there also exists an MTU discovery mechanism. If an intermediate
node has a lower MTU than the two endpoints, they will discover this
through a bisectional approach, and finally adopt this MTU for common use.

Since there is no TIPC level routing, and probably never will be,
this mechanism doesn't make any sense, and only serves to make the
link level protocol unecessarily complex.

In this commit, we eliminate the MTU discovery algorithm,and fall back
to the simple MTU advertising approach. This change is fully backwards
compatible.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:27:12 -04:00
Ying Xue
8a0f6ebe84 tipc: involve reference counter for node structure
TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:40:28 -07:00
Ying Xue
b952b2befb tipc: fix potential deadlock when all links are reset
[   60.988363] ======================================================
[   60.988754] [ INFO: possible circular locking dependency detected ]
[   60.989152] 3.19.0+ #194 Not tainted
[   60.989377] -------------------------------------------------------
[   60.989781] swapper/3/0 is trying to acquire lock:
[   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.990743]
[   60.990743] but task is already holding lock:
[   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.991738]
[   60.991738] which lock already depends on the new lock.
[   60.991738]
[   60.992174]
[   60.992174] the existing dependency chain (in reverse order) is:
[   60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
[   60.992174] other info that might help us debug this:
[   60.992174]
[   60.992174]  Possible unsafe locking scenario:
[   60.992174]
[   60.992174]        CPU0                    CPU1
[   60.992174]        ----                    ----
[   60.992174]   lock(&(&bclink->lock)->rlock);
[   60.992174]                                lock(&(&n_ptr->lock)->rlock);
[   60.992174]                                lock(&(&bclink->lock)->rlock);
[   60.992174]   lock(&(&n_ptr->lock)->rlock);
[   60.992174]
[   60.992174]  *** DEADLOCK ***
[   60.992174]
[   60.992174] 3 locks held by swapper/3/0:
[   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:40:27 -07:00
Jon Paul Maloy
1f66d161ab tipc: introduce starvation free send algorithm
Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Ying Xue
bc14b8d6a9 tipc: fix a link reset issue due to retransmission failures
When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:43:32 -04:00
Jon Paul Maloy
e3eea1eb47 tipc: clean up handling of message priorities
Messages transferred by TIPC are assigned an "importance priority", -an
integer value indicating how to treat the message when there is link or
destination socket congestion.

There is no separate header field for this value. Instead, the message
user values have been chosen in ascending order according to perceived
importance, so that the message user field can be used for this.

This is not a good solution. First, we have many more users than the
needed priority levels, so we end up with treating more priority
levels than necessary. Second, the user field cannot always
accurately reflect the priority of the message. E.g., a message
fragment packet should really have the priority of the enveloped
user data message, and not the priority of the MSG_FRAGMENTER user.
Until now, we have been working around this problem in different ways,
but it is now time to implement a consistent way of handling such
priorities, although still within the constraint that we cannot
allocate any more bits in the regular data message header for this.

In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
that will be the only one used apart from the four (lower) user data
levels. All non-data messages map down to this priority. Furthermore,
we take some free bits from the MSG_FRAGMENTER header and allocate
them to store the priority of the enveloped message. We then adjust
the functions msg_importance()/msg_set_importance() so that they
read/set the correct header fields depending on user type.

This small protocol change is fully compatible, because the code at
the receiving end of a link currently reads the importance level
only from user data messages, where there is no change.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Jon Paul Maloy
05dcc5aa4d tipc: split link outqueue
struct tipc_link contains one single queue for outgoing packets,
where both transmitted and waiting packets are queued.

This infrastructure is hard to maintain, because we need
to keep a number of fields to keep track of which packets are
sent or unsent, and the number of packets in each category.

A lot of code becomes simpler if we split this queue into a transmission
queue, where sent/unacknowledged packets are kept, and a backlog queue,
where we keep the not yet sent packets.

In this commit we do this separation.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Jon Paul Maloy
2cdf3918e4 tipc: eliminate unnecessary call to broadcast ack function
The unicast packet header contains a broadcast acknowledge sequence
number, that may need to be conveyed to the broadcast link for proper
treatment. Currently, the function tipc_rcv(), which is on the most
critical data path, calls the function tipc_bclink_acknowledge() to
have this done. This call is made for each received packet, and results
in the unconditional grabbing of the broadcast link spinlock.

This is unnecessary, since we can see directly from tipc_rcv() if
the acknowledged number differs from what has been previously acked
from the node in question. In the vast majority of cases the numbers
won't differ, and there is nothing to update.

We now make the call to tipc_bclink_acknowledge() conditional
to that the two ack values differ.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:38:32 -04:00
Richard Alpe
f2b3b2d4cc tipc: convert legacy nl link stat to nl compat
Add functionality for safely appending string data to a TLV without
keeping write count in the caller.

Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 13:20:47 -08:00
Richard Alpe
bfb3e5dd8d tipc: move and rename the legacy nl api to "nl compat"
The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 13:20:47 -08:00
Jon Paul Maloy
cb1b728096 tipc: eliminate race condition at multicast reception
In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:03 -08:00
Jon Paul Maloy
3c724acdd5 tipc: simplify socket multicast reception
The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:03 -08:00
Jon Paul Maloy
c637c10355 tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:02 -08:00
Jon Paul Maloy
c5898636c4 tipc: reduce usage of context info in socket and link
The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:00:01 -08:00
Ying Xue
3474753954 tipc: make tipc node address support net namespace
If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
1da465683a tipc: make tipc broadcast link support net namespace
TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
7f9f95d9d9 tipc: make bearer list support net namespace
Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue
f2f9800d49 tipc: make tipc node table aware of net namespace
Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue
c93d3baa24 tipc: involve namespace infrastructure
Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Jon Maloy
703068eee6 tipc: fix bug in broadcast retransmit code
In commit 58dc55f256 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Erik Hugne
4988bb4a3f tipc: fix missing spinlock init and nullptr oops
commit 908344cdda ("tipc: fix bug in multicast congestion
handling") introduced two bugs with the bclink wakeup
function. This commit fixes the missing spinlock init for the
waiting_sks list. We also eliminate the race condition
between the waiting_sks length check/dequeue operations in
tipc_bclink_wakeup_users by simply removing the redundant
length check.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Tero Aho <Tero.Aho@coriant.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:41:54 -05:00
Ying Xue
a6ca109443 tipc: use generic SKB list APIs to manage TIPC outgoing packet chains
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
bc6fecd409 tipc: use generic SKB list APIs to manage deferred queue of link
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's deferred queue, simplifying relevant code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58dc55f256 tipc: use generic SKB list APIs to manage link transmission queue
Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
47b4c9a82f tipc: clean up the process of link pushing packets
In original tipc_link_push_packet(), it pushes messages from protocol
message queue, retransmission queue and next_out queue. But as the two
first queues are removed, we can simplify its relevant code through
deleting tipc_link_push_queue().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:16 -05:00
Richard Alpe
d8182804cf tipc: fix sparse warnings in new nl api
Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:10:23 -05:00
Richard Alpe
7be57fc691 tipc: add link get/dump to new netlink api
Add TIPC_NL_LINK_GET command to the new tipc netlink API.

This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).

The information about a link includes name, transmission info,
properties and link statistics.

As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.

Netlink logical layout of link response message:
    -> port
        -> name
        -> MTU
        -> RX
        -> TX
        -> up flag
        -> active flag
        -> properties
           -> priority
           -> tolerance
           -> window
        -> statistics
            -> rx_info
            -> rx_fragments
            -> rx_fragmented
            -> rx_bundles
            -> rx_bundled
            -> tx_info
            -> tx_fragments
            -> tx_fragmented
            -> tx_bundles
            -> tx_bundled
            -> msg_prof_tot
            -> msg_len_cnt
            -> msg_len_tot
            -> msg_len_p0
            -> msg_len_p1
            -> msg_len_p2
            -> msg_len_p3
            -> msg_len_p4
            -> msg_len_p5
            -> msg_len_p6
            -> rx_states
            -> rx_probes
            -> rx_nacks
            -> rx_deferred
            -> tx_states
            -> tx_probes
            -> tx_nacks
            -> tx_acks
            -> retransmitted
            -> duplicates
            -> link_congs
            -> max_queue
            -> avg_queue

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Jon Maloy
908344cdda tipc: fix bug in multicast congestion handling
One aim of commit 50100a5e39 ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.

If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.

This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.

This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.

v2: using explicit 'unsigned int' declaration instead of 'uint', as
per comment from David Miller.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 14:50:15 -04:00
Jon Paul Maloy
2e84c60b77 tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy
50100a5e39 tipc: use pseudo message to wake up sockets after link congestion
The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy
6f92ee54b3 tipc: ensure sequential message delivery across dual bearers
When we run broadcast packets over dual bearers/interfaces, the
current transmission code is flipping bearers between each sent
packet, with the purpose of leveraging the double bandwidth
available. The receiving bclink is resequencing the packets if
needed, so all messages are delivered upwards from the broadcast
link in the correct order, even if they may arrive in concurrent
interrupts.

However, at the moment of delivery upwards to the socket, we release
all spinlocks (bclink_lock, node_lock), so it is still possible
that arriving messages bypass each other before they reach the socket
queue.

We fix this by applying the same technique we are using for unicast
traffic. We use a link selector (i.e., the last bit of sending port
number) to ensure that messages from the same sender socket always are
sent over the same bearer. This guarantees sequential delivery between
socket pairs, which is sufficient to satisfy the protocol spec, as well
as all known user requirements.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
9fbfb8b120 tipc: rename temporarily named functions
After the previous commit, we can now give the functions with temporary
names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
names.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
c4116e1057 tipc: remove unreferenced functions
We can now remove a number of functions which have become obsolete
and unreferenced through this commit series. There are no functional
changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
0abd8ff21f tipc: start using the new multicast functions
In this commit, we convert the socket multicast send function to
directly call the new multicast/broadcast function (tipc_bclink_xmit2())
introduced in the previous commit. We do this instead of letting the
call go via the now obsolete tipc_port_mcast_xmit(), hence saving
a call level and some code complexity.

We also remove the initial destination lookup at the message sending
side, and replace that with an unconditional lookup at the receiving
side, including on the sending node itself. This makes the destination
lookup and message transfer more uniform than before.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
Jon Paul Maloy
078bec826f tipc: add new functions for multicast and broadcast distribution
We add a new broadcast link transmit function in bclink.c and a new
receive function in socket.c. The purpose is to move the branching
between external and internal destination down to the link layer,
just as we have done with unicast in earlier commits. We also make
use of the new link-independent fragmentation support that was
introduced in an earlier commit series.

This gives a shorter and simpler code path, and makes it possible
to obtain copy-free buffer delivery to all node local destination
sockets.

The new transmission code is added in parallel with the existing one,
and will be used by the socket multicast send function in the next
commit in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
Jon Paul Maloy
999417549c tipc: clear 'next'-pointer of message fragments before reassembly
If the 'next' pointer of the last fragment buffer in a message is not
zeroed before reassembly, we risk ending up with a corrupt message,
since the reassembly function itself isn't doing this.

Currently, when a buffer is retrieved from the deferred queue of the
broadcast link, the next pointer is not cleared, with the result as
described above.

This commit corrects this, and thereby fixes a bug that may occur when
long broadcast messages are transmitted across dual interfaces. The bug
has been present since 40ba3cdf54 ("tipc:
message reassembly using fragment chain")

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11 15:02:10 -07:00
Octavian Purdila
bad93e9d4e net: add __pskb_copy_fclone and pskb_copy_for_clone
There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:38:02 -07:00
Jon Paul Maloy
37e22164a8 tipc: rename and move message reassembly function
The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.

In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:19:48 -04:00
Ying Xue
3f5a12bd9f tipc: avoid to asynchronously reset all links
Postpone the actions of resetting all links until after bclink
lock is released, avoiding to asynchronously reset all links.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:45 -04:00
Ying Xue
eb8b00f5f2 tipc: convert allocations of global variables associated with bclink
Convert allocations of global variables associated with bclink from
static way to dynamical way for the convenience of bclink instance
initialisation. Meanwhile, this also helps TIPC support name space
in the future easily.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:45 -04:00
Ying Xue
d69afc90b8 tipc: define new functions to operate bc_lock
As we are going to do more jobs when bc_lock is released, the two
operations of holding/releasing the lock should be encapsulated with
functions. In addition, we move bc_lock spin lock into tipc_bclink
structure avoiding to define the global variable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
28dd94187a tipc: use bc_lock to protect node map in bearer structure
The node map variable - 'nodes' in bearer structure is only used by
bclink. When bclink accesses it, bc_lock is held. But when change it,
for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest()
the bc_lock is not taken at all. To avoid any inconsistent data, we
should always grab bc_lock while accessing node map variable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 21:17:53 -04:00
Ying Xue
7216cd949c tipc: purge tipc_net_lock lock
Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
  RTNL lock is used on update side, and RCU is used on read side.
  Meanwhile, all bearer instances including broadcast bearer are
  saved into bearer_list array.

- Node and link level
  All node instances are saved into two tipc_node_list and node_htable
  lists. The two lists are protected by node_list_lock on write side,
  and they are guarded with RCU lock on read side. All members in node
  structure including link instances are protected by node spin lock.

- The relationship between bearer and node
  When link accesses bearer, it first needs to find the bearer with
  its bearer identity from the bearer_list array. When bearer accesses
  node, it can iterate the node_htable hash list with the node
  address to find the corresponding node.

In the new locking policy, every component has its private locking
solution and the relationship between bearer and node is very simple,
that is, they can find each other with node address or bearer identity
from node_htable hash list or bearer_list array.

Until now above all changes have been done, so tipc_net_lock can be
removed safely.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 21:17:53 -04:00
Ying Xue
7a2f7d18e7 tipc: decouple the relationship between bearer and link
Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.

So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.

Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 21:17:53 -04:00
Ying Xue
f8322dfce5 tipc: convert bearer_list to RCU list
Convert bearer_list to RCU list. It's protected by RTNL lock on
update side, and RCU read lock is applied to read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 21:17:52 -04:00
Ying Xue
987b58be37 tipc: make broadcast bearer store in bearer_list array
Now unicast bearer is dynamically allocated and placed into its
identity specified slot of bearer_list array. When we search
bearer_list array with a bearer identity, the corresponding bearer
instance can be found. But broadcast bearer is statically allocated
and it is not located in the bearer_list array yet. So we decide to
enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last
slot stores the broadcast bearer so that the broadcast bearer can
be found from bearer_list array with MAX_BEARERS as index. The
change will help us reduce the complex relationship between bearer
and link in the future.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
f47de12b06 tipc: remove active flag from tipc_bearer structure
After the allocation of tipc_bearer structure instance is converted
from statical way to dynamical way, we identify whether a certain
tipc_bearer structure pointer is valid by checking whether the pointer
is NULL or not. So the active flag in tipc_bearer structure becomes
redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
3874ccbba8 tipc: convert tipc_bearers array to pointer list
As part of the effort to introduce RCU protection for the bearer
list, we first need to change it to a list of pointers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
247f0f3c31 tipc: align tipc function names with common naming practice in the network
Rename the following functions, which are shorter and more in line
with common naming practice in the network subsystem.

tipc_bclink_send_msg->tipc_bclink_xmit
tipc_bclink_recv_pkt->tipc_bclink_rcv
tipc_disc_recv_msg->tipc_disc_rcv
tipc_link_send_proto_msg->tipc_link_proto_xmit
link_recv_proto_msg->tipc_link_proto_rcv
link_send_sections_long->tipc_link_iovec_long_xmit
tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
tipc_link_send_sync->tipc_link_sync_xmit
tipc_link_recv_sync->tipc_link_sync_rcv
tipc_link_send_buf->__tipc_link_xmit
tipc_link_send->tipc_link_xmit
tipc_link_send_names->tipc_link_names_xmit
tipc_named_recv->tipc_named_rcv
tipc_link_recv_bundle->tipc_link_bundle_rcv
tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
link_send_long_buf->tipc_link_frag_xmit

tipc_multicast->tipc_port_mcast_xmit
tipc_port_recv_mcast->tipc_port_mcast_rcv
tipc_port_reject_sections->tipc_port_iovec_reject
tipc_port_recv_proto_msg->tipc_port_proto_rcv
tipc_connect->tipc_port_connect
__tipc_connect->__tipc_port_connect
__tipc_disconnect->__tipc_port_disconnect
tipc_disconnect->tipc_port_disconnect
tipc_shutdown->tipc_port_shutdown
tipc_port_recv_msg->tipc_port_rcv
tipc_port_recv_sections->tipc_port_iovec_rcv

release->tipc_release
accept->tipc_accept
bind->tipc_bind
get_name->tipc_getname
poll->tipc_poll
send_msg->tipc_sendmsg
send_packet->tipc_send_packet
send_stream->tipc_send_stream
recv_msg->tipc_recvmsg
recv_stream->tipc_recv_stream
connect->tipc_connect
listen->tipc_listen
shutdown->tipc_shutdown
setsockopt->tipc_setsockopt
getsockopt->tipc_getsockopt

Above changes have no impact on current users of the functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 17:31:59 -05:00
Ying Xue
a83045292d tipc: remove bearer_lock from tipc_bearer struct
After the earlier commits ("tipc: remove 'links' list from
tipc_bearer struct") and ("tipc: introduce new spinlock to protect
struct link_req"), there is no longer any need to protect struct
link_req or or any link list by use of bearer_lock. Furthermore,
we have eliminated the need for using bearer_lock during downcalls
(send) from the link to the bearer, since we have ensured that
bearers always have a longer life cycle that their associated links,
and always contain valid data.

So, the only need now for a lock protecting bearers is for guaranteeing
consistency of the bearer list itself. For this, it is sufficient, at
least for the time being, to continue applying 'net_lock´ in write mode.

By removing bearer_lock we also pre-empt introduction of issue b) descibed
in the previous commit "tipc: remove 'links' list from tipc_bearer struct":

"b) When the outer protection from net_lock is gone, taking
    bearer_lock and node_lock in opposite order of method 1) and 2)
    will become an obvious deadlock hazard".

Therefore, we now eliminate the bearer_lock spinlock.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:57:07 -05:00
Jon Paul Maloy
03b9201793 tipc: stricter behavior of message reassembly function
The function tipc_link_recv_fragment(struct sk_buff **buf) currently
leaves the value of the input buffer pointer undefined when it returns,
except when the return code indicates that the reassembly is complete.
This despite the fact that it always consumes the input buffer.

Here, we enforce a stricter behavior by this function, ensuring that
the returned buffer pointer is non-NULL if and only if the reassembly
is complete. This makes it possible to test for the buffer pointer as
criteria for successful reassembly.

We also rename the function to tipc_link_frag_rcv(), which is both
shorter and more in line with common naming practice in the network
subsystem.

Apart from the new name, these changes have no impact on current
users of the function, but makes it more practical for use in some
planned future commits.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:57:05 -05:00
Jon Paul Maloy
581465fa28 tipc: make link start event synchronous
When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.

We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:44:26 -05:00
Erik Hugne
512137eeff tipc: remove interface state mirroring in bearer
struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.

This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.

Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.

CPU0                                    CPU1
----                                    ----
Time 0: bearer_disable()                link_timeout()
Time 1:   spin_lock_bh(&b_ptr->lock)      tipc_link_push_queue()
Time 2:   tipc_link_delete()                tipc_bearer_blocked(b_ptr)
Time 3:     k_cancel_timer(&req->timer)       spin_lock_bh(&b_ptr->lock)
Time 4:       del_timer_sync(&req->timer)

I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.

We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.

An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.

A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:30:29 -05:00
Erik Hugne
40ba3cdf54 tipc: message reassembly using fragment chain
When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.

Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.

To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Erik Hugne
528f6f4bf3 tipc: don't reroute message fragments
When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.

Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Paul Gortmaker
ae8509c420 tipc: cosmetic realignment of function arguments
No runtime code changes here.  Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Gerlando Falauto
488fc9af82 tipc: pskb_copy() buffers when sending on more than one bearer
When sending packets, TIPC bearers use skb_clone() before writing their
hardware header. This will however NOT copy the data buffer.
So when the same packet is sent over multiple bearers (to reach multiple
nodes), the same socket buffer data will be treated by multiple
tipc_media drivers which will write their own hardware header through
dev_hard_header().
Most of the time this is not a problem, because by the time the
packet is processed by the second media, it has already been sent over
the first one. However, when the first transmission is delayed (e.g.
because of insufficient bandwidth or through a shaper), the next bearer
will overwrite the hardware header, resulting in the packet being sent:
a) with the wrong source address, when bearers of the same type,
e.g. ethernet, are involved
b) with a completely corrupt header, or even dropped, when bearers of
different types are involved.

So when the same socket buffer is to be sent multiple times, send a
pskb_copy() instead (from the second instance on), and release it
afterwards (the bearer will skb_clone() it anyway).

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-03 16:08:58 -04:00
Gerlando Falauto
77861d9c00 tipc: tipc_bcbearer_send(): simplify bearer selection
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-03 16:08:58 -04:00
Gerlando Falauto
e616071094 tipc: cosmetic: clean up comments and break a long line
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-03 16:08:58 -04:00
Patrick McHardy
8aeb89f214 tipc: move bcast_addr from struct tipc_media to struct tipc_bearer
Some network protocols, like InfiniBand, don't have a fixed broadcast
address but one that depends on the configuration. Move the bcast_addr
to struct tipc_bearer and initialize it with the broadcast address of
the network device when the bearer is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:18:33 -04:00
Erik Hugne
c5c73dca59 tipc: fix missing spinlock init in broadcast code
After commit 3c294cb3 "tipc: remove the bearer congestion mechanism",
we try to grab the broadcast bearer lock when sending multicast
messages over the broadcast link. This will cause an oops because
the lock is never initialized. This is an old bug, but the lock
was never actually used before commit 3c294cb3, so that why it was
not visible until now.  The oops will look something like:

	BUG: spinlock bad magic on CPU#2, daemon/147
	lock: bcast_bearer+0x48/0xffffffffffffd19a [tipc],
	.magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
	Pid: 147, comm: daemon Not tainted 3.8.0-rc3+ #206
	Call Trace:
	spin_dump+0x8a/0x8f
	spin_bug+0x21/0x26
	do_raw_spin_lock+0x114/0x150
	_raw_spin_lock_bh+0x19/0x20
	tipc_bearer_blocked+0x1f/0x40 [tipc]
	tipc_link_send_buf+0x82/0x280 [tipc]
	? __alloc_skb+0x9f/0x2b0
	tipc_bclink_send_msg+0x77/0xa0 [tipc]
	tipc_multicast+0x11b/0x1b0 [tipc]
	send_msg+0x225/0x530 [tipc]
	sock_sendmsg+0xca/0xe0

The above can be triggered by running the multicast demo program.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-15 15:40:56 -05:00
Ying Xue
389dd9bcf6 tipc: rename supported flag to recv_permitted
Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node.  Convert it to a bool at the same time,
since it is not used to store any numerical values.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 07:50:51 -05:00
Ying Xue
3c294cb374 tipc: remove the bearer congestion mechanism
Currently at the TIPC bearer layer there is the following congestion
mechanism:

Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer.  When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.

However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.

But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer.  So the code maintaing the state
information is not removed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-21 20:07:25 -05:00
Erik Hugne
dc1aed37d1 tipc: phase out most of the struct print_buf usage
The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does.  It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer.  All callers of
this function that used to pass a print_buf are updated.

Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.

Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate.  This function is removed since it
is no longer referenced nor needed.

A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed.  This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.

The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-07-13 19:33:28 -04:00
Erik Hugne
2cf8aa19fe tipc: use standard printk shortcut macros (pr_err etc.)
All messages should go directly to the kernel log.  The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.

Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.

Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG).  The latter will be
phased out completely

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-07-13 19:24:44 -04:00
Ben Hutchings
2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
Paul Gortmaker
617d3c7a50 tipc: compress out gratuitous extra carriage returns
Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-30 15:53:56 -04:00
Allan Stephens
5f6d9123f1 tipc: Eliminate trivial buffer manipulation helper routines
Gets rid of two inlined routines that simply call existing sk_buff
manipulation routines, since there is no longer any extra processing
done by the helper routines.

Note that these changes are essentially cosmetic in nature, and have
no impact on the actual operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-24 17:05:16 -05:00
Allan Stephens
63e7f1ac28 tipc: Prevent loss of fragmented messages over broadcast link
Modifies broadcast link so that an incoming fragmented message is not
lost if reassembly cannot begin because there currently is no buffer
big enough to hold the entire reassembled message. The broadcast link
now ignores the first fragment completely, which causes the sending node
to retransmit the first fragment so that reassembly can be re-attempted.

Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.

To do this cleanly without duplicaton, a new bclink_accept_pkt()
function is introduced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:19 -05:00
Allan Stephens
1ec2bb0840 tipc: Remove obsolete broadcast tag capability
Eliminates support for the broadcast tag field, which is no longer
used by broadcast link NACK messages.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:18 -05:00
Allan Stephens
7a54d4a99d tipc: Major redesign of broadcast link ACK/NACK algorithms
Completely redesigns broadcast link ACK and NACK mechanisms to prevent
spurious retransmit requests in dual LAN networks, and to prevent the
broadcast link from stalling due to the failure of a receiving node to
acknowledge receiving a broadcast message or request its retransmission.

Note: These changes only impact the timing of when ACK and NACK messages
are sent, and not the basic broadcast link protocol itself, so inter-
operability with nodes using the "classic" algorithms is maintained.

The revised algorithms are as follows:

1) An explicit ACK message is still sent after receiving 16 in-sequence
messages, and implicit ACK information continues to be carried in other
unicast link message headers (including link state messages).  However,
the timing of explicit ACKs is now based on the receiving node's absolute
network address rather than its relative network address to ensure that
the failure of another node does not delay the ACK beyond its 16 message
target.

2) A NACK message is now typically sent only when a message gap persists
for two consecutive incoming link state messages; this ensures that a
suspected gap is not confirmed until both LANs in a dual LAN network have
had an opportunity to deliver the message, thereby preventing spurious NACKs.
A NACK message can also be generated by the arrival of a single link state
message, if the deferred queue is so big that the current message gap
cannot be the result of "normal" mis-ordering due to the use of dual LANs
(or one LAN using a bonded interface). Since link state messages typically
arrive at different nodes at different times the problem of multiple nodes
issuing identical NACKs simultaneously is inherently avoided.

3) Nodes continue to "peek" at NACK messages sent by other nodes. If
another node requests retransmission of a message gap suspected (but not
yet confirmed) by the peeking node, the peeking node forgets about the
gap and does not generate a duplicate retransmit request. (If the peeking
node subsequently fails to receive the lost message, later link state
messages will cause it to rediscover and confirm the gap and send another
NACK.)

4) Message gap "equality" is now determined by the start of the gap only.
This is sufficient to deal with the most common cases of message loss,
and eliminates the need for complex end of gap computations.

5) A peeking node no longer tries to determine whether it should send a
complementary NACK, since the most common cases of message loss don't
require it to be sent. Consequently, the node no longer examines the
"broadcast tag" field of a NACK message when peeking.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:18 -05:00
Allan Stephens
b98158e3b3 tipc: Add missing locks in broadcast link statistics accumulation
Ensures that all attempts to update broadcast link statistics are done
only while holding the lock that protects the link's main data structures,
to prevent interference by simultaneous updates caused by messages
arriving on other interfaces.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:18 -05:00
Allan Stephens
0232c5a566 tipc: Fix bug in broadcast link duplicate message statistics
Modifies broadcast link so that it increments the "received duplicate
message" count if an incoming message cannot be added to the deferred
message queue because it is already present in the queue. (The aligns
broadcast link behavior with that of TIPC's unicast links.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:17 -05:00
Allan Stephens
8a275a6a30 tipc: Fix node lock reclamation issues in broadcast link reception
Fixes a pair of problems in broadcast link message reception code
relating to the reclamation of the node lock after consuming an
in-sequence message.

1) Now retests to see if the sending node is still up after reclaiming
   the node lock, and bails out if it is non-operational.

2) Now manipulates the node's deferred message queue only after
   reclaiming the node lock, rather than using queue head pointer
   information that was cached previously.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:17 -05:00
Allan Stephens
57732560d1 tipc: Add missing broadcast link lock when sending NACK
Ensures that any attempt to send a NACK message over TIPC's broadcast
link has exclusive access to the link's main data structures, to prevent
interference with a simultaneous attempt to send other broadcast link
traffic (such as application-generated multicast messages).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-06 16:59:17 -05:00
Paul Gortmaker
a18c4bc3ea tipc: rename struct link* to struct tipc_link*
This converts the following:

	struct link		->	struct tipc_link
	struct link_req		->	struct tipc_link_req
	struct link_name	->	struct tipc_link_name

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-29 21:53:30 -05:00
Paul Gortmaker
7f9ab6ac2e tipc: rename struct bcbearer* to tipc_bcbearer*
This changes both the struct bcbearer and struct bcbearer_pair to
have the "tipc_" prefix.  Runtime behaviour is unchanged.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-29 21:53:30 -05:00
Paul Gortmaker
6765fd6771 tipc: rename struct bclink to struct tipc_bclink
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-29 21:53:30 -05:00
Paul Gortmaker
4584310b4a tipc: rename struct port_list to struct tipc_port_list
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-29 21:53:29 -05:00
Paul Gortmaker
358a0d1c9e tipc: rename struct media to struct tipc_media
Give it a meaningful prefix, as suggested by DaveM, so that it
is consistent with things like struct tipc_bearer, and so it isn't
confused with anything else.  This has no impact on the actual
runtime code behaviour.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-29 21:53:29 -05:00
Allan Stephens
f905730c7e tipc: Allow use of buf_seqno() helper routine by unicast links
Migrates the buf_seqno() helper routine from broadcast link level to
unicast link level so that it can be used both types of TIPC links.
This is a cosmetic change only, and does not affect the operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:54 -05:00
Allan Stephens
3655959143 tipc: Ignore broadcast acknowledgements that are out-of-range
Adds checks to TIPC's broadcast link so that it ignores any
acknowledgement message containing a sequence number that does not
correspond to an unacknowledged message currently in the broadcast
link's transmit queue.

This change prevents the broadcast link from becoming stalled if a
newly booted node receives stale broadcast link acknowledgement
information from another node that has not yet fully synchronized
its end of the broadcast link to reflect the current state of the
new node's end.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:53 -05:00
Allan Stephens
10745cd599 tipc: Flush unsent broadcast messages when contact with last node is lost
Adds code to release any unsent broadcast messages in the broadcast link
transmit queue if TIPC loses contact with its only neighboring node.
Previously, a broadcast link that was in the congested state would hold
on to the unsent messages, even though the messages were now undeliverable.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:52 -05:00
Allan Stephens
9157bafb44 tipc: Minor optimization of broadcast link transmit queue statistic
The two broadcast link statistics fields that are used to derive the
average length of that link's transmit queue are now updated only after
a successful attempt to send a broadcast message, since there is no need
to update these values when an unsuccessful send attempt leaves the
queue unchanged.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:51 -05:00
Allan Stephens
2b78f9a002 tipc: Handle broadcast attempt when no neighboring nodes exist
Adds a check to detect when an attempt is made to send a message
via the broadcast link and no neighboring nodes are currently available
to receive it. Rather than wasting effort passing the message to the
broadcast link and broadcast bearer, who will only throw it away,
TIPC now frees the message immediately and reports success (i.e. the
message has been delivered to all available destinations).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:50 -05:00
Allan Stephens
cd3decdfd1 tipc: Ensure broadcast link spinlock is held when updating node map
Fixes oversight that allowed broadcast link node map to be updated without
first taking the broadcast link spinlock that protects the map. As part
of this fix the node map has been incorporated into the broadcast link
structure to make the need for such protection more evident.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:48 -05:00
Allan Stephens
c47e9b9188 tipc: Eliminate dynamic allocation of broadcast link data structures
Creates global variables to hold the broadcast link's pseudo-bearer and
pseudo-link structures, rather than allocating them dynamically. There
is only a single instance of each structure, and changing over to static
allocation allows elimination of code to handle the cases where dynamic
allocation was unsuccessful.

The memset in the teardown code may look like they aren't used, but
the same teardown code is run when there is a non-fatal error at
init-time, so that stale data isn't present when the user fixes the
cause of the soft error.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-27 11:33:47 -05:00
Allan Stephens
0f38513d22 tipc: Remove obsolete congestion handling when sending a broadcast NACK
Eliminates obsolete code that handles broadcast bearer congestion when
the broadast link sends a NACK message, since the broadcast pseudo-bearer
never becomes blocked.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:37 -04:00
Allan Stephens
9f6bdcd428 tipc: Discard incoming broadcast messages that are unexpected
Modifies TIPC's incoming broadcast packet handler to discard messages
that cannot legally be sent over the broadcast link, including:

- broadcast protocol messages that do no contain state information
- payload messages that are not named multicast messages
- any other form of message except for bundled messages, fragmented
  messages, and name distribution messages.

These checks are needed to prevent TIPC from handing an unexpected
message to a routine that isn't prepared to handle it, which could
lead to incorrect processing (up to and including invalid memory
references caused by attempts to access message fields that aren't
present in the message).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:37 -04:00
Allan Stephens
693d03ae3c tipc: Remove deferred queue head caching during broadcast message reception
Modifies TIPC's incoming broadcast packet handler so that it no longer
pre-reads information about the deferred packet queue, since the cached
value is unreliable once the associated node lock has been released.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:36 -04:00
Allan Stephens
5d3c488dfe tipc: Fix node lock problems during broadcast message reception
Modifies TIPC's incoming broadcast packet handler to ensure that the
node lock associated with the sender of the packet is held whenever
node-related data structure fields are accessed. The routine is also
restructured with a single exit point, making it easier to ensure
the node lock is properly released and the incoming packet is properly
disposed of.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:36 -04:00
Allan Stephens
23f0ff906a tipc: Remove non-executable code to handle broadcast bearer congestion
Eliminates code associated with the sending of unsent broadcast link
traffic when the broadcast pseudo-bearer becomes unblocked following a
temporary congestion situation. This code is non-executable because the
broadcast pseudo-bearer never becomes blocked [see tipc_bcbearer_send()].

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:36 -04:00
Allan Stephens
2ff9f924a5 tipc: Cosmetic changes to broadcast bearer send routine
Updates the comments in the broadcast bearer send routine to more
accurately describe the processing done by the routine. Also replaces
the improper use of a TIPC payload message error status symbol (in a place
that has nothing to do with such errors) with its numeric equivalent.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:36 -04:00
Allan Stephens
2e2d9be845 tipc: Update obsolete references to multicast link
Updates TIPC's broadcast link in a couple of places that were missed
during the transition from its former name ("multicast-link") to its
current name ("broadcast-link"). These changes are essentially cosmetic
and do not affect the overall operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-01 11:16:35 -04:00