linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-03 15:47:36 +00:00

Author	SHA1	Message	Date
Or Gerlitz	faa3ffce78	net/sched: cls_flower: Add support for matching on flags Add UAPI to provide set of flags for matching, where the flags provided from user-space are mapped to flow-dissector flags. The 1st flag allows to match on whether the packet is an IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:32:50 -05:00
Dan Carpenter	d26aac2d87	net: mvneta: Indent some statements These two statements were not indented correctly so it's sort of confusing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:29:21 -05:00
Dan Carpenter	0b801290ea	drivers: net: xgene: uninitialized variable in xgene_enet_free_pagepool() We never set "slots" in this function. Fixes: `a9380b0f7b` ("drivers: net: xgene: Add support for Jumbo frame") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:27:46 -05:00
Peng Tao	635abf0191	vhost: remove unnecessary smp_mb from vhost_work_queue test_and_set_bit() already implies a memory barrier. Signed-off-by: Peng Tao <bergwolf@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:26:46 -05:00
Peng Tao	1440a3a12b	vhost-vsock: remove unused vq variable Signed-off-by: Peng Tao <bergwolf@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:26:46 -05:00
Zhang Shengju	f91c58d68b	icmp: correct return value of icmp_rcv() Currently, icmp_rcv() always return zero on a packet delivery upcall. To make its behavior more compliant with the way this API should be used, this patch changes this to let it return NET_RX_SUCCESS when the packet is proper handled, and NET_RX_DROP otherwise. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 11:24:23 -05:00
David S. Miller	5fccd64aa4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a large Netfilter update for net-next, to summarise: 1) Add support for stateful objects. This series provides a nf_tables native alternative to the extended accounting infrastructure for nf_tables. Two initial stateful objects are supported: counters and quotas. Objects are identified by a user-defined name, you can fetch and reset them anytime. You can also use a maps to allow fast lookups using any arbitrary key combination. More info at: http://marc.info/?l=netfilter-devel&m=148029128323837&w=2 2) On-demand registration of nf_conntrack and defrag hooks per netns. Register nf_conntrack hooks if we have a stateful ruleset, ie. state-based filtering or NAT. The new nf_conntrack_default_on sysctl enables this from newly created netnamespaces. Default behaviour is not modified. Patches from Florian Westphal. 3) Allocate 4k chunks and then use these for x_tables counter allocation requests, this improves ruleset load time and also datapath ruleset evaluation, patches from Florian Westphal. 4) Add support for ebpf to the existing x_tables bpf extension. From Willem de Bruijn. 5) Update layer 4 checksum if any of the pseudoheader fields is updated. This provides a limited form of 1:1 stateless NAT that make sense in specific scenario, eg. load balancing. 6) Add support to flush sets in nf_tables. This series comes with a new set->ops->deactivate_one() indirection given that we have to walk over the list of set elements, then deactivate them one by one. The existing set->ops->deactivate() performs an element lookup that we don't need. 7) Two patches to avoid cloning packets, thus speed up packet forwarding via nft_fwd from ingress. From Florian Westphal. 8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to prevent infinite loops, patch from Dwip Banerjee. And one minor refactoring from Gao feng. 9) Revisit recent log support for nf_tables netdev families: One patch to ensure that we correctly handle non-ethernet packets. Another patch to add missing logger definition for netdev. Patches from Liping Zhang. 10) Three patches for nft_fib, one to address insufficient register initialization and another to solve incorrect (although harmless) byteswap operation. Moreover update xt_rpfilter and nft_fib to match lbcast packets with zeronet as source, eg. DHCP Discover packets (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang. 11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has been broken in many-cast mode for some little time, let's give them a chance by placing them at the same level as other existing protocols. Thus, users don't explicitly have to modprobe support for this and NAT rules work for them. Some people point to the lack of support in SOHO Linux-based routers that make deployment of new protocols harder. I guess other middleboxes outthere on the Internet are also to blame. Anyway, let's see if this has any impact in the midrun. 12) Skip software SCTP software checksum calculation if the NIC comes with SCTP checksum offload support. From Davide Caratti. 13) Initial core factoring to prepare conversion to hook array. Three patches from Aaron Conole. 14) Gao Feng made a wrong conversion to switch in the xt_multiport extension in a patch coming in the previous batch. Fix it in this batch. 15) Get vmalloc call in sync with kmalloc flags to avoid a warning and likely OOM killer intervention from x_tables. From Marcelo Ricardo Leitner. 16) Update Arturo Borrero's email address in all source code headers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 19:16:46 -05:00
David S. Miller	63c36c40b9	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-12-07 This series contains updates to i40e and i40evf only. Filip modifies the i40e to log link speed change and when the link is brought up and down. Mitch replaces i40e_txd_use_count() with a new function which is slightly faster and better documented so the dim witted can better follow the code. Fixes the locking of the service task so that it is actually done in the service task and not in the scheduling function which calls the service task. Jacob, being the busy little beaver he is, provides most of the changes starting restores a workaround that is still needed in some configurations, specifically the Ethernet Controller XL710 for 40GbE QSFP+. Removes duplicate code and simplifies the i40e_vsi_add_vlan() and i40e_vsi_kill_vlan() functions. Removes detection of PTP frames over L4 (UDP) on the XL710 MAC, since there was a product decision to defeature it. Fixed a previous refactor of active filters which caused issues in the accounting of active_filters. Remaining work was done in the VLAN filters to improve readability and simplify code as much as possible to reduce inconsistencies. Alex fixes foul budget accounting in core code by returning actual work done, capped to budget-1. Henry fixes the "ethtool -p" function for 1G BaseT PHYs. Carolyn adds support for 25G devices for i40e and i40evf. Michal adds functions to apply the correct access method for external PHYs which could use Clause22 or Clause45 depending on the PHY. v2: dropped last patch from previous series, since changes are needed based on feedback from Sergei Shtylyov ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 19:03:23 -05:00
Zhang Shengju	25e3e84b18	dummy: expend mtu range for dummy device After commit `61e84623ac` ("net: centralize net_device min/max MTU checking"), the mtu range for dummy device becomes [68, 1500]. This patch extends it to [0, 65535]. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:29:45 -05:00
Zhang Shengju	e82621e329	nlmon: use core MTU range checking in nlmon driver Since commit `61e84623ac` ("net: centralize net_device min/max MTU checking"), mtu range is checked at dev_set_mtu(). This patch adds min_mtu for nlmon device and remove unnecessary ndo_change_mtu() function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:28:26 -05:00
Gao Feng	a1f5315ce4	driver: macvlan: Remove the rcu member of macvlan_port When free macvlan_port in macvlan_port_destroy, it is safe to free directly because netdev_rx_handler_unregister could enforce one grace period. So it is unnecessary to use kfree_rcu for macvlan_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:22:07 -05:00
Gao Feng	48140a210b	driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu There are two functions which would free the ipvl_port now. The first is ipvlan_port_create. It frees the ipvl_port in the error handler, so it could kfree it directly. The second is ipvlan_port_destroy. It invokes netdev_rx_handler_unregister which enforces one grace period by synchronize_net firstly, so it also could kfree the ipvl_port directly and safely. So it is unnecessary to use kfree_rcu to free ipvl_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:21:21 -05:00
Daniel Borkmann	ef0915cacd	bpf: fix loading of BPF_MAXINSNS sized programs General assumption is that single program can hold up to BPF_MAXINSNS, that is, 4096 number of instructions. It is the case with cBPF and that limit was carried over to eBPF. When recently testing digest, I noticed that it's actually not possible to feed 4096 instructions via bpf(2). The check for > BPF_MAXINSNS was added back then to bpf_check() in `cbd3570086` ("bpf: verifier (add ability to receive verification log)"). However, `09756af468` ("bpf: expand BPF syscall with program load/unload") added yet another check that comes before that into bpf_prog_load(), but this time bails out already in case of >= BPF_MAXINSNS. Fix it up and perform the check early in bpf_prog_load(), so we can drop the second one in bpf_check(). It makes sense, because also a 0 insn program is useless and we don't want to waste any resources doing work up to bpf_check() point. The existing bpf(2) man page documents E2BIG as the official error for such cases, so just stick with it as well. Fixes: `09756af468` ("bpf: expand BPF syscall with program load/unload") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:16:04 -05:00
Niklas Cassel	90364feaf7	net: stmmac: do not call phy_ethtool_ksettings_set from atomic context >From what I can tell, spin_lock(&priv->lock) is not needed, since the phy_ethtool_ksettings_set call is not given the priv struct. phy_start_aneg takes the phydev->lock. Calls to phy_adjust_link from phy_state_machine also takes the phydev->lock. [ 13.718319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 [ 13.726717] in_atomic(): 1, irqs_disabled(): 0, pid: 1307, name: ethtool [ 13.742115] Hardware name: Axis ARTPEC-6 Platform [ 13.746829] [<80110568>] (unwind_backtrace) from [<8010c2bc>] (show_stack+0x18/0x1c) [ 13.754575] [<8010c2bc>] (show_stack) from [<80433484>] (dump_stack+0x80/0xa0) [ 13.761801] [<80433484>] (dump_stack) from [<80145428>] (___might_sleep+0x108/0x170) [ 13.769554] [<80145428>] (___might_sleep) from [<806c9b50>] (mutex_lock+0x24/0x44) [ 13.777128] [<806c9b50>] (mutex_lock) from [<8050cbc0>] (phy_start_aneg+0x1c/0x13c) [ 13.784783] [<8050cbc0>] (phy_start_aneg) from [<8050d338>] (phy_ethtool_ksettings_set+0x98/0xd0) [ 13.793656] [<8050d338>] (phy_ethtool_ksettings_set) from [<80517adc>] (stmmac_ethtool_set_link_ksettings+0xa0/0xb4) [ 13.804184] [<80517adc>] (stmmac_ethtool_set_link_ksettings) from [<805c5138>] (ethtool_set_settings+0xd4/0x13c) [ 13.814358] [<805c5138>] (ethtool_set_settings) from [<805c9718>] (dev_ethtool+0x13c4/0x211c) [ 13.822882] [<805c9718>] (dev_ethtool) from [<805dc7c0>] (dev_ioctl+0x480/0x8e0) [ 13.830291] [<805dc7c0>] (dev_ioctl) from [<80260e34>] (do_vfs_ioctl+0x94/0xa00) [ 13.837699] [<80260e34>] (do_vfs_ioctl) from [<802617dc>] (SyS_ioctl+0x3c/0x60) [ 13.845011] [<802617dc>] (SyS_ioctl) from [<801088bc>] (__sys_trace_return+0x0/0x10) Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:08:23 -05:00
David S. Miller	64e8de58c5	Merge branch 'ti-cpts-update-and-fixes' Grygorii Strashko says: ==================== net: ethernet: ti: cpts: update and fixes It is preparation series intended to clean up and optimize TI CPTS driver to facilitate further integration with other TI's SoCs like Keystone 2. Changes in v5: - fixed copy paste error in cpts_release - reworked cc.mult/shift and cc_mult initialization Changes in v4: - fixed build error in patch "net: ethernet: ti: cpts: clean up event list if event pool is empty" - rebased on top of net-next Changes in v3: - patches reordered: fixes and small updates moved first - added comments in code about cpts->cc_mult - conversation range (maxsec) limited to 10sec Changes in v2: - patch "net: ethernet: ti: cpts: rework initialization/deinitialization" was split on 4 patches - applied comments from Richard Cochran - dropped patch "net: ethernet: ti: cpts: add return value to tx and rx timestamp funcitons" - new patches added: "net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs" and "clocksource: export the clocks_calc_mult_shift to use by timestamp code" Links on prev versions: v4: https://lkml.org/lkml/2016/12/6/496 v3: https://www.spinics.net/lists/devicetree/msg153474.html v2: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1282034.html v1: http://www.spinics.net/lists/linux-omap/msg131925.html ==================== Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:49 -05:00
Grygorii Strashko	20138cf9ef	net: ethernet: ti: cpts: fix overflow check period The CPTS drivers uses 8sec period for overflow checking with assumption that CPTS retclk will not exceed 500MHz. But that's not true on some TI platforms (Kesytone 2). As result, it is possible that CPTS counter will overflow more than once between two readings. Hence, fix it by selecting overflow check period dynamically as max_sec_before_overflow/2, where max_sec_before_overflow = max_counter_val / rftclk_freq. Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:48 -05:00
Grygorii Strashko	88f0f0b0be	net: ethernet: ti: cpts: calc mult and shift from refclk freq The cyclecounter mult and shift values can be calculated based on the CPTS rfclk frequency and timekeepnig framework provides required algos and API's. Hence, calc mult and shift basing on CPTS rfclk frequency if both cpts_clock_shift and cpts_clock_mult properties are not provided in DT (the basis of calculation algorithm is borrowed from __clocksource_update_freq_scale() commit `7d2f944a2b` ("clocksource: Provide a generic mult/shift factor calculation")). After this change cpts_clock_shift and cpts_clock_mult DT properties will become optional. Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:48 -05:00
Murali Karicheri	5304121ada	clocksource: export the clocks_calc_mult_shift to use by timestamp code The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and requires to know mult and shift factors for timestamp conversion from raw value to nanoseconds (ptp clock). Now these mult and shift factors are calculated manually and provided through DT, which makes very hard to support of a lot number of platforms, especially if CPTS refclk is not the same for some kind of boards and depends on efuse settings (Keystone 2 platforms). Hence, export clocks_calc_mult_shift() to allow drivers like CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of mult and shift factors. Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:48 -05:00
Grygorii Strashko	4a88fb9565	net: ethernet: ti: cpts: move dt props parsing to cpts driver Move DT properties parsing into CPTS driver to simplify CPSW code and CPTS driver porting on other SoC in the future (like Keystone 2) - with this change it will not be required to add the same DT parsing code in Keystone 2 NETCP driver. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:48 -05:00
Grygorii Strashko	8a2c9a5ab4	net: ethernet: ti: cpts: rework initialization/deinitialization The current implementation CPTS initialization and deinitialization (represented by cpts_register/unregister()) does too many static initialization from .ndo_open(), which is reasonable to do once at probe time instead, and also require caller to allocate memory for struct cpts, which is internal for CPTS driver in general. This patch splits CPTS initialization and deinitialization on two parts: - static initializtion cpts_create()/cpts_release() which expected to be executed when parent driver is probed/removed; - dynamic part cpts_register/unregister() which expected to be executed when network device is opened/closed. As result, current code of CPTS parent driver - CPSW - will be simplified (and it also will allow simplify adding support for Keystone 2 devices in the future), plus more initialization errors will be catched earlier. In addition, this change allows to clean up cpts.h for the case when CPTS is disabled. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:47 -05:00
Grygorii Strashko	2a79df3ee9	net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs CPTS module and IRQs are always enabled when CPTS is registered, before starting overflow check work, and disabled during deregistration, when overflow check work has been canceled already. So, It doesn't require to (re)enable CPTS module and IRQs in cpts_overflow_check(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:47 -05:00
WingMan Kwok	e4439fa838	net: ethernet: ti: cpts: clean up event list if event pool is empty When a CPTS user does not exit gracefully by disabling cpts timestamping and leaving a joined multicast group, the system continues to receive and timestamps the ptp packets which eventually occupy all the event list entries. When this happns, the added code tries to remove some list entries which are expired. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:47 -05:00
Grygorii Strashko	8fcd68914e	net: ethernet: ti: cpts: disable cpts when unregistered The cpts now is left enabled after unregistration. Hence, disable it in cpts_unregister(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:47 -05:00
Grygorii Strashko	6c691405bc	net: ethernet: ti: cpts: fix registration order The ptp clock registered before spinlock, which is protecting it, and before timecounter and cyclecounter initialization in cpts_register(). So, ensure that ptp clock is registered the last, after everything else is done. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:46 -05:00
Grygorii Strashko	fd123a9414	net: ethernet: ti: cpts: fix unbalanced clk api usage in cpts_register/unregister There are two issues with TI CPTS code which are reproducible when TI CPSW ethX device passes few up/down iterations: - cpts refclk prepare counter continuously incremented after each up/down iteration; - devm_clk_get(dev, "cpts") is called many times. Hence, fix these issues by using clk_disable_unprepare() in cpts_clk_release() and skipping devm_clk_get() if cpts refclk has been acquired already. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:46 -05:00
Grygorii Strashko	b63ba58ee9	net: ethernet: ti: cpsw: minimize direct access to struct cpts This will provide more flexibility in changing CPTS internals and also required for further changes. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:46 -05:00
Grygorii Strashko	c8395d4e1d	net: ethernet: ti: allow cpts to be built separately TI CPTS IP is used as part of TI OMAP CPSW driver, but it's also present as part of NETCP on TI Keystone 2 SoCs. So, It's required to enable build of CPTS for both this drivers and this can be achieved by allowing CPTS to be built separately. Hence, allow cpts to be built separately and convert it to be a module as both CPSW and NETCP drives can be built as modules. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:46 -05:00
Grygorii Strashko	391fd6caf5	net: ethernet: ti: cpts: switch to readl/writel_relaxed() Switch to readl/writel_relaxed() APIs, because this is recommended API and the CPTS IP is reused on Keystone 2 SoCs where LE/BE modes are supported. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 11:13:45 -05:00
David S. Miller	d3243aef98	Merge branch 'bnxt_en-RDMA' Michael Chan says: ==================== bnxt_en: Add interface to support RDMA driver. This series adds an interface to support a brand new RDMA driver bnxt_re. The first step is to re-arrange some code so that pci_enable_msix() can be called during pci probe. The purpose is to allow the RDMA driver to initialize and stay initialized whether the netdev is up or down. Then we make some changes to VF resource allocation so that there is enough resources to support RDMA. Finally the last patch adds a simple interface to allow the RDMA driver to probe and register itself with any bnxt_en devices that support RDMA. Once registered, the RDMA driver can request MSIX, send fw messages, and receive some notifications. v2: Fixed kbuild test robot warnings. David, please consider this series for net-next. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:27 -05:00
Michael Chan	a588e4580a	bnxt_en: Add interface to support RDMA driver. Since the network driver and RDMA driver operate on the same PCI function, we need to create an interface to allow the RDMA driver to share resources with the network driver. 1. Create a new bnxt_en_dev struct which will be returned by bnxt_ulp_probe() upon success. After that, all calls from the RDMA driver to bnxt_en will pass a pointer to this struct. 2. This struct contains additional function pointers to register, request msix, send fw messages, register for async events. 3. If the RDMA driver wants to enable RDMA on the function, it needs to call the function pointer bnxt_register_device(). A ulp_ops structure is passed for RCU protected upcalls from bnxt_en to the RDMA driver. 4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg() function pointer. 5. 1 stats context is reserved when the RDMA driver registers. MSIX and completion rings are reserved when the RDMA driver calls bnxt_request_msix() function pointer. 6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources will be cleaned up. v2: Fixed 2 uninitialized variable warnings. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:26 -05:00
Michael Chan	a1653b13f1	bnxt_en: Refactor the driver registration function with firmware. The driver register function with firmware consists of passing version information and registering for async events. To support the RDMA driver, the async events that we need to register may change. Separate the driver register function into 2 parts so that we can just update the async events for the RDMA driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:26 -05:00
Michael Chan	e4060d306b	bnxt_en: Reserve RDMA resources by default. If the device supports RDMA, we'll setup network default rings so that there are enough minimum resources for RDMA, if possible. However, the user can still increase network rings to the max if he wants. The actual RDMA resources won't be reserved until the RDMA driver registers. v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:26 -05:00
Michael Chan	7b08f661ab	bnxt_en: Improve completion ring allocation for VFs. All available remaining completion rings not used by the PF should be made available for the VFs so that there are enough rings in the VF to support RDMA. The earlier workaround code of capping the rings by the statistics context is removed. When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources() to restore FW resources. Later on we need to add some logic to account for RDMA resources. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:26 -05:00
Michael Chan	aa8ed021ab	bnxt_en: Move function reset to bnxt_init_one(). Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by the RDMA driver before the network device is opened. So we cannot do function reset in bnxt_open() which will clear all the resources. The proper place to do function reset now is in bnxt_init_one(). If we get AER, we'll do function reset as well. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:25 -05:00
Michael Chan	7809592d3e	bnxt_en: Enable MSIX early in bnxt_init_one(). To better support the new RDMA driver, we need to move pci_enable_msix() from bnxt_open() to bnxt_init_one(). This way, MSIX vectors are available to the RDMA driver whether the network device is up or down. Part of the existing bnxt_setup_int_mode() function is now refactored into a new bnxt_init_int_mode(). bnxt_init_int_mode() is called during bnxt_init_one() to enable MSIX. The remaining logic in bnxt_setup_int_mode() to map the IRQs to the completion rings is called during bnxt_open(). v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:25 -05:00
Michael Chan	33c2657eb6	bnxt_en: Add bnxt_set_max_func_irqs(). By refactoring existing code into this new function. The new function will be used in subsequent patches. v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:59:25 -05:00
Eric Dumazet	5b8e2f61b9	net: sock_rps_record_flow() is for connected sockets Paolo noticed a cache line miss in UDP recvmsg() to access sk_rxhash, sharing a cache line with sk_drops. sk_drops might be heavily incremented by cpus handling a flood targeting this socket. We might place sk_drops on a separate cache line, but lets try to avoid wasting 64 bytes per socket just for this, since we have other bottlenecks to take care of. sock_rps_record_flow() should only access sk_rxhash for connected flows. Testing sk_state for TCP_ESTABLISHED covers most of the cases for connected sockets, for a zero cost, since system calls using sock_rps_record_flow() also access sk->sk_prot which is on the same cache line. A follow up patch will provide a static_key (Jump Label) since most hosts do not even use RFS. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 10:47:35 -05:00
Pablo Neira Ayuso	73c25fb139	netfilter: nft_quota: allow to restore consumed quota Allow to restore consumed quota, this is useful to restore the quota state across reboots. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 14:40:53 +01:00
Willem de Bruijn	2c16d60332	netfilter: xt_bpf: support ebpf Add support for attaching an eBPF object by file descriptor. The iptables binary can be called with a path to an elf object or a pinned bpf object. Also pass the mode and path to the kernel to be able to return it later for iptables dump and save. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:32:35 +01:00
Marcelo Ricardo Leitner	5bad87348c	netfilter: x_tables: avoid warn and OOM killer on vmalloc call Andrey Konovalov reported that this vmalloc call is based on an userspace request and that it's spewing traces, which may flood the logs and cause DoS if abused. Florian Westphal also mentioned that this call should not trigger OOM killer. This patch brings the vmalloc call in sync to kmalloc and disables the warn trace on allocation failure and also disable OOM killer invocation. Note, however, that under such stress situation, other places may trigger OOM killer invocation. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:31:41 +01:00
Pablo Neira Ayuso	8411b6442e	netfilter: nf_tables: support for set flushing This patch adds support for set flushing, that consists of walking over the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set. This patch requires the following changes: 1) Add set->ops->deactivate_one() operation: This allows us to deactivate an element from the set element walk path, given we can skip the lookup that happens in ->deactivate(). 2) Add a new nft_trans_alloc_gfp() function since we need to allocate transactions using GFP_ATOMIC given the set walk path happens with held rcu_read_lock. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:31:40 +01:00
Pablo Neira Ayuso	37df5301a3	netfilter: nft_set: introduce nft_{hash, rbtree}_deactivate_one() This new function allows us to deactivate one single element, this is required by the set flush command that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:31:02 +01:00
Pablo Neira Ayuso	1a37ef769d	netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc() Context is not modified by nft_trans_alloc(), so constify it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:51 +01:00
Davide Caratti	3189a290f9	netfilter: nat: skip checksum on offload SCTP packets SCTP GSO and hardware can do CRC32c computation after netfilter processing, so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE when the NAT code computes the CRC, to prevent offloaders from computing it again (on ixgbe this resulted in a transmission with wrong L4 checksum). Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:50 +01:00
Liping Zhang	3b760dcb0f	netfilter: rpfilter: bypass ipv4 lbcast packets with zeronet source Otherwise, DHCP Discover packets(0.0.0.0->255.255.255.255) may be dropped incorrectly. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:50 +01:00
Pablo Neira Ayuso	a9fea2a3c3	netfilter: nf_tables: allow to filter stateful object dumps by type This patch adds the netlink code to filter out dump of stateful objects, through the NFTA_OBJ_TYPE netlink attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:49 +01:00
Pablo Neira Ayuso	63aea29060	netfilter: nft_objref: support for stateful object maps This patch allows us to refer to stateful object dictionaries, the source register indicates the key data to be used to look up for the corresponding state object. We can refer to these maps through names or, alternatively, the map transaction id. This allows us to refer to both anonymous and named maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:48 +01:00
Pablo Neira Ayuso	8aeff920dc	netfilter: nf_tables: add stateful object reference to set elements This patch allows you to refer to stateful objects from set elements. This provides the infrastructure to create maps where the right hand side of the mapping is a stateful object. This allows us to build dictionaries of stateful objects, that you can use to perform fast lookups using any arbitrary key combination. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:47 +01:00
Pablo Neira Ayuso	1896531710	netfilter: nft_quota: add depleted flag for objects Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag indicates we have reached overquota. Add pointer to table from nft_object, so we can use it when sending the depletion notification to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 13:22:12 +01:00
Pablo Neira Ayuso	2599e98934	netfilter: nf_tables: notify internal updates of stateful objects Introduce nf_tables_obj_notify() to notify internal state changes in stateful objects. This is used by the quota object to report depletion in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-12-07 12:57:20 +01:00

1 2 3 4 5 ...

637048 commits