We encountered a situation where concurrent invocations of the docker daemon on a machine with an older version of iptables led to nondeterministic errors related to simultaenous invocations of iptables.
While this is best resolved by upgrading iptables itself, the particular situation would have been avoided if the docker daemon simply took care not to concurrently invoke iptables. Of course, external processes could also cause iptables to fail in this way, but invoking docker in parallel seems like a pretty common case.
Signed-off-by: Aaron Davidson <aaron@databricks.com>
When firewalld (or iptables service) restarts/reloads,
all previously added docker firewall rules are flushed.
With firewalld we can react to its Reloaded() [1]
D-Bus signal and recreate the firewall rules.
Also when firewalld gets restarted (stopped & started)
we can catch the NameOwnerChanged signal [2].
To specify which signals we want to react to we use AddMatch [3].
Libvirt has been doing this for quite a long time now.
Docker changes firewall rules on basically 3 places.
1) daemon/networkdriver/portmapper/mapper.go - port mappings
Portmapper fortunatelly keeps list of mapped ports,
so we can easily recreate firewall rules on firewalld restart/reload
New ReMapAll() function does that
2) daemon/networkdriver/bridge/driver.go
When setting a bridge, basic firewall rules are created.
This is done at once during start, it's parametrized and nowhere
tracked so how can one know what and how to set it again when
there's been firewalld restart/reload ?
The only solution that came to my mind is using of closures [4],
i.e. I keep list of references to closures (anonymous functions
together with a referencing environment) and when there's firewalld
restart/reload I re-call them in the same order.
3) links/links.go - linking containers
Link is added in Enable() and removed in Disable().
In Enable() we add a callback function, which creates the link,
that's OK so far.
It'd be ideal if we could remove the same function from
the list in Disable(). Unfortunatelly that's not possible AFAICT,
because we don't know the reference to that function
at that moment, so we can only add a reference to function,
which removes the link. That means that after creating and
removing a link there are 2 functions in the list,
one adding and one removing the link and after
firewalld restart/reload both are called.
It works, but it's far from ideal.
[1] https://jpopelka.fedorapeople.org/firewalld/doc/firewalld.dbus.html#FirewallD1.Signals.Reloaded
[2] http://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-name-owner-changed
[3] http://dbus.freedesktop.org/doc/dbus-specification.html#message-bus-routing-match-rules
[4] https://en.wikipedia.org/wiki/Closure_%28computer_programming%29
Signed-off-by: Jiri Popelka <jpopelka@redhat.com>
Firewalld [1] is a firewall managing daemon with D-Bus interface.
What sort of problem are we trying to solve with this ?
Firewalld internally also executes iptables/ip6tables to change firewall settings.
It might happen on systems where both docker and firewalld are running
concurrently, that both of them try to call iptables at the same time.
The result is that the second one fails because the first one is holding a xtables lock.
One workaround is to use --wait/-w option in both
docker & firewalld when calling iptables.
It's already been done in both upstreams:
b315c380f4b3b451d6f8
But it'd still be better if docker used firewalld when it's running.
Other problem the firewalld support would solve is that
iptables/firewalld service's restart flushes all firewall rules
previously added by docker.
See next patch for possible solution.
This patch utilizes firewalld's D-Bus interface.
If firewalld is running, we call direct.passthrough() [2] method instead
of executing iptables directly.
direct.passthrough() takes the same arguments as iptables tool itself
and passes them through to iptables tool.
It might be better to use other methods, like direct.addChain and
direct.addRule [3] so it'd be more intergrated with firewalld, but
that'd make the patch much bigger.
If firewalld is not running, everything works as before.
[1] http://www.firewalld.org/
[2] https://jpopelka.fedorapeople.org/firewalld/doc/firewalld.dbus.html#FirewallD1.direct.Methods.passthrough
[3] https://jpopelka.fedorapeople.org/firewalld/doc/firewalld.dbus.html#FirewallD1.direct.Methods.addChainhttps://jpopelka.fedorapeople.org/firewalld/doc/firewalld.dbus.html#FirewallD1.direct.Methods.addRule
Signed-off-by: Jiri Popelka <jpopelka@redhat.com>
This modifies iptables.Exists so that it must be called with an explicit
table and chain. This allows us (a) to generate an appropriate command
line for "iptables -C", which was not previously possible, and (b) it
allows us to limit our strings.Contains() search to just the table and
chain in question, preventing erroneous matches against unrelated rules.
Resolves#10781
Signed-off-by: Lars Kellogg-Stedman <lars@redhat.com>
Due to the iptables package being `init`ed at start of the docker
runtime, this means the iptables --wait command listing all rules
is run, no matter if the command is simply "docker -h". It makes
more sense to both locate the iptables command and check for the
wait flag support at the time iptables is actually used, as it
may not be used at all if certain network support is off/configured
differently.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This re-applies commit b39d02b with additional iptables rules to solve the issue with containers routing back into themselves.
The previous issue with this attempt was that the DNAT rule would send traffic back into the container it came from. When this happens you have 2 issues.
1) reverse path filtering. The container is going to see the traffic coming in from the outside and it's going to have a source address of itself. So reverse path filtering will kick in and drop the packet.
2) direct return mismatch. Assuming you turned reverse path filtering off, when the packet comes back in, it's goign to have a source address of itself, thus when the reply traffic is sent, it's going to have a source address of itself. But the original packet was sent to the host IP address, so the traffic will be dropped because it's coming from an address which the original traffic was not sent to (and likely with an incorrect port as well).
The solution to this is to masquerade the traffic when it gets routed back into the origin container. However for this to work you need to enable hairpin mode on the bridge port, otherwise the kernel will just drop the traffic.
The hairpin mode set is part of libcontainer, while the MASQ change is part of docker.
This reverts commit 63c303eecdbaf4dc7967fd51b82cd447c778cecc.
Docker-DCO-1.1-Signed-off-by: Patrick Hemmer <patrick.hemmer@gmail.com> (github: phemmer)
If iptables version is < 1.4.11, try to delete the rule vs. checking if it exists. Fixes#6831.
Docker-DCO-1.1-Signed-off-by: Jessica Frazelle <jfrazelle@users.noreply.github.com> (github: jfrazelle)
This reverts commit b39d02b611f1cc0af283f417b73bf0d36f26277a.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Hairpin NAT is currently done by passing through the docker server. If
two containers on the same box try to access each other through exposed
ports and using the host IP the current iptables rules will not match the
DNAT and thus the traffic goes to 'docker -d'
This change drops the restriction that DNAT traffic must not originate
from docker0. It should be safe to drop this restriction because the
DOCKER chain is already gated by jumps that check for the destination
address to be a local address.
Docker-DCO-1.1-Signed-off-by: Darren Shepherd <darren.s.shepherd@gmail.com> (github: ibuildthecloud)
Allow publicly mapped ports to be made public beyond the host. This is
needed for distros like Fedora and RHEL which have a reject all rule at
the end of their FORWARD table.
Docker-DCO-1.1-Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> (github: jpoimboe)