tech-net: Re: Fw: Re: tcp connections lost on interface down

Subject: Re: Fw: Re: tcp connections lost on interface down
To: Michael van Elst <mlelstv@serpens.de>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-net
Date: 08/17/2003 18:18:42
    Date:        Sun, 17 Aug 2003 10:28:22 +0000 (UTC)
    From:        mlelstv@serpens.de (Michael van Elst)
    Message-ID:  <bhnlc6$hs6$1@serpens.de>

  | it would also affect connections where the peer is sending.

It would yes, but the local system doesn't know that, or not until
after the address is reinstated.

  | The peer will notice that your IP-address changed

How?   There's no mechanism for this that I'm aware of.

  | and break his side of the connection.

If it is sending, and gets no responses, then yes, it will, eventually
regardless of whether or not our address changed.

  | But you will not notice anything but wait forever and in case of
  | simple listen-accept-handleio-close daemons you won't wait for new
  | connections to come in, preventing the peer from reestablishing the
  | connection.

So, you're suggesting that the case where what happened is that our
address changed should be handled completely differently than the case
where the network died for a while - in both cases, we're idle, and
either ignore it, or never notice and so have no option other than to
ignore it.   The other end is attempting to send, fails, and abandons
the connection (just the same, either way).

In the one case where the problem happened to be that our address was
removed, we could have detected that, and ended things.   But in the
other our node still has no idea that anything changed.   Surely your
simple daemon needs to be able to recover from that second case (net
outage for some time so peer has given up), just as much as the first
one (lost addr)?   If it has the mechanism to do that, won't that
mechanism work for the first case just as well?

  | Please just drop the suspend mode from your example.

Easy, it makes no difference (just seems a more obvious case for people
to see as desirable).

  | You pop the wireless
  | card from your laptop and wait 3 or 4 days until you re-insert the card.
  | Would you expect the connection to stay if not only the IP address is
  | gone but even the interface itself ?

Yes.  Definitely.   [As Manuel says in a slightly later message than yours,
NetBSD seems to have some "issues" with some of this at the minute, but
eventually someone will get annoyed enough, and fix those.]

  | What happens when the peer is
  | on the same network and you just dropped all routes to it by killing
  | the interface ? The connection would be then be dropped rapidly.

No it wouldn't.  Or shouldn't.   The assumption was an idle connection
remember?   Nothing will be being sent, no-one will ever notice that the
route is missing.   Once the interface is back (once an interface is back)
and we have the address again, the connection ought to go back to working
state.    TCP connections are supposed to be able to handle all kinds of
outages - that's one of the design goals (and why having keepalive on
by default was not a good idea).

  | I can understand the need to handle 'delete-then-add' szenarios gracefully,
  | but treating the local computer as more volatile than the network inbetween,
  | so that the _network code_ has to survive even hardware reconfigurations
  | is a bit far fetched.

No, not at all.   This would be like claiming that filesystems shouldn't
be able to survive hardware reconfigurations.   It is true that many of
them can't - but things like raid can.  All this is the reason for using
layering (filesystems & drives, network stacks & interfaces) - so
that some degree of insulation is possible.    Making it work perfectly
in all cases isn't easy - but that's no reason to abandon cases where
it does work (even if just by chance).

  | If you want to survive hardware reconfigurations then you can use an
  | abstraction, e.g. a tunnel interface.

Sometimes that helps, sometimes not.   If we want to be able to keep
connections running through address changes, then clearly we need some
different connection identifier - but as connection identifiers are
addresses, and addresses are always assigned to interfaces, using some
new interface, which can hold non routing related addresses, helps.

But this only works if the other end (or some helpful intermediary that
has an address that won't change) is willing to use the tunnel, you cannot
simply create a tunnel interface, assign it an address, and use it to
contact random other systems around the internet, the tunneled packets
just don't "look right".

kre