Subject: Re: Fw: Re: tcp connections lost on interface down
To: None <tech-net@netbsd.org>
From: Michael van Elst <mlelstv@serpens.de>
List: tech-net
Date: 08/17/2003 13:26:06
kre@munnari.OZ.AU (Robert Elz) writes:

>  | The peer will notice that your IP-address changed
>How?   There's no mechanism for this that I'm aware of.

It is trying to send and the write will time out with an error.
It won't know that the IP address changed, it just knows that
the peer is gone.

>So, you're suggesting that the case where what happened is that our
>address changed should be handled completely differently than the case
>where the network died for a while

Yes.

>- in both cases, we're idle, and
>either ignore it, or never notice and so have no option other than to
>ignore it.   The other end is attempting to send, fails, and abandons
>the connection (just the same, either way).

In the case of a reconfigured system we are surely aware of what
happened. Why ignore it ?

Why doesn't a writer ignore the timeouts ? Surely this could be
some intermediate disruption of the network and days or years
later the session could continue. Instead it handles temporary
disruptions gracefully (by using a timeout) but finally it
fails when it "assumes" that there is no recovery possible.

The difference is that, in case of a disrupted network, a writer
_does_ see the error while a reader cannot distinguish this case
from an idle sender. "Ignoring" the problem is simply a consequence
of not seeing the problem. You have to change the protocol (i.e.
enable keepalives) to see the problem.

Again, this is different from a local administative decision to
reconfigure the system. The information is there, it can be
delivered immediately, we don't even need timeouts to detect
the issue.

>  | You pop the wireless
>  | card from your laptop and wait 3 or 4 days until you re-insert the card.
>  | Would you expect the connection to stay if not only the IP address is
>  | gone but even the interface itself ?

>Yes.  Definitely.   [As Manuel says in a slightly later message than yours,
>NetBSD seems to have some "issues" with some of this at the minute, but
>eventually someone will get annoyed enough, and fix those.]

Well, I wouldn't. Instead I would expect to be told that the network
connection is no longer usable.

>  | What happens when the peer is
>  | on the same network and you just dropped all routes to it by killing
>  | the interface ? The connection would be then be dropped rapidly.

>No it wouldn't.  Or shouldn't.

Yet it does.

>  The assumption was an idle connection
>remember?   Nothing will be being sent, no-one will ever notice that the
>route is missing.   Once the interface is back (once an interface is back)
>and we have the address again, the connection ought to go back to working
>state.    TCP connections are supposed to be able to handle all kinds of
>outages - that's one of the design goals (and why having keepalive on
>by default was not a good idea).

TCP should handle all kinds of outages of the network. It surely shouldn't
survive the outage of the local host. The outage of the "local interface"
is something in between.


>  | I can understand the need to handle 'delete-then-add' szenarios gracefully,
>  | but treating the local computer as more volatile than the network inbetween,
>  | so that the _network code_ has to survive even hardware reconfigurations
>  | is a bit far fetched.

>No, not at all.   This would be like claiming that filesystems shouldn't
>be able to survive hardware reconfigurations.

Filesystems do not survive hardware reconfigurations.


> It is true that many of
>them can't - but things like raid can.

This is a correct point, but I belive on my side. The filesystem doesn't
know about the reconfiguration because it is separated from the underlying
disk subsystem. The RAID driver virtualizes the disk and hides the
reconfiguration.

To complete the analogy you'd need some kind of virtual network that
hides the reconfiguration of the underlying real network interfaces.
Which means something like a tunnel.

I know that current tunnel implementations are not complete to solve
all the problems that arise with virtualizing network connections.


>But this only works if the other end (or some helpful intermediary that
>has an address that won't change) is willing to use the tunnel, you cannot
>simply create a tunnel interface, assign it an address, and use it to
>contact random other systems around the internet, the tunneled packets
>just don't "look right".

That's why the RAID driver and the disk driver talk through a common
interface.

The same could be done for network interfaces. I.e. create a virtual
(RAID-like) interface talks to the network peers but which uses some
hidden mechanism to utilize to a real interface. It would either provide
redundancy (e.g. fail-over) or provide error-recovery (i.e. hide
the drop/re-insert events of your wifi-card and also hides the subsequent
errors to TCP connections that would be caused by the hardware reconfiguration).

Hiding the errors on the lower level is the wrong idea, just like
hiding disk errors in the disk driver is the wrong idea. You do
get an I/O-error if a disk is removed from a disk array if you talk
to an individual disk.


-- 
-- 
                                Michael van Elst
Internet: mlelstv@serpens.de
                                "A potential Snark may lurk in every tree."