Subject: Problems with outgoing routing of UDP packets
To: None <tech-net@netbsd.org>
From: Tom Ivar Helbekkmo <tih@eunetnorge.no>
List: tech-net
Date: 04/21/2004 10:21:37
I am trying to get my laptop, and IBM X31 with ethernet and wireless,
to work properly in an automated roaming mode, to get seamless
transfers between wired and wireless network access, at work and at
home. My laptop runs current, as per April 20th. To handle the
address and routing changes, I use dhclient, coupled with ifwatchd and
apmd scripts to remove the default route and trigger dhclient
renegotiation whenever the carrier status on the ethernet changes, or
the laptop resumes from a suspended state. For most of what I need,
this now works well. The exception is UDP, as used by the Coda file
system client software, venus. Venus must not be restarted when the
address and routing table changes occur.
I'm seeing a problem with the transmission of UDP packets, where the
expected failover from the use of a no longer available interface to
another one does not happen. Specifically, when sending packets from
a socket not locally bound to an address, the transmission will flip
back and forth between two candidate interfaces if the routing table
changes so as to indicate it -- but only as long as the sending
machine's address on the interface currently used for transmitting
remains valid. If you ifconfig away that address while the interface
is being used, it is no longer possible for the socket to continue to
transmit UDP packets, even if the routing table supplies a good route
to the target system over another interface.
To analyze the problem, I run the following little program (under
ktruss, so as to see the error returns from sendto()):
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
struct sockaddr_in sinme;
struct sockaddr_in sinhim;
struct hostent *addr;
in_addr_t addr_tmp;
char buf[1024];
int fd;
int main(int argc, char **argv) {
bzero((char *)&sinhim, sizeof(sinhim));
addr=gethostbyname(argv[1]);
sinhim.sin_family = addr->h_addrtype;
bcopy(addr->h_addr, (char *)&addr_tmp, addr->h_length);
sinhim.sin_addr.s_addr = addr_tmp;
sinhim.sin_port = htons(9);
bzero((char *)&sinme, sizeof(sinme));
sinme.sin_family = AF_INET;
fd = socket(AF_INET, SOCK_DGRAM, 0);
bind(fd, (struct sockaddr *)&sinme, sizeof(sinme));
bzero(buf, sizeof(buf));
strcpy(buf, "Jokum er en fisk");
while (1) {
sendto(fd, buf, 512, 0, (struct sockaddr *)&sinhim, sizeof(sinhim));
sleep(1);
}
}
I start with the following interfaces configured, and both dhclient
and my ifwatchd and apmd scripts disabled, so as not to get any
unexpected interference from them:
wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0
address: 00:0d:60:80:27:11
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 193.71.2.59 netmask 0xffffff00 broadcast 193.71.2.255
ath0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ssid eunet
powersave off
bssid 00:06:b1:41:03:83 chan 1
address: 00:05:4e:44:93:c4
media: IEEE802.11 autoselect (DS11)
status: active
inet 10.13.65.16 netmask 0xffffff00 broadcast 10.13.65.255
lo0: flags=8009<UP,LOOPBACK,MULTICAST> mtu 33196
inet 127.0.0.1 netmask 0xff000000
The routing table looks like this:
Destination Gateway Flags Refs Use Mtu Interface
default 193.71.2.1 UGS 2 35 - wm0
10.13.65/24 link#2 UC 0 0 - ath0
10.13.65.16 127.0.0.1 UGHS 0 0 33196 lo0
127/8 127.0.0.1 UGRS 0 0 33196 lo0
127.0.0.1 127.0.0.1 UH 4 254 33196 lo0
193.71.2/24 link#1 UC 2 0 - wm0
193.71.2.1 00:06:b1:0c:0d:2b UHLc 1 0 - wm0
193.71.2.52 00:90:27:1b:12:94 UHLc 2 127 - wm0
193.71.2.59 127.0.0.1 UGHS 0 0 33196 lo0
Here's a sequence of experiments, showing the commands being given,
and their effect on the transmission of UDP packets by the test
program, which is left running (the test program is on a different
console than the other commands, of course):
dejah# ktruss ./test 193.71.2.52
[sending OK over wm0 from 193.71.2.59 to 193.71.2.52]
dejah# ifconfig wm0 down
[gets ENETDOWN from sendto]
dejah# ifconfig wm0 up
[sending OK as before]
dejah# ifconfig wm0 inet 193.71.2.59 delete
[gets EADDRNOTAVAIL from sendto!]
dejah# ifconfig wm0 inet 193.71.2.59
[sending OK again]
dejah# ifconfig wm0 inet 193.71.2.59 delete
dejah# route delete default
dejah# route add default 10.13.65.1
[gets EADDRNOTAVAIL from sendto!]
dejah# ifconfig -a
wm0: flags=8b43<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0
address: 00:0d:60:80:27:11
media: Ethernet autoselect (100baseTX full-duplex)
status: active
ath0: flags=8943<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ssid eunet
powersave off
bssid 00:06:b1:41:03:83 chan 1
address: 00:05:4e:44:93:c4
media: IEEE802.11 autoselect (DS11)
status: active
inet 10.13.65.16 netmask 0xffffff00 broadcast 10.13.65.255
lo0: flags=8009<UP,LOOPBACK,MULTICAST> mtu 33196
inet 127.0.0.1 netmask 0xff000000
dejah# netstat -rn -f inet
Destination Gateway Flags Refs Use Mtu Interface
default 10.13.65.1 UGS 1 6 - ath0
10.13.65/24 link#2 UC 1 0 - ath0
10.13.65.1 00:06:b1:41:03:83 UHLc 1 0 - ath0
10.13.65.16 127.0.0.1 UGHS 0 0 33196 lo0
127/8 127.0.0.1 UGRS 0 0 33196 lo0
127.0.0.1 127.0.0.1 UH 4 20 33196 lo0
193.71.2.59 127.0.0.1 UGHS 0 0 33196 lo0
[stuck with EADDRNOTAVAIL from sendto!]
This situation is plainly wrong. The sendto() should be sending the
data out over ath0, since the routing table clearly shows that as the
only correct option. (Interestingly, EADDRNOTAVAIL is not listed in
the man page as a possible return from sendto().) This smells of
routing information being improperly cached by the socket itself.
Now, in this situation, I abort the ongoing test run, and start a new
one. As expected, it sends properly over the wireless link:
^C
dejah# ktruss ./test 193.71.2.52
[sending OK over ath0 from 10.13.65.16 to 193.71.2.52]
dejah# ifconfig wm0 inet 193.71.2.59 netmask 255.255.255.0
dejah# netstat -rn -f inet
Destination Gateway Flags Refs Use Mtu Interface
default 10.13.65.1 UGS 2 105 - ath0
10.13.65/24 link#2 UC 1 0 - ath0
10.13.65.1 00:06:b1:41:03:83 UHLc 1 0 - ath0
10.13.65.16 127.0.0.1 UGHS 0 0 33196 lo0
127/8 127.0.0.1 UGRS 0 0 33196 lo0
127.0.0.1 127.0.0.1 UH 4 20 33196 lo0
193.71.2/24 link#1 UC 1 0 - wm0
193.71.2.52 00:90:27:1b:12:94 UHLc 1 17 - wm0
193.71.2.59 127.0.0.1 UGHS 1 315 33196 lo0
[still sending OK over ath0]
dejah# ifconfig ath0 inet 10.13.65.16 delete
[gets EADDRNOTAVAIL from sendto!]
dejah# netstat -rn -f inet
Destination Gateway Flags Refs Use Mtu Interface
default 10.13.65.1 UGS 2 231 - ath0
10.13.65.16 127.0.0.1 UGHS 0 0 33196 lo0
127/8 127.0.0.1 UGRS 0 0 33196 lo0
127.0.0.1 127.0.0.1 UH 4 20 33196 lo0
193.71.2/24 link#1 UC 1 0 - wm0
193.71.2.52 00:90:27:1b:12:94 UHLc 1 29 - wm0
193.71.2.59 127.0.0.1 UGHS 1 315 33196 lo0
dejah# route delete default
dejah# route add default 193.71.2.1
[still stuck with EADDRNOTAVAIL from sendto!]
dejah# ifconfig ath0 inet 10.13.65.16
[now sends OK over wm0!]
dejah# ifconfig ath0 inet 10.13.65.16 delete
[continues to send OK over wm0!]
Note, in particular, this last sequence of events. Deleting the
sending address pulls the rug out from under the UDP socket, and it
fails to use the other interface, even after explicitly changing the
default route (which shouldn't be needed, since a network route, and a
host route, are already in place). Reinstating the address, with the
new default route in place, causes it to flip over to the "right"
interface, and the old address may then be deleted with no ill effect.
I'm also wondering why, when an explicit direct connection to the
network with the target host came online, the socket didn't start
using the more specific route. This reinforces the assumption that
there is routing information cached in the socket itself, that does
not get updated properly in all situtations.
I've later run another test, where I send the packets to a remote
destination. With both interfaces active, I can change the default
route back and forth, and watch the UDP socket flip along with it.
The moment I delete the address that's currently being used, the
socket will hang, getting EADDRNOTAVAIL -- and it's stuck there.
Thus, the problem is not tied to the situation where one of the
interfaces shares a network segment with the target.
-tih
--
Tom Ivar Helbekkmo, Senior System Administrator, EUnet Norway
www.eunet.no T: +47-22092958 M: +47-93013940 F: +47-22092901