Subject: Re: nmap not working?
To: Tom Spindler <dogcow@babymeat.com>
From: Greg Troxel <gdt@ir.bbn.com>
List: current-users
Date: 11/14/2007 11:13:47
Please note that I had nothing to do with adding struct sockaddr_storage
to ifreq.  (I think whether or not that was wise is a tough call; there
are good arguments on both sides.)  I have, however, been fixing
problems resulting from that change, including latent bugs that never
caused problems before.

There are two roots of the SIOCGIFCONF problem:

1) The original definition of the API in 4.4BSD (I think that's where it
started) is sloppy.  When a sockaddr fits in ifreq, the next ifreq
follows.  When it doesn't, the next one starts after the actual
sockaddr's end, determined by sa_len.  The confusion is whether "fits"
means sa_len <= sizeof(struct sockaddr), which used to be the "big"
member of the union, or <= sizeof(ifreq.ifr_ifru), which is really the
size of the union.

The interpretation that I called "crazy" was switching on sizeof(struct
sockaddr), which leads to the case of AF_INET addresses using the full
ifreq (16 name and 128 ifr_ifru), but AF_INET6 using 16 name and sa_len
(26 IIRC) bytes of structure, with the next ifreq coming *before* "(void
*)ifr + sizeof(struct ifreq)" relative to the last one.  I still think
this is a nonsensical rule.  The "sane" interpretation is to decide that
"too big" means "sockaddr doesn't fit in the union".  No one has argued
that my judgement in this case is wrong, but I'm willing to listen.

2) Perhaps because this is apparently a hard API to use correctly (by
observation of broken code), many users don't do it correctly.  I've
seen comparing to sizeof(struct sockaddr) to see whether to use sa_len,
and I've seen just assuming the next ifreq follows as in an array.
Comparing to struct sockaddr used to work, because that was equal to
sizeof(ifr.ifr_ifru), and now fails.  Assuming an array used to be wrong
and result in incorrect behavior, and still is wrong, but now works,
because there are no sockaddrs with sa_len > sizeof(struct
sockaddr_storage).

> On 30 aug 2007, gdt posts to tech-net, <rmiveax9g62.fsf@fnord.ir.bbn.com>:
>
>   racoon doesn't work in current because SIOCGIFCONF is broken. (I know
>   racoon should probably not use SIOCGIFCONF, but instead getifaddrs.  I
>   won't argue, but SIOCGIFCONF should work in any case.)
>
> More analysis of how the ioctl is purportedly broken is in that message.
> However, the root cause of such breakage seems to come from the addition
> of some fields to struct ifreq in late May/early June. AFAICT, no other
> programs using SIOCGIFCONF broke due to those changes.

The code after the addition of sockaddr_storage was truly broken, keying
off the wrong length field to test for where the next ifreq should be put.

It might be that nothing else broke, but I think that's multiple bugs
masking each other.

> On 11 Sep 2007, gdt checks in -r1.200 of src/sys/net/if.c with his fix(es).
>
> On 12 Sep 2007, Patrick Welch mails current-users about dhcpd not working.
> (viz <20070912144453.GA1070@quartz.itdept.newn.cam.ac.uk>)

dhcpd was using struct sockaddr instead of sizeof(ifr.ifr_ifru).
This was a one-line change, plus 17 lines explaining what's going on.

-		if (ifp -> ifr_addr.sa_len > sizeof (struct sockaddr)) {
+		if (ifp -> ifr_addr.sa_len > sizeof (ifp->ifr_ifru)) {

> On 13 Sep 2007, gdt checks in a fix to dhcp's discover.c
>
> On 31 Sep 2007, in <rmi3avrmoxo.fsf@fnord.ir.bbn.com>, gdt analyzes
> even more problems in dhcpd related to SIOCGIFCONF.
>
> On 31 Oct 31, gdt checks in yet another fix to dhcp's discover.c

This was working around another latent bug in the dhcp code that has
presumably been there for years.  SIOCGIFCONF is not guaranteed to
return all the data if the user's buffer is too small, and the only way
you can tell if you got it all if there's still room for a
maximally-sized struct ifreq.  With sockaddr_storage, fewer addresses
fit in the amount of space that was previously used, and machineswith
lots of (usually virtual) interfaces don't get all of them returned.

This is another argument for code not using this ioctl.

> On 8 Nov 2007, smb reports that nmap ain't working in -current (viz
> <20071108140235.4653bf54@berkshire.machshav.com>).
>
> On the 9th, gdt says that nmap is also doing things wrong.

It was comparing to sizeof(struct sockaddr).

> On the 10th, samba is also declared wrong.

The samba code is just plain wrong, for all versions of the ioctl that
have ever existed.  I do not believe that it would ever work on NetBSD
before the sockaddr_storage change.

> On the 11th, it is opined that
> all extant in-tree programs should be fixed.

Sure, and one way to fix is to stop using this ioctl and write
code to use getifaddrs.

> Now, personally, I don't care one way or the other about what the
> "proper" or "sane" interface for SIOCGIFCONF is. (I would note that
> the lignuces don't seem to return IPv6 addresses at all with
> SIOCGIFCONF, thus providing an interesting slant on the sanity-or-not
> of the ioctl; cf http://oss.sgi.com/archives/netdev/2000-02/msg00124.html )

Wow, that's really broken - thanks for pointing me to it.

> However, I do care about random pieces of software breaking - and don't
> really like the idea of NetBSD declaring that it has the One True Way
> of invoking a given ioctl and that everybody else should adapt.

Given the addition of sockaddr_storage, the rest of the choices are more
or less forced.  Code that compares sa_len to sizeof(ifr.ifr_ifru)
should work on all systems, and the original API should have made it
clear that one is supposed to compare to ifr_ifru.  Or made it clear
that one should compared to sizeof(struct sockaddr), in which case we
probably should not have added sockaddr_storage.

> I thus
> don't think my original point is unreasonable: dhcpd, nmap, and samba -
> hardly fringe pieces of software - failed to work properly after the
> ioctl change, and repeated kludges have been committed to restore
> functionality. 

The changes aren't kludges; they are writing the code correctly the way
it should have been written in the first place.

> So, I reiterate my original questions: isn't there far more software
> using SIOCGIFCONF "incorrectly" than correctly?

We don't know because the code that uses it correctly doesn't show up.
But I don't think it matters.  It's clear that this ioctl is hard to
use, and the Linux people made it do something different than on 4.4BSD,
which is further trouble.

> And if this is the case, shouldn't racoon be fixed instead?

Certainly racoon should be fixed to use getifaddrs.  I have limited
cycles, particulaly at the moment, and I've tried to fix things that are
broken in the simplest and easiest way, and explain to others the
issues.  I fixed net/if.c because it broke racoon and that caused me
grief.  Following that, I've felt a bit of a sense of responsibility to
help people having trouble in other programs.

If you're arguing that no effort should be put into fixing existing code
that uses SIOCGIFCONF, then I disagree, but that's mostly history.

You or anyone else changing programs that now use SIOCGIFCONF so that
they use getifaddrs would be welcome progress.  I would do that given
infinite spare time.  Besides the problems with NetBSD's inclusion of
sockaddr_storage (which requires a 1-line fix in code that was
previously almost correct), the behavior of Linux of not returning
AF_INET6 sockaddrs is a stronger argument to change programs to use
getifaddrs.

I think the TODO list is:

  racoon:
    works now
    regenerate config.h or whatever to use getifaddrs

  dhcp:
    works now
    move to 4

  nmap:
    add patch to use ifr_ifru
    change to getifaddrs

  samba:
    add patch to rewrite loop to standard form to find next ifreq
    change to getifaddrs

maybe:
  kernel/libc:
    add option to log first use of SIOCGIFCONF by a process

I'm not aware of anything other programs having trouble.