tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: refactoring ip_output() and the L2 _output()

>> Well, pushing the ARP table out of the v4 routing table would mean
>> pushing it into somewhere else instead.  Depending on how complex
>> that "somewhere else" turns out to be, this may not be a net win.

>> My guess would be that it would be a win.  But that's just a guess.

> I don't think it would be an improvement - you'd need a separate
> table, and you'd need to do two lookups, one to find the cloning
> route, and then to find the arp entry.  Right now the arp entry, if
> present, is found directly.

It's been long enough I'm not sure, but I don't think so.  I _think_
the ARP routes are found only by ARP-aware code - for IP-layer lookups,
the non-ARP route is more general, in the radix-tree sense, so it's the
one that's found.

>> However, other interfaces accept [the sin_zero bytes], since the
>> routing table code is (perhaps excessively) general and doesn't know
>> they're special;
> The lookup code does not understand address formats; that's a feature
> so that it works with various protocols.

Sure.  But (ab)using it this way, by sticking an extra 8 bytes on the
end and then counting on all of userland to know that it has to zero
them on input and ignore them on output?  Oh, except for the ARP code.
It's just pushing the ugliness around.

Using the radix-tree code for the ARP table might even be a good idea.
But not the same table that's used for IP-layer IPv4 routing!

...well, that's my opinion. :-)

> It sounds like what's broken is that there is not a documented
> invariant that all regular v4 entries in the routing table have zeros
> in the unused bytes, lack of asserting that if DIAGNOSTIC, and lack
> of zeroing it on the way in.

If you want to keep ARP entries in the same table as IPv4 routes, yes,
that's approximately what I think it would mean to do it right.  You'd
need to be careful you don't break the ARP-aware paths, though.

> Agreed, 90%.  The kernel should probably zero sin_zero when needed,
> unless the syscall interface, which means posix, perhaps, documents
> that they must be zero.

Even then, I think it should.  POSIX is more about political
compromises, in many cases, than it is about good interface
engineering.  Having eight bytes of magic data which is barely
mentioned in the manpage (inet(4) lists it in the struct elements but
doesn't mention it elsewhere, at least as of 5.1) but which must be
zero for proper operation - but, if it's not zero, leads to mysterious,
obscure, and silent failure modes?  Depending on accumulated cultural
lore for people to learn that it's MBZ?  That is _horrible_ interface
engineering.  I've been working with that struct for over twenty years
and still didn't really understand what the deal was with sin_zero
until I put a couple of days into trying to get rid of it and tracking
down the resulting weird failures.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML      
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Home | Main Index | Thread Index | Old Index