tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: refactoring ip_output() and the L2 _output()



Mouse <mouse%Rodents-Montreal.ORG@localhost> writes:

> Well, pushing the ARP table out of the v4 routing table would mean
> pushing it into somewhere else instead.  Depending on how complex that
> "somewhere else" turns out to be, this may not be a net win.
>
> My guess would be that it would be a win.  But that's just a guess.

I don't think it would be an improvement - you'd need a separate table,
and you'd need to do two lookups, one to find the cloning route, and
then to find the arp entry.  Right now the arp entry, if present, is
found directly.

>> Insofar as ARP installs routing-table entries using a different key
>> than a "standard" IPv4 address and mask, if that is what you're
>> saying that it does, that seems too complicated.
>
> Yes, that's what I think is going on.  There are 8 bytes of information
> in each IPv4 routing table entry that are ignored by almost everything
> that prints routes as routes.  They are the three fields struct
> sockaddr_inarp has that overlay sin_zero in struct sockaddr_in, and, as
> far as I can recall, only ARP code ever uses them.  However, other
> interfaces accept them, since the routing table code is (perhaps
> excessively) general and doesn't know they're special; this is why
> sin_zero must exist and be zeroed by userland: if not, you wind up with
> routes and/or addresses with garbage in the hidden 8 bytes, which don't
> match when they should (because of the garbage) but are hard to detect
> (because most tools don't print those bytes).

The lookup code does not understand address formats; that's a feature so
that it works with various protocols.

It sounds like what's broken is that there is not a documented invariant
that all regular v4 entries in the routing table have zeros in the
unused bytes, lack of asserting that if DIAGNOSTIC, and lack of zeroing
it on the way in.  (And lack of user-space routing tools not printing a
warning if not zero.)  This seems like it should be easy to fix.  Fixing
that seems far less likely to introduce bad behavior than a grand
reorganization.

> As I wrote in the change
> I made in my private patch tree when I looked into this,
>
>  * That interfaces like bind(2) or routing socket messages pay
>  *  attention to what userland has in sin_zero is the real bug here.
>  *  (Well, that plus the abuse of the AF_INET routing table to hold
>  *  extra hidden data for structs sockaddr_inarp.)

Agreed, 90%.  The kernel should probably zero sin_zero when needed,
unless the syscall interface, which means posix, perhaps, documents that
they must be zero.

Attachment: pgpPhWmy55qfs.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index