Subject: Re: Data alignment
To: Craig Metz <cmetz@inner.net>
From: Dennis Ferguson <dennis@jnx.com>
List: tech-net
Date: 05/21/1996 17:38:46
> 	I noticed that NetBSD (-current May 11) doesn't seem to be as careful
> as 4.4 about aligning header data.
> 
> 	For loopback, with no MAC header, everything's word aligned and happy.
> 
> 	But on Ethernet, NetBSD passes ipintr() a mbuf where mt_data is
> aligned to a two-byte boundary but NOT to a four-byte boundary as it is in
> straight 4.4. Is this a known and/or fixed problem? Unless I'm missing
> something huge, on x86 it would give better total performance to copy the
> headers to an aligned buffer, and, on several RISCy systems, it'll panic.

The basic problem here is that ethernet has a MAC header which, at 14 octets,
is not a multiple of almost anyone's word size.  You have to deal with this
somehow.

The ideal way to deal with this is to have the hardware write the packet
starting at an odd-numbered two-byte-aligned address, such that the IP
header ends up aligned on a word boundary when the MAC header is stripped
off.  This, of course, requires that the hardware be capable of writing
the packet to a non-word-aligned buffer, or at least of inserting a
couple of pad bytes itself.  I assume NetBSD drivers always do this when
the device hardware is capable; if not it is a driver bug and needs to be
fixed.

This leaves the problem of what to do with a device which is incapable
of writing a packet to a non-word-aligned buffer (the DEC 21?40, otherwise
an admirable chip, has this problem).  Here you've got (at best) two
choices: either you leave the packet in the buffer the way the device
put it in there, and do unaligned word accesses, or you copy the packet
header into aligned space, often allocating a new mbuf to do it.

Now in the case where you have a processor which can do unaligned
multibyte accesses I would suggest that, contrary to your assertion
above, you will almost always be better off by leaving the packet in the
buffer the way it is.  It in principle only takes a couple of 32-bit-wide
accesses to process the IP header, and only a couple of more to deal with
TCP, and doing this unaligned will almost always cost much less than finding
a new mbuf and copying data.  Only in the case where the device is incapable
of writing the packet correctly, and the processor is incapable of doing
unaligned accesses, should you need to resort to copying.  I assume drivers
for unhappy devices like this do the copy when they need to (the DEC
driver copies if it is running on an alpha, leaves the packet alone when
running on an x86).

If you can find a driver which doesn't take advantage of the ability of
the hardware to write the packet on a non-word-aligned boundary, or a driver
which leaves packets unaligned on a processor which objects to this, it is
a bug.  Otherwise the lack of alignment is a feature.

Dennis Ferguson