Subject: Re: Some observations about DMA segment count
To: None <thorpej@zembu.com>
From: Darren Reed <darrenr@reed.wattle.id.au>
List: tech-net
Date: 07/25/2001 19:38:05
In some email I received from Jason R Thorpe, sie wrote:
> swinger:thorpej 19$ vmstat -e
> event                               total     rate type
> stge1 txdmaintr                     20504       30 intr
> stge1 rxintr                       282943      422 intr
> stge1 txseg1                        48384       72 misc
> stge1 txseg2                       279768      417 misc
> stge1 txseg3                            8        0 misc
> stge1 rxipsum                      283442      423 misc
> stge1 rxtcpsum                     283414      423 misc
> stge1 txipsum                      234827      350 misc
> stge1 txtcpsum                     234826      350 misc
> swinger:thorpej 20$  
> 
> During my writing of the "stge" driver, I discovered that the vast majority
> of our TCP packets that are sent out are comprised of 2 DMA segments.
> 
> This is probably because tcp_output() allocates an mbuf cluster for the
> data, but the link header has to be prepended .. the prepend allocates
> a new mbuf and sticks it in front because M_EXT is set (thus it is assumed
> that it is not safe to scribble into the m->m_data area).
> 
> Thoughts on what to do about this?  There should certainly be room in front
> of the packet for the link header, and we should try to avoid allocating
> the extra mbuf.

Some random thoughts from me on different things to consider here...

How could fastpath help out here?

Also, don't we have this in tcpcb:

        struct  mbuf *t_template;       /* skeletal packet for transmit */

Why can't that include IP + TCP + link layer headers ?

It should need a flag to indicate that link layer headers were already
present ?  Only problem with this is flag space available in m_flags...
M_ARPDONE for 0x8000, anyone ?

There's something else to add to this too - for the most common case,
ethernet, it's 14 bytes long so it needs a 2 byte padding at the front
to ensure the IPv4 header is aligned.  I would imagine something that
is similar to this should be happening for IP input packets too...or are
drivers adding a 2 byte padding before the ethernet header?  Maybe we
can use 0x8000 to say "2 byte padding - use for extra flags" ?

Something I've noticed some Solaris drivers doing, now, is telling IP to
include "other" information (not ARP) prepended to the IP packet.  From
my observation, this appears to be information that is useful for the
driver, internally.  That the only place I've seen this is in ATM drivers
I don't consider a coincidence, but maybe we should think of how we could
benefit from being able to associate some "data" that the network driver
has with a flow (or a route) and how we can signal that driver whether or
not it is there.  An example of an indirect use might be IPSec providing
a template ESP/AH header, etc.

Darren