Subject: Re: Some observations about DMA segment count
To: None <thorpej@zembu.com>
From: Darren Reed <darrenr@reed.wattle.id.au>
List: tech-net
Date: 07/25/2001 19:38:05
In some email I received from Jason R Thorpe, sie wrote:
> swinger:thorpej 19$ vmstat -e
> event total rate type
> stge1 txdmaintr 20504 30 intr
> stge1 rxintr 282943 422 intr
> stge1 txseg1 48384 72 misc
> stge1 txseg2 279768 417 misc
> stge1 txseg3 8 0 misc
> stge1 rxipsum 283442 423 misc
> stge1 rxtcpsum 283414 423 misc
> stge1 txipsum 234827 350 misc
> stge1 txtcpsum 234826 350 misc
> swinger:thorpej 20$
>
> During my writing of the "stge" driver, I discovered that the vast majority
> of our TCP packets that are sent out are comprised of 2 DMA segments.
>
> This is probably because tcp_output() allocates an mbuf cluster for the
> data, but the link header has to be prepended .. the prepend allocates
> a new mbuf and sticks it in front because M_EXT is set (thus it is assumed
> that it is not safe to scribble into the m->m_data area).
>
> Thoughts on what to do about this? There should certainly be room in front
> of the packet for the link header, and we should try to avoid allocating
> the extra mbuf.
Some random thoughts from me on different things to consider here...
How could fastpath help out here?
Also, don't we have this in tcpcb:
struct mbuf *t_template; /* skeletal packet for transmit */
Why can't that include IP + TCP + link layer headers ?
It should need a flag to indicate that link layer headers were already
present ? Only problem with this is flag space available in m_flags...
M_ARPDONE for 0x8000, anyone ?
There's something else to add to this too - for the most common case,
ethernet, it's 14 bytes long so it needs a 2 byte padding at the front
to ensure the IPv4 header is aligned. I would imagine something that
is similar to this should be happening for IP input packets too...or are
drivers adding a 2 byte padding before the ethernet header? Maybe we
can use 0x8000 to say "2 byte padding - use for extra flags" ?
Something I've noticed some Solaris drivers doing, now, is telling IP to
include "other" information (not ARP) prepended to the IP packet. From
my observation, this appears to be information that is useful for the
driver, internally. That the only place I've seen this is in ATM drivers
I don't consider a coincidence, but maybe we should think of how we could
benefit from being able to associate some "data" that the network driver
has with a flow (or a route) and how we can signal that driver whether or
not it is there. An example of an indirect use might be IPSec providing
a template ESP/AH header, etc.
Darren