Subject: Re: CVS commit: src/sys/netinet
To: NetBSD-network <tech-net@NetBSD.org>
From: Matt Thomas <matt@3am-software.com>
List: tech-net
Date: 03/11/2005 13:14:42
At 12:26 PM 3/11/2005, Jonathan Stone wrote:

>In message <6.1.2.0.2.20050311114255.08ca6360@3am-software.com>,
>Matt Thomas writes:
>
>
> >>I think its harder than that: the two space domains (sosend bufs, vs.
> >>packing into Tx descriptors, with hardware-dependent constraints on
> >>length, aligment, etc.) are incommensurate. Only the driver can realy
> >>tell for sure a given mbuf chain can be sent via large-send.
> >
> >If the driver can't large send it, it must perform enough segmentation
> >so that it can be sent.  If that mean until it's in individual segments,
> >so be it (it can detect that and turn off TSO).
>
>Is there documentation for this API? I pinged Allen privately about
>certain private TCP large-send code about a week ago. Is this the same
>code, or a derivative of it?

Just as much as for the other hardware-assisted APIs. :)
This is completely independent of what Allen was doing [hadn't
heard about that until now.]

> >>Also if (dim, unpleasant) memory serves, the way new bge devices
> >>(5705, and later PCI-e "server" nics?)  support large-send is atrocious.
> >>The chip requires the driver to put a descriptor for the entire
> >>``large send' into on-chip registers. As far as I could see, bge chips
> >>can only support one outstanding large-send transaction: if you want
> >>to send more, you have to wait for the current large-send to complete,
> >>then field an interrupt.  Thus, to avoid reordering large-send TCP
> >>output with normal non-large send TCP output (TCP reordering being a
> >>Bad Thing), you basically have to stall the entire Tx queue.
> >>(I dunno about the 5706, though).
> >
> >"A descriptor"?  No s/g?
>
>Like I said, dim memory.  IIRC, the 5705 has a 64-bit onchip register
>which contains large-send state. The actual packet data is in Tx-ring
>descriptors, but (as there's only one on-chip register), I concluded
>you can only enqueue a single large-send segment.
>
>The Linux tg3 driver has TSO support for the 5750. I will check there.
>
>
> >You don't have to stall the entire TX queue,
> >just TCP traffic (and probably only that traffic that would advance the
> >sequence number). :)
>
>Nope. Discounting legacy protocols like IPv6, you need to not reuse
>the same ip_id on other traffic to the remote host, or a downstream
>router could fragment multiple packets with the same ip_id, which
>(combined with drop) could yield incorrect reassembly at the remote end.

actually, isn't it the tuple of {src ip, dst ip, protocol, id}?
I've often thought of adding a 'ipidflow' structure so that the
ip_id updating could be more disjoint.  (The logical place for them
would be off the inpcb and use a hash on {dstip,protocol} to
do the lookups off the in_ifaddr.

>Peeking into the layer-3 payload of every enqueued outbound frame,
>looking for IP traffic that might bump the ip_id on this flow?
>Eeew, *yuck*....

Didn't say it was clean.


-- 
Matt Thomas                     email: matt@3am-software.com
3am Software Foundry              www: http://3am-software.com/bio/matt/
Cupertino, CA              disclaimer: I avow all knowledge of this message.