Subject: Re: CVS commit: src/sys/netinet
To: Matt Thomas <matt@3am-software.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 03/11/2005 12:26:29
In message <6.1.2.0.2.20050311114255.08ca6360@3am-software.com>,
Matt Thomas writes:
>>I think its harder than that: the two space domains (sosend bufs, vs.
>>packing into Tx descriptors, with hardware-dependent constraints on
>>length, aligment, etc.) are incommensurate. Only the driver can realy
>>tell for sure a given mbuf chain can be sent via large-send.
>
>If the driver can't large send it, it must perform enough segmentation
>so that it can be sent. If that mean until it's in individual segments,
>so be it (it can detect that and turn off TSO).
Is there documentation for this API? I pinged Allen privately about
certain private TCP large-send code about a week ago. Is this the same
code, or a derivative of it?
>>Also if (dim, unpleasant) memory serves, the way new bge devices
>>(5705, and later PCI-e "server" nics?) support large-send is atrocious.
>>The chip requires the driver to put a descriptor for the entire
>>``large send' into on-chip registers. As far as I could see, bge chips
>>can only support one outstanding large-send transaction: if you want
>>to send more, you have to wait for the current large-send to complete,
>>then field an interrupt. Thus, to avoid reordering large-send TCP
>>output with normal non-large send TCP output (TCP reordering being a
>>Bad Thing), you basically have to stall the entire Tx queue.
>>(I dunno about the 5706, though).
>
>"A descriptor"? No s/g?
Like I said, dim memory. IIRC, the 5705 has a 64-bit onchip register
which contains large-send state. The actual packet data is in Tx-ring
descriptors, but (as there's only one on-chip register), I concluded
you can only enqueue a single large-send segment.
The Linux tg3 driver has TSO support for the 5750. I will check there.
>You don't have to stall the entire TX queue,
>just TCP traffic (and probably only that traffic that would advance the
>sequence number). :)
Nope. Discounting legacy protocols like IPv6, you need to not reuse
the same ip_id on other traffic to the remote host, or a downstream
router could fragment multiple packets with the same ip_id, which
(combined with drop) could yield incorrect reassembly at the remote end.
Peeking into the layer-3 payload of every enqueued outbound frame,
looking for IP traffic that might bump the ip_id on this flow?
Eeew, *yuck*....