[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD 5.1 TCP performance issue (lots of ACK)
On 22 Nov, 2011, at 06:32 , Manuel Bouyer wrote:
> On Tue, Nov 22, 2011 at 12:41:13AM -0600, David Young wrote:
>> On Mon, Nov 21, 2011 at 10:11:17PM +0100, Manuel Bouyer wrote:
>>> On Mon, Nov 21, 2011 at 02:58:57PM -0600, Greg Oster wrote:
>>>> Hi Manuel
>>>> Have there been issues with these patches that prevent them from being
>>>> applied to -current and/or pulled up?
>>> Nothing wrong AFAIK, I just got distracted.
>> In the discussion of the patches, people seem to disagree how the
>> patches work to improve the performance as they do, whether the patches
>> are portable, and whether or not the whole patch is necessary or just
>> the bus_dmamap_sync() part of it. I hope our understanding improves
>> before there is a commit. :-/
> There are 2 parts: changes to the interrupt setup, and the reading of
> receive descriptors.
> I didn't see peoples having issues with the interrupt setup.
> As for the discussion if a x86 CPU will reorder reads, I'm sure they do:
> I've had troubles in Xen front/back driver because of this
> (and there is explicit lfence() in the linux Xen drivers, because of this).
Needless to say, the last bit would be entirely inconsistent with section
7.2 of any version of the "Intel 64 and IA-32 Architectures Software Developer’s
Manual, Volume 3A: System Programming Guide, Part 1" published more recently
than 2007. I won't repeat what it says here, but it is rather unambiguous
about the fact that newer reads (in program order) are always done after older
reads, at least in the basic instruction set.
Of course if that is always true then it would also imply that an lfence
is useless, because the only thing an lfence instruction does would seem to be
guaranteed even if an lfence instruction isn't there. Yet the lfence
does exist, which made me wonder what it is used for?
After looking for an answer to that it turns out that while read ordering is
guaranteed for loads done using the basic x86 instruction set, it is not
with respect to loads done by certain SSE instructions. The lfence can be
if the compiler is generating SSE instructions, and if we now have a complier
is more aggressive about finding SSE instructions to generate it is possible
there will be code which once worked fine without memory barriers which now
them. Maybe this is an instance of that?
Main Index |
Thread Index |