enet(4) problem? (Was: NFS client performance problems)

To: tech-net%netbsd.org@localhost, tech-kern%netbsd.org@localhost
Subject: enet(4) problem? (Was: NFS client performance problems)
From: Valery Ushakov <uwe%stderr.spb.ru@localhost>
Date: Wed, 30 Dec 2020 20:06:35 +0300

TL;DR: looks like a problem in enet(4)

On Fri, Dec 25, 2020 at 02:20:13 +0300, Valery Ushakov wrote:

> I've stumbled into a weird performance problem with NFS client.  I
> have CompuLab's Utilite Pro (evbarm) machine running a very current
> -current.  It's connected to my Ubuntu laptop with Ethernet (direct
> cable link) and uses it as an NFS server to get e.g. pkgsrc tree and
> distfiles.  The performance is really bad, e.g. make extract of a
> package may take literal ages (I did a lot of the initial
> investigation for this mail while python distfile was being
> extracted).  Extracting 31M uncompressed bash distfile may take, from
> run to run:
> 
> real    2m21.110s
> user    0m0.635s
> sys     0m4.233s
> 
> or
> 
> real    4m52.010s
> user    0m0.769s
> sys     0m4.815s
> 
> or whatever.
> 
> Looking at the traffic with wireshark I can see a curious recurring
> pattern.  Looking at the time/sequence plot one immediately sees short
> bursts of activity separated by huge gaps of no traffic.
> 
> Here's one such instance abridged and wrapped for clarity.  Timestamps
> are shown as delta to the previous frame.  Window scale is 3 for the
> client (utilite) and 7 for the server (majava), i.e the server has
> plenty of window open.
> 
> > 413 00.000351 IP utilite.1011 > majava.nfs: Flags [.],
>       seq 177121:178569, ack 79601, win 3225,
>           options [nop,nop,TS val 111 ecr 1941833655],
>           length 1448:
>           NFS request xid 1992059772 1444 write fh ... 5406 (5406) bytes
>               @ 16384
> > 414 00.000048 IP utilite.1011 > majava.nfs: Flags [.],
>           seq 178569:180017, ack 79601, win 3225,
>           options [nop,nop,TS val 111 ecr 1941833655],
>           length 1448
>   415 00.000009 IP majava.nfs > utilite.1011: Flags [.],
>           ack 180017, win 1834,
>           options [nop,nop,TS val 1941833656 ecr 111],
>           length 0
> > 416 00.000051 IP utilite.1011 > majava.nfs: Flags [.],
>           seq 180017:181465, ack 79601, win 3225,
>           options [nop,nop,TS val 111 ecr 1941833655],
>           length 1448
>   417 00.043745 IP majava.nfs > utilite.1011: Flags [.],
>           ack 181465, win 1834,
>           options [nop,nop,TS val 1941833700 ecr 111],
>           length 0
> > 418 00.994813 IP utilite.1011 > majava.nfs: Flags [P.],
>           seq 181465:182645, ack 79601, win 3225,
>           options [nop,nop,TS val 111 ecr 1941833655],
>           length 1180
>   419 00.000032 IP majava.nfs > utilite.1011: Flags [.],
>           ack 182645, win 1834,
>           options [nop,nop,TS val 1941834694 ecr 111],
>           length 0
> ! 420 00.000007 IP utilite.1011 > majava.nfs: Flags [P.],
>           seq 181465:182645, ack 79601, win 3225,
>           options [nop,nop,TS val 113 ecr 1941833700],
>           length 1180
>   421 00.000009 IP majava.nfs > utilite.1011: Flags [.],
>           ack 182645, win 1834,
>           options [nop,nop,TS val 1941834694 ecr 113,nop,nop,
>                    sack 1 {181465:182645}],
>           length 0
> 
> Here frames 413, 414, 416 and 418 comprise single NFS write request.
> All, but the last segments are sent very fast.  Then note that the
> last segment (418) is sent after a 1 second delay, and then
> immediately resent (420, marked as "spurious retransmission" by
> wireshark).
> 
> This pattern repeats through the trace.  From time to time a large
> write has its last segment delayed by 1 second, then there's an ACK
> from the server and then that last segment is immediately "spuriously"
> resent.
> 
> Does this ring any bells?  Either from the TCP point of view or from
> NFS might be doing here that might trigger that.
> 
> Just copying a large file to the server seems to be ok, the
> time/sequence plot is nice and linear.

Looking at the same traffic from the client I see a different picture
that complements the server side story.
 
The NFS write request is sent in one batch of N chunks.  The server
acks chunks up to N-1, but not the last one.  After one second the
client retransmits the last chunk of the batch and gets two acks on
it.

So what seems to be happenning is that enet(4) does not actually
transmit the packet with the last chunk of data (418) along with its
siblings (413, 414, 416).  The server does not see it and so does not
ack it.  Eventually the rexmit timer kicks in and the client's TCP
resends the last chunk in a new packet (420).  At this point the old
last packet (418) gets unstuck and so both the original last packet
and the new copy are sent.

So this seems to be some kind of enet(4) bug.

mlelstv@ pointed out that TXDESC_WRITEOUT and RXDESC_WRITEOUT use
PREWRITE, not POSTWRITE, which seems suspicious.  Unfortunately
changing them to POSTWRITE doesn't seem to help.  Since this problem
doesn't happen with all long WRITEs there must be something else at
play here too.  If i have to guess - ringbuffer wraparound may be?


-uwe

References:
- NFS client performance problems
  - From: Valery Ushakov

Prev by Date: NFS client performance problems
Next by Date: policy routing
Previous by Thread: NFS client performance problems
Next by Thread: policy routing
Indexes:

Home | Main Index | Thread Index | Old Index