NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55800: Data transfers stall when SACK is enabled



>Number:         55800
>Category:       kern
>Synopsis:       Data transfers stall when SACK is enabled
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 10 08:40:00 +0000 2020
>Originator:     kim%netbsd.org@localhost (Kimmo Suominen)
>Release:        NetBSD 9.99.75 (202011081900Z)
>Organization:
>Environment:
System: NetBSD rendez-vous.gw.fi 9.99.75 NetBSD 9.99.75 (GENERIC) #0: Sun Nov 8 18:27:14 UTC 2020 mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

	Transferring files using rsync over ssh stalls after about 1 GB
	of data transferred.  (Might not be connected with the amount of
	data, though.)  The connection is over IPv4.

	When the transfer stalls there is always some unresolved SACK.
	During the transfer I observed regular bouts of SACK throughout
	so not all occurrences of SACK result in a stall.

	In the stalled state it looks like ssh is not getting any data
	through (and therefore rsync is not receiving anything).  I have
	tcpdump output available here:

	    https://www.netbsd.org/~kim/NB-RSYNC-PROBLEM.txt

	The last transfer stalled at 1:27.  Then there are some packets
	exchanged at 2:27 and 3:27.  At 4:27 the connection is closed.
	This would appear to match the sshd_config settings I have:

	    TCPKeepAlive no
	    ClientAliveInterval 3600
	    ClientAliveCountMax 3

	The output on the terminal running rsync is as follows:

	    Timeout, server equinoxe not responding.
	    rsync: connection unexpectedly closed (949438584 bytes received so far) [receiver]
	    rsync error: error in rsync protocol data stream (code 12) at io.c(228) [receiver=3.2.3]
	    rsync: connection unexpectedly closed (14688411 bytes received so far) [generator]
	    rsync error: unexplained error (code 255) at io.c(228) [generator=3.2.3]
	    rsync: [generator] write error: Broken pipe (32)

	I'm guessing the first line is from ssh, although I have not
	verified that.

	The remote side is running the NetBSD 9.1 release:

	    NetBSD 9.1 (GENERIC) #0: Sun Oct 18 19:24:30 UTC 2020

	The local side is running the most recent -current snapshot:

	    NetBSD 9.99.75 (GENERIC) #0: Sun Nov 8 18:27:14 UTC 2020

	When I first noticed the issue I was running a slightly older
	-current (build ID derived from CVS checkout timestamp):

	    NetBSD 9.99.74 (GENERIC.202010172211Z~GW) #1: Sun Oct 18 02:20:50 EEST 2020

>How-To-Repeat:

	This is the command I ran:

	    rsync -aHSs --delete --exclude /branch/ --exclude /daily/ \
		--exclude /git/ --exclude /hg/ --exclude /releases/ \
		--exclude /work/ --exclude /www/ equinoxe:/p/netbsd/ \
		/p/netbsd/

	Possibly any data transfer with enough data will do.

>Fix:

	A successful workaround was to disable SACK on the local side:

	    sysctl -w net.inet.tcp.sack.enable=0

	This transfer was using IPv4, but I did also disable IPv6 SACK:

	    sysctl -w net.inet6.tcp6.sack.enable=0



Home | Main Index | Thread Index | Old Index