Subject: Re: LFS writes and network receive (too much splhigh?)
To: Thor Lancelot Simon <firstname.lastname@example.org>
From: Bill Studenmund <email@example.com>
Date: 10/22/2006 18:18:48
Content-Type: text/plain; charset=us-ascii
On Sun, Oct 22, 2006 at 06:28:32PM -0400, Thor Lancelot Simon wrote:
> On Sun, Oct 22, 2006 at 02:56:29PM -0700, Jason Thorpe wrote:
> > On Oct 22, 2006, at 12:41 PM, Thor Lancelot Simon wrote:
> > >What do you think is going on?
> > I'm not sure yet. Are you absolutely sure it's a problem with =20
> > servicing the interrupt on time?
> Well, if I run systat vmstat (with interval 1) while this is going on,
> when LFS starts to whack the disk, I see the usual 2000-3000 network
> interrupts per second fall off to near zero until the writes (and
> disk controller) drop back to zero.=20
> So, while LFS is writing, I will see 10, sometimes 30, very occasionally
> as many as 300 interrupts per second on the network controller's interrupt
> line, right as TCP backs off and throughput goes to hell. When LFS isn't
> writing, I see, as I said, 2000-300 network interrupts per second -- or
> as many as 7,000, if I turn interrupt moderation down.
> So the reasonable inference, to me, really seems to be that LFS, when it
> writes flat-out for 5 or 10 seconds at a time, is causing the network
> interrupts to not be serviced, which is what's causing TCP to back off.
How much buffering is in your application? I ask as another thing that=20
could be happening is that the file is locked while it's being flushed, so=
that the program that's reading the network stalls during this flush. That=
means no more packet reception.
This scenario would be noticable by either monitoring netstat to see what=
the connection queue lengths look like or by monitoring the tcp stream to=
see if the LFS box is explicitly shrinking the window (i.e. the stack=20
noticed that the app's not reading for the moment).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
-----END PGP SIGNATURE-----