tech-kern: Re: LFS writes and network receive (too much splhigh?)

Subject: Re: LFS writes and network receive (too much splhigh?)
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/22/2006 18:18:48

--OBd5C1Lgu00Gd/Tn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Oct 22, 2006 at 06:28:32PM -0400, Thor Lancelot Simon wrote:
> On Sun, Oct 22, 2006 at 02:56:29PM -0700, Jason Thorpe wrote:
> >=20
> > On Oct 22, 2006, at 12:41 PM, Thor Lancelot Simon wrote:
> >=20
> > >What do you think is going on?
> >=20
> > I'm not sure yet.  Are you absolutely sure it's a problem with =20
> > servicing the interrupt on time?
>=20
> Well, if I run systat vmstat (with interval 1) while this is going on,
> when LFS starts to whack the disk, I see the usual 2000-3000 network
> interrupts per second fall off to near zero until the writes (and
> disk controller) drop back to zero.=20
>=20
> So, while LFS is writing, I will see 10, sometimes 30, very occasionally
> as many as 300 interrupts per second on the network controller's interrupt
> line, right as TCP backs off and throughput goes to hell.  When LFS isn't
> writing, I see, as I said, 2000-300 network interrupts per second -- or
> as many as 7,000, if I turn interrupt moderation down.
>=20
> So the reasonable inference, to me, really seems to be that LFS, when it
> writes flat-out for 5 or 10 seconds at a time, is causing the network
> interrupts to not be serviced, which is what's causing TCP to back off.

How much buffering is in your application? I ask as another thing that=20
could be happening is that the file is locked while it's being flushed, so=
=20
that the program that's reading the network stalls during this flush. That=
=20
means no more packet reception.

This scenario would be noticable by either monitoring netstat to see what=
=20
the connection queue lengths look like or by monitoring the tcp stream to=
=20
see if the LFS box is explicitly shrinking the window (i.e. the stack=20
noticed that the app's not reading for the moment).

Take care,

Bill

--OBd5C1Lgu00Gd/Tn
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFFPBh4Wz+3JHUci9cRAr5xAJ0T0GuGAei6t+skWSyU8P3+zmr1UwCfSKa7
cf9KBe68/5wfdJz+cdpO1ic=
=eYgT
-----END PGP SIGNATURE-----

--OBd5C1Lgu00Gd/Tn--