tech-net: LFS writes and network receive (too much splhigh?)

Subject: LFS writes and network receive (too much splhigh?)
To: None <tech-kern@netbsd.org, tech-net@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-net
Date: 10/22/2006 15:07:02

If I restore a backup full of large files (such that the smooth syncer,
which schedules writes when _a file's_ oldest data is 30 seconds old),
over the network onto LFS, the following thing happens:

1) Writes queue up for 30 seconds (this is a design flaw in the smooth syncer)

2) Every 30 seconds, LFS writes flat-out for a few seconds

3) While this is going on, the network interface interrupt rate falls off
   dramatically, almost to zero, while the disk interface interrupt rate,
   of course, rises.

I assume the network interrupts are being masked during the segment writes,
either by LFS itself or by the disk driver.  This has the exasperating
effect of causing dropped packets, which causes the TCP window to slam open
and shut.

How can we fix this?  The smooth syncer issue is really separate: it just
makes it easier to demonstrate the problem.

Could LFS use more locks so that it spent less time at high IPL?  Or is
this really a problem in my disk device driver (amr)?  I have heard
similar reports from other LFS users with different disk hardware.

-- 
  Thor Lancelot Simon	                                     tls@rek.tjls.com

  "We cannot usually in social life pursue a single value or a single moral
   aim, untroubled by the need to compromise with others."      - H.L.A. Hart