Subject: Re: buffer starvation & the vnd driver
To: Bill Studenmund <wrstuden@nas.nasa.gov>
From: R. C. Dowdeswell <elric@mabelode.imrryr.org>
List: tech-kern
Date: 09/02/1999 18:02:37
On 936317294 seconds since the Beginning of the UNIX epoch
Bill Studenmund wrote:
>
>> Well, I'll chime in here with the basic symptoms.  (I don't have
>> the machine with me, but...)  It hangs.  ddb tells me that
>> msdosfs_bmap() calls getblk() calls bawrite() to write a block out,
>> which calls vndstrategy() which calls msdosfs_bmap() which calls
>> getblk() which hangs waiting for the previous getblk() to finish.
>> And then all disk activity stops.
>
>Ahhhhh... That's a different problem than what I understood was going on.
>
>[note: getblk() shoudn't be calling bawrite()]

Well, getblk() calls allocbuf() which calls getnewbuf() which starts
delayed writes with bawrite().  (Just because it is no longer late
and I just had a look at the code.)

>So what happens then is that we call getblk() on a buffer, and do
>SET(bp->b_flags, B_BUSY). In the process of doing the msdosfs_bmap(), we
>end up calling getblk on this buffer AGAIN, see B_BUSY set, and sleep on
>ourselves.

This is almost it.  Now that I am more awake, I'll try to describe
it a bit better.

We have large file A.  We initiate an action which requires us to
determine the disk location of lblkno 50, so we begin walking the
FAT chain (which is a linked list).  At some point, we get to a
FAT block (say FB1) that is not in the buffer chache, so we try to
bring it in.  This initiates a write of a dirty buffer of say lblkno
75.  To find out where lblkno 75 is, we need to walk the FAT chain
again.  Now we have lost, since in order to find out the disk addr
of lblkno 75, we must necessarily load in FB1, but we've already
marked the buffer busy, and so when we wait on it we hang.

>Sounds like much more of an msdosfs bug than I understood it to be, though
>we should find an FS-I solution.
>
>So basically if we want to read in a FAT page, we must write out a page
>which won't need that page to write.

I think that bawrite() should be able to determine the disk location
of a buffer by its struct buf.b_blkno.  So, the real problem that
we run into here is in the vnd, where this information points to
a vnd, and so vndstrategy gets called.  vndstrategy calls VOP_BMAP(),
and hence runs into the problem.

>I see no easy solution.

Well, if it really is an interaction between the vnd layer and the
msdosfs, then if we con the vnd layer to remember the physical disk
locations of buffer pages I think that we have sidestepped the
issue.

>Thoughts?
>
>Take care,
>
>Bill
>
>

 == Roland Dowdeswell                      http://www.Imrryr.ORG/~elric/  ==
 == The Unofficial NetBSD Web Pages        http://www.Imrryr.ORG/NetBSD/  ==
 == The NetBSD Project                            http://www.NetBSD.ORG/  ==