Subject: Re: buffer starvation & the vnd driver
To: R. C. Dowdeswell <elric@imrryr.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 09/02/1999 18:29:56
On Thu, 2 Sep 1999, R. C. Dowdeswell wrote:

> 
> On 936317294 seconds since the Beginning of the UNIX epoch
> Bill Studenmund wrote:
> >
> >Ahhhhh... That's a different problem than what I understood was going on.

Now we're back to the initial problem. Wheee!

> >[note: getblk() shoudn't be calling bawrite()]
> 
> Well, getblk() calls allocbuf() which calls getnewbuf() which starts
> delayed writes with bawrite().  (Just because it is no longer late
> and I just had a look at the code.)

This problem would be fixed with the low water mark stuff Jason mentioned.
We'd then be triggering these writes before we were out of buffers to use
to write them. :-)

> >So what happens then is that we call getblk() on a buffer, and do
> >SET(bp->b_flags, B_BUSY). In the process of doing the msdosfs_bmap(), we
> >end up calling getblk on this buffer AGAIN, see B_BUSY set, and sleep on
> >ourselves.
> 
> This is almost it.  Now that I am more awake, I'll try to describe
> it a bit better.
> 
> We have large file A.  We initiate an action which requires us to
> determine the disk location of lblkno 50, so we begin walking the
> FAT chain (which is a linked list).  At some point, we get to a
> FAT block (say FB1) that is not in the buffer chache, so we try to
> bring it in.  This initiates a write of a dirty buffer of say lblkno
> 75.  To find out where lblkno 75 is, we need to walk the FAT chain
> again.  Now we have lost, since in order to find out the disk addr
> of lblkno 75, we must necessarily load in FB1, but we've already
> marked the buffer busy, and so when we wait on it we hang.

> >Sounds like much more of an msdosfs bug than I understood it to be, though
> >we should find an FS-I solution.
> >
> >So basically if we want to read in a FAT page, we must write out a page
> >which won't need that page to write.
> 
> I think that bawrite() should be able to determine the disk location
> of a buffer by its struct buf.b_blkno.  So, the real problem that
> we run into here is in the vnd, where this information points to
> a vnd, and so vndstrategy gets called.  vndstrategy calls VOP_BMAP(),
> and hence runs into the problem.

The problem is that the vnd device is written so that it can deal with its
blocks being a multiple of the blocks on the underlying device (if I'm
reading it right), so that this info is a one-> many mapping, and isn't
that easy to cache. :-( There's also another problem...

> >I see no easy solution.
> 
> Well, if it really is an interaction between the vnd layer and the
> msdosfs, then if we con the vnd layer to remember the physical disk
> locations of buffer pages I think that we have sidestepped the
> issue.

which is: what if the underlying fs decides to move the file around? Say
someone blows a hole in the file (deallocates block) and reuses the space?
When we go to write, we definitly don't want to write info into the space
where we read it from. :-)

The only solution I can think of to support caching the position would be
if we had some sort of coherency manager a la what Heidemann suggests for
stacked filesystems. With it, the vnd driver could mark that it was
caching the underlying file and thus be informed if the file was modified.

I think the low water mark (technically a high water mark on delayed
writes) is the cleanest way to go for now.

Thoughts?

Take care,

Bill