Subject: Re: Buffer cache oddities
To: None <tech-kern@NetBSD.ORG, tls@rek.tjls.com>
From: Wolfgang Solfrank <ws@kurt.tools.de>
List: tech-kern
Date: 12/10/1996 15:24:13
> One of my suspicions was immediately confirmed: There are *NO* buffers larger
> than 8K on my system, ever.  The machine in question is a fairly small-memory
> machine, so I have nbuf=bufpages; however, within a minute or so of runtime,
> about 2/3 of the buffers are 8k, as I'd expect (all of my filesystems have 8K
> blocks), and about 1/3 remain at 4K.  This is the in-use buffers; the 8K
> buffers have been created, of course, by stripping pages from other buffers;
> at the default size of one page (4K, on the i386), this means that a lot of
> buffers went away onto the EMPTY list.

Yes, I was suspicious of those EMPTY buffers myself, too, for quite a while.
It doesn't make too much sense to allocate buffer headers that will only
be on this empty queue ever (albeit it's only(?) headers, so the loss isn't
too hard).

> Although diagrams in the 4.4 book seem to show much larger buffers than 8K,
> after reading through vfs_bio.c a number of times I don't exactly understand
> how such buffers would in fact be used, since buffers are indexed by
> filesystem and logical block number.  Were buffers larger than a filesystem 
> block to exist, how could one ensure that getblk() would in fact find the 
> buffer containing block X, if it's not the first block of the buffer?  And 
> since getblk() doesn't return an offset into the buffer, how would one 
> _access_ such blocks?

The filesystem code has to guarrantee that buffers are accessed with the same
granularity (as long as they might be in the buffer cache, to be exact).
E.g. if the filesystem code ever requests a buffer at, say, device block number
1234 with a size of 3 device blocks, it may never request a buffer at device
block number 1236, as long as it doesn't guarrantee that the previous buffer
has been invalidated.  However, different buffers may have different sizes.

Indeed, the ffs code does access buffers in a granularity of (ffs) blocks with
the exception of trailing fragments of a file which are accessed through
one buffer that contains all the fragments exceeding the last full (ffs) block
of such file.

Note that while buffers physically contain memory in multiples of the page
size of the machine (typically 4k on most machines), logically they contain
memory in multiples of the device block size (typically 512B).

> It doesn't seem to matter, since the read-ahead code allocates a buffer per
> block, so I can't see what would ever read in such a >8K buffer anyhow.
> 
> After running my system for about fifteen minutes doing a pretty heavy mix of
> I/O (several compiles, a large CVS checkout, deliberately induced heavy
> paging) I in fact still have no buffers larger than 8K.
> 
> I just did a bunch of dd's from raw (shouldn't do anything, right?) and block
> devices with bs=16k and bs=64k.  Still no buffers larger than 8K in the buffer
> cache.
> 
> So what did increasing MAXBSIZE change, exactly?  I must be misunderstanding
> something very fundamental.  Please humor me.

For one thing, if you look into the cluster code, it allocates buffers larger
than the (ffs) block size to do the I/O in one chunk. Only after the I/O is
complete are the pages of the buffer redistributed in their own buffers to
allow for separate retrieval.

Additionally, there are filesystems (actually only msdosfs currently (I think))
that require at least a buffer size of 32k in order to support all possible
incarnations of themselves.

> A more surprising revelation is that the AGE list is always empty.  Again, is
> this correct?  Why?

Hmm, I just found that one out myself.  It doesn't look correct, but I've got
to look into this further for an analyzation.

Hope this makes at least some things clear.

Bye,
Wolfgang
--
ws@TooLs.DE     (Wolfgang Solfrank, TooLs GmbH) 	+49-228-985800