tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RB tree in the buffer cache



I thought about this one more and came to the conclusion that this is a half
measure, and to do it properly the buffer cache should catch up with the
page cache (per-vnode locking and radixtree).  That's a fair bit of effort,
so I'm going to leave this change for now.

I still plan to merge the changes for sync-on-shutdown to wait on vnode
output instead of buffer output since that's needed with whatever scheme is
chosen to replace the global hash.

Andrew

On Sat, Apr 04, 2020 at 10:53:12PM +0000, Andrew Doran wrote:

> Despite repeatedly messing around with the hash function over the last few
> months, and despite being ~64MB, bufhash often has long chains on my test
> system.
> 
> And, whatever is done to the hash function it will always throw away the
> valuable natural partitioning that's there to begin with, which is that
> buffers are cached with specific vnodes:
> 
> # vmstat -H
>                     total     used     util      num  average  maximum
> hash table        buckets  buckets        %    items    chain    chain
> bufhash           8388608    24222     0.29    36958     1.53      370
> in_ifaddrhash         512        2     0.39        2     1.00        1
> uihash               1024        4     0.39        4     1.00        1
> vcache_hashmask   8388608   247582     2.95   252082     1.02        2
> 
> Changing this to use a per-vnode index makes use of that partitioning, and
> moves things closer to the point where bufcache_lock can be replaced by
> v_interlock in most places (I have not made that replacement yet - it's a
> decent chunk of work).
> 
> Despite the best efforts of all involved the buffer cache code is a bit of a
> minefield because of things the users do.  For example LFS and WAPBL want
> some combination of I/O buffers for COW type stuff, pre-allocated memory,
> and inclusion on the vnode buf lists presumably for vflushbuf(), but those
> buffers shouldn't be returned via incore().  To handle that case I added a
> new flag BC_IOBUF.
> 
> I don't have performance measurements but from lockstat I see about 5-10x
> reduction in contention on bufcache_lock in my tests, which suggests to me
> that less time is being spent in incore().
> 
> Changes here:
> 
> 	http://www.netbsd.org/~ad/2020/bufcache.diff
> 
> Comments welcome.
> 
> Thanks,
> Andrew


Home | Main Index | Thread Index | Old Index