tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

WAPBL/cache flush and mfi(4)

while benchmarking mfi(4) on a new thunderbolt-based controller,
I found that WAPBL badly affects performances, and in some case
make the system stall.

For these tests, I have 3 logical drives:
- sd0, 30GB boot volume
- sd1, 10TB
- sd2, 3.5TB

sd0 and sd1 are from the same disk array (raid-6), sd2 is from another
disk array, but all are on the same controller mfi0.

filesystems are:
/dev/sd0a on / type ffs (log, local)
/dev/sd0f on /var type ffs (log, local)
/dev/sd0e on /usr type ffs (log, local)
/dev/dk0 on /home type ffs (local)
/dev/dk1 on /data1 type ffs (local)
/dev/dk2 on /data2 type ffs (local)

dk0 and dk1 are wedges from sd1, dk2 is a wedge from sd2.

First, I was a bit dissapointed by performances compared to an older
system with a perc/5 controller. Then I noticed that while running
bonnie++ on /data2, some processes would hang in tstile.
I tracked it down to this scenario:
- WAPBL wants to issue a cache flush for one of the fs. It's doing this
  with the filesystem's journal lock held.
- the cache flush hits mfi0. This is translated to a command which
  flushes the controller+disk caches. But this command flushes the
  whole controller's cache, not only data associated with sd0.
  With 1Gb of dirty data to write back it takes 2 to 3 seconds.
- to make things worse, bonnie++ keeps writing go /data2. The controller
  accept these commands, adding new data to the cache as soon as
  the cache flush has freed some space. So the cache flush, in fact,
  doens't complete as long as more writes are comming in.
- as the journal is locked, processes doing an operation requiring the
  journal hang in tstile.

An obvious workaround is setting vfs.wapbl.flush_disk_cache to 0.
This is safe, because the controller has a battery backup.
This also has a positive impact on performances, and I'm now happy
with this controller (the system with the perc/5 doens't have any wapbl
filesytem, which explain why it has good performances).

But I feel that something more fine-grained is needed: I could have
an ahci(4) controller with the boot disk, and data disks on mfi(4).
Then I want wapbl flushes to really do flushes on ahci(4) but not on mfi(4).

How do you think this should be handled ? have a sysctl to control
mfi's behavior per-controller (eventually with default depending on BBU
status), or have a vfs.wapbl.flush_disk_cache per mount-point
(with a mount option maybe) ? 

Manuel Bouyer <>
     NetBSD: 26 ans d'experience feront toujours la difference

Home | Main Index | Thread Index | Old Index