tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

What is the best layer/device for a write-back cache based in nvram?



This is a continuation of the thread Is factible to implement full
writes of stripes to raid using NVRAM memory in LFS.
http://mail-index.netbsd.org/tech-kern/2016/08/18/msg020982.html

I want to discuss in what layer must be located a write back-cache. It
will be used usually for for raid configurations as a general purpose
device: any type of filesystem or raw.

Before of discussing the different options, I want to present the
benefits of a write-back, that I think must be supported by a
write-back cache, for check that they can be supported for the
different options.

1- There is no need to use parity map for the RAID 1/10/5/6. Usually
the impact is small, but it can be noticeable in busy servers.
  a) There isn't parity to rebuild. The parity is always up to date.
Less down time in case of os crash / power failure / hardware failure
  b) Better performance for RAID 1/5/6. It isn't necessary to update
the parity map because they don't exist.

2- In scattered writes contained in a same slice, it allows to reduce
the number of writes. With RAID 5/6 there is a advantage, the parity
is written only one time for several writes in the same slice, instead
of one time for every write in the same slice.
3- It allows to consolidate several writes that takes the full length
of the stripe in one write, without reading the parity. This can be
the case for log structured file systems as LFS, and allows to use a
RAID 5/6 with the similar performance of a RAID-0.
4- Faster synchronous writes.

The proposed layer must support:

A- It must be able to obtain the raid configuration of the raid device
backing the writeback cache. If it is a RAID 0/1 it will cache
portions of the size of the interleave. If it is RAID 5/6 it will
cache the size of a full slice.

B- It can use the buffer cache for avoid read/write cycles, and do
only writes if the data to be read is in memory.

C- Several devices can share the same write back-cache device ->
optimal and easy to configure. There is not need to hard partitioning
a NVRAM device in smaller devices with one partition over-used and
other infra-used.

D- In the case of filesystems as LFS, it would be useful to do the
next optimization: when a slice is complete in the buffer, write it in
a short time, because this won't be written any more.

E- It can be useful to use elevator algorithms for do the writes from
buffer cache to raid.



These are the three options proposed by Thor. I would like to know
what is the best option for you:

1- Implement in a generic pseudodisk the write-back cache. This
pseuododisk is attached on a raid/disk/etc. This is also the option
suggested by Greg.

It seems the option more recommended in the previous thread.

2- Add this to Raidframe.

Is it more easy to implement/to integrate with Raidframe?. The raid
configurations are contained in the same driver.

It can be more easy for a sysadmin to configure: less
devices/commands, and not prune to corruption errors: there isn't a
device with write-back cache and the same device without write-back
cache.
For non raid devices It can be used as raid 0 of one disk.

3- LVM. I don't see special advantage in this option.


I want to leave for other thread what devices must be supported:
nvram/nvme/disk/etc.

As notes:
- mdadm has the possibility to use a disk (flash by example) as
journal for raid devices. It would be used instead of parity maps. It
has been integrated inside of mdadm: https://lwn.net/Articles/665299/

- I don't think that it is possible to use write back-cache for boot
the OS in a easy way.


Home | Main Index | Thread Index | Old Index