Subject: Re: unsynced vnd images
To: Florian Stoehr <netbsd@wolfnode.de>
From: Greg Troxel <gdt@ir.bbn.com>
List: netbsd-users
Date: 02/18/2005 11:48:31
  Of course. I meant the file is not there if I copy the vnd image file
  and revnd/recgd/remount that copied image. Sorry my mail was a bit not
  clear enough here.

I followed you.  I meant that you were relying on a cache coherency
property that isn't guaranteed for write-back caches.

  Yes. I don't mind whether it is ready on disk or not. The problem seems
  to be that it isn't even ready IN CACHE on the physical disk!

Sure, but this is pretty much analogous to dd'ing the raw device
on which you have an active ffs mount, and expecting the file bits to
be on the platter already.

  I assume this is because the cache knows that the file inside the cgd
  is "dirty buffered" - and thus copying the file from the mounted cgd
  will work (copying content from cache). But it misses that the image the
  cgd resides in is invalid - this itself will be marked a "dirty buffer"
  when the cache for the ***cgd*** is flushed -> the physical I/O for the
  cgd is actually the cached I/O for the next unit, the vnd being on a 
  (buffered) "real" partition.

When the ffs cache for /mnt is flushed, this will do a write to the
cgd, which will then (I think) do a write on the underlying device,
creating a dirty buffer there if block, or a write to a file, creating
a dirty buffer on the underlying device.

  Hm. "pseudo-disk" is the problem, residing on a physical disk and thus
  beeing kinda double-cached. hm. <-- (I guess)

I don't think vnd is hurting you here.

  Guess unmounting the cgd and calling sync() afterwards, then doing a
  short wait is all I can do?

I don't think you have to unconfig cgd, but perhaps unmount /mnt or
downgrade it to RO.

I think this is pretty hard to fix.  One way would be mount the fs
with the 'sync' flag, but the fs performance would be awful.  The
right thing might be to somehow always write the data, but give
permission for lower-layer caching, and then invoke cache flush
operations when necessary.  This would be a departure for the current
way, which I think does have the 'force completion' notion, but
doesn't always start the writes.  The idea would be to have the
ciphertext always be consistent, but maintain metadata ordering
guarantees for the actual writes to disk.  Of course, these guarantees
may not exist now.