Subject: Re: DEV_B_SIZE
To: Steve Byan <stephen_byan@maxtor.com>
From: Konrad Schroder <perseant@hitl.washington.edu>
List: tech-kern
Date: 01/31/2003 11:52:40
My $0.02 regarding FFS: since the default block size (including indirect
blocks etc.) is 8k the only common alignment issue would come from
(mis-)alignment of the partition as a whole.  If the drive were structured
so that it reported cylinders as multiples of 4k, (almost) no one would
ever have the type of problem you're describing with FFS.

On Fri, 31 Jan 2003, Steve Byan wrote:

> I think journaling filesystems need to know the atomic block size in
> order to structure their log in a fault-tolerant way; I'm hoping
> someone on these lists can provide some details.

I think LFS is mostly okay here, though there is a corner case in which
some data could be lost (possibly the filesystem corrupted) without the
user knowing about it.  Let me describe such a case.

Suppose that the cleaner were operating.  Every cleaner write is a
checkpoint, but following the cleaner write, the previous checkpoint is
invalidated---so it is possible that there is only one valid checkpoint on
disk, at all.  Now further suppose that the filesystem were created with
fragment size less than 4k, the cleaner has just cleaned segment n+1,
filling segment n with that data; and another write has occurred into
segment n+1, thereby invalidating the contents of segment n+1; and there
were a power outage while that first segment summary in segment n+1 were
being written.

Both the previous checkpoint state (including segment n+1) and the current
checkpoint state (including segment n) would be invalid in this case.

The worst part about it is that even if fsck_lfs could fix this problem,
no one would know to run it; LFS uses roll-forward as its default repair
mechanism, and roll-forward always starts from the last known-valid
checkpoint.

The solution, of course, is to

1) Identify the disk as a 4k-sector disk;
2) Partition the disk so that LFS partitions begin on 4k boundaries;
3) Create the LFS filesystems with 4k or greater fragment size;
4) Play happily with your 8k/1k FFSes and 8k/4k LFSes.

If you did that the 4k sector size would be truly invisible to you---and
in particular, you would *not* need to recompile the kernel for any of
that unless I'm misunderstanding what you're saying.

------------------------------------------------------------------------
Konrad Schroder          http://www.hitl.washington.edu/people/perseant/
Information Tech & Services   Box 352142 -or- 215 Fluke Hall, Mason Road
Human Interface Technology Lab                  University of Washington
Voice: +1.206.616.1478   Fax: +1.206.543.5380    Seattle, WA, 98195, USA