Subject: Re: DEV_B_SIZE
To: Steve Byan <>
From: Stephan Uphoff <>
List: tech-kern
Date: 01/31/2003 16:58:43
> Steve Byan writes:
> If you think there are no functional problems with this 
> backwards-compatibility scenario, including during recovery (fsck or 
> journal roll-forward), I'd be happy to hear a clear "no problem".

Unfortunately there are some problems:

Journaling Filesystems and databases often group multiple 512 sectors into 
logical blocks and check if the blocks were updated atomically
on recovery.
( Recommended reading: C. Mohan: "Disk Read-Write Optimization and Data
Integrity in Transaction Systems Using Write-Ahead Logging") 

If (as described earlier in this thread by David Laight) the blocks are not 
aligned on a 4k boundary a write to one block will lead to the rewrite of a 
4k sector containing parts of another block.
The loss of this 4k sector due to a power failure will destroy both blocks.

The behavior above breaks for example Ping-Pong writes - probably one of the best
known ways to update log files.

Databases might also not be able to detect the affected blocks needing "media recovery"
during the recovery phase. (Since blocks are affected that the database never touched
and as such have no entry of in the log)

Parity based Raid Systems need some kind of NVRAM or redundant log to guarantee
redundancy immediately after power failure.
However with disks pretending to have 512 bytes sectors - but writing 4K 
sectors - the raid software will probably not store enough data for a 
successful recovery.

How to recover a bad 4k sector ?
What happens if the software tries to fix it by sequentially 
writing 512 Byte Sectors?
I see two solutions:
	- make up (set to 0?) the rest of the 4k Block
	- fail the operation
Neither solution looks safe.
Alignment problems as describer in (1) might bring additional joy.

I am currently designing my second Journaling Filesystem ( The first one was for
 Network Storage Solutions ) and have some neat tricks on the drawing board
that I would have to throw away because they requiring knowledge of the exact
sector size. ;-)


Thanks for bringing this to my attention.

I would appreciate any update on IDEMA's proposal and hope that any
new standard would include a way to find the real sector size for special