NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors



The following reply was made to PR bin/50108; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem
 with 4k sectors
Date: Sun, 2 Aug 2015 07:48:59 +0000

 On Fri, Jul 31, 2015 at 09:55:00PM +0000, dudinea%gmail.com@localhost wrote:
  > When trying to run fsck on a FFS filesystem that has been created
  > with 4k sectors, mounted with -o log and then not cleanly
  > unmounted, fsck_ffs fails when it's replaying the journal and tries
  > to write wrong blocks off the device end:
  > 
  > fsck_ffs  -d -d /dev/rvnd0a 
  > ** /dev/rvnd0a
  > wapbl_replay_start: vp=0x0 off=91808 count=8192 blksize=4096
  > wapbl_read: 8192 bytes at block 91808 on fd 0x3
  > wapbl_replay: head=5332992 tail=5095424 off=8192 len=33546240 used=237568
  > wapbl_read: 4096 bytes at block 93052 on fd 0x3
  > wapbl_read: 4096 bytes at block 93108 on fd 0x3
  > wapbl_read: 4096 bytes at block 93109 on fd 0x3
  > ** File system is journaled; replaying journal
  > wapbl_read: 4096 bytes at block 93092 on fd 0x3
  > wapbl_write: 4096 bytes at block 367616 on fd 0x4
  > 
  > CANNOT WRITE: BLK 367616
  > CONTINUE? [yn] n
  > 
  > Looks like it's trying to replay journal into blocks
  > that lie off the device ends. I suppose that it can
  > corrupt file system as well when wrong blocks to be written
  > lie within device boundaries.
 
 The cause of this is almost certainly wrong unit conversion
 somewhere.
 
 However, I notice that the dumpfs output says that there are 91808
 blocks in the fs. So reading from block 91808 (not to mention past
 that) and succeeding seems odd - does this volume have the journal
 past the end of the volume data?
 
 Anyway, 367616 is 91904 * 4, and 91904 is reasonably close to the
 other blocks it's accessing so it's a plausible correct block value.
 Which would imply that there's a stray factor of 4x somewhere. I'd
 have expected 8x (4096 / 512) but whatever...
 
 If you (or someone) can catch this in the debugger and then trace the
 provenance of that 367616 my guess is that it's been multiplied by the
 wrong thing.
 
 Although this is not trivial; for the moment I can't tell for sure if
 wapbl_read and wapbl_write are supposed to be getting block numbers in
 frags or device-level blocks or DEV_BSIZE blocks - it looks like in
 fsck it's supposed to be device-level blocks, but I'm not sure it's
 the same in the kernel... and I'm not sure that's what it's getting
 either (from reading the code) (which is what one would expect given
 the bug report I suppose).
 
 unfortunately the units are obscure and most of the conversion macros
 are named unhelpful things like 'sntod' or 'fsbtodb' where the letters
 aren't necessarily related much to the units they convert.
 
 -- 
 David A. Holland
 dholland%netbsd.org@localhost
 


Home | Main Index | Thread Index | Old Index