NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem with 4k sectors
The following reply was made to PR bin/50108; it has been noted by GNATS.
From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: bin/50108: fsck_ffs fails replaying wapbl journal on filesystem
with 4k sectors
Date: Sun, 2 Aug 2015 07:48:59 +0000
On Fri, Jul 31, 2015 at 09:55:00PM +0000, dudinea%gmail.com@localhost wrote:
> When trying to run fsck on a FFS filesystem that has been created
> with 4k sectors, mounted with -o log and then not cleanly
> unmounted, fsck_ffs fails when it's replaying the journal and tries
> to write wrong blocks off the device end:
>
> fsck_ffs -d -d /dev/rvnd0a
> ** /dev/rvnd0a
> wapbl_replay_start: vp=0x0 off=91808 count=8192 blksize=4096
> wapbl_read: 8192 bytes at block 91808 on fd 0x3
> wapbl_replay: head=5332992 tail=5095424 off=8192 len=33546240 used=237568
> wapbl_read: 4096 bytes at block 93052 on fd 0x3
> wapbl_read: 4096 bytes at block 93108 on fd 0x3
> wapbl_read: 4096 bytes at block 93109 on fd 0x3
> ** File system is journaled; replaying journal
> wapbl_read: 4096 bytes at block 93092 on fd 0x3
> wapbl_write: 4096 bytes at block 367616 on fd 0x4
>
> CANNOT WRITE: BLK 367616
> CONTINUE? [yn] n
>
> Looks like it's trying to replay journal into blocks
> that lie off the device ends. I suppose that it can
> corrupt file system as well when wrong blocks to be written
> lie within device boundaries.
The cause of this is almost certainly wrong unit conversion
somewhere.
However, I notice that the dumpfs output says that there are 91808
blocks in the fs. So reading from block 91808 (not to mention past
that) and succeeding seems odd - does this volume have the journal
past the end of the volume data?
Anyway, 367616 is 91904 * 4, and 91904 is reasonably close to the
other blocks it's accessing so it's a plausible correct block value.
Which would imply that there's a stray factor of 4x somewhere. I'd
have expected 8x (4096 / 512) but whatever...
If you (or someone) can catch this in the debugger and then trace the
provenance of that 367616 my guess is that it's been multiplied by the
wrong thing.
Although this is not trivial; for the moment I can't tell for sure if
wapbl_read and wapbl_write are supposed to be getting block numbers in
frags or device-level blocks or DEV_BSIZE blocks - it looks like in
fsck it's supposed to be device-level blocks, but I'm not sure it's
the same in the kernel... and I'm not sure that's what it's getting
either (from reading the code) (which is what one would expect given
the bug report I suppose).
unfortunately the units are obscure and most of the conversion macros
are named unhelpful things like 'sntod' or 'fsbtodb' where the letters
aren't necessarily related much to the units they convert.
--
David A. Holland
dholland%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index