NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/55362: bad inodes created after dump|restore
The following reply was made to PR kern/55362; it has been noted by GNATS.
From: Patrick Welche <prlw1%cam.ac.uk@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Thu, 19 Aug 2021 16:16:00 +0100
A big thank you to chs@ and oster@ for working so hard on this bug.
The summary now that we know the solution is: c.f., kre@'s comment above:
Further, since the two drives have different data, and shouldn't, this
cannot be a filesystem or above problem, it has to be either the drive
omitting to write sometimes, or the driver forgetting to perform the write,
or raidframe not requesting one of the writes.
To catch which of the above, chs@ suggested:
fbt::raidstrategy:entry,fbt::bdev_strategy:entry
/ args[0]->b_dev == 0x0023 || args[0]->b_dev == 0x0043 || args[0]->b_dev == 0xa807 || args[0]->b_dev == 0xa80c || args[0]->b_dev == 0x1203 || args[0]->b_dev == 0x3e03 /
{
printf("func %s rw %d dev 0x%x blkno %d len %d", probefunc, (args[0]->b_flags & 0x00100000) != 0, args[0]->b_dev, args[0]->b_blkno, args[0]->b_bcount);
}
Reminder of set up:
wd2 -> dk7 (backup1) --+-> raid0 -> dk19 -> cgd. -> dk..
wd4 -> dk12 (backup2) -/
Another experiment as per earlier in this PR shows,
according to cmp, there is a mismatch between offsets
3412087771137 -> 3412087775232 (from cmp)
6664233928 b+1 -> 6664233936 b (in blocks)
inclusive. The associated bits of dtrace are:
wd4d: func bdev_strategy rw 0 dev 0x43 blkno 6664233920 len 40960 = 80b
wd2d: func bdev_strategy rw 0 dev 0x23 blkno 6664233920 len 40960 = 80b
dk12: func bdev_strategy rw 0 dev 0xa80c blkno 6664233920 len 4096 = 8b
dk7: func bdev_strategy rw 0 dev 0xa807 blkno 6664233920 len 4096 = 8b
dk12: func bdev_strategy rw 0 dev 0xa80c blkno 6664233936 len 57344 = 7168b
dk7: func bdev_strategy rw 0 dev 0xa807 blkno 6664233936 len 57344 = 7168b
raid0:func bdev_strategy rw 0 dev 0x1203 blkno 6664233992 len 4096 = 8b
raid0:func raidstrategy rw 0 dev 0x1203 blkno 6664233992 len 4096 = 8b
dk12: func bdev_strategy rw 0 dev 0xa80c blkno 6664233928 len 4096 = 8b
Just to check "grep -C3 6664233928 dtrace2.log" has only one hit:
5 36066 raidstrategy:entry func raidstrategy rw 0 dev 0x1203 blkno 6628949584 len 32768
9 4107 bdev_strategy:entry func bdev_strategy rw 0 dev 0x1203 blkno 6628949712 len 32768
9 36066 raidstrategy:entry func raidstrategy rw 0 dev 0x1203 blkno 6628949712 len 32768
5 4107 bdev_strategy:entry func bdev_strategy rw 0 dev 0xa80c blkno 6664233928 len 4096
5 4107 bdev_strategy:entry func bdev_strategy rw 0 dev 0x43 blkno 6731606984 len 4096
12 4107 bdev_strategy:entry func bdev_strategy rw 0 dev 0x1203 blkno 6628888960 len 4096
12 36066 raidstrategy:entry func raidstrategy rw 0 dev 0x1203 blkno 6628888960 len 4096
The first component dk7 is not written to and contains zero.
chs@ & oster@ traced the issue to a PR_NOWAIT allocation from a raidframe
pool failing.
oster@ patched raidframe so that getiobuf() can't run out of memory,
which means this PR was fixed in:
https://mail-index.netbsd.org/source-changes/2021/07/23/msg131062.html
Home |
Main Index |
Thread Index |
Old Index