Re: kern/55362: bad inodes created after dump|restore

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,prlw1%cam.ac.uk@localhost
Subject: Re: kern/55362: bad inodes created after dump|restore
From: Patrick Welche <prlw1%cam.ac.uk@localhost>
Date: Fri, 12 Jun 2020 13:45:02 +0000 (UTC)

The following reply was made to PR kern/55362; it has been noted by GNATS.

From: Patrick Welche <prlw1%cam.ac.uk@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/55362: bad inodes created after dump|restore
Date: Fri, 12 Jun 2020 14:41:19 +0100

 As these are in-warranty seagate disks, I ran
 
 SeaTools Bootable v2.1.2
 - SMART check                         PASS
 - short drive self test               PASS
 - short generic test                  PASS
 - long generic test (over 9 hours)    PASS
 
 
 > From: Robert Elz <kre%munnari.OZ.AU@localhost>
 > Date: Thu, 11 Jun 2020 00:02:46 +0700
 >  On the off chance  it is, you could simply compare the data
 >  portions (all except the raid header) of the two partitions that
 >  make up the raid1 on which the filesystem was created - assuming that
 >  raid1 init has finished, the two should be, and remain, identical
 >  (at least when not in active use, so do it with the filesystem(s)
 >  unmounted).   (You cold just compare the entire partitions, and
 >  ignore differences in the first 64 blocks).
 >  
 >  If the two differ, then either there really is some hardware issue, or
 >  raidframe isn't working properly.
 
 I ran
 
     cmp -c -l /dev/rdk2 /dev/rdk6
 
  16401   0   1              <- ignore as within raid header
 1570687852545   0  73
 1570687852546   0 142
 1570687852547   0 260
 1570687852548   0 302
 ...
 
 There is always a zero in the second column, i.e., the mismatch is
 always reading a zero from the first disk of the mirror.
 
 Looking for patterns:
 -> is the byte offset at which cmp finds a difference, i.e., the zeros begin,
 <- is the last byte offset in the range for which cmp finds a difference.
 so e.g., the offsets 1570687852545 to 1570687885312 inclusive are different.
 
 Within those ranges, there are some byte locations which don't
 appear in the cmp list, but they correspond to zero bytes in rdk6.
 
   n             (n/512, n%512)  (n/1024,n%1024)
 ->1570687852545 (3067749712,1)  (383468714,1)
 <-1570687885312 (3067749776,0)  (383468722,0)
 ->1770406838273 (3457825856,1)  (432228232,1)
 <-1770406846464 (3457825872,0)  (432228234,0)
 ->1770406903809 (3457825984,1)  (432228248,1)
 <-1770406912000 (3457826000,0)  (432228250,0)
 ->1770406969345 (3457826112,1)  (432228264,1)
 <-1770406977536 (3457826128,0)  (432228266,0)
 ->1770407100417 (3457826368,1)  (432228296,1)
 <-1770407108608 (3457826384,0)  (432228298,0)
 ->1770407428097 (3457827008,1)  (432228376,1)
 <-1770407436288 (3457827024,0)  (432228378,0)
 ->1770407493633 (3457827136,1)  (432228392,1)
 <-1770407501824 (3457827152,0)  (432228394,0)
 ->1770407624705 (3457827392,1)  (432228424,1)
 <-1770407632896 (3457827408,0)  (432228426,0)
 ->1770407690241 (3457827520,1)  (432228440,1)
 <-1770407698432 (3457827536,0)  (432228442,0)
 ->1770407952385 (3457828032,1)  (432228504,1)
 <-1770407960576 (3457828048,0)  (432228506,0)
 ...
 
 and so on.
 
 
 Looking at the bad patch starting at 1570687852545:
                   3067749712 * 512 = 1570687852544
 
 # dd if=/dev/rdk2 skip=3067749712 count=1 | hexdump -C
 1+0 records in
 1+0 records out
 512 bytes transferred in 0.297 secs (1723 bytes/sec)
 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
 00000200
 
 so it seems there really are zeros on that disk, not just that cmp happened
 to read zero.

Prev by Date: Re: port-amd64/54988: possible memory leaks/swap problems
Next by Date: Re: kern/55362: bad inodes created after dump|restore
Previous by Thread: Re: kern/55362: bad inodes created after dump|restore
Next by Thread: Re: kern/55362: bad inodes created after dump|restore
Indexes:

Home | Main Index | Thread Index | Old Index