Subject: Re: Data corruption issues possibly involving cgd(4)
To: Daniel Carosone <dan@geek.com.au>
From: Nino Dehne <ndehne@gmail.com>
List: current-users
Date: 01/17/2007 06:05:19
On Wed, Jan 17, 2007 at 07:44:33AM +1100, Daniel Carosone wrote:
> On Tue, Jan 16, 2007 at 09:28:21AM +0000, David Laight wrote:
> > The 'dd' will be doing sequential reads, whereas the fs version will be doing
> > considerable numbers of seeks.  It is the seeks that cause the disks to
> > draw current bursts from the psu - so don't discount that.
> 
> And this is a most excellent and important point.  Could you try
> repeating the test with one or more of these variations to force
> seeking:
> 
>  two concurrent dd's, one with a large skip= to land elsewhere on the
>  platters

OK, I have done some extensive tests now. It doesn't look good though:

1) Run memtest86+ again. There's a new version 1.70 with better support
   for K8 and DDR2 memory: 4 passes without errors.
2) machdep.powernow.frequency.target = 2000 to maximize power draw.
3) I'm on SMP kernel again. Start 2 instances of

      gzip -9c </dev/zero | gzip -dc >/dev/null

   i.e. 4 gzip processes are running.
4) Additionally, run

      while true; do dd if=/dev/rcgd0d bs=65536 count=1024 2>/dev/null | md5

   and

      while true; do dd if=/dev/rcgd0d bs=65536 count=1024 skip=123456 2>/dev/null | md5

   concurrently. After 100 runs each, not a single mismatch occurred.
   cgd0a was not mounted to eliminate filesystem changes affecting the
   checksums. Disks were active all the time and top showed no buffer
   usage increase, so caching was definitely not involved. The first
   even slowed down as expected when starting the second dd. All this
   is single-user BTW.

   After that, I killed the second dd and tried different blocksizes
   but was faced with serious trouble:

   While the first dd was running I tried a

      dd if=/dev/rcgd0d bs=123405 count=1024 skip=56454

   This gave me:

   dd: /dev/rcgd0d: Invalid argument
   0+0 records in
   0+0 records out
   0 bytes transferred in 0.002 secs (0 bytes/sec)

   Then I tried

      dd if=/dev/rcgd0d bs=32000 count=1024 skip=56454 | md5

   This panicked the box!

   cgd0: error 22
   uvm_fault(0xc03c62c0, 0xca596000, 1) -> 0xe
   kernel: supervisor trap page fault, code=0
   Stopped in pid 31.1 (raidio1) at        netbsd:BF_cbc_encrypt+0xed:     movl    0
   (%esi),%eax
   db{0}> trace
   uvm_fault(0xc03dbee0, 0, 1) -> 0xe
   kernel: supervisor trap page fault, code=0
   Faulted in DDB; continuing...
   db{0}>

5) A watt meter showed 175W usage during 2)-4) for a whole bunch of
   hardware including the server. The hardware minus the server is
   drawing ~42W, i.e. the server was drawing around 133W during these
   tests. The power supply is only some weeks old and is a bequiet
   BQT E5-350W.

After I was done with that, I mounted cgd0a and hashed the usual file
in a loop. Result: mismatch at the 3rd try. This was on an idle box, i.e.
no 3) or 4) running.


>  dd from raid and a concurrent fsck -n of the cgd filesystem
> 
>  multiple concurrent fsck -n's, to see if they ever report different
>  errors.  -n is especially important here, both because of the
>  concurrency and if they're going to find spurious errors

This was actually not possible:

# fsck -n /home
** /dev/rcgd0a (NO WRITE)
** File system is clean; not checking
# fsck -pn /home
NO WRITE ACCESS
/dev/rcgd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.

I also don't want to mess with the filesystem further. I'm already trying
to minimize access to it in fear of permanent corruption.

Now I'm not sure what to make of this. The cgd/raid panic looks creepy but
I'm not sure how to interpret it.

Does this help you?

In either case, thanks a lot for your help and best regards,

ND