Subject: Re: odd memory corruption problems
To: Greg Troxel <gdt@ir.bbn.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 08/03/2006 21:20:07
On Wed, Aug 02, 2006 at 02:55:15PM -0400, Greg Troxel wrote:
> 
> I'm having a problem that I think is probably hardware, but I'm
> posting anyway because there's some chance it is related to the
> suspected pool corruption problems.
> 
> [...]
> 
> I noticed some corrupt pictures, and have traced this to some bytes
> being wrong in cases where I could trace it; I'm quite confident this
> is the problem in the other ones.
> 
> I immediately suspected memory, and ran memtest+ 1.65 overnight for 10
> hours, and it found zero errors.
> 
> I made a list of all .jpg and .nef files under ~/PICTURES, and ran
> xargs md5 on that list multiple times.   I found that I got different
> output.  Two files were often different, and then there were larger
> differences.
> 
> I mounted (ro) the underlying RAID-1 components, and ran xargs md5 on
> those.  Two files were different on wd0 and wd1, and in both cases,
> wd0 was right (by inspecting the picture to find the undamaged one).

I already ran into bad drives, silently corrupting data this way. I'd suggest
running without wd1, or remplacing wd1.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--