Subject: Re: LFS and big files
To: Greg Troxel <gdt@ir.bbn.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 01/04/2007 22:27:24
On Wed, Jan 03, 2007 at 07:01:26PM -0500, Greg Troxel wrote:
> I won't dispute the 'test your memory' advice from others at all, but
> memory tests aren't fully adequate to find problems.  I've seen
> problems that are similar to yours and after removing 1 DIMM don't
> have them.  But I'm not convinced I understand what's going on.
> 
> My machine is a 3.4 GHz P4, and it had 2 1GB dimms.  It has two
> Seagate 400 GB SATA drives in RAID-1 with raidframe.  It always ran
> fine, and it survived memtest (forget which version) for an entire
> day.
> 
> But, I found that some image files (usually JPG, from a digital
> camera) were corrupt.  On comparing from a separate copy from the
> memory card, and from another machine, I found that some bits were
> different, usually contained to a 4K page, but occasionally in more
> than one page.  I further found that the two RAID-1 copies differed,
> sometimes with one of them being correct.  This can lead to md5/sha1
> returning a different value everytime the blocks leave the cache,
> since raid-1 reads can get filled from either disk.  (I have overlay
> filesystems that I mount read-only to be able to debug such things.)

I also ran twice into disk drive corrupting data (probably a defect in the
disk's electronic). No errors, just the data would occasionally get corrupted
(every few months). I remplaced the drive and the problem dissapeared in both
cases.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--