Subject: Re: duff pages in middle of file
To: Patrick Welche <prlw1@newn.cam.ac.uk>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 09/23/2003 22:12:49
On Tue, Sep 23, 2003 at 09:11:28AM +0100, Patrick Welche wrote:
> Can anyone think of a reason for exactly 0x400 contiguous bytes (0x1000-0x13ff)
> of a 1Gb file (1024^3 bytes) (postgresql database) to be replaced by a bit
> of my /var/mail/prlw1 (often mmapped by mutt)? It (i386) ran a 17 Sept kernel
> for a day, and 18 Sept ever since. I think the corruption happened on the
> 17th. The disk is
> 
> wd0 at pciide0 channel 0 drive 0: <Maxtor 7Y250P0>
> wd0: drive supports 16-sector PIO transfers, LBA48 addressing
> wd0: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
> wd0: 32-bit data port
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
> 
> I didn't see any real candidates in source-changes of 17/18th.. (Not SATA, no
> unionfs) It's ffs with softdeps. Is assuming "duff drive" an option?

I laso saw this 2 times on a NFS server, which runs mrtg local. I found
complete PNG mrtg images in other files. The mrtg db and the NFS exported
files are not on the same disk.

I also see, from times to times, complains from mrtg about syntax errors
in perl library files (I run mrtg from cron). I saw this on different boxes.
From the error message which quotes lines from the file, it's clear that
the data comes from another file, again not on the same disk.

So I suspect there's a problem where file data can get mixed between 2
files, one being read and the second written.


-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 24 ans d'experience feront toujours la difference
--