Subject: Re: Soft error on disk write corrupted drive
To: Stuart Brooks <stuartb@cat.co.za>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-i386
Date: 08/30/2007 21:27:43
On Thu, Aug 30, 2007 at 10:36:35AM +0200, Stuart Brooks wrote:
> Hi,
>
> I have picked up a very concerning problem on NetBSD 3.1_RC2 involving a
> corrected soft error following an "error writing fsbn".
>
> The short version:
> A disk write which was directed to the rwd0g partition reported the
> "error writing fsbn" with "id not found" a few times before succeeding
> (we believed) with "soft error (corrected)". However the write actually
> ended up taking place to sector 0 on *wd0d*, trashing the disk. The data
> never made its way onto the wd0g partition.
>
> The longer version:
>
> The g partition is used as a raw file system and I write structures
> sequentially into it. Every structure contains a magic number, timestamp
> and offset which can be used to check the validity. The following error
> was seen in the logs at the time when the problem occurred:
>
> Aug 18 14:55:59 Connswater1 /netbsd: wd0g: error writing fsbn 216369084 of
> 216369084-216369211 (wd0 bn 268435451; cn 266305 tn 0 sn 11), retrying
> Aug 18 14:55:59 Connswater1 /netbsd: wd0: (id not found)
> Aug 18 14:55:59 Connswater1 /netbsd: wd0g: error writing fsbn 216369084 of
> 216369084-216369211 (wd0 bn 268435451; cn 266305 tn 0 sn 11), retrying
> Aug 18 14:55:59 Connswater1 /netbsd: wd0: (id not found)
> Aug 18 14:56:00 Connswater1 /netbsd: wd0g: error writing fsbn 216369084 of
> 216369084-216369211 (wd0 bn 268435451; cn 266305 tn 0 sn 11), retrying
> Aug 18 14:56:00 Connswater1 /netbsd: wd0: (id not found)
> Aug 18 14:56:00 Connswater1 /netbsd: wd0g: error writing fsbn 216369084 of
> 216369084-216369211 (wd0 bn 268435451; cn 266305 tn 0 sn 11), retrying
> Aug 18 14:56:00 Connswater1 /netbsd: wd0: (id not found)
> Aug 18 14:56:01 Connswater1 /netbsd: wd0: soft error (corrected)
Hum, 268435451 = 0xffffffb. This looks like LBA48 lossage.
Maybe this drive doesn't handle properly LBA48 PIO transfers.
What kind of controller is it ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--