Subject: Re: Soft error on disk write corrupted drive (LBA28-1 - problem?)
To: Stuart Brooks <firstname.lastname@example.org>
From: Brian Buhrow <email@example.com>
Date: 08/31/2007 08:21:03
I thought that problem was unique to Seagate disks. Or maybe not,
but in either case, patches were made to the ATA subsystem to work around
potential failures of the drives to deal with the problem. NetBSD-3.1
contains these fixes, so if that's your problem, upgrading your kernel to
3.1 should do the trick.
On Aug 31, 1:07pm, Ignatios Souvatzis wrote:
} Subject: Re: Soft error on disk write corrupted drive (LBA28-1 - problem?)
} [Added tech-kern and enhanced subject line - i.s.]
} On Fri, Aug 31, 2007, Stuart Brooks <firstname.lastname@example.org> wrote:
} > I managed to obtain a clean disk (identical model,WDC WD5000AAJS-22TKA0,
} > Rev: 12.01C01) and could reproduce the problem by doing a dd of 1MB
} > blocks across the suspect sector (268435451).
} > Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1g: error writing fsbn 216369024
} > of 216369024-216369151 (wd1 bn 268435391; cn 266304 tn 15 sn 14), retrying
} > Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1: (id not found)
} > Aug 31 11:59:51 30_DEMO_697 /netbsd: wd1: soft error (corrected)
} > With a block size of 512 bytes it didn't manifest. Where to from here?
} > At least it's reproducible...
} I think you used the buffered disk interfaces (/dev/wd1g) - if I'm not
} wrong, you should try with dd to the raw disk - (/dev/rwd1g), for
} But anyway, I seem to recall former discussions about some disks with a
} broken LBA28 boundary and how to handle them - that is, they needed
} LBA48 addressing even for block LBA28-boundary - 1. In this case you
} should see the problem also for smaller transfers, as soon as block
} 0xfffffff is touched. Can you try that please?
} (Oh, and some real LBA28-1-problem-expert, please speak up!)
>-- End of excerpt from Ignatios Souvatzis