Subject: Re: Soft error on disk write corrupted drive
To: <>
From: Stuart Brooks <stuartb@cat.co.za>
List: port-i386
Date: 08/31/2007 12:08:09
Stuart Brooks wrote:
> Manuel Bouyer wrote:
>> On Thu, Aug 30, 2007 at 08:53:59PM +0100, David Laight wrote:
>>  
>>> On Thu, Aug 30, 2007 at 09:27:43PM +0200, Manuel Bouyer wrote:
>>>    
>>>>> Aug 18 14:56:00 Connswater1 /netbsd: wd0g: error writing fsbn 
>>>>> 216369084 of 216369084-216369211 (wd0 bn 268435451; cn 266305 tn 0 
>>>>> sn 11), retrying
>>>>> Aug 18 14:56:00 Connswater1 /netbsd: wd0: (id not found)
>>>>> Aug 18 14:56:01 Connswater1 /netbsd: wd0: soft error (corrected)
>>>>>         
>>>> Hum, 268435451 = 0xffffffb. This looks like LBA48 lossage.
>>>> Maybe this drive doesn't handle properly LBA48 PIO transfers.
>>>>       
>>> Is this a case where we are doing LBA28 transfers of multiple sectors
>>> that cross the boundary ?
>>>     
>>
>> I suspect it is, yes. But the controller may be at fault too here.
>>
>>   
> Thanks for all the posts. Some more information has come to light 
> which may be of interest. I have just experienced exactly the same 
> problem on another disk and the logs indicate an error within 12 
> sectors of the original error:
>
I managed to obtain a clean disk (identical model,WDC WD5000AAJS-22TKA0, 
Rev: 12.01C01) and could reproduce the problem by doing a dd of 1MB 
blocks across the suspect sector (268435451).

Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1g: error writing fsbn 216369024 
of 216369024-216369151 (wd1 bn 268435391; cn 266304 tn 15 sn 14), retrying

Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1: (id not found)
Aug 31 11:59:51 30_DEMO_697 /netbsd: wd1: soft error (corrected)

With a block size of 512 bytes it didn't manifest. Where to from here? 
At least it's reproducible...

Stuart