Subject: Re: Bad sectors vs RAIDframe
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Stephen Borrill <netbsd@precedence.co.uk>
List: netbsd-users
Date: 05/11/2005 08:48:28
On Tue, 10 May 2005, Manuel Bouyer wrote:
>>> [ ... ]
>>> Is there any way to mark bad sectors in the underlying components so
>>> that RAIDframe will ignore them? Is doing such a thing a sensible
>>> move? bad144/badsect don't seem appropriate.
>>
>> No, it's not a sensible move.  Modern ATA drives already use ECC and
>> migrate bad sectors to the spare sectors automaticly.  You don't see
>> errors until the drive has had so many bad sectors appear that it has
>> used up all of the replacement spare sectors.
>
> It may also that the drive couldn't correct the error. In such a case, the
> sector will be remapped on write.
> Just rebuilding the array (raidctl -R) would take care of this. I did
> this a few times with success on some raid1 set.

I ran up smartmontools (thanks for the pointers, people!) and it looks 
more fatal than that. An extract from smartctl -a /dev/wd2d:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
   5 Reallocated_Sector_Ct   0x0033   246   246   063    Pre-fail  Always       -       74
196 Reallocated_Event_Count 0x0008   183   183   000    Old_age   Offline      -       70
197 Current_Pending_Sector  0x0008   246   246   000    Old_age   Offline      -       73
198 Offline_Uncorrectable   0x0008   180   180   000    Old_age   Offline      -       73


Error 21 occurred at disk power-on lifetime: 30 hours (1 days + 6 hours)
   When the command that caused the error occurred, the device was in an unknown state.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 10 30 ef 0c e0  Error: UNC at LBA = 0x000cef30 = 847664

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c4 00 10 30 ef 0c e0 00   1d+03:47:42.944  READ MULTIPLE
   c6 00 10 00 00 00 a0 00   1d+03:47:42.944  SET MULTIPLE MODE
   ef 03 0c 00 00 00 a0 00   1d+03:47:42.944  SET FEATURES [Set transfer mode]
   10 00 02 ae 82 09 a0 00   1d+03:47:42.928  RECALIBRATE [OBS-4]
   c4 00 10 30 ef 0c e0 04   1d+03:47:42.528  READ MULTIPLE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining   LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       40%        48         847664

-- 
Stephen