Subject: Re: HDD - SMART status
To: Tomasz Luchowski <tomasz@luchowski.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 04/26/2003 15:42:18
On Sat, Apr 26, 2003 at 01:54:16AM +0200, Tomasz Luchowski wrote:
> Hi,
> 
> Does this mean the HDD is going to die soon? I had several problems with it
> already, but they would always go away after moving the disk physically
> by just some milimeters (some bizarre interaction.)
> 
> I am getting the usual several second "freeze" after each read error.
> This is i386 running -current as of 13th April (no, I don't think I suffered
> from UFS2 problems).
> 
> I knew it had to be finally replaced some day, but I'd like to make sure
> whether it's gotten really bad now.
> 
> atactl wd0 smart status says:
> 
> SMART supported, SMART enabled
> id	value	thresh	crit	collect	reliability description
>   1	 75	 34	yes	online	positive    Raw read error rate
>   3	 86	  0	yes	online	positive    Spin-up time
>   4	100	 20	no	online	positive    Start/stop count
>   5	100	 36	yes	online	positive    Reallocated sector count
>   7	 77	 30	yes	online	positive    Seek error rate
>   9	 98	  0	no	online	positive    Power-on hours count
>  10	100	 97	yes	online	positive    Spin retry count
>  12	 99	 20	no	online	positive    Device power cycle count
> 194	 42	  0	no	online	positive    Temperature
> 195	 75	  0	no	online	positive    
> 197	100	  0	no	online	positive    Current pending sector
> 198	100	  0	no	offline	positive    Offline uncorrectable
> 199	200	  0	no	online	positive    Ultra DMA CRC error count
> 200	100	  0	no	offline	positive    
> 202	100	  0	no	online	positive    

From this, it shouldn't be bad yet. It would be usefull to watch how
various values moves (how fast the "Raw read error rate" value decrease, for
example)

> 
> in syslog:
> 
> Apr 26 01:42:48 zunpc /netbsd: pciide0:0:0: lost interrupt
> Apr 26 01:43:00 zunpc /netbsd:  type: ata tc_bcount: 16384 tc_skip: 0
> Apr 26 01:43:00 zunpc /netbsd: pciide0:0:0: bus-master DMA error: missing interr
> upt, status=0x21
> Apr 26 01:43:00 zunpc /netbsd: wd0e: DMA error reading fsbn 13069678 of 13069678
> -13069709 (wd0 bn 13069741; cn 12966 tn 0 sn 13), retrying
> Apr 26 01:43:00 zunpc /netbsd: wd0: soft error (corrected)

This looks more like a problem on the bus, rather than with the disk itself

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 24 ans d'experience feront toujours la difference
--