Subject: kern/36896: LBA28 - LBA48 problem with Western Digital AAJS model hard drives
To: None <,,>
From: None <>
List: netbsd-bugs
Date: 09/04/2007 13:20:00
>Number:         36896
>Category:       kern
>Synopsis:       LBA28 - LBA48 problem with Western Digital AAJS model hard drives
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 04 13:20:00 +0000 2007
>Originator:     Stuart Brooks
>Release:        3.1_STABLE
NetBSD test_77 3.1_STABLE NetBSD 3.1_STABLE (V5_GENERIC) #2: Mon Sep
3 12:15:38 SAST 2007 root@test_77:/usr/src/sys/arch/i386/compile/V5_GENERIC i386
A block write across the LBA28 boundary on large (200GB+) Western Digital hard drives with an AAJS model number corrupts the boot sectors of the drive. This has been seen on a WD2500AAJS and a number of WD5000AAJS drives.

A sample of the error logs follows:

Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1g: error writing fsbn 216369024 of 216369024-216369151 (wd1 bn 268435391; cn 266304 tn 15 sn 14), retrying

Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1: (id not found)
Aug 31 11:59:51 30_DEMO_697 /netbsd: wd1: soft error (corrected) 

At this point a portion of the disk from sector 0 has been overwritten by part of the block of data being written across the LBA28 boundary.
To corrupt the drive:
dd if=/dev/zero of=/dev/rwd1d seek=134200 bs=1000k count=1000

To get a read failure:
dd if=/dev/rwd1d of=/dev/null skip=134200 bs=1000k count=1000

I have only seen this on the AAJS models.
I added the following entry to the wd_quirks array in src/sys/dev/ata/wd.c which appeared to fix the problem:

>       { "WDC WD[1-9][0-9][0-9][0-9]AAJS*",
>         WD_QUIRK_FORCE_LBA48 },