current-users: wd0 lost interrupts after "atactl wd0 setstandby 30"

Subject: wd0 lost interrupts after "atactl wd0 setstandby 30"
To: None <current-users@NetBSD.org>
From: Alan Barrett <apb@cequrux.com>
List: current-users
Date: 10/22/2004 10:36:44

[Subject changed because this is no longer about Seebs' problem.]

On Thu, 21 Oct 2004, Daniel Carosone wrote:
> Can't explain why it might happen exactly once per boot, but could it
> simply be the drive trying to remap sectors?  Does turning off
> write-cache prevent it, or at least cause it to stop on the second and
> subseqent times through (ie, once it's remapped successfuly)?

I tracked it down further.  I can trigger the same behaviour as often as
I like using "atactl wd0 setstandby 30".  Here's what happens:

# ( ( jot 20 0 | while read n ; do printf '[%d]' $n ; sleep 1 ; \
	done ; echo ) & ) 2>/dev/null ; \
	atactl wd0 setstandby 30 ; sleep 20
[0]piixide0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0
ATA command timed out
[1][2][3][4][5][6][7][8][9][10][11]piixide0:0: lost interrupt
    type: ata tc_bcount: 16384 tc_skip: 0
piixide0:0:0: intr with DRQ (st=0x58)
wd0e: device timeout writing fsbn 747776 of 747776-747807 (wd0 bn 35685106; cn 17424 tn 23 sn 18), retrying
wd0: soft error (corrected)
[12][13][14][15][16][17][18][19]

The "ATA command timed out" message is from /sbin/atactl.  The numbers
in square brackets measure elapsed time in seconds.  All other messages
are from the kernel.  The hardware is an i386 laptop.  The kernel is
NetBSD-2.99.10.

> If so, you probably want to overwrite the whole disk this way, or at
> least run with write cache off through as much write activity as you
> can create.  Making sure the whole disk is presently readable (with
> dd) or at least all the data is (with dump) would also be a great
> idea.

The write cache setting makes no difference.  The entire disk is
readable with dd.

--apb (Alan Barrett)