NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
The following reply was made to PR kern/40569; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Matthias Scheler <tron%zhadum.org.uk@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system
shutdown
Date: Wed, 11 Feb 2009 13:09:30 +0100
--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Wed, Feb 11, 2009 at 07:26:17AM +0000, Matthias Scheler wrote:
> On Tue, Feb 10, 2009 at 11:24:40PM +0100, Manuel Bouyer wrote:
> > OK, so it's probably an issue with the ahci controller: b_resid was set to
> > 0 even in case of failure; and it's used in the LBA48 workaround detection
> > to see if we crossed the boundary ... I think the attached patch fixes it
> > but
> > unfortunably my test box didn't reboot after panic to I can't test before
> > tomorow.
>
> The fix doesn't work on my system:
>
> raid1: initiating in-place reconstruction on column 0
> wd3e: LBA48 bug reading fsbn 268435392 of 268435392-268435519 (wd3 bn
> 268435455; cn 266305 tn 0 sn 15), retrying
> wd3: soft error (corrected)
> wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455;
> cn 266305 tn 0 sn 15), retrying
> wd2: (id not found)
there was an inverted test condition in my patch; the attached one should work
(it does for me at last; with it a write at the LBA48 address triggers
the workaround detection)
--
Manuel Bouyer, LIP6, Universite Paris VI.
Manuel.Bouyer%lip6.fr@localhost
NetBSD: 26 ans d'experience feront toujours la difference
--
--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Index: ahcisata_core.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/ahcisata_core.c,v
retrieving revision 1.18
diff -u -p -u -r1.18 ahcisata_core.c
--- ahcisata_core.c 3 Oct 2008 13:02:08 -0000 1.18
+++ ahcisata_core.c 11 Feb 2009 12:07:01 -0000
@@ -1065,7 +1065,7 @@ ahci_bio_complete(struct ata_channel *ch
ata_bio->error = TIMEOUT;
} else {
callout_stop(&chp->ch_callout);
- ata_bio->error = 0;
+ ata_bio->error = NOERROR;
}
chp->ch_queue->active_xfer = NULL;
@@ -1095,7 +1095,14 @@ ahci_bio_complete(struct ata_channel *ch
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
AHCIDEBUG_PRINT(("ahci_bio_complete bcount %ld",
ata_bio->bcount), DEBUG_XFERS);
- ata_bio->bcount -= le32toh(achp->ahcic_cmdh[slot].cmdh_prdbc);
+ /*
+ * if it was a write, complete data buffer may have been transfered
+ * before error detection; in this case don't use cmdh_prdbc
+ * as it won't reflect what was written to media. Assume nothing
+ * was transfered and leave bcount as-is.
+ */
+ if ((ata_bio->flags & ATA_READ) || ata_bio->error == NOERROR)
+ ata_bio->bcount -= le32toh(achp->ahcic_cmdh[slot].cmdh_prdbc);
AHCIDEBUG_PRINT((" now %ld\n", ata_bio->bcount), DEBUG_XFERS);
(*chp->ch_drive[drive].drv_done)(chp->ch_drive[drive].drv_softc);
atastart(chp);
--NzB8fVQJ5HfG6fxh--
Home |
Main Index |
Thread Index |
Old Index