NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown



The following reply was made to PR kern/40569; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Matthias Scheler <tron%zhadum.org.uk@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system
        shutdown
Date: Wed, 11 Feb 2009 13:09:30 +0100

 --NzB8fVQJ5HfG6fxh
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Wed, Feb 11, 2009 at 07:26:17AM +0000, Matthias Scheler wrote:
 > On Tue, Feb 10, 2009 at 11:24:40PM +0100, Manuel Bouyer wrote:
 > > OK, so it's probably an issue with the ahci controller: b_resid was set to
 > > 0 even in case of failure; and it's used in the LBA48 workaround detection
 > > to see if we crossed the boundary ... I think the attached patch fixes it 
 > > but
 > > unfortunably my test box didn't reboot after panic to I can't test before
 > > tomorow.
 > 
 > The fix doesn't work on my system:
 > 
 > raid1: initiating in-place reconstruction on column 0
 > wd3e: LBA48 bug reading fsbn 268435392 of 268435392-268435519 (wd3 bn 
 > 268435455; cn 266305 tn 0 sn 15), retrying
 > wd3: soft error (corrected)
 > wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455; 
 > cn 266305 tn 0 sn 15), retrying
 > wd2: (id not found)
 
 there was an inverted test condition in my patch;  the attached one should work
 (it does for me at last; with it a write at the LBA48 address triggers
 the workaround detection)
 
 -- 
 Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 
 --NzB8fVQJ5HfG6fxh
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff
 
 Index: ahcisata_core.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/ahcisata_core.c,v
 retrieving revision 1.18
 diff -u -p -u -r1.18 ahcisata_core.c
 --- ahcisata_core.c    3 Oct 2008 13:02:08 -0000       1.18
 +++ ahcisata_core.c    11 Feb 2009 12:07:01 -0000
 @@ -1065,7 +1065,7 @@ ahci_bio_complete(struct ata_channel *ch
                ata_bio->error = TIMEOUT;
        } else {
                callout_stop(&chp->ch_callout);
 -              ata_bio->error = 0;
 +              ata_bio->error = NOERROR;
        }
  
        chp->ch_queue->active_xfer = NULL;
 @@ -1095,7 +1095,14 @@ ahci_bio_complete(struct ata_channel *ch
            BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
        AHCIDEBUG_PRINT(("ahci_bio_complete bcount %ld",
            ata_bio->bcount), DEBUG_XFERS);
 -      ata_bio->bcount -= le32toh(achp->ahcic_cmdh[slot].cmdh_prdbc);
 +      /* 
 +       * if it was a write, complete data buffer may have been transfered
 +       * before error detection; in this case don't use cmdh_prdbc
 +       * as it won't reflect what was written to media. Assume nothing
 +       * was transfered and leave bcount as-is.
 +       */
 +      if ((ata_bio->flags & ATA_READ) || ata_bio->error == NOERROR)
 +              ata_bio->bcount -= le32toh(achp->ahcic_cmdh[slot].cmdh_prdbc);
        AHCIDEBUG_PRINT((" now %ld\n", ata_bio->bcount), DEBUG_XFERS);
        (*chp->ch_drive[drive].drv_done)(chp->ch_drive[drive].drv_softc);
        atastart(chp);
 
 --NzB8fVQJ5HfG6fxh--
 


Home | Main Index | Thread Index | Old Index