Subject: kern/9857: wddone() omits block numbers from soft errors
To: None <gnats-bugs@gnats.netbsd.org>
From: None <jhawk@MIT.EDU>
List: netbsd-bugs
Date: 04/10/2000 16:02:15
>Number:         9857
>Category:       kern
>Synopsis:       wddone() omits block numbers from soft errors
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 10 16:03:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     John Hawkinson
>Release:        NetBSD 1.4.2
>Organization:

>Environment:
	

>Description:
	wddone() omits block numbers from soft errors. For hard errors,
diskerr() is called and a block number is reported. For soft errors,
a simple printf() happens and the block number is not included.
I presume that blkdone is also available in the NOERROR case.

>How-To-Repeat:
Inspect the code:

        case ERROR:
                /* Don't care about media change bits */
                if (wd->sc_wdc_bio.r_error != 0 &&
                    (wd->sc_wdc_bio.r_error & ~(WDCE_MC | WDCE_MCR)) == 0)
                        goto noerror;
                ata_perror(wd->drvp, wd->sc_wdc_bio.r_error, errbuf);
retry:          /* Just reset and retry. Can we do more ? */
                wdc_reset_channel(wd->drvp);
                diskerr(bp, "wd", errbuf, LOG_PRINTF,
                    wd->sc_wdc_bio.blkdone, wd->sc_dk.dk_label);
                if (wd->retries++ < WDIORETRIES) {
                        printf(", retrying\n");
                        timeout(wdrestart, wd, RECOVERYTIME);
                        return;
                }
                printf("\n");
                bp->b_flags |= B_ERROR;
                bp->b_error = EIO;
                break;
        case NOERROR:
noerror:        if ((wd->sc_wdc_bio.flags & ATA_CORR) || wd->retries > 0)
                        printf("%s: soft error (corrected)\n",
                            wd->sc_dev.dv_xname);
        }
        disk_unbusy(&wd->sc_dk, (bp->b_bcount - bp->b_resid));

>Fix:
	Presumably diskerr() should be called for the NOERR case, as well.
Especially if someone gets around to modifying diskerr() to centrally collect
statistics on disk errors (ala iostat -E under Solaris). I can't help but
wondering if there is some reason it wasn't done this way?
>Release-Note:
>Audit-Trail:
>Unformatted: