Subject: kern/13275: IDE retries should be disableable
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 06/21/2001 10:36:25
>Number:         13275
>Category:       kern
>Synopsis:       IDE retries should be disableable
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 21 07:40:02 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     John Hawkinson
>Release:        -current of 21 May 2001
>Organization:
MIT
>Environment:
	
System: NetBSD zorkmid.mit.edu 1.5U NetBSD 1.5U (ZORKMID-$Revision: 1.10 $) #103: Mon May 21 21:10:47 EDT 2001 jhawk@zorkmid.mit.edu:/usr/local/netbsd-current/src/sys/arch/i386/compile/ZORKMID i386


>Description:
    On a machine with a number of known uncorrectable bad blocks on
an IDE disk, it is rather irritating to have the kernel retry to read them
5 times, when each retry involves a lengthy delay, and you know that it is
never going to succeed.
>How-To-Repeat:
	Inspecting sys/dev/ata/wd.c, it appears to be hardcoded to retry
five times:

   106  #define WDIORETRIES     5       /* number of retries before giving up */
...
   592  retry:          /* Just reset and retry. Can we do more ? */
   593                  wdc_reset_channel(wd->drvp);
   594                  diskerr(bp, "wd", errbuf, LOG_PRINTF,
   595                      wd->sc_wdc_bio.blkdone, wd->sc_dk.dk_label);
   596                  if (wd->retries++ < WDIORETRIES) {
   597                          printf(", retrying\n");
   598                          callout_reset(&wd->sc_restart_ch, RECOVERYTIME,
   599                              wdrestart, wd);
   600                          return;
   601                  }
   602                  printf("\n");


It would be desirable to change this behavior, though I'm not certain how.
At the very least, it should be easily patchable, so perhaps someone could
just patch "wdioretries" to 0 in ddb.

It would also be reasonable to expose this to the user. Perhaps by a sysctl,
or some other mechanism?

Is there prior art from other OSs [or even Linux] in this area?

>Fix:
	
As a workaround, one can patch the code. Unfortunately, this is a bit
more of a pain than expected, because the compiler optimizes the test on line
596 (above) to a jnle, so turning off retries completely requires more
opcode synthesis than I'm generally willing to do on the fly. But to drop
it from 5 to 2, just patch as follows:

0xc031f0e2 <wddone+226>:        pushl  %esi
0xc031f0e3 <wddone+227>:        call   0xc01c2d64 <diskerr>
0xc031f0e8 <wddone+232>:        addl   $0x1c,%esp
0xc031f0eb <wddone+235>:        movl   0x40c(%ebx),%eax
0xc031f0f1 <wddone+241>:        incl   0x40c(%ebx)
0xc031f0f7 <wddone+247>:        cmpl   $0x4,%eax
0xc031f0fa <wddone+250>:        jg     0xc031f130 <wddone+304>
0xc031f0fc <wddone+252>:        pushl  $0xc0472702
0xc031f101 <wddone+257>:        call   0xc01c5d50 <printf>

wddone+247 (wddone+0xf7) is the test in question. 

0xc031f0f7 <wddone+247>:        0x7f04f883

patching it to 0x7f00f883 makes it a "cmpl $0, %eax", which
results in two reads, which is better than 6, but not quite
what I was hoping for.

(Yeah, I guess I could just replace the next opcode with an unconditional
jump...)
>Release-Note:
>Audit-Trail:
>Unformatted: