Subject: Re: Quadra SCSI performance
To: None <port-mac68k@netbsd.org>
From: Chas Williams <chas@cmf.nrl.navy.mil>
List: port-mac68k
Date: 12/26/2000 12:52:57
i too noticed a performance drop w/ scsi on my q700 when running 1.5.  i 
applied allen's patch and built a profile-enabled kernel.  after collecting 
statistics during a disk test, gprof gave the following:

...
Each sample counts as 0.0166667 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 37.33   1068.09  1068.09   413352  2583.98  2583.98  delay
 32.73   2004.61   936.52                             mcount
 15.78   2456.07   451.46    45896  9836.55 13001.13  esp_quick_dma_go
  5.07   2601.18   145.11 187445248     0.77     0.77  esp_dafb_have_dreq
  1.96   2657.37    56.19    37654  1492.32  1492.32  copyout
  1.81   2709.25    51.88                             Idle
  0.88   2734.29    25.04    49975   501.08   501.08  copyin
  0.56   2750.41    16.12                             esp_dualbus_intr
  0.41   2762.04    11.63   166045    70.06  5026.81  ncr53c9x_intr
...

ok, so delay() seems to be consuming a fair amount of time.  i guessed
it was the delay() in esp_intr() and rewrote esp_intr() to the following:

#define RETRY 10

void
esp_intr(sc)    
        void *sc;
{               
        struct esp_softc *esc = (struct esp_softc *)sc;
        int     i = 0;  
                        
        do {    
                if (esc->sc_reg[NCR_STAT * 16] & 0x80) {
                        ncr53c9x_intr((struct ncr53c9x_softc *) esp0);
                        i = RETRY;
                }
                
                if (i < RETRY) {
                        delay(1000);
                }
        } while (i++ < RETRY);
} 

instead of retrying once after delay(10000), it retries 10 times with
delay(1000).  this helped some:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
before     10   118 97.6   146 98.5    84 99.6   143 99.4   186 97.0   4.8 96.5
after      10   173 99.2   245 96.7   131 98.3   192 99.3   292 99.5   7.1 96.0

ok, i know this isnt the 'correct' fix, it seems a bit strange to me that 
you would get an esp_intr() but the h/w isnt ready yet?  it seems to me that
the real problem is that we get an interrupt before the h/w is ready causing
us to poll until things are ready?