Subject: weird scsi lag
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Michael L. VanLoon -- Iowa State University <michaelv@iastate.edu>
List: current-users
Date: 01/27/1994 07:51:47
I've heard other people on the list talk about their systems just
going away for awhile and coming back a little later, but had never
really experienced it for myself -- until tonight.

Until today, I had been running with dual IDE drives, and had never
experienced this "pause" before.  I figured maybe I was just doing
something right.

Today, I installed a BusLogic 747 EISA SCSI adapter and a SCSI drive.
I'm also running with one of my old IDE drives still installed, with
swap enabled on both.

I was compiling a kernel in one shell, doing a make cleandir then make
obj in another shell, and running top in a third.  Doing this in the
past with just IDE, things would get a bit bogged down, but everything
still responded in a timely, if lagged, fashion.

With the new SCSI controller/drive installed, I was doing these builds
in a SCSI filesystem.  While doing the "make cleandir ; make obj", for
awhile the kernel make and top just froze (awhile being on the order
of 5-10 minutes).  I could do a ^Z to put the kernel make in the
background, and the shell responded fine, but when foregrounding the
make again, it was just frozen.  Top showed its last screen update as
being from several minutes prior.  Even though I set top to update
every 10 seconds, it was taking literally 5-10 minutes between
updates; on the old IDE-only system with a heavy build, 10-second
updates usually came every 15-20 seconds).  It looked like some kind
of CPU starvation or something...

This is a *very* rough theory, not based in any way on code research.
But, the only change in my system is the addition of a bus-mastering
SCSI controller.  This controller has the "mailbox" feature, which, I
guess, lets it queue up to 255 commands in advance that it can take
care of asynchronously, whereas the old IDE system had to use the
actual CPU to do all the work.  Could it be the advanced queueing up
of these disk transfers that's hanging things up?  Maybe the
scheduling code expects to get disk interrupts on a regular basis to
keep things moving correctly, but the SCSI controller doesn't
interrupt the CPU til it finishes its long queue of stacked-up disk
transfers?  It's like certain processes go into cpu starvation,
waiting for run time to be scheduled for them.

Like I said, these are just theories.  But, one thing is certain, this
process hang thing is for real, and it didn't happen on my IDE-only
system, and now happens with a super-fast bus-mastering SCSI
controller.  When things aren't hung, the system runs beautifully --
things seem a bit more responsive, and the throughput appears to be
much better.  I'm running the SCSI adapter with all the performance
options enabled, including fast SCSI (10MB/sec.) and EISA burst mode.
No kernel panics; no weird reboots or anything fatal; just strange
process hangs and lagging that comes and goes with certain activity.

Any ideas on the process hanging?  I have appended a ps listing while
some processes were hung (specifically top, but also a couple others).
My shells worked fine the whole time.  The system was doing very
little swapping/paging.

[michaelv@stingray]~> ps axo user,pid,tt,state,flags,wchan,pri,ni,vsz,rss,command
USER       PID TT  STAT       F WCHAN  PRI NI   VSZ  RSS COMMAND
root         0 ??  DLs        3 sched  -18  0     0    0 (swapper)
root         1 ??  Is        25 wait    10  0   188   72 init --
root         2 ??  DL         3 thrd_s -18  0     0   12 (pagedaemon)
root        39 ??  INs        5 select   2  4    48  124 portmap
root        44 ??  INs        5 select   2 12   176   16 mountd
root        46 ??  INs        5 netcon   2 12    80   16 nfsd-listen
root        49 ??  Is         5 nfsidl  10  0    36   16 nfsiod 4
root        50 ??  I          5 nfsidl  10  0    36   16 nfsiod 4
root        51 ??  I          5 nfsidl  10  0    36   16 nfsiod 4
root        52 ??  I          5 nfsidl  10  0    36   16 nfsiod 4
root        54 ??  IN         5 netio    2 12    80   16 nfsd-udp
root        55 ??  IN         5 netio    2 12    80   16 nfsd-udp
root        65 ??  Is         5 select   2  0    64  236 syslogd
root        79 ??  Ss         5 pause   18  0    12  132 update
root        81 ??  INs        5 pause   18  5   140  264 /usr/libexec/cron
root        85 ??  INs        5 select   2  5   128   68 routed -q
root        89 ??  INs        5 select   2  8   172  156 named
root        92 ??  Is         5 select   2  0    84  180 inetd
root        95 ??  RNs        1 -      104 19   292  312 sendmail: accepting co
root       142 ??  I         25 select   2  0   112  284 telnetd
root       149 ??  I         25 select   2  0   112  284 telnetd
root       156 ??  S         25 select   2  0   112  284 telnetd
root      3298 ??  S         25 select   2  0   112  356 telnetd
michaelv   143 p0  Is        2d pause   18  0   532  216 -tcsh (tcsh)
root       147 p0  I         2d pause   18  0   580  432 -csh (tcsh)
root      1334 p0  I+      802d pause   18  0   476  220 /bin/tcsh doconfig STI
root      1799 p0  I+        2d wait    10  0  1964  824 make
root      6799 p0  I+        2d wait    10  0   112  108 /bin/sh -ec cc  -c -O
root      6800 p0  I+        2d wait    10  0    96  300 cc -c -O -I. -I../../.
root      6904 p0  R+        29 -       58  0  1660 1848 /usr/libexec//cc1 /var
michaelv   150 p1  Is        2d pause   18  0   532  216 -tcsh (tcsh)
root       154 p1  I         2d pause   18  0   532  244 -csh (tcsh)
root      2411 p1  RN+       29 -       88 15   200  436 top -s 10 -d inf inf
michaelv   157 p2  Is        2d pause   18  0   532  216 -tcsh (tcsh)
root       161 p2  I         2d pause   18  0   568  468 -csh (tcsh)
root      4757 p2  IN+       2d wait    10  5   200  408 make -f Makefile.basic
root      4758 p2  IN+       2d wait    10  5   112  100 /bin/sh -ec for entry
root      7651 p2  IN+       2d wait    10  5   220  428 make obj
root      7653 p2  SN+       2d wait    10  5   112  100 /bin/sh -ec for entry
root      8105 p2  SN+       2d wait    10  5   244  452 make obj
root      8116 p2  SN+       2d wait    10  5   112  100 /bin/sh -ec for entry
root      8128 p2  RN+       29 -       61  5   196  404 make obj
michaelv  3301 p3  Ss        2d pause   18  0   540  476 -tcsh (tcsh)
michaelv  8130 p3  R+        29 -       57  0   756  208 ps axo user
root       110 vg  Is+       2d ttyin    3  0    24  160 /usr/libexec/getty Pc

Thanks!
				--Michael

------------------------------------------------------------------------------
    Michael L. VanLoon  --  michaelv@iastate.edu  --  gg.mlv@isumvs.bitnet
 Iowa State University of Science and Technology -- The way cool place to be!
   Project Vincent Systems Staff, Iowa State University Computation Center
------------------------------------------------------------------------------



------------------------------------------------------------------------------