Subject: weird scsi lag
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Michael L. VanLoon -- Iowa State University <michaelv@iastate.edu>
List: current-users
Date: 01/27/1994 07:51:47
I've heard other people on the list talk about their systems just
going away for awhile and coming back a little later, but had never
really experienced it for myself -- until tonight.
Until today, I had been running with dual IDE drives, and had never
experienced this "pause" before. I figured maybe I was just doing
something right.
Today, I installed a BusLogic 747 EISA SCSI adapter and a SCSI drive.
I'm also running with one of my old IDE drives still installed, with
swap enabled on both.
I was compiling a kernel in one shell, doing a make cleandir then make
obj in another shell, and running top in a third. Doing this in the
past with just IDE, things would get a bit bogged down, but everything
still responded in a timely, if lagged, fashion.
With the new SCSI controller/drive installed, I was doing these builds
in a SCSI filesystem. While doing the "make cleandir ; make obj", for
awhile the kernel make and top just froze (awhile being on the order
of 5-10 minutes). I could do a ^Z to put the kernel make in the
background, and the shell responded fine, but when foregrounding the
make again, it was just frozen. Top showed its last screen update as
being from several minutes prior. Even though I set top to update
every 10 seconds, it was taking literally 5-10 minutes between
updates; on the old IDE-only system with a heavy build, 10-second
updates usually came every 15-20 seconds). It looked like some kind
of CPU starvation or something...
This is a *very* rough theory, not based in any way on code research.
But, the only change in my system is the addition of a bus-mastering
SCSI controller. This controller has the "mailbox" feature, which, I
guess, lets it queue up to 255 commands in advance that it can take
care of asynchronously, whereas the old IDE system had to use the
actual CPU to do all the work. Could it be the advanced queueing up
of these disk transfers that's hanging things up? Maybe the
scheduling code expects to get disk interrupts on a regular basis to
keep things moving correctly, but the SCSI controller doesn't
interrupt the CPU til it finishes its long queue of stacked-up disk
transfers? It's like certain processes go into cpu starvation,
waiting for run time to be scheduled for them.
Like I said, these are just theories. But, one thing is certain, this
process hang thing is for real, and it didn't happen on my IDE-only
system, and now happens with a super-fast bus-mastering SCSI
controller. When things aren't hung, the system runs beautifully --
things seem a bit more responsive, and the throughput appears to be
much better. I'm running the SCSI adapter with all the performance
options enabled, including fast SCSI (10MB/sec.) and EISA burst mode.
No kernel panics; no weird reboots or anything fatal; just strange
process hangs and lagging that comes and goes with certain activity.
Any ideas on the process hanging? I have appended a ps listing while
some processes were hung (specifically top, but also a couple others).
My shells worked fine the whole time. The system was doing very
little swapping/paging.
[michaelv@stingray]~> ps axo user,pid,tt,state,flags,wchan,pri,ni,vsz,rss,command
USER PID TT STAT F WCHAN PRI NI VSZ RSS COMMAND
root 0 ?? DLs 3 sched -18 0 0 0 (swapper)
root 1 ?? Is 25 wait 10 0 188 72 init --
root 2 ?? DL 3 thrd_s -18 0 0 12 (pagedaemon)
root 39 ?? INs 5 select 2 4 48 124 portmap
root 44 ?? INs 5 select 2 12 176 16 mountd
root 46 ?? INs 5 netcon 2 12 80 16 nfsd-listen
root 49 ?? Is 5 nfsidl 10 0 36 16 nfsiod 4
root 50 ?? I 5 nfsidl 10 0 36 16 nfsiod 4
root 51 ?? I 5 nfsidl 10 0 36 16 nfsiod 4
root 52 ?? I 5 nfsidl 10 0 36 16 nfsiod 4
root 54 ?? IN 5 netio 2 12 80 16 nfsd-udp
root 55 ?? IN 5 netio 2 12 80 16 nfsd-udp
root 65 ?? Is 5 select 2 0 64 236 syslogd
root 79 ?? Ss 5 pause 18 0 12 132 update
root 81 ?? INs 5 pause 18 5 140 264 /usr/libexec/cron
root 85 ?? INs 5 select 2 5 128 68 routed -q
root 89 ?? INs 5 select 2 8 172 156 named
root 92 ?? Is 5 select 2 0 84 180 inetd
root 95 ?? RNs 1 - 104 19 292 312 sendmail: accepting co
root 142 ?? I 25 select 2 0 112 284 telnetd
root 149 ?? I 25 select 2 0 112 284 telnetd
root 156 ?? S 25 select 2 0 112 284 telnetd
root 3298 ?? S 25 select 2 0 112 356 telnetd
michaelv 143 p0 Is 2d pause 18 0 532 216 -tcsh (tcsh)
root 147 p0 I 2d pause 18 0 580 432 -csh (tcsh)
root 1334 p0 I+ 802d pause 18 0 476 220 /bin/tcsh doconfig STI
root 1799 p0 I+ 2d wait 10 0 1964 824 make
root 6799 p0 I+ 2d wait 10 0 112 108 /bin/sh -ec cc -c -O
root 6800 p0 I+ 2d wait 10 0 96 300 cc -c -O -I. -I../../.
root 6904 p0 R+ 29 - 58 0 1660 1848 /usr/libexec//cc1 /var
michaelv 150 p1 Is 2d pause 18 0 532 216 -tcsh (tcsh)
root 154 p1 I 2d pause 18 0 532 244 -csh (tcsh)
root 2411 p1 RN+ 29 - 88 15 200 436 top -s 10 -d inf inf
michaelv 157 p2 Is 2d pause 18 0 532 216 -tcsh (tcsh)
root 161 p2 I 2d pause 18 0 568 468 -csh (tcsh)
root 4757 p2 IN+ 2d wait 10 5 200 408 make -f Makefile.basic
root 4758 p2 IN+ 2d wait 10 5 112 100 /bin/sh -ec for entry
root 7651 p2 IN+ 2d wait 10 5 220 428 make obj
root 7653 p2 SN+ 2d wait 10 5 112 100 /bin/sh -ec for entry
root 8105 p2 SN+ 2d wait 10 5 244 452 make obj
root 8116 p2 SN+ 2d wait 10 5 112 100 /bin/sh -ec for entry
root 8128 p2 RN+ 29 - 61 5 196 404 make obj
michaelv 3301 p3 Ss 2d pause 18 0 540 476 -tcsh (tcsh)
michaelv 8130 p3 R+ 29 - 57 0 756 208 ps axo user
root 110 vg Is+ 2d ttyin 3 0 24 160 /usr/libexec/getty Pc
Thanks!
--Michael
------------------------------------------------------------------------------
Michael L. VanLoon -- michaelv@iastate.edu -- gg.mlv@isumvs.bitnet
Iowa State University of Science and Technology -- The way cool place to be!
Project Vincent Systems Staff, Iowa State University Computation Center
------------------------------------------------------------------------------
------------------------------------------------------------------------------