Subject: 1.6E stray irq for SCSI controller halts system.
To: None <port-alpha@netbsd.org>
From: Stephen M. Jones <smj@cirr.com>
List: port-alpha
Date: 08/03/2002 17:50:58
This morning I was working with a new disk array..  You might think I'm
brutal, but I generally generate file systems on disk arrays like this:

for i in sd1a sd2a sd3a sd4a sd5a sd6a sd7a
do
newfs -m 0 /dev/$i &
done

I do it like this because I feel its a simple test to be sure the controller
and disks work sanely.

So, while doing this I got a couple of stray interrupt complaints and repeately
after 3 of them, the operating system halted.  I've seen this before on the
5305 (this is the API CS20 NetBSD 1.6E (SVERIGE) #0: Tue Jul 30 22:22:12 UTC 2002)
with a 3COM ethernet controller.  Exactly the same scenario when you'd have stray
interrupts .. roughly about 10 or 15 .. the system would just lock up.

Info on the device:

ahc0 at pci0 dev 5 function 0
ahc0: interrupting at dec 6600 irq 24
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs

Debugger output:

cpu_Debugger() at cpu_Debugger+0x4
comintr() at comintr+0x168
alpha_shared_intr_dispatch() at alpha_shared_intr_dispatch+0x6c
sio_iointr() at sio_iointr+0x68
interrupt() at interrupt+0x304
XentInt() at XentInt+0x1c
--- interrupt (from ipl 0) ---
idle() at idle+0x70
idle() at idle+0x54
--- root of call graph ---
db{0}> buf
No such command
db{0}> show buf

CPU 0: fatal kernel trap:

CPU 0    trap entry = 0x4 (unaligned access fault)
CPU 0    a0         = 0xfffffc00004bacc4
CPU 0    a1         = 0x29
CPU 0    a2         = 0x11
CPU 0    pc         = 0xfffffc00004103b4
CPU 0    ra         = 0xfffffc00003a6784
CPU 0    pv         = 0xfffffc00003dfc00
CPU 0    curproc    = 0x0

Caught exception in ddb.
db{0}> show map
MAP 0xfffffc0000514550: [0xfffffe0000000000->0xffffffffffffe000]
        #ent=12, sz=382664704, ref=1, version=593, flags=0x1
        pmap=0xfffffc000055d5b8(resident=4604)

db{0}> show event
evcnt type 0: FP proc use = 227
evcnt type 0: FP proc re-use = 36354
evcnt type 1: soft serial = 3411
evcnt type 1: soft net = 45
evcnt type 1: soft clock = 1550
evcnt type 1: cpu0 clock = 981787
evcnt type 1: cpu0 device = 301316
evcnt type 1: cpu0 ipi = 424674
evcnt type 1: cpu0 shootdown ipi = 423958
evcnt type 1: cpu0 imb ipi = 343
evcnt type 1: cpu0 synch fpu ipi = 451
evcnt type 1: cpu0 discard fpu ipi = 31
evcnt type 1: cpu1 clock = 968933
evcnt type 1: cpu1 ipi = 9254
evcnt type 1: cpu1 microset ipi = 946
evcnt type 1: cpu1 shootdown ipi = 8096
evcnt type 1: cpu1 imb ipi = 160
evcnt type 1: cpu1 synch fpu ipi = 75
evcnt type 1: cpu1 discard fpu ipi = 3
evcnt type 1: cpu1 pause ipi = 2
evcnt type 1: isa irq 4 = 3413

db{0}> show registers
v0                0xf9  rn+0xd9
t0          0xfffffc000050e9ec  db_fromconsole
t1                 0x1
t2                   0
t3          0xfffffc0000513b58  cn_magic
t4                   0
t5             0xa42e3  rn+0xa42c3
t6          0x10556000
t7             0x10000  rn+0xffe0
s0          0xfffffe0000102e00
s1          0xfffffc000050f3c0  com_cnm_state
s2          0xfffffc0000559a68  tsp_configuration+0x18
s3                0xc6  rn+0xa6
s4                0xf9  rn+0xd9
s5          0xfffffd01fc0003f8
s6          0xfffffe0000117360
a0          0xfffffc0000559a50  tsp_configuration
a1          0xfffffd01fc0003f8
a2          0xfffffd01fc0003fd
a3          0xfffffc00028e9df0  end+0x238c270
a4                   0
a5               0x109  rn+0xe9
t8          0xfffffc00005591d8  vm_physmem
t9          0xfffffc00004ab990  microtime+0xb0
t10         0x1ff289a2b14c0
t11         0x31aa4752
ra          0xfffffc0000320708  comintr+0x168
t12         0xfffffc00004bac20  cpu_Debugger
at                 0x4
gp          0xfffffc0000508830  special_symbols+0x8160
sp          0xfffffc00028e9d10  end+0x238c190
pc          0xfffffc00004bac24  cpu_Debugger+0x4
ps                 0x4
ai          0x31aa4752
pv          0xfffffc00004bac20  cpu_Debugger
cpu_Debugger+0x4:       ret     zero,(ra)