Subject: SCSI problems, again.
To: None <port-macppc@netbsd.org>
From: Monroe Williams <monroe@criticalpath.com>
List: port-macppc
Date: 10/04/2001 00:27:46
My search to find a fast SCSI card that works with NetBSD continues.  See my
posts last month to port-macppc for details of failures with an Adaptec
29160.

I now have a 53c895-based card made by ATTO installed in a G4/466.  (This
machine is slated to replace a rather important server running on an old,
slow, reliable PowerMac 9600.  If I can ever make it work.)

I'm using a kernel built from the 1.5.1 source snapshot.  The GENERIC kernel
didn't work for me because:

- I need the siop driver (instead of ncr) to support this SCSI card
- I need raidframe

The setup includes 3 IBM DDYS-series (U160-capable) drives: two 18G drives
as a mirrored RAID set for most of the filesystem and a 9G used for /tmp,
swap, and an HFS partition.

This card works much better than the Adaptec so far.  I was able to get
through system installation and set up the RAID set with no problems.  I've
seen a couple of random "/netbsd: sprious interrupt" messages (yes, it's
spelled wrong), but they seem harmless.

I'm currently working on stress-testing the machine.  At an apparently
random time (I think I had just invoked 'screen'), I got the following panic
(transcribed by hand, so it may contain mistakes):

panic: lockmgr: no context

db> trace
in panic+e8
in lockmgr+a4
in uvm_map+c4
in uvm_km_valloc+5c
in _bus_dmamem_map+68
in siop_morecbd+138
in siop_scsicmd+9c
in scsipi_execute_xs+58
in scsi_scsipi_cmd+190
in scsipi_command+bc
in sdstart+270
in scsipi_free_xs+d0
in scsipi_done+1b8
in siop_scsicmd_end+3bc
in siop_intr+13d4
in ext_intr_openpic+dc
in extint_call+0
in cpu_switch+30
in mi_switch+1a4
in ltsleep+294
in sched_sync+310
in fork_trampoline+10
db>

After a reboot, after the fsck but while the RAID parity rewrite was still
in progress, I got another panic with a similar backtrace which I didn't
transcribe.  The machine is up again after a second reboot.

Does this problem ring a bell with anyone?  If I can verify that it's a
known problem that got fixed in later kernel sources, I'll feel much better.
Once I put this machine into production, random panics are not acceptable.

I've built another kernel from the 1.5.2 source snapshot, which I'll try
tomorrow. 

Thanks,
-- monroe
------------------------------------------------------------------------
Monroe Williams                                  monroe@criticalpath.com