Subject: NMI on Compaq 1850R (was Re: cac problems with 2.0E-20040517)
To: None <current-users@netbsd.org>
From: Chris Ross <cross+netbsd@distal.com>
List: current-users
Date: 06/12/2004 04:38:17
Chris Ross wrote:
>   I think the next thing for me to try, is to remove the RAID'd disks,
> and try installing onto disks on the motherboard's SYM876 part.  I
> could do an install from the same ftp'd 20040517 snapshot, but that
> takes a while.

   Okay.  I tried this, and had the same problem.  I even went so
far as to completely remove the Smart Array 3200 card, and it
will still drop into the kernel debugger while untar'ing packages.
So, it's clearly *not* the cac part causing the problem.

   Now that that was ruled out, I tried the only other thing I
could think to try.  I removed the upper 512MB of memory.  (I
had 4 256MB DIMMs.  Now have only 2 256MB DIMMs.)  It will reliably
install now, and doesn't seem to have any problem running.

   However, I know this system was running just fine with BSD/OS
on it, with all 1 GB of memory.  And, I know the BSD/OS kernel
was configured to panic on parity-errors, so I wouldn't think
that hardware-detected parity-errors are the problem

   Does anyone else have any thoughts as to what I could try to
figure out why the upper half of my memory was causing this
problem?  Thanks...

>>> On Tuesday 08 June 2004 22:37, Chris Ross wrote:
>>>
>>>>   Hello there.  I have a Compaq 1850R, with a Smart Array 3200
>>>> controller in it (controlling 4 disks, two RAID1 arrays).  I
>>>> have tried installing NetBSD 1.6.2, and multiple 2.0 snapshots,
>>>> and when I try to install all of the source, they will all
>>>> fail with an NMI (so noted by 1.6.2) or unannounced failure
>>>> (cause not noted by pre-2.0 kernels).  In all cases, it will
>>>> drop into the kernel debugger "db> prompt", but a trace
>>>> is not useful because the symbol names are not present, assumedly
>>>> because I loaded off of a floppy.
>>>>
>>>>   (FYI: I've run BSD/OS 5.1 and FreeBSD-current on this same
>>>> hardware without seeing this sort of problem at all, so I'm
>>>> assuming for now it's not a hardware problem...)
>>>>
>>>>   I tried to load the 2.0E snapshot (20040517) from ftp.netbsd.org,
>>>> and only if I trim out unneccesary packages at install would it
>>>> complete.  However, now, when the system is running, it will
>>>> periodically receive an NMI and drop into the kernel debugger.
>>>> It appears to be in the mpidle() function (or something like that,
>>>> been a week since I looked at it), but I assume that isn't
>>>> where the problem actually is, since I don't think I was running
>>>> an MP kernel from the boot-floppies.
>>>>
>>>>   I've been doing a lot of disk I/O, of course, so I'm wondering
>>>> if maybe it's a problem with the cac (or pcicac) driver.  Is
>>>> there anyone else who has a system with a Smart 3200 controller
>>>> in it running netbsd pre-2.0?
>>>>
>>>>   Any advice on how I could help you guys track down and fix
>>>> this problem would be much appreciated.  Thank you.
>>>>
>>>>                              - Chris