port-i386: EISA problems under 1.6

Subject: EISA problems under 1.6
To: None <port-i386@netbsd.org>
From: Dr R.S. Brooks <R.S.Brooks@liverpool.ac.uk>
List: port-i386
Date: 11/24/2002 02:14:44
I recently got a Compaq Proliant 1500 (2 x Pentium 133, 160MB, PCI/EISA MB) on
eBay, and started installing NetBSD 1.6 on it.

The 1.6 INSTALL kernel only sees 16MB of RAM, but I was able to install and
boot the GENERIC kernel and the system seemed to be stable.  So I decided to
build a kernel with REALEXTMEM=163452 (which is what I calculate the extended
memory size to be).  But with only 16MB, egcs doesn't run too well, and after
about 2.5 hrs of continuous paging (on init_main.c AFAIR), I gave up and
built the kernel on another machine.

With the new kernel, the Proliant booted OK, but /kern/msgbuf had disappeared.
On trying to recompile the kernel I got repeatable kernel page fault traps.
It would fail at slightly different points, but here is an example:

uvm_fault(0xcd481bc0, 0x0, 0, 2) -> e
kernel: page fault trap, code=0
stopped in pid 416 (cpp0) at pool_get+0x199: movl %eax, 0x4(%edx)

Both %eax and %edx contain 0xffffffff

I'm currently running the Compaq diagnostics, but so far have no errors
(except that the diags are so old that they don't seem to believe that
a 36GB disk can really be that big!).

But the curious thing is that I got the Proliant to replace a different
server with an Intel PCI/EISA motherboard which also gave similar kernel
page fault traps (but while the kernel was still probing the hardware).
This machine had seemed to run 1.5.2 OK (except for a curious problem with
illegal instructions when running cdrecord).  But it had a dubious history,
so I assumed it was a hardware fault.  Now, having seen the Proliant do exactly
the same thing I'm beginning to wonder if something is borked in the EISA code
in 1.6.

So is anyone running 1.6 successfully on an EISA-based machine?  I realise
these machines are a bit ancient, but there are a lot of them available, and
they seem ideal for a home fileserver.

Any suggestions as to what I can do to pin this down?  Looking back
through the mailing list archives there seem to be a sprinkling of
similar problems (some going back as far as 8 years), but no solutions!


Roger