Subject: Re: NMI on Compaq 1850R
To: None <current-users@NetBSD.org, port-i386@NetBSD.org>
From: Chris Ross <cross+netbsd@distal.com>
List: port-i386
Date: 01/20/2007 12:12:13
On Jan 18, 2007, at 18:19, Chris Ross wrote:
> In June of 2004, I posted to current-users about a problem I was
> having getting NetBSD 2 (point something) installed on a Compaq
> 1850R. I have a recollection of discussing this with someone else
> from the list, off-list, and finding a really tiny kernel bug that
> only affected some small class of memory systems. PIIX3, perhaps?
>
> In any case, I was fairly certain that change, which allowed me
> to run with more than 512MB of memory without getting an NMI fairly
> easily during heavy disk activity, was committed to the trunk, and
> pulled up into 3. [...]
Hello again, all. And, new lists for the more specific questions
now to be asked. As it turns out, I still had a 2.99.14 kernel tree
sitting on the machine, and was able to find a [crude] patch to sys/
arch/i386/pci/pchb.c that appears, tested now against 3.1-RELEASE, to
solve my problem. This was clearly never contributed back to the
core, and that may well be my fault. The relevant code in pchb.c
hasn't changed in any significant way in a very long time.
The aforementioned "patch" I am now running with simply removes
the PCI_PRODUCT_INTEL_82443BX_AGP & PCI_PRODUCT_INTEL_82443BX_AGP
case starting near line 193 of pchbattach(). This is noted to be a
"BIOS BUG WORKAROUND". But, at least for my machine (pchb0: Intel
82443BX Host Bridge/Controller (AGP disabled) (rev. 0x03)), this
"workaround" causes the machine to get an NMI fairly easily.
I have confirmed that with 4 DIMMS making 768MB of memory, the
above code will cause a crash within a few minutes when doing a cvs
checkout of the NetBSD src tree. Without the 20 lines of code (and
comment) in that 'case', it runs just fine for multiple full
checkouts/updates. If I have only 2 DIMMS (either 256MB or 512MB) in
the machine, though, it will work just fine with or without the above
code. As I mentioned in my first piece of email, which went only to
current-users, I discussed this off-list with someone in the summer
of 2004. Sadly, I don't have that email. But, I do remember now,
vaguely, him noting something about this being incorrect code, at
least with respect to some revisions of the 82443BX. I wish I could
remember which revisions he said did or didn't this code belong to,
but clearly for rev 0x03, it causes a problem.
Perhaps the person who "owns" that code in i386/pci/pchb.c, or if
the person I worked with a couple years ago is on any of these lists,
could discuss this with me we could find the "correct" solution, and
get it into the tree. I can certainly run a patched kernel, but
there must be other people with a Proliant 1850R, or some other
machine with affected rev's of the 82443BX, that this would also
help. :-)
Thanks much. I hope to hear from you soon!
- Chris