Subject: Re: generic HBA error messages on 1.6beta2
To: Matthew Jacob <mjacob@feral.com>
From: Matthias Buelow <mkb@informatik.uni-wuerzburg.de>
List: port-alpha
Date: 07/10/2002 02:21:42
Matthew Jacob writes:
>Fetch ftp://ftp.feral.com/pub/outgoing/patches.gz and apply and try again.
Will do that tomorrow... besides, let's have an analytical look at
the problem (maybe with someone on the list a bell might ring):
1) the problem only appears to occur with machines with >= 1GB RAM
installed (as Mel Kravitz claims, who has seen the same problem),
2) the problem only occurs here when the machine has been running for
at least 2-3 days, this might hint at some problem with higher
address spaces or physical memory or mappings, and the kernel
migrates some mappings or buffers slowly upwards over time,
making the problem appear after a couple of days,
3) the problem appears to be with the dma mapping of the host adapter,
or more generally; considering that Jason has made new SGMAP DMA
improvements a while ago (according to the /alpha webpage) this
might be a hint that something might be broken there (with the
direct-mapped DMA window, although it only mentions mbufs and
things being made "a bit more efficient" on the webpage),
4) it does not seem to result from hardware bus collision or similar,
because the system is completely unloaded and what triggers it
seems rather to be related to passed uptime than to i/o traffic,
5) there hasn't been observed any real data loss to the disks so far,
at least not here, maybe it's just a bogus error (although I somewhat
doubt that, and there hasn't been enough disk i/o to substantiate that),
6) it cannot be triggered from userland by consuming all available
virtual memory (what's available physical, not swap) and doing disk
i/o.
I haven't checked yet if the problem also occurs on the adaptec
controller (or at least, never have seen it for that one so far)
which is also installed in the system, which may or may not hint
at specific problems with the isp (qlogic) driver. I somehow doubt
that, though, but I of course can't tell.
--mkb