Subject: Re: port-alpha/5546: port-alpha/lost a stack? exception_restore_regs bombs
To: Chris G. Demetriou <cgd@pa.dec.com>
From: Matthew Jacob <mjacob@feral.com>
List: port-alpha
Date: 06/05/1998 17:29:37
Chris G. Demetriou wrote:
> 
 
> In particular, note that PCI aborts, such as when doing configuration
> (or other) accesses to space which doesn't exist, can yeild machine
> checks (typically vector 660, if i recall, but my memory is rather
> fuzzy in that area).  Note that these machine checks are expected,
> taken, and survived!

Yes. It is 660.

> 
> I would expect that similarly, devices trying to DMA to SG-mapped
> space where the translation is invalid (or bogus) could have a similar
> effect.
> 
> I would suggest:
> 
> (1) dumping all of the information you can from the platform-specific
> logout area.  That is, not just numbers, but interpretation of numbers
> too.  That and the rest of the mcheck information _should_ tell you
> exactly what caused death, _if_ you can interpret it.
> 
> (2) if you can't squeeze meaningful information out of the logout
> frames, i'd strongly suggest dumping all of the information you can
> about device DMA state, both from I/O controller registers and kernel
> data structures.  Sure, it's only a WAG, but it's a somewhat-educated
> WAG, and lacking better information seems like a good candidate for
> trouble, _especially_ if you have lots of DMA accesses going on.
> 
> Note that if you're not actually using SG-mapped DMA for the things
> you're beating on, on that system, then my WAG is more WA and less
> likely to be a valid G.  8-)


I'll try this. I was setting up to do this for 660- since this is the
case where you're supposed to go off and read Turbo Laser registers
and check for sparse addess barfs- it's not supposed to cause a 670,
but heck, it won't be the first time.

-matt

p.s.: "Red Herring"? I'd believe this except it's repeatable. If it
were an asynchronous fault, I'd expect random death. This has died
in the same place before, I believe. Well, "watch this space"- I'm
reluctant to do too much remotely- but if I do and wedge the 8200,
I guess it'll be time come and work on it more.