Subject: Re: bge/ahd nterrupt problems
To: Frank van der Linden <firstname.lastname@example.org>
From: Edgar =?iso-8859-1?B?RnXf?= <email@example.com>
Date: 03/25/2007 16:04:38
> Let me know what you see.
OK, here is what I see with DDB when the server is in the "locked up" state:
ahd1: Timedout SCB already complete. Interrupts may not be functioning.
ioapics RDRs (i.e. write 2*i+10 to REG, read DATA):
ioapic1 (the one ahd1 is on):
1: E064 (i.e., the receipt bit is on)
2, 3 disabled (10000)
Setting a breakpoint on ahd_intr:
Looks like getting only interrupts for ahd0 and none for ahd1.
I can also inspect ci_isources, but that doesn't make sense as long as
either I misunderstand which one should be handling the interrupt or
there is indeed confusion wrt. multiple IOAPICs.
I still have the machine more or less untouched (i.e. it still complains
about ahd1 timeouts). But I will now leave the server cellar in favour
or a bicycle ride. I can return later today if someone wants me to inspect
further hardware registers. Otherwise, I'll try to save the RAID parity and
try to get a dump. Then, I'll probably run the torture test with a non-IOAPIC
Thanks for any hints what's going on.
I would really like this solved next week.