Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Panic very late in shutdown sequence in amd64 Dom0



My amd64 XEN Dom0 (used to) panic very late in the shutdown sequence.
So late, that (given I have ddb.onpanic=0) when I was doing shutdown -r
to reboot, I didn't even notice it - it was only when shutdown -p
failed to power off the system (or otherwise at least, halt) but acted
just like shutdown -r that I even noticed - the first few times I even
thought I must have mistyped the shutdown command...

The problem turns out to be that wm_intr() (from dev/pci/if_wm.c) is
being called way after the driver has supposedly been shut down and has
released all of its bus mappings - when wm_intr() attempts the CSR_READ() to
access the WMREG_ICR to find out the cause of the interrupt (and if it
is really even a wm interrupt, or belongs to someone else) - BOOM.
There is no bus space left mapped any more, and (as it should) the
access faults.

I "fixed" this by simply having wm_intr() return 0 if it has no register
mappings: if (sc->sc_ss == 0) return 0; right at the start of the
function.   (I actually have it issuing a printf when this happens, and
I tend to see quite a few of those (10 or a dozen typically) in the late
stages of the shutdown sequence.)   I have no idea whether the interrupt
is something that really did belong (in some sense) to the wm interface,
or whether it is really something shared and the "return 0" is actually
the correct action to allow some other handler to finish its work.

With this "fix" in place, a reboot now reboots as it should.  shutdown -h
gets to the "press any key to reboot" stage, and hangs (and reboots if
a key is pressed) just like it should as well.   shutdown -h just hangs
at about that stage, but I don't have enough experience with XEN to
know whether or not the hypervisor is supposed to support power off.

I have no idea how many other drivers might suffer the same problem,
if they happened to have a shared interrupt with somethig else that
continues to generate interrupts very very late in the shutdown sequence.
That is, if that is what the problem is - if the wm hardware is still
issuing interrupts, after being "turned off" then I guess the "turned off"
was not very efective. There are none that affect my system.   It is
also possible, I guess, that there was simply a big queue of pending
interrupts from before the interface was disabled, that are being delivered
later, if so, some kind of "pending interrupt queue flush" mechanism is
likely to be neded.

I also have no idea why the Dom0 interrupt handling mechanism is continuing
to direct inerrupts at a driver that has been shut down (supposedly,
it certainly thinks it has) - and not just once, but many times.  Fixing
that would be a real soution to the problem.

This doesn't really bother me, as the system in question should never stop,
(any more - it used to want to while it was being installed).  I am
also not going to be able to test any changes (or assist with any debugging)
for at least 3 weeks from now - I just thought I should let people know in
case this is another of those "we knew about it but hadn't found the cause"
issues...

Lastly, I should note, that if the system runs a normal generic kernel,
this problem doesn't happen, everything reboots, halts, or powers off,
as would be expected (no idea about suspend, don't need it for this,
haven't even tried...) with no stray late (post unmap) interrupts.

kre



Home | Main Index | Thread Index | Old Index