On Tue, Sep 02, 2008 at 03:52:20PM -0700, Jason Thorpe wrote: > >> A lot of the unkillable processes I've seen are stuck deep inside some >> device driver, waiting for an even that either will never happen or >> which could happen but which is difficult to arrange for. > > Yes, they're waiting for an event... using some facility provided by the > kernel... condvars or tsleep... meaning the kernel could awaken the > thread and cause it to commit suicide. There are other cases of interest for a forced removal/invalidation of swap pages too, that may favour the page-invalidation approach rather than the process-killing approach. Those cases aren't only at shutdown. Pages owned by something-other-than-a-process (the tmpfs example) is one that's come up already. Another would be a failing/failed/removed/etc swap device. Depending on details, this currently would lead to a panic or processes blocked forever on a failing pagein (I expect). This could lead to exactly the kind of shutdown scenario discussed above, as well as problems in general operation. It Might Be Nice to let the system try and proceed instead, invalidating pages that can't be recovered, killing processes if need be as a result. There's another case in the other direction, but it hits some of the same kinds of error paths when paging. Ideally, when suspending a machine with cgd(4), we should flush the keys from memory, and the device should block new requests until the key is reloaded after resume. On such a machine, swap is clearly one of the things likely to be inside the cgd. We need to arrange for cgdconfig(8) and whatever else we need to reload the key to be locked in ram before suspend, sure, and there are ways to do that now. Having support for marking a swap device as suspended (so the system can do something smarter than just pile up paging requests in the disk queue) seems like it might be helpful too. Doing "hibernate" support via process swapout and a small kernel state blob will probably raise some other cases. Is it worth catering in detail for these cases? I'm not sure, but as long as we're hypothesising about "smarter swapctl -d" they're worth raising for consideration. If it's not worth it, it is enough to have a knob that can be turned to avoid a hang trying to detach swap when shutting down in such circumstances. Remembering to turn that knob is another matter, maybe some of the cases above should automatically set it if we know they're going to lead to trouble. -- Dan.
Attachment:
pgpLSbFRL0Ohe.pgp
Description: PGP signature