current-users: Re: CVS commit: src

Subject: Re: CVS commit: src
To: None <current-users@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: current-users
Date: 11/03/1998 13:19:29
[ On Mon, November 2, 1998 at 20:52:50 (-0800), John Nemeth wrote: ]
> Subject: Re: CVS commit: src
>
>      The way shutdown is handled by SysV is one of my biggest peeves
> with it.  It is all too easy for a shutdown script to hang,
> necessitating a hard reset of the system.

That's a bug in the implementation, not a design flaw.

The only time I've ever had a problem where I actually had to pull the
power plug on a hung machine was when I screwed up and didn't have a
console getty running in *all* run levels.

Scripts *should* be written robustly so that they can't hang, and there
should always be a way for root to login on a secure terminal and
manually complete the shutdown in the event of a scripting bug.  Since
most traditional SysV shutdown scripts simply send signals to processes
there's usually not much chance they'll hang unless they're buggy.  A
catch-all "kill -15 -1 && sleep 5 && kill -9 -1" catches processes not
explicitly shut down.  The only part that can sometimes hang is the
unmounting of filesystems, and that's done at the very last so pulling
the plug then will generally only result in extra fsck'ing.  There's
really not much that can be done if the hardware is hung and a given
filesystem cannot be unmounted anyway....

Of course a design that's robust even in the face of implementation bugs
would be better, but I don't think it's very likely one will be found.

>    If we're going to have this
> feature, it should be setup so that there is a time limit on how long
> rc.shutdown can run.  If the time limit is exceeded, then shutdown
> should continue anyways (do kill -15, sleep 30, do kill -9, ...).

Such time limits are horribly system dependent.  Unless you make the
limit exessively long (i.e. so that it's annoying if you're beside the
machine, but short enought that you really don't have to go down to the
machine location on a cold winter night) it'll often be too short.

If you're only going to have one rc.shutdown then I don't know that the
timer should be around the whole script either.  There should be a
mechanism script writers can use that'll run a command in the background
and wait only a specified amount of time for it to complete.  If it
fails to complete then that failure should be noted and the remaining
shutdown steps should still be taken.  Perhaps there should be a
separate warning time and a timeout time too so that if something's
taking longer than expected a warning can be printed and logged and the
administrator won't be left wondering what the heck's going on.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>