Subject: Re: shutdown(8) Heisenbug?
To: John Darrow <John.P.Darrow@wheaton.edu>
From: Mason Loring Bliss <mason@acheron.middleboro.ma.us>
List: current-users
Date: 09/29/1999 09:43:12
On Tue, Sep 28, 1999 at 11:15:17PM -0500, John Darrow wrote:

> However, I've been running into a different problem.  It will take forever
> to get to the 'syncing disks' stage, and then be unable to flush the last
> few buffers, thus 'giving up' and forcing a fsck of all partitions upon
> boot.

Are you seeing rc.shutdown run?

> I eventually tracked it down to having an mfs mounted

Hm. I don't believe I have an mfs mounted, unless kernfs counts as one.

> If I unmounted my mfs /tmp as part of rc.shutdown (after xdm is killed,
> which uses a socket in /tmp), then the sync completes, and the reboot
> procedes quickly, leaving the disks marked clean.

Without rc.shutdown being run, I suppose that somehow xdm is persisting too
long and interfering with clean unmounting.

> As another data point, I seem to remember that if I manually ran rc.shutdown,
> then unmounted all but ffs and mfs drives, then did 'reboot' instead of
> 'shutdown -r now', the syncing worked fine, though I can't figure out why
> this would work and shutdown wouldn't...

If it's the same thing I'm seeing, shutdown simply isn't running rc.shutdown
itself as a consequence of some obscure bug.

This might not help us find the right solution, but this miniscule patch makes
shutdown do the right thing on my box:

*** shutdown.c  Wed Sep 29 09:37:07 1999
--- shutdown.c.foo      Sun Sep 26 20:19:54 1999
***************
*** 118,123 ****
--- 118,125 ----
        struct passwd *pw;
        int arglen, ch, len;
  
+       dofast = 0;
+ 
  #ifndef DEBUG
        if (geteuid())
                errx(1, "NOT super-user");


I'd be gratified if you could try it and tell me how it does on your box. The
messed up thing is that this should not be necessary - "dofast" is supposed to
be automatically initialized to zero. That's something we're supposed to be
able to depend on. Since this patch explicitly initializes "dofast" at the
start of the program, and since nothing else has been touched, I'm at a loss
to point at anything other than a code generation bug, but I'm not quite sure
how to proceed... I think I probably just need to learn to use a debugger to
step through the thing and see what's happening. I suppose it wouldn't hurt
to take a whack at staring at assembly output as well - I once, ages ago,
was at least read-only with regard to x86 assembly code. :) That's something
I can probably do today. (I've been picking through the O'Reilly "Programming
with GNU Software" lately, or whatever it's called - the black swan book.)

-- 
    Mason Loring Bliss  mason@acheron.middleboro.ma.us  They also surf who
awake ? sleep : dream;  http://acheron.ne.mediaone.net  only stand on waves.