Subject: bin/6065: rebooting is broken in many ways.
To: None <gnats-bugs@gnats.netbsd.org>
From: Lennart Augustsson <augustss@cs.chalmers.se>
List: netbsd-bugs
Date: 08/29/1998 13:39:24
>Number:         6065
>Category:       bin
>Synopsis:       rebooting is broken in many ways.
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 29 04:50:00 1998
>Last-Modified:
>Originator:     Lennart Augustsson
>Organization:
>Release:        NetBSD-current 980829
>Environment:
System: NetBSD dogbert.cs.chalmers.se 1.3G NetBSD 1.3G (DOGBERT) #0: Fri Aug 28 16:03:01 CEST 1998 sparud@dogbert.cs.chalmers.se:/usr/src/sys/arch/i386/compile/DOGBERT i386


>Description:
	Shutting down NetBSD is deficient in several ways.  Here is my list:
	1) Reboot dies on all kinds of signals that it might get
	   during rebooting.  I committed a fix to this problem
	   so I hope we can forget about this one.
	2) It takes far too long before the system actually shuts down.
	   The reason is that reboot(1) sends TSTP to init and the
	   TERM to all other processes and waits for them to die.
	   If they have not all died within 30 seconds reboot(1) plunges
	   ahead and calls reboot(2) anyway.  Well, on my machines it
	   always takes 30 seconds because all process don't die.
	   What is typically left after a few seconds is init, mount_mfs
	   and inetd.  inetd should shut down on TERM, but maybe something
	   goes wrong?  init should shut down on TERM after TSTP, but
	   it doesn't.  mount_mfs should unmount, but maybe it fails?
	3) When it actually comes to shutting down this fails on my machines
	   (3 i386 and 1 arm32) about once in four.  The machine goes
	   completely catatonic, no 'syncing disks', no getting into
	   the debugger, only reset helps (with dirty file systems, of
	   course).
	4) If the disk syncing fails then it seems that all file systems
	   are considered dirty instead of just the affected ones.
	   This can be very annoying if the failed sync is for an NFS
	   file system, and you have large local disks.
>How-To-Repeat:
	Just do reboot.  :-)
>Fix:
	1) Fixed.
	2) I guess you just have to insert lots of tracing and figure
	   what keeps those last processes from dying.
	3) This is the hard one.  I have no idea what goes wrong, but
	   it happens very often to me.
	4) Should be a matter of bookkeeping.
>Audit-Trail:
>Unformatted: