Subject: misc/2936: make install of /bin/sh failure is bad
To: None <gnats-bugs@gnats.netbsd.org>
From: Brian C. Grayson <bgrayson@marvin.ece.utexas.edu>
List: netbsd-bugs
Date: 11/13/1996 22:48:52
>Number:         2936
>Category:       misc
>Synopsis:       If make install of /bin/sh fails, /bin/sh gets truncated.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    misc-bug-people (Misc Bug People)
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 13 20:50:00 1996
>Last-Modified:
>Originator:     Brian Grayson
>Organization:
	Parallel and Distributed Systems
	Electrical and Computer Engineering
	The University of Texas at Austin
>Release:        current from early November 1996
>Environment:
	
System: NetBSD marvin 1.2 NetBSD 1.2 (MARVIN) #1: Tue Oct 1 21:46:42 CDT 1996 bgrayson@marvin:/a/orac/home/orac/src/sys/arch/i386/compile/MARVIN i386


>Description:
	The /bin/sh executable has grown since my previous
	install, and my root partition happened to be full
	enough that there wasn't enough room for those extra K's
	of program.  When 'make install' got to this point, it 
	truncated the previous /bin/sh, and then failed
	installing the new one, leaving a zero-byte /bin/sh.
	Obviously, this makes the system partially brain-dead --
	'make' (and anything that requires /bin/sh, such as
	system())) will no longer work.  Usually, damage recovery
	is relatively simple -- remove an extra kernel version or
	two, or clean a file or two out of /tmp, or just shuffle
	some file from the root partition to another one
	temporarily, and you can then manually install /bin/sh
	from /usr/src/bin/sh/sh.  However, enough programs
	depend (non-obviously) on /bin/sh that it _may_ be
	conceivable for someone (like a newbie sysadmin like
	me) to get fairly hosed if they do a misstep early on,
	and booting single-user and trying to use /bin/sh won't
	work, obviously.
	
>How-To-Repeat:
	Fill your root partition as much as possible, then make a
	slightly larger /bin/sh executable.  Do an install -c.
	Look at the files.  Of course, you can test this by just
	trying to install any file, instead of using /bin/sh.  I
	verified that install will truncate before it checks to
	see if there is enough room by trying it with a bunch of
	files in my wd0g partition.
	
	** Digression **
	On a related note, when I was trying this out, I ended up
	with a situation like this:
	9:48pm 9# df .
	Filesystem  1K-blocks     Used    Avail Capacity Mounted on
	/dev/wd0g       34991    34869    -1628   105%    /wd0g.mnt
	9:48pm 10# dd if=/dev/zero of=rem bs=1024 count=2

	/wd0g.mnt: write failed, file system is full
	dd: rem: No space left on device
	2+0 records in
	1+0 records out
	1024 bytes transferred in 1 secs (1024 bytes/sec)
	9:48pm 11# df .
	Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
	/dev/wd0g       34991    34869    -1628   105% /wd0g.mnt

	How come df says there is still 122K available,
	but a dd of a 2k file fails?  There are 5000 inodes free
	on this partition, so that's not the problem.  Is this
	due to fragmentation or something?  Or am I just missing
	something REALLY obvious?
	** End digression **

	
>Fix:
	I can see two different levels of fixes:

	1.  Change the makefile for the installation of sh to,
	perhaps, create a file of the appropriate size in the
	target directory, to check for sufficient disk space,
	then if that succeeded, delete it and really do the
	install.

	2a.  Add a paranoid/failsafe option to install that
	says 'Make sure this will work,' perhaps that does the
	same trick as in 1.  (Note that the safest (check me on
	this one) way would be to create a file of the right
	size (or perhaps of the right size minus the current size
	of the previous executable, plus one block to take care
	of "internal fragmentation" if I'm remembering my terms),
	then use mmap to put the new contents, and then do an
	unlink() of the old one and a rename() of the new one,
	to prevent something like an update of a few lines into
	/var/log/messages from sneaking between the test for
	space and the creation of the new file, and preventing
	the install from happening.  There are probably
	problems with this setup since I don't know what I'm
	talking about :), but you get my point.  If we are
	going to be paranoid, we might as well be
	super-paranoid and cover every base we can.) 
	I frequently use make -j2, so it's not inconceivable that
	something could happen between the check and the install
	if we aren't careful.
	
	2b.  Then modify the /bin/sh makefile to add this option to
	its invocation of install.

	There may be other programs that are similarly fairly
	critical to the system that should be handled similarly
	in their Makefiles -- probably getty, login, and su,
	and maybe test/[, ls, cp, mv, cd, pwd, csh (basically,
	any program that would be needed to install a copy of
	itself manually to fix the problem if that executable
	got zapped).  I'm sure other people would have much
	better ideas of what is really critical.

	(Sorry for the length of this thing!)
	
>Audit-Trail:
>Unformatted: