Subject: install/15400: sysinst sometimes dumps core in curses routine
To: None <gnats-bugs@gnats.netbsd.org>
From: Duncan McEwan <duncan@mcs.vuw.ac.nz>
List: netbsd-bugs
Date: 01/28/2002 17:25:31
>Number:         15400
>Category:       install
>Synopsis:       sysinst sometimes dumps core in curses routine
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    install-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 27 20:26:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Duncan McEwan
>Release:        NetBSD 1.5ZA build with sources from early January.
>Organization:
Victoria University of Wellington, New Zealand
>Environment:
System: NetBSD turakirae.mcs.vuw.ac.nz 1.5ZA NetBSD 1.5ZA (GEN_X) #0: Fri Jan 4 12:56:58 NZDT 2002 mark@turakirae.mcs.vuw.ac.nz:/mnt/SAVE/build.obj/sys/arch/i386/compile/GEN_X i386
Architecture: i386
Machine: i386
>Description:
	Our (slightly modified!) sysinst can be made to dump core repeatably
	while extracting distribution sets with pax.

	We are reasonably sure that our modifications have not caused the
	problem.  They do a couple of things: (a) set a few default answers
	to values appropriate for our local environment; and (b) add a couple
	of extra distribution sets of local software that we want installed
	on all our NetBSD machines.

>How-To-Repeat:
	The problem occurs most frequently when you (a) update the MBR on
	the disk; and (b) answer "yes" when sysinst asks whether you want to
	see files listed as they are extracted by pax.

	Our previous workaround for this problem was to do the installation
	in two stages.  First update the MBR, then reboot and rerun sysinst.
	However, today we discovered that saying "no" when asked whether we
	want to see the extracted files listed seems to prevent the coredump
	from occuring.

	As stated above, due to the nature of our modifications we don't
	believe they are *directly* to blame.  However, as we are not aware of
	anyone else reporting this problem, it is possible that a side-
	effect of them could be to trigger an existing bug.  For eg, perhaps
	the fact that we extract more/larger distribution sets might cause
	sysinst to exhaust memory?
	
>Fix:
 	We don't have a fix.  However, we did compile a non-crunchgen'd
	sysinst binary with '-g' and got that onto a machine we were about
	to install using a floppy disk.  We also used a floppy to get the
	resulting corefile from	the machine.

	Running gdb on the core file showed that the coredump occured
	at the following line in the _waddbytes curses routine.

	Core was generated by `sysinst'.
	Program terminated with signal 11, Segmentation fault.
	#0  0x805ff13 in __waddbytes (win=0x8112880, 
    	bytes=0xbfbfd3cb "<25 bytes of binary junk deleted>", count=0, 
    	attr=0) at /src/work/src/lib/libcurses/addbytes.c:165
	165                             if (lp->line[x].ch != c ||

	I used the gdb print command to look at a few variables that _waddchar
	uses and found that x has the reasonable looking value of 0,
	as does the variable y, but printing win->lines[0].line[0] (which is
	what the above line is equivelent to) causes gdb to say "Cannot access
	memory at address 0x732f7972".

	Further debugging analysis is hard because I can't see any way of
	running gdb on a live sysinst while it is installing a system.
	So I'm hoping that someone who knows the code better might be able
	to suggest what might be going wrong here, even if they are not able
	to reproduce the problem we are seeing themselves.  To help with this
	I've made the sysinst binary we used (compiled with debugging symbols)
	and the sysinst.core file available at

		http://www.mcs.vuw.ac.nz/~duncan/{sysinst,sysinst.core}

	for further postmortem analysis.  I'll be more than happy to try some
	additional experiments to gather more information if it will help.
>Release-Note:
>Audit-Trail:
>Unformatted: