Subject: bin/13301: ksh will dump core sometimes if it gets a spurious SIGWINCH
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 06/24/2001 23:34:20
>Number:         13301
>Category:       bin
>Synopsis:       ksh will dump core sometimes if it gets a spurious SIGWINCH
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 24 20:32:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Greg A. Woods
>Release:        2001/06/19
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:

all NetBSD IIRC, though most recently noticable on sparc

Architecture: sparc
Machine: sparc

>Description:

	This problem has been "bugging" me for nearly forever (well ever
	since ksh became a standard part of NetBSD, and maybe even from
	before that.  I don't know if I've reported this before or not,
	but at least now I've got a half-useful traceback.

	Normally I don't trip over it on fast machines, but when
	openning an xterm on a slower or loaded machine I'll sometimes
	go to resize the window before the shell has finished sourcing
	~/.profile et al and some sub-shell being started by .profile or
	.kshrc or whatever (never the parent) will dump core.

	When not compiled with '-g', at least on sparc, the error is
	SIGBUS or SIGILL, and the stack frames are always pretty much
	totally corrupt and useless.  This one at least starts out in an
	apparently valid place, but as we'll see it's just as broken

		Core was generated by `ksh'.
		Program terminated with signal 4, Illegal instruction.
		#0  0x11000 in c_pwd ()
		(gdb) where
		#0  0x11000 in c_pwd ()
		#1  0xa78ec in ?? ()
		Cannot access memory at address 0x2d703a58.

	I finally tonight got bored enough while watching "make build"s
	run to try deubgging this.

	Now the tricky part is you can't just run your login shell under
	the debugger, particularly if it's a shell being started by the
	likes of:  "rsh -n host "xterm -ls"

	Luckily that's not necessary since the binary compiled with '-g'
	seems to generate a valid core dump.

>How-To-Repeat:

	1. make your ~/.profile fairly complex so that it takes a bit of
           time and so that it needs to run several subshells.

	2. start an xterm with '-ls' (i.e. so that it runs the shell as
           a login shell).

	3. resize the xterm window constantly while your .profile is
           doing its thing

	4. watch for error messages and look for the resultinga core
           dump after you get your shell prompt....

	$ gdb /bin/ksh ksh.core
	GNU gdb 4.17
	Copyright 1998 Free Software Foundation, Inc.
	GDB is free software, covered by the GNU General Public License, and you are
	welcome to change it and/or distribute copies of it under certain conditions.
	Type "show copying" to see the conditions.
	There is absolutely no warranty for GDB.  Type "show warranty" for details.
	This GDB was configured as "sparc--netbsd"...
	Core was generated by `ksh'.
	Program terminated with signal 11, Segmentation fault.
	#0  0x2fce0 in trapsig (i=87996)
	    at /proven/work/woods/NetBSD-src/bin/ksh/trap.c:117
	117             trap = p->set = 1;
	(gdb) where
	#0  0x2fce0 in trapsig (i=87996)
	    at /proven/work/woods/NetBSD-src/bin/ksh/trap.c:117
	#1  0xefffff74 in ?? ()
	#2  0x2c154 in shf_flush (shf=0xacb68)
	    at /proven/work/woods/NetBSD-src/bin/ksh/shf.c:316
	#3  0x1d3b4 in execute (t=0xacb68, flags=50)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:99
	#4  0x1c570 in comsub (xp=0xefffedf0, cp=0xacb68 "")
	    at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:877
	#5  0x1b1c0 in expand (cp=0xa72b1 "expr \":$varvalue:\" : \".*:$1:.*\"", 
	    wp=0xefffee78, f=11) at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:243
	#6  0x1adac in eval (ap=0xa7224, f=11)
	    at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:95
	#7  0x1d42c in execute (t=0xa71c8, flags=256)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:116
	#8  0x1de14 in execute (t=0xa7198, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:376
	#9  0x1d8b4 in execute (t=0xa7168, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:194
	#10 0x1ddb0 in execute (t=0xa7050, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:369
	#11 0x1d8b4 in execute (t=0xa7020, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:194
	---Type <return> to continue, or q <return> to quit---
	#12 0x1df28 in execute (t=0xa6858, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:394
	#13 0x1e794 in comexec (t=0xa7928, tp=0xa6820, ap=0xa6040, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:664
	#14 0x1d724 in execute (t=0xa7928, flags=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:157
	#15 0x28f88 in shell (s=0xa1820, toplevel=0)
	    at /proven/work/woods/NetBSD-src/bin/ksh/main.c:623
	#16 0x28bd8 in include (name=0x9fe18 "/home/most/woods/.profile", argc=0, 
	    argv=0x0, intr_ok=1) at /proven/work/woods/NetBSD-src/bin/ksh/main.c:504
	#17 0x288a8 in main (argc=1, argv=0xeffff7c4)
	    at /proven/work/woods/NetBSD-src/bin/ksh/main.c:379
	#18 0x10238 in ___start ()
	(gdb) list
	112     trapsig(i)
	113             int i;
	114     {
	115             Trap *p = &sigtraps[i];
	116
	117             trap = p->set = 1;
	118             if (p->flags & TF_DFL_INTR)
	119                     intrsig = 1;
	120             if ((p->flags & TF_FATAL) && !p->trap) {
	121                     fatal_trap = 1;
	(gdb) print p
	$1 = (Trap *) 0x340888
	(gdb) print *p
	Cannot access memory at address 0x340888.
	(gdb) print i
	$2 = 87996
	(gdb) print sizeof(sigtraps)
	$3 = 1088
	(gdb) 


	OK, well the value of 'i' is clearly wonky.

	Can NetBSD really be calling signal handlers differently than
	other systems?

>Fix:

	unknown
>Release-Note:
>Audit-Trail:
>Unformatted: