Subject: xsrc/15357: stack trashing bug crashing the sparc Xservers
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 01/24/2002 21:36:21
>Number:         15357
>Category:       xsrc
>Synopsis:       stack trashing bug crashing the sparc Xservers
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    xsrc-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 24 18:37:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Greg A. Woods
>Release:        xsrc-2001/07/03
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 1.5W
Architecture: sparc
Machine: sparc
>Description:

	I've been suffering occasional crashes of the Xserver on my
	primary workstation, a SPARCstation-1, now a 1+, ever since I
	first began to use it.

	Originally (for me) it ran NetBSD-1.3.2/sparc.

	Now it runs 1.5W from sources last updated 2001/06/24, and xsrc
	built from sources last updated 2001/07/03.

	It runs diskless, and has 16MB of ram and a bwtwo frame buffer.

	Since upgrading last week I've been suffering these crashes even
	more frequently it seems, every other day instead of every other
	week (though since I know not exactly what causes them I'm not
	sure how to rate their frequency).

	It doesn't seem to make any difference whether I run Xsun or
	XsunMono, but since I find the latter to perform slightly
	better, and since it is sufficient for this hardware, that's
	what I prefer to run.

	It doesn't matter whether I start it from xdm or xinit.

	I generally run it with xfs (xset fp= tcp/server:7100).

	Yesterday I decided to suffer the overhead of gdb and I attached
	gdb to the running XsunMono process shortly after I had started
	it with xinit.  Here are the results:

14:04 [19] $ gdb /usr/X11R6/bin/XsunMono 6720  
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc--netbsd"...

/proven/work/woods/NetBSD-xsrc/xc/programs/Xserver/6720: No such file or directry.
Attaching to program `/usr/X11R6/bin/XsunMono', process 6720
0x18e978 in select ()
(gdb) cont
Continuing.

Program received signal SIGPIPE, Broken pipe.
0x1a3758 in writev ()
(gdb) cont
Continuing.

Program received signal SIGBUS, Bus error.
0x19960 in DeliverEventsToWindow (pWin=0x1f02e9, pEvents=0x417808, count=8, 
    filter=2147483648, grab=0x0, mskidx=37159673) at events.c:1199
1199            if (filter != CantBeFiltered &&
(gdb) where
#0  0x19960 in DeliverEventsToWindow (pWin=0x1f02e9, pEvents=0x417808, 
    count=8, filter=2147483648, grab=0x0, mskidx=37159673) at events.c:1199
#1  0x1ac4c in DeliverFocusedEvent (keybd=0xb0680, xE=0x417808, 
    window=0x46e940, count=8) at events.c:1921
#2  0x382734 in ?? ()
Error accessing memory address 0x3d: Invalid argument.
(gdb) list
1194    
1195        /* CantBeFiltered means only window owner gets the event */
1196        if ((filter == CantBeFiltered) || !(type & EXTENSION_EVENT_BASE))
1197        {
1198            /* if nobody ever wants to see this event, skip some work */
1199            if (filter != CantBeFiltered &&
1200                !((wOtherEventMasks(pWin)|pWin->eventMask) & filter))
1201                return 0;
1202            if ( (attempt = TryClientEvents(wClient(pWin), pEvents, count,
1203                                          pWin->eventMask, filter, grab)) )
(gdb) 


	Here's all gdb can tell me from an earlier dump of Xsun:

21:19 [28] $ gdb Xsun ~/Xsun.core
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc--netbsd"...
Core was generated by `Xsun'.
Program terminated with signal 11, Segmentation fault.
#0  mieqProcessInputEvents () at mieq.c:191
191         }
(gdb) where
#0  mieqProcessInputEvents () at mieq.c:191
Cannot access memory at address 0x168.
(gdb) list
186                     (*miEventQueue.pPtr->processInputProc)
187                                     (&xe, (DeviceIntPtr)miEventQueue.pPtr, 1);
188                     break;
189                 }
190             }
191         }
192     }
(gdb)


	As I recall this was as much information as I was able to get
	from the cores from the 1.3.2 release too.

	It seems the stack is always so thouroughly trashed that any
	possibility of finding the real backtrace is impossible.

	I've no idea how to debug this further without getting much more
	familiar with the Xserver code (I know almost nothing about it
	now).  I've thought of various compiler hacks which might be
	possible to try and detect the stack trashing earlier by saving
	the return address just after every call and comparing it to the
	value still on the stack before executing the return
	instruction, or to even save a copy of the entire stack just
	after every function call (before executing the first
	instruction of the function), etc., but there doesn't seem to be
	any quick hack that would be both efficient enough to run with
	and effective enough to catch the stack trashing.

	Maybe if there were a compiler option that could be used in
	conjunction with debugger watchpoints so that a watch would be
	automatically set on every return address on the stack.....
	Even then I suspect the overhead would make X unusable on any
	sparcstation-1 or -2 class machine and thus make it impossible
	to run long enough to trigger the bug.


>How-To-Repeat:

	unknown

>Fix:

	unknown

>Release-Note:
>Audit-Trail:
>Unformatted: