Subject: I think I found it! XsunMono lives to tell the tale!
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 04/20/2002 15:07:07
It's probably way too early to say for certain, but I'm so excited I
can't bear to hold back the good news any longer!  (though a failure at
this point would be ultra-devastating....)

The other day while doing a little more than the usual amount of playing
around with Japanese characters in emacs I suffered through a series of
Xserver crashes that all exhibited identical symptoms in GDB.

The crash was always at the same place in ProcQueryFont(), but the stack
was broken, most of the local variables were all zeros, the core file
itself seemed a bit more damaged than usuaal (the local registers are
unreadable):

GNU gdb 5.1.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-unknown-netbsdelf1.5W"...

warning: little endian file does not match big endian target.
Core was generated by `XsunMono'.
Program terminated with signal 11, Segmentation fault.
Couldn't read input and local registers from core file
Couldn't read input and local registers from core file
#0  0x00034aa0 in ProcQueryFont (client=0x0) at dispatch.c:1227
1227            reply->type = X_Reply;
(gdb) where
#0  0x00034aa0 in ProcQueryFont (client=0x0) at dispatch.c:1227
Cannot access memory at address 0xeff4deac
(gdb) list
1222            if(!reply)
1223            {
1224                return(BadAlloc);
1225            }
1226
1227            reply->type = X_Reply;
1228            reply->length = (rlength - sizeof(xGenericReply)) >> 2;
1229            reply->sequenceNumber = client->sequence;
1230            QueryFont( pFont, reply, nprotoxcistructs);
1231
(gdb) info locals
pmax = (xCharInfo *) 0xec00
pmin = (xCharInfo *) 0x26dd38
nprotoxcistructs = 60416
rlength = 0
reply = (xQueryFontReply *) 0x0
pFont = 0x0
pGC = (GC *) 0x1
stuff = (xResourceReq *) 0x0

Now how the heck could a local pointer be OK at one point where it's
tested explicitly to avoid causing a NULL dereference, and then in the
very next statement end up being zero and causing a NULL dereference?

I already knew that emacs could not display the HELLO document without
triggering a crash so I tested it again and sure enough the very same
symptoms appeared in the core file.

Anyway this all pointed even more to font-handling problems than before
(though there had been many hints about this previously).

That night (well early the next morning :-) while going to sleep I
realised there might be a simple way to reduce the problem to the
simplest possible case and thus perhaps make a very simple recipe for
reproducing it.  First I ran just an xterm directly from xinit (no
window manager) and started emacs, tried to open the HELLO document and
got the same crash again.  I switched from using xfs to NFS-mounted
fonts, but the crash was exactly the same.

So then I decided to try running xfontsel directly from xinit.  Sure
enough I could trigger the same crash immediately when opening the very
first font family on the list:  admas.

It turns out this font family contained the only two unicode-encoded
fonts I had installed.

$ fgrep -i unicode /usr/X11R6/lib/X11/fonts/*/font*
intlfonts/fonts.dir:ethio16f-uni.pcf.gz -admas-ethiomx16f-medium-r-normal--16-150-100-100-m-160-ethiopic-unicode
intlfonts/fonts.dir:ethio24f-uni.pcf.gz -admas-ethiomx16f-medium-r-normal--24-225-100-100-m-240-ethiopic-unicode

I removed them from the directory and from the fonts.dir file and now
things are running smoothly with no crashes.  I can open the HELLO
document multiple times in the same session and I can view every font
family with xfontsel.

This XsunMono has been running for more than 12 hours and has done more
work, especially font-related work (eg. the emacs HELLO document) than
most previous instances ever did, and it's still ticking happily along:

14:31 [9] $ ps -auxww | head -3
USER   PID %CPU %MEM   VSZ RSS TT STAT STARTED     TIME COMMAND
woods 5035 15.8  0.5 10044 188 ?? RN   11:37PM 57:03.80 XsunMono :0 -fp tcp/sometimes.weird.com:7100 -v -logo 
woods 5274  2.0  0.1  1172  44 ?? SN   11:37PM 17:58.34 swisswatch -name swissclock -geometry 190x190-0+0 


What I don't understand is why this wasn't a problem in the 1.3.x
Xserver.  I had the same version of intlfonts installed on the old font
server....

I also don't understand why the mere presence of a couple of unicode
encoded fonts in amongst 1040 other fonts would eventually cause a
crash, even when there's never been any explicit use of those fonts.

I'll note too that I didn't restart xfs after removing those two, and
yet even now while again using the font server all's well -- it seems to
only happen if you're able to actually try to open and use one of the
unicode-encoded font files.

In any case even if I haven't found all the bugs I've certainly found a
nasty but easily reproducible one.  Just install fonts/intlfonts-1.2 and
then fire up XsunMono as (adjusting as necessary for the location of
your intlfonts directory):

	xinit /usr/X11R6/bin/xfontsel -- XsunMono :0 -fp "/usr/X11R6/lib/X11/fonts/misc/,/usr/X11R6/lib/X11/fonts/Speedo/,/usr/X11R6/lib/X11/fonts/Type1/,/usr/X11R6/lib/X11/fonts/intlfonts/,/usr/X11R6/lib/X11/fonts/100dpi/,/usr/X11R6/lib/X11/fonts/75dpi/"

and then try to look at the "admas" font family.....

BTW, there's no problem on my NCDs, though of course no character glyphs
are displayed in the bottom xfontsel pane with the default sample text
used by xfontsel (not even the adjusted sample I use which contains all
printable 8-bit characters).

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>