Subject: sh core dumps
To: None <port-sparc@netbsd.org>
From: Valeriy E. Ushakov <uwe@ptc.spbu.ru>
List: port-sparc
Date: 10/20/2005 00:47:00
[Starting a new thread to disentangle this from the -mcpu discussion]

It seem that I can reliably reproduce the problem with devel/gmake
after running make there I can cd to work/make-3.80 and trigger the
bug by running ./config.status.  That gives me one or sometimes two sh
core files (i run with kern.defcorename=%n.%p.core).  Both are from
backticked invocation of sed.  As we get sh.core, not sed.core that
should happen in the vforked child before exec.

[While I don't have older cores around, but IIRC they were similar in
that the sh.core was from a child vforked to run a backticked command
or parens subshell]

<root@krups:/usr/pkgsrc/devel/gmake/work/make-3.80> (1042) ./config.status
config.status: creating Makefile
config.status: creating glob/Makefile
config.status: creating po/Makefile.in
config.status: creating config/Makefile
config.status: creating doc/Makefile
config.status: creating build.sh
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
[1]   Segmentation fault (core dumped) sed -n -e "/^DEP...
[1]   Segmentation fault (core dumped) sed -n -e "/^DEP...
config.status: executing default-1 commands
config.status: creating po/POTFILES
config.status: creating po/Makefile


I've instrumented memfault handler to print some additional info and
for those cores I always get

sh[1116]: SEGV(map): addr=0xe804068c type=9
> sfsr=38e<PERR=0,LVL=3,AT=4,FT=3,FAV,OW>

AT=4 - store user data
FT=3 - privelege violation

The fault address 0xe804068c is in the kernel (this is krups, so the
kernel is at e8000000).  That address is in the middle of nd6_ioctl().

The backtrace and register contents is always the same.

The instruction at pc looks totally innocent.

(gdb) x/7i $pc-20
0x17400 <argstr+324>:   cmp  %l6, 0
0x17404 <argstr+328>:   sethi  %hi(0x31000), %l3
0x17408 <argstr+332>:   be  0x17434 <argstr+376>
0x1740c <argstr+336>:   sethi  %hi(0x30c00), %l7
0x17410 <argstr+340>:   ld  [ %l3 + 0x310 ], %g1
0x17414 <argstr+344>:   add  %g1, -1, %g1		# <-- pc
0x17418 <argstr+348>:   cmp  %g1, 0			# <-- npc

(gdb) bt
#0  0x00017414 in argstr ()
#1  0x000171c4 in expandarg ()
#2  0x0001461c in evalcommand ()
#3  0x0001396c in evaltree ()
#4  0x000142ec in evalbackcmd ()
#5  0x00017c70 in expbackq ()
#6  0x000174ac in argstr ()
#7  0x000171c4 in expandarg ()
#8  0x00014738 in evalcommand ()
#9  0x0001396c in evaltree ()
#10 0x00013954 in evaltree ()
#11 0x00013904 in evaltree ()
#12 0x00013904 in evaltree ()
#13 0x00013904 in evaltree ()
#14 0x00013904 in evaltree ()
#15 0x00013cc0 in evalfor ()
#16 0x00013a90 in evaltree ()
#17 0x00013954 in evaltree ()
#18 0x00013e20 in evalcase ()
#19 0x00013aa4 in evaltree ()
#20 0x00013954 in evaltree ()
#21 0x00013cc0 in evalfor ()
#22 0x00013a90 in evaltree ()
#23 0x0001edb8 in cmdloop ()
#24 0x0001eaa4 in main ()
#25 0x00011954 in ___start ()

(gdb) i r
g0             0x0      0
g1             0xe804068c       -402389364
g2             0xe7ffde88       -402661752
g3             0x140    320
g4             0xe7ffde90       -402661744
g5             0xff6c606a       -9674646
g6             0x0      0
g7             0x0      0
o0             0x426af  272047
o1             0x344    836
o2             0x192a8  103080
o3             0xf2d2bfb0       -221069392
o4             0x44     68
o5             0x0      0
sp             0xe7ffde28       3892305448
o7             0x196e0  104160
l0             0x81000000       -2130706432
l1             0x81     129
l2             0x81     129
l3             0x31000  200704
l4             0x0      0
l5             0x0      0
l6             0x1      1
l7             0x30c00  199680
i0             0x3d726  251686
i1             0x3      3
i2             0xfffffffc       -4
i3             0xf24443fc       -230407172
i4             0x0      0
i5             0x1      1
fp             0xe7ffde90       3892305552
i7             0x171bc  94652
y              0xb773   46963
psr            0x4900087        76546183        icc:N--C, pil:0, s:1, ps:0, et:0, cwp:7
wim            0x0      0
tbr            0x0      0
pc             0x17414  95252
npc            0x17418  95256
fpsr           0x0      0       rd:N, tem:0, ns:0, ver:0, ftt:0, qne:0, fcc:=, aexc:0, cexc:0
cpsr           0x0      0


g1 looks suspicious (== fault address).  Preceding instructions that
loads g1 pick the data from

(gdb) p/x $l3+0x310
$1 = 0x31310
(gdb) x/x $l3+0x310
0x31310 <sstrnleft>:    0x0000013f


Any attempts to ktrace or run config.status under gdb make the bug
hide.

Any ideas?  I have no theory as to what might possibly cause this.  On
one hand, it seems like a timing issue, as I sometimes get second core
and sometimes I don't.  On the other hand, the backtrace and registers
are always the same (from invocation to invocation, and in both cores
from the same invocation if there are two).

SY, Uwe
-- 
uwe@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen