Subject: port-sparc64/27730: sh crashing on sparc64
To: None <>
From: None <>
List: netbsd-bugs
Date: 10/31/2004 20:29:29
>Number:         27730
>Category:       port-sparc64
>Synopsis:       sh crashing on sparc64
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 31 18:30:00 UTC 2004
>Originator:     Arto Huusko
>Release:        NetBSD 2.99.10
System: NetBSD 2.99.10 (GENERIC) #0: Sun Oct 31 11:56:29 EET 2004
Architecture: sparc64
Machine: sparc64

NetBSD 2.99.10 (GENERIC) #0: Sun Oct 31 11:56:29 EET 2004
total memory = 320 MB
avail memory = 302 MB
bootpath: /pci@1f,0/pci@1,1/ide@3,0/disk@0,0
mainbus0 (root): SUNW,Ultra-5_10: hostid 80c17025
cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 360 MHz, version 0 FPU
cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 256K external (64 b/l)
psycho0 at mainbus0 addr 0xfffc4000
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 2; PCI bus 0
DVMA map: c0000000 to e0000000
IOTSB: 53c000 to 5bc000
pci0 at psycho0
pci0: i/o space, memory space enabled
ppb0 at pci0 dev 1 function 1: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ebus0 at pci1 dev 1 function 0
ebus0: Sun Microsystems, Inc. PCIO Ebus2, revision 0x01
auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, 72c000-72c003, 72f000-72f003
power at ebus0 addr 724000-724003 ipl 37 not configured
SUNW,pll at ebus0 addr 504000-504002 not configured
sab0 at ebus0 addr 400000-40007f ipl 43: rev 3.2
sabtty0 at sab0 port 0: console i/o
sabtty1 at sab0 port 1
com0 at ebus0 addr 3083f8-3083ff ipl 41: ns16550a, working fifo
kbd0 at com0
com1 at ebus0 addr 3062f8-3062ff ipl 42: ns16550a, working fifo
ms0 at com1
lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 700000-70000f ipl 34
fdthree at ebus0 addr 3023f0-3023f7, 706000-70600f, 720000-720003 ipl 39 not configured
clock0 at ebus0 addr 0-1fff: mk48t59
flashprom at ebus0 addr 0-fffff not configured
audiocs0 at ebus0 addr 200000-2000ff, 702000-70200f, 704000-70400f, 722000-722003 ipl 35 ipl 36: CS4231A
audio0 at audiocs0: full duplex
hme0 at pci1 dev 1 function 1: Sun Happy Meal Ethernet, rev. 1
hme0: interrupting at ivec 3021
hme0: Ethernet address 08:00:20:c1:70:25
nsphy0 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ATI Technologies 3D Rage Pro (VGA display, revision 0x5c) at pci1 dev 2 function 0 not configured
cmdide0 at pci1 dev 3 function 0
cmdide0: CMD Technology PCI0646 (rev. 0x03)
cmdide0: bus-master DMA support present
cmdide0: primary channel configured to native-PCI mode
cmdide0: using ivec 1820 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
atabus1 at cmdide0 channel 1
ppb1 at pci0 dev 1 function 0: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
cmdide1 at pci2 dev 3 function 0
cmdide1: Silicon Image 0680 (rev. 0x02)
cmdide1: bus-master DMA support present
cmdide1: primary channel configured to native-PCI mode
cmdide1: using ivec 18 for native-PCI interrupt
atabus2 at cmdide1 channel 0
cmdide1: secondary channel configured to native-PCI mode
atabus3 at cmdide1 channel 1
pcons at mainbus0 not configured
No counter-timer -- using %tick at 360MHz as system clock.
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <Maxtor 52049H3>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(cmdide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <CRD-8322B, 1998/09/24, 1.05> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2
cd0(cmdide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)

     $NetBSD: crt0.c,v 1.23 2004/08/26 21:21:33 thorpej Exp $
     $NetBSD: alias.c,v 1.12 2003/08/07 09:05:29 agc Exp $
     $NetBSD: cd.c,v 1.34 2003/11/14 20:00:28 dsl Exp $
     $NetBSD: error.c,v 1.31 2003/08/07 09:05:30 agc Exp $
     $NetBSD: eval.c,v 1.80 2004/10/30 19:29:27 christos Exp $
     $NetBSD: exec.c,v 1.37 2003/08/07 09:05:31 agc Exp $
     $NetBSD: expand.c,v 1.67 2004/07/13 15:05:59 seb Exp $
     $NetBSD: histedit.c,v 1.34 2003/10/27 06:19:29 lukem Exp $
     $NetBSD: input.c,v 1.39 2003/08/07 09:05:32 agc Exp $
     $NetBSD: jobs.c,v 1.62 2003/12/18 00:56:05 christos Exp $
     $NetBSD: mail.c,v 1.16 2003/08/07 09:05:33 agc Exp $
     $NetBSD: main.c,v 1.48 2003/09/14 12:09:29 jmmv Exp $
     $NetBSD: memalloc.c,v 1.28 2003/08/07 09:05:34 agc Exp $
     $NetBSD: miscbltin.c,v 1.34 2004/04/19 01:36:32 lukem Exp $
     $NetBSD: mystring.c,v 1.16 2003/08/07 09:05:35 agc Exp $
     $NetBSD: options.c,v 1.37 2004/10/30 19:29:27 christos Exp $
     $NetBSD: parser.c,v 1.57 2004/06/27 10:27:57 dsl Exp $
     $NetBSD: redir.c,v 1.29 2004/07/08 03:57:33 christos Exp $
     $NetBSD: show.c,v 1.26 2003/11/14 10:46:13 dsl Exp $
     $NetBSD: trap.c,v 1.30 2003/08/26 18:13:25 jmmv Exp $
     $NetBSD: output.c,v 1.28 2003/08/07 09:05:36 agc Exp $
     $NetBSD: var.c,v 1.36 2004/10/06 10:23:43 enami Exp $
     $NetBSD: test.c,v 1.25 2002/05/25 23:12:16 wiz Exp $
     $NetBSD: kill.c,v 1.23 2003/08/07 09:05:13 agc Exp $
     $NetBSD: skeleton.c,v 1.25 2003/08/07 11:17:54 agc Exp $
     $NetBSD: arith.y,v 1.17 2003/09/17 17:33:36 jmmv Exp $
     $NetBSD: arith_lex.l,v 1.12 2003/08/07 09:05:30 agc Exp $
     $NetBSD: printf.c,v 1.30 2004/10/30 19:28:10 christos Exp $
	sh crashes a lot on sparc64, but unfortunately this is not
	very reproducable.

	The crashing seems to occur usually in the function environment(),
	called when sh is preparing to fork a new process. Usually the
	reason for crash is segmentation fault or bus error due to
	invalid pointer. For me, it is unclear where the invalid pointer
	comes from -- it seems that it just magically appears to a

	Other than crashing, there are sometimes other peculiar effects.
	For example, my test script at one point printed out:

./ cut: error 14

	After running again, that error did not appear again. I also saw
	something like that from perl configure:

./Configure: grep: error ??

	I also saw (from the same test script) a bunch of

sh in free(): warning: junk pointer, too high to make sense

	errors. Once I ran with MALLOC_OPTIONS=AJ, the errors just did not
	occur again.

	The puzzling thing is that on the one hand the crashes seem somewhat
	random and arbitrary -- sometimes they occur, sometimes they don't --
	but on the other hand, when sh crashes, it is always in the same spot.
	Try running the following script. It may crash, or it may not. It
	may help to run several in paraller, or be doing some other things
	at the same time (or it may not...)

# Generate a bunch of variable names, extract the names, set the vars
# and export.
# A lot of things here are done just for the sake of doing something,
# so that this would stress things a bit more.

VARS=$(jot -w abcd -s : 256)
export VARS
while [ "${VARS}" ]
	VAR=$(echo ${VARS} | cut -d : -f 1)
	printf "%3d " $(echo ${VAR} | sed 's/[abcd]//g')
	VARS=$(echo ${VARS} | (IFS=:; read A B; echo "${B}"))
	export VAR
	VAL=$(echo ${VAR} | tr a-d A-D)
	eval export ${VAR}=${VAL}

	A sparc64 sh image with debugging symbols, and one core file
	can be downloaded from:

	I have had sh crash several times, and the crash has always
	occured in the same place as in the above core.

	Note that the contents of the register %g2 (which contains the
	junk pointer that causes the crash) is not constant;
	it changes with each crash.
	The fix is left as an exercise for the reader.