Subject: Re: -current very unstable on Ultra10 (fwd)
To: Martin Husemann <martin@duskware.de>
From: Arto Huusko <arto.huusko@utu.fi>
List: port-sparc64
Date: 10/31/2004 18:33:17
On Sun, 31 Oct 2004, Martin Husemann wrote:

> On Sun, Oct 31, 2004 at 04:35:24PM +0200, Arto Huusko wrote:
> > The crash occured in function "environment()", called from
> > "evalcommand()". Note that all the sh cores I've looked at seem
> > to always happen when sh is trying to execute some command
> > via `cmd` (or $(cmd)) mechanism. The traces all look more or less
> > similar, and the errors occur in environment().
>
> Could you try downgrading src/bin/sh
>   var.c to revision 1.34 and
>   var.h to revision 1.22
> and then build/reinstall sh?
>
> Please let us know if this fixes things for you. I've ocasionally seen
> crashes in /etc/daily runs now - maybe this is related.

I tested with a small sh script which stresses both environment
vars and $(cmd) construct. Both the sh with the var.[ch] versions
above and the sh  with up-to-date versions crash.

They both crash in the same place, though, on line 412 in var.c,
as far as I can see. The command is ld [ %g2 + 8 ], %g1
The value for %g2 seems to vary wildly, but each time it quite
clearly is invalid.

As I looked at the core with gdb, if I'm reading the sparc
assembly plus the C code right, as far as I can see, it should
be impossible for g2 to get such a wildly incorrect value.
I examined the relevant memory addresses using GDB, and
it seems to me that the environment structures are all OK and
that there is no bug in sh there. The assembly around the
error point is:

environment+40	ldx [%g4], %g2
		brz,a,pn %g2, environment+76
		add %g4, 8, %g4
environment+52	ld [ %g2 + 8 ], %g1
		and %g1, 1, %g1
		ldx [ %g2 ], %g2
		brnz %g2, environemt+52
		add %o0, %g1, %o0
		add %g4, 8, %g4
		cmp %g4, %g3
		bcs,a %xcc environemt+44
		ldx [ %g4 ], %g2

In the latest core, value of %g2 is 0x2000, however %g4
value (and previous positions also) all point to NULL
values. That is [%g4] = NULL, [%g4 - 8] = NULL, etc.
So %g2 should not be able to have that kind of value.
It almost seems like something is corrupting contents of
the g2 register...  however it is very odd that the crash
occurs always in the same spot.