Subject: Observations on the SVR4/ELF problem
To: None <port-sparc@netbsd.org>
From: Charles M. Hannum <root@ihack.net>
List: port-sparc
Date: 11/29/1999 17:14:30
As Christos pointed out, a -current ELF kernel has intermittent
lossage running Solaris executables. Typically, I'll either see
(paraphrased, since the machine is running Solaris right now):
# ./foo
./foo: ld.so.1: libelf.so.1: corrupt or truncated file
Killed
# ./foo
[works fine]
[...]
# cp -p foo .foo && mv .foo foo
# ./foo
./foo: ld.so.1: libelf.so.1: corrupt or truncated file
Killed
# ./foo
[works fine]
[...]
or:
# ./foo
./foo: ld.so.1: libc.so.1: unknown file type or format
Killed
# ./foo
./foo: ld.so.1: libelf.so.1: unknown file type or format
Killed
# [...]
No Solaris programs I tried worked the first time.
I observe a few things about this lossage:
1) A -current kernel built in a.out format, with tools from around
05/19, does not experience this problem.
2) A kernel built from 05/20 sources with ELF tools does experience
the problem (but see below).
3) Adding `-mno-fpu' to the kernel compile changes the nature of the
lossage somewhat. In particular, after copying the file, running
the copy does *not* cause it to lose the first time. And after one
executable has died with `corrupt or truncated file', all the ones
that would have lost with that failure appear to work.
4) I tried aggressively flushing the TLB, and also aggressively saving
FPU state and disabling the FPU inside the kernel. Neither of
these things appeared to solve the problem or even elicit any
useful information. This suggests, among other things, that the
kernel is not actually using floating point. (Thus one wonders why
`-mno-fpu' seems to change the behavior!)