Subject: Re: poor error reporting when ld.so is missing
To: None <tech-kern@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 04/16/2001 14:58:09
[ On Monday, April 16, 2001 at 12:10:34 (+0100), Ben Harris wrote: ]
> Subject: Re: poor error reporting when ld.so is missing

It's not just a "poor error", it's totally bogus, i.e. completely wrong
and misleading!

> In article <1erxkx8.1wljwfn1ty2u10M%p99dreyf@criens.u-psud.fr> you write:
> > We have exactly the same problem for native NetBSD dynamic binaries. Try
> > moving /usr/libexec/ld.elf_so, and run a dynamically linked program (gcc
> > for instance). The error you get does not tell you what is wrong.

Indeed!  I discovered this just the other day, much to my surprise, when
trying to make use of dynamically linked programs on the hard disk while
using the rescue floppy....

I've not looked in detail at how ELF dynamic programs bootstrap
ld.elf_so, but I was somewhat surprised to get errors indicating that my
shell was trying to execute either the shared libraries or the binary itself!

> > There is nothing in errno.h such as "can't find ld.so". Should we use an
> > existing error code (and which one is revelant here), or should we add a
> > new one? I beleive adding an errno is not a good ifdea for portability
> > reason. Is there any other way of doing it?

Indeed a new value for errno would definitely NOT be appropriate.
ENOENT would be about the best errno value to expect in this case
though....

The correct message would hopefully look something like:

	<basename of argv[0]>: /usr/libexec/ld.elf_so: No such file or directory
 
> I'm inclined to feel that following the behaviour of shell scripts whose
> interpreter is missing, and returning ENOENT (or whatever else we get when
> we try to access ld.so) would be a sensible compromise.
 
Well at least for some shells even that's still not very meaninful:

	$ sh  
	$ tmp/fooit
	tmp/fooit: not found
	$ ksh
	$ tmp/fooit    
	ksh: tmp/fooit: tmp/fooit: No such file or directory

	$ cat tmp/fooit
	#! /bin/barit
	blah
	$ 

Note that the error messages above should have read more like:

	SHELL: tmp/fooit: /bin/barit: No such file or directory

I thought I'd fixed these in my source tree, but it seems not....

It's a little tricky to get this right though since the magic "#! /"
support is deep in the kernel and the shell is simply trying to handle
the error from execve() in a sensible way.  There's still some confusion
in the implementation of some shells dating back to how they worked on
systems that did not have magic "#! /" support -- they try to interpret
the target themselves as if it's a script.  That's sort of what's
happening when "ld.[elf_]so" goes missing.  On systems with magic "#! /"
support the interpreter should never try to interpret an executable file
that execve() fails to run -- such things can NEVER be valid scripts
(unless the programmer broke them by removing the "#! /" magic, but then
they're broken now aren't they).

(Note of course that in 4.3BSD and newer the magic is now just "#!", and
it's not really "magic" any more.)

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>