Subject: Re: Bug in i386 execl
To: None <current-users@NetBSD.ORG, port-i386@portal.ca, cjs@portal.ca>
From: Wolfgang Solfrank <ws@kurt.tools.de>
List: current-users
Date: 11/19/1996 15:54:28
> #0  0x100434b8 in fnmatch ()
> #1  0x10043500 in execl ()
> #2  0x942c in __rmt_open (path=0xf7bfd7b4 "tape", oflag=0, mode=438, bias=128)
>     at rtapelib.c:377
> #3  0x7fb3 in open_archive (file=0xf7bfda7b "tape@kefron:/dev/rst0")
>     at util.c:721
> #4  0x68c2 in process_args (argc=8, argv=0xf7bfda00) at main.c:404
> #5  0x69da in main (argc=8, argv=0xf7bfda00) at main.c:469
> 
> Unfortunately, here I'm a bit stuck, since execl() in
> /usr/src/lib/libc/gen/exec.c. doesn't call fnmatch. I must be
> missing something somewhere. Can you clue me in so I can try to
> track this down further? I'd like to get a bit more information on
> this if I can before submitting a bug report.

While I can't currently help you with the actual bug, here is an explanation
of the backtrace (including the way I found it):

If you look into lib/libc/gen/execl.c, you'll find that on the i386 execl calls
directly into execve.

Looking into lib/libc/sys/Makefile.inc, you'll find execve.o has a default
implementation on all architectures.  What this means is that it is generated
by the sequence

	printf '#include "SYS.h"\nRSYSCALL(${.PREFIX}\n' | \
		${CPP} -DPROF ${CFLAGS:M-[ID]*} ${AINC} | ${AS} -o ${.TARGET}.o
	${LD} -X -r ${.TARGET}.o -o ${.TARGET}

This essentially means that the source code for execve.S is

#include "SYS.h"
RSYSCALL(execve)

Now looking up the RSYSCALL macro in lib/libc/arch/i386/SYS.h, you'll see
that the above code results in the following assembly code (after some
formatting and for the PIC case, which is relevant for cpio):

	.text
	.align 2
2:	jmp	cerror@PLT
_execve:
	movl	$(SYS_execve),%eax
	int	$0x80
	jc	2b
	ret

If you are not familiar with i386 assembly, this means that the code for
execve places the system call number into the appropriate register, calls into
the kernel, and if there was an error in the system call, to jump to the
label cerror through the dynamic link mechanism.

Now you can see that the jump to the cerror label is before the label naming
the function itself.  When the debugger generates a backtrace, it has to
decide what function was called to get to a particular program counter.
It does this by assuming that the processor reached the particular place in
the code by jumping to the immediately preceding label it can find.  Since the
2: is a temporary label that is stripped during the assembly and cannot be
found in the resulting executable, it just finds the label of the function
that accidently got linked before the execve.o module.

This is the reason that you get this mysterious fnmatch label in the backtrace.

> Any idea whether or not this fails on systems other than the i386?

No real idea, but judging from all of the above it looks like there is a bug
in the handling of dynamic linking.  So it's probably safe to say that the
bug doesn't occur on at least some of the other ports :-).

> Also, if you've got some general clues for a newbie on debugging
> library routines and system calls, I'd like to hear them. My current
> thought for debugging something like this would be to compile a
> copy of fnmatch with debugging information and link it in with the
> program, overriding the copy in the shared library. Will this work?
> Is it sensible?

As you probably can suspect from the above, this wouldn't buy you anything,
as fnmatch doesn't participate in any way with the bug.  It just happened
that fnmatch.o was linked immediately before execve.o.

The first thing to consider in tracing the bug is looking at the instruction
the processor was about to execute (most likely the jmp to cerror immediately
preceding the _execve label).  Assuming that this has indeed the correct
instruction (which it at least had at the return from the first call to execl),
I don't have a good idea off hand why this would result in a segmentation
violation :-(.  And I'm not sure whether this is related to the new bug you
are seeing where there are some arguments missing in the new process.

Hope this at least explains a bit of what you're seeing, while unfortunately
not being of real help :-(.

PS: One thing however that comes to mind with regard to i386 machines: Are
you sure your memory is OK?
--
ws@TooLs.DE     (Wolfgang Solfrank, TooLs GmbH) 	+49-228-985800