Subject: COMPAT_LINUX and errno: one more bug?
To: None <tech-kern@netbsd.org>
From: Emmanuel Dreyfus <p99dreyf@criens.u-psud.fr>
List: tech-kern
Date: 02/08/2001 21:01:47
There is something I don't understand in how error returning works for
Linux emulation.

Linux error codes are defined in compat/linux/common/linux_errno.h. In
this file these are positive numbers

Then are used in the native_to_linux_errno array that maps Linux to
NetBSD errors. This happens in compat/linux/common/linux_errno.c There,
there is a minus sign, so the error gets negative.

This array fits in the e_errno field of the emul switch, in
compat/linux/common/linux_exec.c. 

This field is used in arch/<arch>/<arch>/trap.c, line 248 on the
powerpc:
            if (p->p_emul->e_errno)
               error = p->p_emul->e_errno[error];
            frame->fixreg[FIRSTARG] = error;

Hence, with all this stuff, don't we send a negative value to the
userland. Kernel traces show this clearly:

  4489 simplex  NAMI  "/usr/lib/libX11.so.6"
  4489 simplex  RET   open -1 errno -2 No such file or directory

But here is my problem: shouldn't be errno positive numbers, in user
land? This is POSIXized, isn't it? Linux uses negative errno in the
kernel, but it make them positive when returning them to userland. I
thought that when ktrace tell me errno -2, it was errno -2 before
exitting kernel space, and then it was 2, not -2.

I've built a simple test program:
#include <stdio.h>
#include <unistd.h>

extern int errno;
int main (int argc, char** argv) {
  int doncare;

  dontcare=setuid(0);
  printf("errno=%d\n",errno);

  return 0;
}

It outputs 1 in Linux, and -1 in Linux emulation on NetBSD on PowerPC.
There is a big problem here, isn't it? 

This could explain the bug I have with X binaries:
  4489 simplex  CALL  read(0x3,0x7fffe408,0x8)
  4489 simplex  RET   read -1 errno -11 Resource temporarily unavailable
  4489 simplex  CALL  write(0x2,0x7fffbba8,0x53)
  4489 simplex  GIO   fd 2 wrote 83 bytes
       "XIO:  fatal IO error -11 (Unknown error 4294967285) on X server
"10.0.\
        12.137:0.0"\r

As Richard Rauch pointed out, 4294967285 == (2^32) - 11. For me it
explains the bug. 

And now, how to fix it... What is surprising for me is that except for
the final step in trap.c, the code is architecture independent. How can
this works on i386, alpha and m68k????

-- 
Emmanuel Dreyfus.  
Pas de processeur Intel, pas de logiciels Microsoft:
Des programmes sains dans un ordinateur sain.
p99dreyf@criens.u-psud.fr