Subject: NetBSD-1.0 (Nov07) and old binaries
To: None <current-users@netbsd.org>
From: Simon J. Gerraty <sjg@zen.void.oz.au>
List: current-users
Date: 11/09/1994 09:10:06
Further to my bleetings yesterday.

I built a kernel from the tar_files of Nov 7, with KTRACE enabled and
ran that offending BSDi binary.  The ktrace suuports my theory that
the problem was a file locking issue.  I believe (with little evidence)
that its related to 32 bit off_t's.  There may be another issue
lurking here too...

First the tail of the ktrace with some comments.  At the end is a
brief discussion of supporting old binaries like this.

(ok here is the start...)

   184 ktrace   RET   ktrace 0
   184 ktrace   CALL  execve(0xf7bfdb91,0xf7bfdaf0,0xf7bfdafc)
   184 ktrace   NAMI  "/usr/MHSnet/_lib/netstate"
   184 netstate RET   execve 0
	...
	...
   184 netstate CALL  open(0x36000,0,0x33a50)
   184 netstate NAMI  "/var/spool/MHSnet/_lib/privsfile"
   184 netstate RET   open -1 errno 2 No such file or directory

Up to this point everything is cool.

	...
	
The following looks a bit sus - maybe there is some doco somewhere I
need to read about changes from 0.9a to 1.0...
Pointers appreciated.

   184 netstate CALL  setgid(0x1)
   184 netstate RET   setgid -1 errno 1 Operation not permitted
   184 netstate CALL  geteuid
   184 netstate RET   geteuid 1
   184 netstate CALL  setuid(0x1)
   184 netstate RET   setuid -1 errno 1 Operation not permitted
   184 netstate CALL  geteuid
   184 netstate RET   geteuid 1
   184 netstate CALL  geteuid
   184 netstate RET   geteuid 1
   184 netstate CALL  break(0x42ffc)
   184 netstate RET   break 0
   184 netstate CALL  sigprocmask(0x1,0)
   184 netstate RET   sigprocmask 0
   184 netstate CALL  sigaction(0xe,0xf7bfd9fc,0xf7bfd9f0)
   184 netstate RET   sigaction 0
   184 netstate CALL  setitimer(0,0xf7bfd9f4,0xf7bfd9e4)
   184 netstate RET   setitimer 0
   184 netstate CALL  old.stat(0x360c0,0xf7bfda20)
   184 netstate NAMI  "/var/spool/MHSnet/_state/lock"
   184 netstate RET   old.stat 0
   184 netstate CALL  open(0x360c0,0x2,0xf7bfdb0c)
   184 netstate NAMI  "/var/spool/MHSnet/_state/lock"
   184 netstate RET   open 3

Ok here is the killer.  According to fcntl.h 0x9 is 
#define	F_SETLKW	9		/* F_SETLK; wait if blocked */

   184 netstate CALL  fcntl(0x3,0x9,0xf7bfd9e4)
   184 netstate RET   fcntl -1 errno 22 Invalid argument
   184 netstate CALL  ioctl(0x2,0x402c7413 ,0xf7bfd964)
   184 netstate RET   ioctl 0
   184 netstate CALL  write(0x2,0xf7bfd2f4,0xa)
   184 netstate GIO   fd 2 wrote 10 bytes
       "netstate: "
   184 netstate RET   write 10/0xa
   184 netstate CALL  write(0x2,0x2e3d4,0xc)
   184 netstate GIO   fd 2 wrote 12 bytes
       "system error"
   184 netstate RET   write 12/0xc
   184 netstate CALL  write(0x2,0x2e479,0x4)
   184 netstate GIO   fd 2 wrote 4 bytes
       " -- "
   184 netstate RET   write 4
   184 netstate CALL  write(0x2,0xf7bfd300,0x2e)
   184 netstate GIO   fd 2 wrote 46 bytes
       "Could not lock "/var/spool/MHSnet/_state/lock""
   184 netstate RET   write 46/0x2e
   184 netstate CALL  write(0x2,0xf7bfd314,0x12)
   184 netstate GIO   fd 2 wrote 18 bytes
       ": Invalid argument"
   184 netstate RET   write 18/0x12
   184 netstate CALL  write(0x2,0x2f73f,0x1)
   184 netstate GIO   fd 2 wrote 1 bytes
       "
       "
   184 netstate RET   write 1
   184 netstate CALL  exit(0x47)

Now my _guess_ is that the kernel does not like the 32bit off_t's in the
flock struct.

My question is:  

Since we have old.stat(), old.lseek() etc to support
old bins, do we need an old.fcntl() ?  Or would adding some checks to
fcntl() do the trick.

It may be considered too late to introduce an old.fcntl()...
Perhaps some extra logic in fcntl() would suffice.
In the case above, the kernel is rejecting the flock struct because
the values don't look right (guessing), it could try again using
off32_t and see if that looks better.  Not as reliable as an
old.fcntl() but...

Is this something that is being addressed?