Subject: Re: Solaris emulation on NetBSD 1.3.2?
To: Todd Vierling <tv@pobox.com>
From: Mark Newton <newton@atdot.dotat.org>
List: port-sparc
Date: 08/25/1998 00:23:48
Todd Vierling wrote:

 > : Out of curiousity, I'm wondering what version(s) of the Solaris libraries
 > : are being used by those who can and can't get this to work?
 > 
 > I can get it to work with all of Solaris 2.3, 2.4, 2.5, and 2.5.1
 > libraries.  I haven't tested 2.6 libraries, but they purportedly work.

Ok, I have some info on this.  I suspect bitrot in the SVR4 code is to
blame.

I've spent the last week (quietly) porting the svr4 emulation code from
NetBSD to FreeBSD.  There are a couple of wierd things which seem to be
stopping X clients and some other programs from working.  I've fixed some
of them, but the latest Netscape Communicator will need something more
than a small fix to get working (it requires a functional
setcontext()/getcontext() pair, and NetBSD currently treats them as noops.
I'm downloading Navigator 3.04 at the moment to see how far back into
history that requirement runs, although I'll have to fix some other
problems before I can test it properly -- see the EAGAIN discussion 
below).

Primarily, I *think* the state machines implemented in svr4_sys_getmsg()
and svr4_sys_putmsg() are a bit broken (not certain, 'cos I don't have
SVR4 sources here to help me :-).  It seems to me that Solaris telnet,
for example, issues putmsg() syscalls with sc.cmd = 3 (something
that isn't defined in svr4_timod.h) to send data over its TCP-based
stream.  Test programs I wrote using normal socket semantics didn't
seem to be affected, which leads me to believe that only programs written
by people warped enough to explicitly use streams care about that 
particular variant of putmsg().

getmsg() is also used in a way not presently handled by NetBSD.  In
particular, sc.cmd = 0 seems to indicate that that a simple read()
from the stream is required (at least, that's how I've handled it).

By fixing these two anomolies, I can get most TCP based services
working under emulation.  I'll include diffs at the end of this message
with (things that I think constitute) fixes.

UDP is another story.  I don't have that working at all, but I'll need
to look further into it to see if it's an anomoly from my porting or
a relic of the NetBSD implementation (it's kinda low on my to-do list
at the moment :-).  If it's a NetBSD issue, that'd explain why people
on this list can't seem to get nameserver resolution to work.

Another thing that strikes me as odd happens during the startup of
X clients.  Shortly after looking at .Xauthority, they all seem to 
issue a read() call on the descriptor they're using for network
connectivity, and that call fails with EAGAIN (Resource temporarily
unavailable).  This yields the following message:

XIO:  fatal IO error 35 (No message of desired type) on X server ":0.0"
      after 7 requests (6 known processed) with 0 events remaining.

Is this the message NetBSD users are seeing when they start X clients?

If you believe the read(2) manpage, EAGAIN (errno 35) is only generated
if you attempt to read a non-blocking descriptor when no data is available.
However, debugging stuff I've inserted into my SVR4 emulation code
suggests that the descriptors involved are *not* non-blocking, so I'm
searching elsewhere in the kernel code at the moment (the streams 
pseudo-device in the kernel directs read() calls to soo_read(), so
it should be in there somewhere).

In any case, that's *probably* a FreeBSD issue, rather than a NetBSD
problem.  Great. :-/  I think I'll have X clients working once I work
out what's going on there...

Diffs for putmsg()/getmsg() follow after the .sig.  Don't expect line
numbers to line up, so you might need a large fuzz-factor on "patch".
I've also edited lots of crap out of the diffs because you probably
don't want to get the complete FreeBSD version of this file :-).  If 
you apply the patch, you'll also have to change read_args and write_args
to sys_read_args and sys_write_args respectively, same goes for read()
and write().

As you can see from the #define before the sendmsg() stuff, this is 
a preliminary fix.  If someone can suggest something better, or has
evidence to show my fixes are completely wrong, I'm certain to listen.
Bear in mind that if I take 'em out programs like "telnet" and "ftp" 
from a Solaris box cease to work;  that's pretty hard to argue with :-)

Anyway, I hope this stuff is useful (even if only for debugging).  

   - mark

--------------------------------------------------------------------
I tried an internal modem,                    newton@atdot.dotat.org
     but it hurt when I walked.                          Mark Newton
----- Voice: +61-4-1958-3414 ------------- Fax: +61-8-83034403 -----


*** svr4_stream.c	Mon Aug 24 23:07:33 1998
--- /mnt/sys/compat/svr4/svr4_stream.c	Mon Aug 10 20:41:16 1998
***************
*** 1471,1502 ****
  
  	switch (st->s_family) {
  	case AF_INET:
! 	        if (sc.len != sizeof(sain)) {
! #define SEND_EXP 3
! 		        if (sc.cmd == SEND_EXP) {
! 			        struct write_args wa;
! 
! 				/* Solaris seems to use sc.cmd = 3 to
! 				 * send "expedited" data.  telnet uses
! 				 * this for options processing, sending EOF,
! 				 * etc.  I'm sure other things use it too.
! 				 * I don't have any documentation
! 				 * on it, so I'm making a guess that this
! 				 * is how it works. newton@atdot.dotat.org XXX
! 				 */
! 				DPRINTF(("sending expedited data (???)\n"));
! 				SCARG(&wa, fd) = SCARG(uap, fd);
! 				SCARG(&wa, buf) = dat.buf;
! 				SCARG(&wa, nbyte) = dat.len;
! 				return write(p, &wa);
! 			}
! 	                DPRINTF(("putmsg: Invalid inet length %ld\n", sc.len));
! 	                return EINVAL;
! 	        }
! 	        netaddr_to_sockaddr_in(&sain, &sc);
! 	        skp = &sain;
! 	        sasize = sizeof(sain);
! 	        error = sain.sin_family != st->s_family;
  		break;
  
  	case AF_LOCAL:
--- 1454,1467 ----
  
  	switch (st->s_family) {
  	case AF_INET:
! 		if (sc.len != sizeof(sain)) {
! 			DPRINTF(("putmsg: Invalid inet length %ld\n", sc.len));
! 			return ENOSYS;
! 		}
! 		netaddr_to_sockaddr_in(&sain, &sc);
! 		skp = &sain;
! 		sasize = sizeof(sain);
! 		error = sain.sin_family != st->s_family;
  		break;
  
  	case AF_LOCAL:
***************
*** 1856,1888 ****
  
  	default:
  		st->s_cmd = sc.cmd;
- 
- 		if (st->s_cmd == 0) {
- 		        struct read_args ra;
- 
- 			/* More wierdness:  Again, I can't find documentation
- 			 * to back this up, but when a process does a generic
- 			 * "getmsg()" call it seems that the command field is
- 			 * zero and the length of the data area is zero.  I
- 			 * think processes expect getmsg() to fill in dat.len
- 			 * after reading at most dat.maxlen octets from the
- 			 * stream.  Since we're using sockets I can let 
- 			 * read() look after it and frob return values
- 			 * appropriately (or inappropriately :-)
- 			 *   -- newton@atdot.dotat.org        XXX
- 			 */
- 			SCARG(&ra, fd) = SCARG(uap, fd);
- 			SCARG(&ra, buf) = dat.buf;
- 			SCARG(&ra, nbyte) = dat.maxlen;
- 			if ((error = read(p, &ra)) != 0) {
- 			        return error;
- 			}
- 			dat.len = *retval;
- 			*retval = 0;
- 			st->s_cmd = SVR4_TI_SENDTO_REQUEST;
- 			break;
- 			
- 		}
  		DPRINTF(("getmsg: Unknown state %x\n", st->s_cmd));
  		return EINVAL;
  	}
--- 1822,1827 ----