Subject: None
To: Ty Sarna <tsarna@endicor.com>
From: Alistair G. Crooks <azcb0@uts.amdahl.com>
List: current-users
Date: 12/18/1995 04:08:17
Sorry in advance for the length of this message - I thought some
others might be interested...

> In article <199512172035.OAA00797@sierra.zyzzyva.com>,
> Randy Terbush  <randy@zyzzyva.com> wrote:
> > Would it be possible for one of those with access to sync up
> > the libpthreads library to the latest release, and include
> > this in the src/lib/Makefile for compile?  I would like to
> > seriously look at using this for the Java port, and would like
> > to know that there is some group support for libpthreads.
> 
> Has anyone looked at whoever-it-was's rfork() and lightweight locking
> primitives support (for FreeBSD, I think, but I think he wrote a message
> to one of teh NetBSD lists about it, indicating a willingness to port
> it), and doing threads on top of that (may have already been done by
> him). Kernel-based tread support seems like it would be much better than
> the simulation NetBSD's libpthreads provides, clever though it may be.

It's plan9's rfork, and I mentioned it to current-users in October:

> 4. There's a whole lot of discussion going on on the FreeBSD hacker's
> list with respect  to kernel threads, and implementations of Plan9's
> rfork mechanism (upon which the smart money seems to be betting).
> 
> > From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> > Date: Fri, 20 Oct 1995 09:10:43 -0400 (EDT)
> > Subject: Re: NetBSD/FreeBSD (pthreads)
> > 
> > I implemented a simple version of plan9 rfork() a little while ago (well, 
> > a year ago). You could rfork and end up with shared data space and file 
> > table. I also implemented a very simple lock/unlock primitive that was 
> > far more efficient than system v semaphores, since in the common case 
> > (no contention for a lock) there's no jump to the kernel to wake up 
> > other procs when you acquire or free a lock. Between these two things you 
> > can do a lot: share data, share locked structures, share open files, etc. 
> > I can do all of what i commonly do with kernel threads on, e.g., Irix. In 
> > fact I implemented a simple user-space distributed shared memory with 
> > these basic parts: on sgi's i used their kernel threads/kernel mutex 
> > code, on freebsd i used rfork/lock code i built. 
> > 
> > For my money this is about as good as kernel threads. There's not the 
> > additional complexity in the kernel (have you ever seen what LWP did to 
> > sunos? No? good.). 
> > 
> > This code has been available gratis for a year. I can't convince anyone 
> > to pull it into core for netbsd or freebsd, but I'll make the offer 
> > again: you want it, let me know. The code, btw, is less than 100 lines 
> > for each change. In fact the fastlock code is something like 25 lines. 
> > I've implemented them as LKMs and directly as part of the kernel.
> > 
> > ron
> > 
> > Ron Minnich                |Like a knife through Daddy's heart: 
> > rminnich@earth.sarnoff.com |"Don't make fun of Windows, daddy! It takes care
> > (609)-734-3120             | of all my files and it's reliable and I like it".
> 
> [The Plan 9 folks report that rfork is a win - there are very rarely
> two occurences of rfork in their code with the same resource flags.
> For more information on the Plan9 stuff, see
> http://plan9.att.com/plan9/doc/9.html
> And if you're interested in Plan9, there's an interesting effort
> called VSTa, that does a lot of Plan9y things. GPLed, though. If
> you're interested, mail me for more info. -agc ]

[And, as another aside, I have seen what LWP did to SunOS, and I was
not impressed.]

To someone else who wondered when symlinks arrived, I'm fairly sure it
was 4.2 - 4.1c manual pages certainly didn't have anything about them,
as I was chastened to find out after coming out worse in a news
confrontation with Guy Harris (those were the days...)

More information on the actual implementation of rfork came from the
author in 3 separate messages:

> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 14:43:02 -0400 (EDT)
> Subject: anatomy of rfork, part 1: minherit
> 
> i've had enough q's on this, and time is tight, so i thought i'd just put 
> out a few messages on how to do rfork. The code is small, so bear with 
> me. 
> 
> To do rfork as i needed it, you really need two parts to start with: a 
> way to share data after fork and a way to share file tables after fork. 
> AIX/370 implemented DCE threads with these two things. I thought i'd show 
> minherit first. I don't know the plan9 environment erasing stuff, 
> although that is pretty easy to add -- could be useful. 
> 
> minherit is shown below. Calls are much like mprotect: 
> minherit(caddr, len, new inherit values)
> 
> Look in vm/vm_inherit.h
> 
> All you need to do is take the mprotect call code and redo it just a bit 
> so it calls vm_map_inherit. Here we go: 
> 
> struct mprotect_args {
> 	caddr_t	addr;
> 	int	len;
> 	int	inherit;
> };
> int
> minherit(p, uap, retval)
> 	struct proc *p;
> 	struct mprotect_args *uap;
> 	int *retval;
> {
> 	vm_offset_t addr;
> 	vm_size_t size;
> 	register vm_inherit_t inherit;
> 
> #ifdef DEBUG
> 	printf("minherit(%d): addr %x len %x prot %d\n",
> 		       p->p_pid, uap->addr, uap->len, uap->inherit);
> #endif
> 
> 	addr = (vm_offset_t)uap->addr;
> 	if ((addr & PAGE_MASK) || uap->len < 0)
> 		return(EINVAL);
> 	size = (vm_size_t)uap->len;
> 	inherit = uap->inherit;
> 
> 	switch (vm_map_inherit(&p->p_vmspace->vm_map, addr, addr+size, 
> 			inherit)) {
> 	case KERN_SUCCESS:
> #ifdef DEBUG
> 	printf("works\n");
> #endif
> 		return (0);
> 	case KERN_PROTECTION_FAILURE:
> #ifdef DEBUG
> 	printf("fails\n");
> #endif
> 		return (EACCES);
> 	}
> #ifdef DEBUG
> 	printf("return einval\n");
> #endif
> 	return (EINVAL);
> }
> 
> 
> ------------------------------
> 
> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 15:02:53 -0400 (EDT)
> Subject: Re: anatomy of rfork, part 2: fork code
> 
> This one is really easy. Basically you have to mod the fork code to take
> an option that indicates whether you dup the open file table for the
> process or simply bump the use count and use it for the child. The segment
> inheritance management has been done at this point: it gets done in user
> mode via the minherit() i showed in the previous note. I delete the middle
> parts that don't change ... it's about 10 lines of difference from a
> regular fork. 
> 
> Points to note: parameter from user mode, which if has bit 0x80 set, 
> means 'dup the file table'. SO i set the dupfd variable at the beginning. 
> At the end, code decides to either dupfd() or just bump counters. Note in 
> include sys/vnode.h, and to make it work correctly, i have to under 
> KERNEL before and redefine it after the include. Ah well ... i think this 
> oughtta get fixed somehow. 
> 
> note the implication of the option: fork is a special case of rfork. 
> 
> /*
>  * Copyright (c) 1982, 1986, 1989, 1991, 1993
>  *	The Regents of the University of California.  All rights reserved.
>  * (c) UNIX System Laboratories, Inc.
>  * All or some portions of this file are derived from material licensed
>  * to the University of California by American Telephone and Telegraph
>  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
>  * the permission of UNIX System Laboratories, Inc.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the distribution.
>  * 3. All advertising materials mentioning features or use of this software
>  *    must display the following acknowledgement:
>  *	This product includes software developed by the University of
>  *	California, Berkeley and its contributors.
>  * 4. Neither the name of the University nor the names of its contributors
>  *    may be used to endorse or promote products derived from this software
>  *    without specific prior written permission.
>  *
>  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
>  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
>  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>  * SUCH DAMAGE.
>  *
>  *	@(#)kern_fork.c	8.6 (Berkeley) 4/8/94
>  */
> 
> #include <sys/param.h>
> #include <sys/systm.h>
> #include <sys/filedesc.h>
> #include <sys/kernel.h>
> #include <sys/malloc.h>
> #include <sys/proc.h>
> #include <sys/resourcevar.h>
> #include <sys/file.h>
> #include <sys/acct.h>
> #include <sys/ktrace.h>
> 
> /* oh, yuck */
> /* this is due to include of vnode_if.h, which is automatically
>  * generated. ouch.
>  */
> #undef KERNEL 
> #include <sys/vnode.h>
> #define KERNEL
> #define VREF(vp)        (vp)->v_usecount++      /* increase reference */
> 
> struct rfa { int opts; };
> /* ARGSUSED */
> rfork(p1, uap, retval)
>       struct proc *p1;
>       struct rfa  *uap;
>       int retval[];
> {     
> 	int dupfd = 0; /* added for rfork() */
> 
> 	register struct proc *p2;
> 	register uid_t uid;
> 	struct proc *newproc;
> 	struct proc **hash;
> 	int count;
> 	static int nextpid, pidchecked = 0;
> 
> 	if (uap->opts&0x80)
> 		dupfd = 1;
> 
> 
> 	/* DUPLICATE FORK CODE DELETED HERE ... */
> 	.
> 	.
> 	.
> 	/* END DELETED FORK CODE */
> 	/* bump references to the text vnode (for procfs) */
> 	p2->p_textvp = p1->p_textvp;
> 	if (p2->p_textvp)
> 		VREF(p2->p_textvp);
> 
> 	/* BEGIN CHANGED CODE FOR RFORK FOR DUPFD () */
>        if (dupfd)
>                p2->p_fd = fdcopy(p1);
>        else
>                {
>                        /* make this a function at some point */
>                        /* danger!!! no locks!!! */
>                        p2->p_fd = p1->p_fd;
>                        p2->p_fd->fd_refcnt++;
>                }
> 	/* END CHANGED CODE FOR RFORK() */
> 	/* MORE DELETED UNCHANGED CODE */
> 	/* END DELETED CODE */
> 	/*
> 	 * Return child pid to parent process,
> 	 * marking us as parent via retval[1].
> 	 */
> 	retval[0] = p2->p_pid;
> 	retval[1] = 0;
> 	return (0);
> }
> 
> ------------------------------
> 
> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 15:27:15 -0400 (EDT)
> Subject: rfork part 3: library code
> 
> All this function does is: 
> 1)minherit the data space
> 2) call rfork with zero as the options value
> 3) return values. Only funniness is that you have to fake the return 0 
>    to kid behavior of fork(), so there's fooling around with getpid() before
>    the call and testing of return values after the call.
> 
> Also, there's a call to something called 'syscallfind' in here for the
> modload case. IF anyone wants that code let me know. It uses modstat code
> to find the named syscall number. 
> 
> There you are. rfork in 3 parts. Questions to me.
> 
> ron
> 
> #include <stdio.h>
> #include <sys/param.h>
> #include <vm/vm.h>
> #include <vm/vm_inherit.h>
> 
> int minherit(caddr_t, unsigned int, int);
> 
> int
> rfork(int i)
> {
> 	extern int end, sbrk();
> 	int pid, newpid;
> 	/* until it's a real syscall, we have to fake the zero-return */
> 	unsigned long start, last;
> 	static int rfsyscallnum = -1;
> 
>         if (rfsyscallnum < 0)
>                 rfsyscallnum = syscallfind("rfork");
>         if (rfsyscallnum < 0) {
>                 perror("rfork syscallfind");
>                 return -1;
>         }
> 
> 	/* for the modload version, we don't get two return values, 
> 	 * so we have to fake the fork 'return 0 to kid' behavior
> 	 */
> 	pid = getpid();
> 
> 	start = (unsigned long) ctob(btoc(&end));
> 	last  = sbrk(0);
> 	/* the man page lies: 
> 	 * it won't return page-aligned values from sbrk
> 	 * the seg is actually several pages larger!
> 	 */
> 	last = ctob(btoc(last)+4);
> 	/* may be nothing to share, ignore return errors */
> 
> 	if (minherit(start, last-start, VM_INHERIT_SHARE) < 0)
> 		perror("minherit failed");
> 
> 	newpid =  syscall(rfsyscallnum,i);
> 
> 	if (newpid == pid)
> 		newpid = 0;
> 	return newpid;
> }
> 
> 
> Ron Minnich                |Like a knife through Daddy's heart: 
> rminnich@earth.sarnoff.com |"Don't make fun of Windows, daddy! It takes care
> (609)-734-3120             | of all my files and it's reliable and I like it".

I suppose this just muddies the waters somewhat, but it would be nice
to have, even if just in an OPTIONS_RFORK kernel config option.

Cheers,
Alistair
--
Alistair G. Crooks (agc@uts.amdahl.com)                    +44 125 234 6377
Amdahl European HQ, Dogmersfield Park, Hartley Wintney, Hants RG27 8TE, UK.
[These are only my opinions, and certainly not those of Amdahl Corporation]