tech-kern: Re: Real vfork() (was: third results)

Subject: Re: Real vfork() (was: third results)
To: None <jiho@postal.c-zone.net>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 04/09/1998 13:38:06
[ Actually, heck with it... I'll just chime in now before I go off to
  install NetBSD on a machine we're going to do NFS benchmarks on :-]

On Thu, 09 Apr 1998 11:25:47 -0000 (GMT) 
 jiho@postal.c-zone.net wrote:

 > Actually, I've just seen one reason.  Whether it's "good" or not depends on
 > your philosophy.
 > 
 > CSRG stated POSIX compliance as a goal for 4.4BSD, and vfork() is not in the
 > POSIX specification.  It's regarded as a BSD quirk.
 > 
 > In practice, though, it's everywhere.*  Even Linux has it (and oddly, the Linux
 > implementation is identical to 4.4BSD!).
 > 
 > *I suppose you'll say, "Like cockroaches!"

So, vfork(2) was introduced in 3.0BSD because at that time, there was no
COW.  At fork time, the address space had to be copied.  This was determined
to be a waste in the case where a program, such as the shell, would fork,
and immediately exec a new program.

Basically, vfork(2) was used in lieu of a Better Solution, because that
Better Solution was a bit off in the distance.

When the Mach VM, and thus COW, came along, the Better Solution was realized,
and fork(2) was changed to:

	(1) Make the parent's pages COW.

	(2) Copy the parent's VM map entries to the child's VM map.

	(3) Optionally copy physical mappings to eliminate needless page
	    faults in the child.

Since fork(2) now used COW, vfork(2) was changed to simply perserve the
synchronization semantics it always had (parent blocks until child exits
or execs).

Now, when I was doing some other work wrt. VM space sharing between
processes, I realized that it would be trivial to implement the old
vfork(2) semantics.  I did, and did some simple performance measurements.

Lo and behold, it was still MUCH faster.  There are a couple of reasons
for this:

	(1) Copying the VM map and pmap entries is overhead.

	(2) Most (all?!) ports don't actually implement pmap_copy(), so
	    they still have the page fault overhead when the child runs
	    again.

Basically, it's STILL wasteful if you're going to just fork/exec (the
exec simply unmaps the address space so it can map in the new program).

So, we figured that re-enabling the old vfork(2) semantics would be
a win.  Sure, it's a speed hack, but it's a speed hack that's been around
for a fairly long time, and, if one knows the constraints of the interface,
and how to use it correctly, it can be quite effective.

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939