Subject: Re: Real vfork() (was: third results)
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: tech-kern
Date: 04/15/1998 13:45:52
In message <199804141815.LAA14816@lestat.nas.nasa.gov>  Jason Thorpe wrote:
> On Tue, 14 Apr 1998 19:52:24 +0200 
>  Stefan Grefen <grefen@hprc.tandem.com> wrote:
> 
>  > I think all effort should be directed to making fork()'s COW cheaper so that
>  > the remaing benefit of vfork is that it blocks the parent until the 
> 
> A good amount of effort was directed at making COW better in UVM.  And
> an address space-sharing vfork() _still_ turned out to be a win.  It shaves
> several seconds off a build of libc on my 200MHz PPro.

Thats how many percent? 

> 
> I really don't understand why we're arguing about this.  It seems obvious to
> me that, in the cases where it was originally meant to be used, it is a
> performance win, and really nothing else is going to be faster.

I agree, but it is still a kludge and I think the bad practise it creates
(not all people abstain from exploring the unwanted side-effects) 
portability hazards. You have to know if the vm-space is shared or not
unless you restrict yourself to change local variables only.
My personal option is that even a 10% gain doesn't justify this kludge.

But I think we should direct the effort away from creating a spawn()
system call, to do some major changes on the vm side to make vfork() and
COW cheaper. If we would be able do manipulate higher levels than a
page we would reduce the number of entrys to change and page-faults a lot.
I know its a major undertaking, and I don't want us do the stuff SYSV R4
does (which you can still cheat on page-lvel if you want to), because
the overhead for normal operations is significant.
I don't have an answer how to do it in my pocket either ...

> 
> Let's look at what happens when you vfork/exec using the 4.4BSD vfork
> and COW:
> 
> 	- Traverse parent's vm_map, marking the writable portions of the
> 	  address space COW.  This means invoking the pmap, modifying PTEs,
> 	  and flushing the TLB.

Thats because we can't set a whole object/segment to COW. Else you
would traverse only 4 objects + mmap objects.

> 
> 	- Create a vm_map for the child, copy the parent's vm_map entries
> 	  into the child's vm_map.  Optionally, invoke the pmap to copy
> 	  PTEs from the parent's page tables into the child's page tables.

Could be a COW clone. 

> 
> 	- Block parent.
> 
> 	- Child runs.  If PTEs were _not_ copied, take page fault to get
> 	  a physical mapping for the text page at the current program counter.
> 
> 	- Child execs, and unmaps the entire address space that was just
> 	  created, and creates a new one.  This implies that the parent's
> 	  vm_map has to be traversed to mark the COW portions not-COW.

Again only the toplevel objects are affected.

> 
> 	- Unblock parent.
> 
> 	- Parent runs, takes page fault when modifying previously R/W
> 	  data that was marked R/O for COW (no data is copied at this
> 	  time

Takes one per object.

[... vfork stays the same ]

> 
> So, in the case where you're going to fork and then exec, which is going
> to be faster?  Clearly the one that has to do less work.  Even if your
> COW algorithms are good, you still have to do a lot more work compared
> to the vmspace-sharing case!

I think this 'a lot' can be reduced to 'some'. Not 1.3.X not 1.4 but maybe
in 2.0 ?

Stefan

> 
> Jason R. Thorpe                                       thorpej@nas.nasa.gov
> NASA Ames Research Center                            Home: +1 408 866 1912
> NAS: M/S 258-5                                       Work: +1 650 604 0935
> Moffett Field, CA 94035                             Pager: +1 415 428 6939

--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
 --- Hacking's just another word for nothing left to kludge. ---