tech-kern: Re: Real vfork() (was: third results)

Subject: Re: Real vfork() (was: third results)
To: None <tech-kern@NetBSD.ORG>
From: None <jiho@postal.c-zone.net>
List: tech-kern
Date: 04/09/1998 09:10:57
On 09-Apr-98 Greg A. Woods wrote:

> The unix process model normally ensures that processes cannot muck about
> in each other's address space (shared memory wasn't commonly implemented
> in unix at the time vfork() was invented).  vfork() allows the child to
> muck about in the parent's storage, and even to mess with file pointers
> and such.  Even with the most intimate parent/child relationship, where
> it's effectively the "same" program doing these things to itself, I'd
> say that at best it's a poor excuse, and at worst it is indeed "bad
> programming practice".

This issue is probably an irresolvable conflict of interests.

But I'm not quibbling about whether it actually is bad programming practice.

I'm saying that making holy war on perceived "bad programming practice" is
insufficient reason to foul up a call specification.  Especially if doing so
causes other problems (as it always will).

There will _always_ be bad programming practices.  You can't run around trying
to eradicate them by sabotaging specifications.

Who is to issue the edict on what is and is not "bad programming practice"?

Is BSD itself entirely free of same?

> Are you saying there was no reason given for the half-baked vfork() that
> was in 4.4BSD?

I've not been clear on the reason, save the aforementioned "bad programming
practice".
 
>> But they didn't offer anything better.
>
> I beg to differ.  They did offer something nearly as good in terms of
> performance, and infinitely better in terms of elegance.
>
> (i.e. copy-on-write to preserve performance, and a "unified" fork()
> interface to relieve the application programmer from the burden of
> having to figure out when the performance gain could be made use of.)

Well, I guess I can see this point, at least from the theoretical design
perspective.  I think in practice, however, there have been unintended
consequences for the vm system.

>> If you mean that the original vfork() was a "stupid trick", I think I'd say
>> that do an exec, fork() would be even more so.
>
> I'm not sure I understand what your getting at here....

First, I should have typed "that to do an exec".

I see fork() as an extremely cumbersome holdover from a much more primitive era
in the history of UNIX.  It has the user process doing a lot of the kernel's
work in launching a child, and most of that is just digging a hole to fill it
in.  And although the original vfork() was an improvement, it really only met
the complaint halfway, and in so doing opened the window of opportunity for all
the abuses you (and CSRG) complain about.

The real solution would have been a single system call, to atomically do the
combined equivalent of vfork and execve.  That seems unlikely now, however.

>> I also think there's more than just a minor performance gain at issue here. 
>> The copy-on-write behavior of the 4.4BSD vfork() causes all kinds of havoc in
>> the vm system.  That havoc doesn't go away with an execve(), and duplicate
>> pages pile up.  I haven't thoroughly evaluated just how big a problem yet,
>> but there's no question a shared vmspace vfork() would/should alleviate a
>> problem for the vm (whichever vm you use).
>
> I wouldn't call it "all kinds of havoc".  Because the parent can keep on
> running in a normal fork(), various VM data structures do indeed have to
> be copied.
>
> Remember that the original purpose of vfork() was to avoid having to
> copy whole processes, sometimes on the swap device too.  The cost of
> copying a few page table entries and such is minor compared to copying
> an entire processes' address space, and extremely minor compared to
> doing swap I/O.

Yes, it's certainly better than that, but it's also more than "a few page table
entries and such."

Any vm object not explicitly given another inheritence type is, by default,
given copy-on-write inheritence.  So any object not already copy-on-write
becomes copy-on-write at fork time.  This applies in particular to anonymous
(zero-fill) memory, which at allocation has no particular mapping
characteristics.

So once having forked, any page written to gets duplicated, including anonymous
memory.  For character mode commands this can be a few dozen kilobytes at a
time, which sounds harmless but adds up quickly on a heavily loaded server. 
With X, it could be multiple megabytes at a time.

The CSRG book acknowledges that 4.4BSD was known to be "unstable at normal
workstation loads".

But I'm having trouble getting in a position to test all of this thoroughly.  I
have patched files ready to go to build a 1.2 kernel with __vfork14(), which I
could boot as an alternative for testing purposes.  But now I have to look at
all the programs that would run during the test!

The first problem is, naturally, init.  It uses fork(), not vfork(), and it
does so fairly inefficiently at that.  Simply rearranging the code in certain
places would avoid most of its page duplications, even with fork(), and would
make it suitable for converting to vfork().  (Ironically, it already uses
_exit(2) instead of exit(3), and that apparently is essential to successful
use of __vfork14().)

Lord knows what's in other programs.

Then there's the C library, which needs to be rebuilt with a patched popen(). 
(You convert exit(3) to _exit(2).)  The good news is that, as an experiment I
converted /bin and /sbin to shared linking some time ago, so I wouldn't have to
recompile those directories due to libc.  The bad new is, as seen from init,
that isn't the only reason you might need to recompile a program.

I've raised a question, and now it's very slow slogging to answer it.


--Jim Howard  <jiho@mail.c-zone.net>


----------------------------------
E-Mail: jiho@mail.c-zone.net
Date: 09-Apr-98
Time: 09:10:59

This message was sent by XFMail
----------------------------------