Subject: Re: port-arm32/5178: __vfork14 hangs system
To: None <gnats-bugs@NetBSD.ORG>
From: Chris G. Demetriou <cgd@pa.dec.com>
List: netbsd-bugs
Date: 04/16/1998 11:38:26
More details on the arm32 vfork problems (which I have replicated),
based on my (admittedly shaky) understanding of the ARM architecture.

It would appear that there are several bugs here.

A sample program which triggers the bug is:


#include <unistd.h>
#include <sys/wait.h>

main()
{
        char *args[] = { "/bin/echo", "foo", 0 };
        char *env[] = { "PATH=/bin", 0 };
        if (!vfork()) {
#ifndef EXIT_LOSE
                execve(args[0], args, 0);
#endif
                _exit(1);
        }       
        wait(0);
        exit(0);
}


Compiled with or without "EXIT_LOSE" defined, it will kill the system.

My current hypothesis is that the system page isn't mapped in the new
process.

Tracing through the code:

	as of vfork, the child and parent will be sharing the same
	address space.

	as of the exeve, vmspace_exec() will be invoked by the child,
	which will cause a new vmspace to be allocated for the
	child, which in turn will cause a new pmap to be created
	for the child.

	That new pmap will _not_ have the system page mapped, and when
	the process is switched to, and the CPU needs to use the
	system page, the system will hang.


Once things are fixed up so that the test case above works, the code
will still fail with the "EXIT_LOSE"-defined test case.  In that case,
the child will exit, and will go through switch_exit() in cpuswitch.S.
That in turn pmap_remove()'s the system page mapping from the pmap.
Unfortunately, the pmap is shared by multiple processes (the child and
the parent), so then the parent is run, the system will lose.


It looks to me like the correct solution is to treat the system page
mapping the same as other 'kernel' mappings, that is, set up the
mapping when the pmap is initialized and destroy the mapping in
pmap_release() (just like the page dir is set up, and kernel mappings
are copied into the page directory).

That's not particularly efficient, but it should work.  (A trick like
the Alpha pmap uses, making processes which haven't yet allocated
their own l1 page table share a master system page table, would
probably help.)


The workaround that i'm using to make functional kernels is:

Index: kern_fork.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_fork.c,v
retrieving revision 1.41
diff -c -r1.41 kern_fork.c
*** kern_fork.c	1998/04/09 00:23:38	1.41
--- kern_fork.c	1998/04/16 18:41:51
***************
*** 106,112 ****
--- 106,116 ----
  	register_t *retval;
  {
  
+ #ifndef __arm32__
  	return (fork1(p, FORK_PPWAIT|FORK_SHAREVM, retval, NULL));
+ #else
+ 	return (fork1(p, FORK_PPWAIT, retval, NULL));
+ #endif
  }
  
  int


I've changed the severity & priority of the bug to be 'critical' and
'high'.  This is a fatal bug, which can't be worked around except by
adding a relatively obscure bug fix into the system and recompiling
binaries (either the kernel, or a bunch of user-land binaries).
Getting it fixed is essential to the operation of the system built
'out of the box'.



cgd