On 08.09.2017 04:32, Mateusz Guzik wrote: > The fork1 routine can wait for the child to exit (if vforked) and/or > return the pointer to the child. > > Neither case guarantees the safety of said operation. The key is that > the parent can be ignoring SIGCHLD, which results in autoreaping the > child and the child itself is made runnable. Thus in particular it can > exit and get its struct proc freed. No prevention measures are taken > AFAICS. > > I did not test it on the NetBSD kernel, but it's a standard problem and > the code definitely looks buggy. > > 1. returning the pointer > > The last parameter to fork1 is struct proc **rnewprocp. I found only 2 > uses: > - linux_sys_clone - the parameter is populated from a local variable > which is not used afterwards, i.e. the func can as well start passing > NULL instead > - creation of initproc - since this becomes the only remaining use of > the parameter I think it makes sense to get rid of it altogether. the > simplest hack which comes to mind will simply proc_find_raw the > process after fork return > > 2. vfork > > By the end there is: > while (p2->p_lflag & PL_PPWAIT) > cv_wait(&p1->p_waitcv, proc_lock); > > Retesting after wake up triggers the bug. > > The cleanest solution I'm aware of introduces a dedicated condvar to use > by vfork. The parent uses a local variable for this purpose. Since the > condvar is only used by vfork to signal exec/exit, there is no need > to access the child after waking up. The downside is extention of struct > proc with a pointer. Interestingly this is what they do in Linux. > > FreeBSD does the obvious of grabbing a reference preventing the > destruction of struct proc from going through. > > Solaris suprisingly has a total hack: they look up the pid and see if > found process (if any) has to be waited on. Apart from general ugliness > it's also a minor scalability problem due to extra lock/unlock of the > global process list lock. > > -- > Mateusz Guzik <mjguzik gmail.com <http://gmail.com>> Thank you for the analysis. There is also a correctness problem with ptrace(2). I had trouble to generate the PTRACE_VFORK event. The first one for the parent that calls vfork(2) and reports child's PID; the second one that is generated by vforkee and returns forker's PID. When the child calls exec(2)/exit(2) and parent is resuming, there shall be reported PTRACE_VFORK_DONE once with PID of vforkee. Another topic is that we always want to get PTRACE_FORK and PTRACE_VFORK generated by the parent first and followed by a signal from the child. This deterministic order simplifies a debugger. Once I will sort out bugs with threads (LWPs) under a tracer, I will be back to this topic.
Attachment:
signature.asc
Description: OpenPGP digital signature