On 11.03.2018 13:00, Robert Elz wrote: > Date: Sun, 11 Mar 2018 11:06:33 +0100 > From: Martin Husemann <martin%duskware.de@localhost> > Message-ID: <20180311100633.GE23416%mail.duskware.de@localhost> > > | I don't get this part - how would we end up with the new process using > | the same pid? > > From what I can see from glancing at the code, the issue is an attempt to > monitor an unrelated process - one that is neither a child, nor being ptrace'd. > > That process can exit, its zombie be cleaned up, and then a new process > created which happens to have the same pid as the previous one had > (most of this is intended to happen quite quickly, but there's no guarantee > of that - the process doing the monitoring could be suspended and wake up > days after the process it was looking for vanished). > > All this is inherantly unreliable, and nothing that is done, beyond adding a > whole new mechanism to hold a process that some other process has an > interest it, will ever fix it. > POSIX people told me that polling of a process entity is reliable only for parent (whether a real one or a tracer in non-POSIX extensions). > Kamil: What I don't understand is how you were ever getting the process > returned twice? You're using the sysctl to look for a specific pid right.? > Yes. > When that pid is found, the sysctl code should simply copy out the relevant > datea, and return. There's no point searching further in the lists, one pid > can only exist once at a time - once found it is found.... > We can use pid lookup in this particular case calling proc_find_raw(). I treat this as just an optimization (something nice to have, but not important now). This will not solve the bug for other sysctl_doeproc() cases and a fix is still needed. > If that is not the way the sysctl lookup code is working then we should > probably fix it. There cannot be 2 processes with pid N at any one > instant, so looking for a specific pid should only ever be able to return 1 > (or "not found" of course). > Correct. > It is possible to not find it, depending on what kind of locking the finding > code is doing (for this, just being an "observation" interface, I'd assume the > minimum possible) even though it exists, if the lists are changing underneath > the search - but given the nature of what happens to a process, a search > of zombproc, allproc, zombproc (stopping when found) will either find the > process or the process does not exist - and possibly just allproc followed > by zombproc searches would work as well. > We already use markers so I prefer to stick to this solution and it makes this code reliable. I was testing the reproducer for the patch of mine and after a long time of concurrent execution of the tests I have not observed a single crash. There are also no ATF regressions observed. For the completeness I'm going to test the approach proposed by Christos whether it is stable, and I plan to commit the patch of mine as a personal preference. > kre >
Attachment:
signature.asc
Description: OpenPGP digital signature