Re: sysctl_doeproc() race

To: Robert Elz <kre%munnari.OZ.AU@localhost>, Martin Husemann <martin%duskware.de@localhost>
Subject: Re: sysctl_doeproc() race
From: Kamil Rytarowski <n54%gmx.com@localhost>
Date: Sun, 11 Mar 2018 16:02:57 +0100

On 11.03.2018 13:00, Robert Elz wrote:
>     Date:        Sun, 11 Mar 2018 11:06:33 +0100
>     From:        Martin Husemann <martin%duskware.de@localhost>
>     Message-ID:  <20180311100633.GE23416%mail.duskware.de@localhost>
> 
>   | I don't get this part - how would we end up with the new process using
>   | the same pid?
> 
> From what I can see from glancing at the code, the issue is an attempt to
> monitor an unrelated process - one that is neither a child, nor being ptrace'd.
> 
> That process can exit, its zombie be cleaned up, and then a new process
> created which happens to have the same pid as the previous one had
> (most of this is intended to happen quite quickly, but there's no guarantee
> of that - the process doing the monitoring could be suspended and wake up
> days after the process it was looking for vanished).
> 
> All this is inherantly unreliable, and nothing that is done, beyond adding a 
> whole new mechanism to hold a process that some other process has an
> interest it, will ever fix it.
> 

POSIX people told me that polling of a process entity is reliable only
for parent (whether a real one or a tracer in non-POSIX extensions).

> Kamil:   What I don't understand is how you were ever getting the process
> returned twice?   You're using the sysctl to look for a specific pid right.?
> 

Yes.

> When that pid is found, the sysctl code should simply copy out the relevant
> datea, and return.   There's no point searching further in the lists, one pid
> can only exist once at a time - once found it is found....
> 

We can use pid lookup in this particular case calling proc_find_raw().

I treat this as just an optimization (something nice to have, but not
important now).

This will not solve the bug for other sysctl_doeproc() cases and a fix
is still needed.

> If that is not the way the sysctl lookup code is working then we should
> probably fix it.   There cannot be 2 processes with pid N at any one
> instant, so looking for a specific pid should only ever be able to return 1
> (or "not found" of course).
> 

Correct.

> It is possible to not find it, depending on what kind of locking the finding
> code is doing (for this, just being an "observation" interface, I'd assume the
> minimum possible) even though it exists, if the lists are changing underneath
> the search - but given the nature of what happens to a process, a search
> of zombproc, allproc, zombproc (stopping when found) will either find the 
> process or the process does not exist - and possibly just allproc followed
> by zombproc searches would work as well.
> 

We already use markers so I prefer to stick to this solution and it
makes this code reliable.

I was testing the reproducer for the patch of mine and after a long time
of concurrent execution of the tests I have not observed a single crash.
There are also no ATF regressions observed.

For the completeness I'm going to test the approach proposed by Christos
whether it is stable, and I plan to commit the patch of mine as a
personal preference.

> kre
>

Attachment: signature.asc
Description: OpenPGP digital signature

References:
- Re: sysctl_doeproc() race
  - From: Martin Husemann
- Re: sysctl_doeproc() race
  - From: Kamil Rytarowski
- sysctl_doeproc() race
  - From: Kamil Rytarowski
- Re: sysctl_doeproc() race
  - From: Christos Zoulas
- Re: sysctl_doeproc() race
  - From: Robert Elz
- Re: sysctl_doeproc() race
  - From: Kamil Rytarowski
- Re: sysctl_doeproc() race
  - From: Robert Elz

Prev by Date: Re: sysctl_doeproc() race
Next by Date: Re: sysctl_doeproc() race
Previous by Thread: Re: sysctl_doeproc() race
Next by Thread: Re: sysctl_doeproc() race
Indexes:

Home | Main Index | Thread Index | Old Index