tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pwait(1) added



On Thu, Mar 05, 2015 at 10:09:02PM -0500, James K. Lowden wrote:
> On Tue, 3 Mar 2015 19:47:00 +0100
> Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:
> 
> > Problems we have for pkgsrc bulk builds that could be
> > solved by a more useful process monitor are:
> > 
> > (1) Finding and reporting orphans. Something leaves a
> > bonobo-activation-server around, no idea what.
> > 
> > (2) Reporting hanging builds. Typical examples here is lang/onyx,
> > which often just stops making progress. Another example is lang/mono.
> 
> For the record I have no opinion on pwait.  I am curious about your
> assertion that we need a "process monitor" though.  
> 
> Taking the second problem first, ISTM that doesn't require anything
> fancy but requires information of what's "expected".  If you build a
> database of successful build-times, then cancelling stalled builds
> could surely be accomplished by enregistering the start of each
> package's build process, and periodically patrolling the tree for cases
> when ".done" or whatever wasn't produced in the expected time.  We
> could call it the OOT (out of time) killer after the dread Linux
> OOM-killer.  

Problem with such databases is that they need maintainance, explosions
in build time are not uncommon, even more on transistions from failure
to success. That's what makes the "doesn't make progress for a while"
metric so interesting -- it can work reliably without knowing anything
about the build in advance.

> Finding and reporting orphans is interesting.  But I wonder if a
> "process monitor" is really what's needed.  Couldn't acct(2) be
> enhanced to capture what you need?  It would be an excuse to use the
> facility instead of vaguely remembering it.  ;-)

Possibly, but do you really want to log all processes to disk during a
bulk build with hundreds of forks per second?

Joerg


Home | Main Index | Thread Index | Old Index