tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pwait(1) added



On Sat, Mar 07, 2015 at 07:08:53PM -0500, James K. Lowden wrote:
> On Fri, 6 Mar 2015 13:16:18 +0100
> Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:
> 
> > > Taking the second problem first, ISTM that doesn't require anything
> > > fancy but requires information of what's "expected".  If you build a
> > > database of successful build-times, then cancelling stalled builds
> > > could surely be accomplished by enregistering the start of each
> > > package's build process, and periodically patrolling the tree for
> > > cases when ".done" or whatever wasn't produced in the expected
> > > time.  
> > 
> > Problem with such databases is that they need maintainance, explosions
> > in build time are not uncommon, even more on transistions from failure
> > to success. That's what makes the "doesn't make progress for a while"
> > metric so interesting -- it can work reliably without knowing anything
> > about the build in advance.
> 
> So, IIUC, what you're saying is that you'd like to monitor the build
> process and take note of ... what?  "Doesn't make progress" isn't
> interesting; it's impossible because too vague.  The process is doing
> something.  Are you going to assume that because there's no I/O after N
> minutes that the process is stalled?  

We already have a measure to terminate processes that "do something",
ulimit -t. So if the process is actually using CPU time, it can be
killed without manual intervention. I'm also not really concerned about
fork bombs, I haven't seen such a problem yet. What I have seen is
processes stuck waiting for something to happen. That can be a kind of
zombie with the wrong PID or a dead lock in a multi-threaded program
(I'm looking at you mono!). Forking is a signal of life from a process,
so monitoring it seems to be a pretty reliable way of dealing with the
issues I have seen.

> I recognize you have a lot of experience in this area.  At the same
> time, I doubt the assertion that history is no guide to the present.
> I'm skeptical that *successful* build-times vary much on a given
> machine, surely not by 1 standard deviation.  Since we have no database
> of build-time history, I'm not sure on what basis you disagree.  

Yes, that's exactly the problem. Most of the problematic builds are not
successful :)

Joerg


Home | Main Index | Thread Index | Old Index