Re: pwait(1) added

To: tech-userlevel%netbsd.org@localhost
Subject: Re: pwait(1) added
From: Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
Date: Sun, 8 Mar 2015 20:48:28 +0100

On Sat, Mar 07, 2015 at 07:08:53PM -0500, James K. Lowden wrote:
> On Fri, 6 Mar 2015 13:16:18 +0100
> Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:
> 
> > > Taking the second problem first, ISTM that doesn't require anything
> > > fancy but requires information of what's "expected".  If you build a
> > > database of successful build-times, then cancelling stalled builds
> > > could surely be accomplished by enregistering the start of each
> > > package's build process, and periodically patrolling the tree for
> > > cases when ".done" or whatever wasn't produced in the expected
> > > time.  
> > 
> > Problem with such databases is that they need maintainance, explosions
> > in build time are not uncommon, even more on transistions from failure
> > to success. That's what makes the "doesn't make progress for a while"
> > metric so interesting -- it can work reliably without knowing anything
> > about the build in advance.
> 
> So, IIUC, what you're saying is that you'd like to monitor the build
> process and take note of ... what?  "Doesn't make progress" isn't
> interesting; it's impossible because too vague.  The process is doing
> something.  Are you going to assume that because there's no I/O after N
> minutes that the process is stalled?  

We already have a measure to terminate processes that "do something",
ulimit -t. So if the process is actually using CPU time, it can be
killed without manual intervention. I'm also not really concerned about
fork bombs, I haven't seen such a problem yet. What I have seen is
processes stuck waiting for something to happen. That can be a kind of
zombie with the wrong PID or a dead lock in a multi-threaded program
(I'm looking at you mono!). Forking is a signal of life from a process,
so monitoring it seems to be a pretty reliable way of dealing with the
issues I have seen.

> I recognize you have a lot of experience in this area.  At the same
> time, I doubt the assertion that history is no guide to the present.
> I'm skeptical that *successful* build-times vary much on a given
> machine, surely not by 1 standard deviation.  Since we have no database
> of build-time history, I'm not sure on what basis you disagree.  

Yes, that's exactly the problem. Most of the problematic builds are not
successful :)

Joerg

Follow-Ups:
- Re: pwait(1) added
  - From: Aleksej Saushev

References:
- pwait(1) added
  - From: Christos Zoulas
- Re: pwait(1) added
  - From: Joerg Sonnenberger
- Re: pwait(1) added
  - From: Joerg Sonnenberger

Prev by Date: Re: ACPICA userland utilities reenable support for NetBSD
Next by Date: Re: ACPICA userland utilities reenable support for NetBSD
Previous by Thread: Re: pwait(1) added
Next by Thread: Re: pwait(1) added
Indexes:

Home | Main Index | Thread Index | Old Index