Re: current build failure automated messages

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: current build failure automated messages
From: Andreas Gustafsson <gson%gson.org@localhost>
Date: Tue, 19 Jul 2016 12:40:34 +0300

Robert Elz wrote:
>   | If the page says "Build: OK" at the end, the issue has
>   | been fixed.  At least for me, this is less work overall than it would
>   | be to handle twice the number of emails.
> 
> I actually cannot imagine that being possible for me, one more e-mail to
> delete every few days is nothing, just switching to a browser and waiting
> for it to page in takes an order of magnitude longer - let alone the
> startup time if I don't have one running (which is not unusual) - plus
> that I can read e-mail on a text terminal trivially, and while that web
> page would not be hard for a text browser to process, it just seems wrong
> to me...

I can appreciate that - different people have different preferences
and workflows.  I'd like to hear the opinions of other developers -
if there is a consensus that "build has been fixed" email notifications
would be useful, I can certainly add them.  And even if the consensus
is that they are not useful, I just might add them anyway, but
hardcode the recipient address as "kre" :)

>   | If we're going to start sending more emails, I think adding notifications
>   | saying "build is still failing after 24 hours" would be more useful
>   | than "build now succeeding again".
> 
> I'm not sure about "more" useful, but that would certainly be useful,
> though I think 12 hours would be a better timeout - that's long enough
> for whoever broke it to have had time to fix things before causing others
> to be provoked into getting involved.

Noted.

> I am appending the script I am now using below.   One caveat - as is the way
> with these things - I made a couple of minor adjustments to the script after
> it worked earlier - and there has not been another failure since to validate
> it still works (it should, but ...)

You don't have to screen scrape the HTML reports - you can get the
underlying data by anonymous rsync, as described in

  https://mail-index.netbsd.org/current-users/2015/10/18/msg028217.html

This will probably yield more data than you want, but rsync has plenty
of options and can hopefully be coaxed into mirroring only the
"bracket.db" files, for example.  With those, finding out if the
latest build succeeded is a one-liner in sh, for example:

  find i386 -name bracket.db | sort | tail | xargs grep build_status | grep build_status=0

The Python code that generates the existing HTML reports and email
notifications is also available if you want it.

> Also, I have no idea of the timezone in which the log files are created, so
> I am currently running the script using just local time (for whoever runs it.)
> That only affects the name of the file that is fetched, and if right near the
> beginning of the month, the previous one - just in case the commit list that
> causes a failure, or corrects a failure, spans the month boundary).  As it
> is now, I am probably going to start attempting to fetch August's log before
> it first gets created (as August will come earlier for me than many of you).
> Of course, if the timezone for those files is from Japan or Australia then
> all would be fine (for me).   It should probably be, and probably is, UTC,
> but before I make the script work that way, I'd appreciate confirmation from
> someone who knows (that is: what timezone is used when deciding it is time
> to create a new log file -- i.e.: that a new month has started?)

A new monthly report page is created when there is a build result to
report from building sources with a CVS source date in that month.

Since the internal date storage format of CVS has a "month" field, in
principle the above definition is complete without introducing the
concept of a time zone.  In practice, the NetBSD CVS repository uses
UTC dates, so a commit made after 0:00 UTC on the 1st will trigger the
creation of a new report page once it has been built.

> It would also be easier if the html markup actually marked the content
> rather than just for appearance (class="build" means a different background
> colour, class="ok" just means "text is green" and class="fail" "text is red",
> and they're used that way... ideally there should be different classes for
> different purposes, and if several of them all just happen to result in the
> same appearance, that would be fine...)

Again, I think it is would be better to use the underlying data than
to screen scrape HTML reports that were never intended for machine
parsing.

> Also, the script, attached below, attempts to make a directory
> 	/var/db/build-status
> to keep track of what the current status is, for each architecture monitored,
> (and some other stuff) but unless it is run as root (not recommended), it is
> probably going to fail...   So just make the directory by hand before running
> the script, and give it a suitable owner and permissions.   The first time
> the script is run for an architecture it will send a more or less useless
> e-mail which tells the current build status for that architecture (that it
> does that is/was intentional...)

If I end up adding the "build fixed" notifications to the TNF test
server, it will be a reimplementation in Python anyway, sharing code
with the existing build failure notifications.
-- 
Andreas Gustafsson, gson%gson.org@localhost

Follow-Ups:
- Re: current build failure automated messages
  - From: Greg Troxel

References:
- Re: current build failure automated messages
  - From: Andreas Gustafsson
- current build failure automated messages
  - From: Robert Elz
- Re: current build failure automated messages
  - From: Robert Elz

Prev by Date: Re: current build failure automated messages
Next by Date: Re: current build failure automated messages
Previous by Thread: Re: current build failure automated messages
Next by Thread: Re: current build failure automated messages
Indexes:

Home | Main Index | Thread Index | Old Index