Re: Distributed bulk building for slow machines

To: cheusov%tut.by@localhost
Subject: Re: Distributed bulk building for slow machines
From: Havard Eidnes <he%NetBSD.org@localhost>
Date: Fri, 07 Jan 2011 10:12:41 +0100 (CET)

> I don't know about pbulk and old bulk build framework, but in distbb
> dependency graph is built in two steps:
> 1) All packages are queried for their summary (PKGNAME, PKGPATH, DEPENDS
>    and so on). These per-package summaries
>    are collected in parallel on slave hosts,
>    in our case running NetBSD on slow architectures on real iron or
>    inside hardware emulators. These summaries are sent to a master host.
> 2) Building the dependency graph. It is built on master host that can
>    run NetBSD on fast amd64/x86 machine. This step is very fast even on not
>    so modern CPU. It takes 20 seconds(!) real time and 12 seconds user
>    time on 3Ghz Pentium 4 CPU (GenuineIntel).

Hm, ok.  In the old bulk build, using make to evaluate PKGNAME,
DEPENDS and so on takes really significant amounts of time.  A
comment I saw indicated > 1 month on a VAXstation 4000/60.  So if
that's carried forward and you only manage to slice the time in 3
or 4 via parallelism, it's still taking far too much time.

>> This one I'm not so sure of, as this would prevent users sitting
>> behind NAT boxes from participating as clients or slaves.
>
> For potential partificipants who are behind a firewall,
> NAT can be organized over any open port, even 80 or 8080.
> Anyway, I'd start with ssh as a default transport.
> BTW transport can easily be changed at any time.
> In case of distbb (and IIRC pbulk) this is just one variable in config
> file.

My point is that a solution can be engineered where the client
always takes the initiative to communicate, and if that's always
the case, NAT will not pose a problem.

>> One thing to keep in mind, though, is this: all the machines
>> participating in such a distributed effort need to run the same
>> base OS release, and also run only the official release for the
>> duration of its contribution.
> Ideally only one person should be a sysadmin of all slave hosts per
> architecture.  He should be a responsible person for the "equality" of
> all slave hosts.

However, as you allude to further down in the post, this will,
perhaps even significantly, limit the number of machines which
can be used for a bulk build.

>> Thought also needs to be given to how the (presumed local) pkgsrc
>> repository is to be kept in sync,
> It makes sense to sync pkgsrc tree between master and slaves hosts
> automatically before starting a bulk build. This will avoid a lot of
> potential problems.

However...  This will make it difficult to have machines joining
the fun after the initial start, or at least it will make the
initial phase after joining more expensive because the source has
to be synced.

> As for me, this is the most serious (the only one?) problem in this
> task.  Limiting contributors to a list of trusted NetBSD developers or
> those who signed "papers" may significantly reduce amount of available
> (potentially) resources. I have no idea how to solve this problem easily.

Right.  This should be solved properly, and this may need to
involve some sort of agreement.  Perhaps a new status of "pkgsrc
bulk build provider" can be made, which is different from being a
full-blown developer?  Not sure; this is probably for someone
else in NetBSD to decide.

>> Also... note that trying to build some packages are prone to
>> cause panics on at least m68k hosts, the message seen on my hp300
>> systems are (from memory) "out of address space", and I beleive
>> this has to do with the pmap implementation.
> Do you mean NetBSD itself crashes or gcc/clisp/whatever?

NetBSD itself crashes.

>> So when doing distributed bulk builds for m68k, some packages need to
>> be marked "no way"
> In distbb there is a way this, all such packages can easily be marked
> as "no way" in config file.

OK.

Regards,

- Håvard

Follow-Ups:
- RE: Distributed bulk building for slow machines
  - From: Larson, Timothy E.
- Re: Distributed bulk building for slow machines
  - From: Aleksey Cheusov

References:
- Distributed bulk building for slow machines
  - From: John Klos
- Re: Distributed bulk building for slow machines
  - From: Havard Eidnes
- Re: Distributed bulk building for slow machines
  - From: Aleksey Cheusov

Prev by Date: Re: Distributed bulk building for slow machines
Next by Date: Re: Distributed bulk building for slow machines
Previous by Thread: Re: Distributed bulk building for slow machines
Next by Thread: Re: Distributed bulk building for slow machines
Indexes:

Home | Main Index | Thread Index | Old Index