tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Distributed bulk building for slow machines



 >> I don't know about pbulk and old bulk build framework, but in distbb
 >> dependency graph is built in two steps:
 >> 1) All packages are queried for their summary (PKGNAME, PKGPATH, DEPENDS
 >>    and so on). These per-package summaries
 >>    are collected in parallel on slave hosts,
 >>    in our case running NetBSD on slow architectures on real iron or
 >>    inside hardware emulators. These summaries are sent to a master host.
 >> 2) Building the dependency graph. It is built on master host that can
 >>    run NetBSD on fast amd64/x86 machine. This step is very fast even on not
 >>    so modern CPU. It takes 20 seconds(!) real time and 12 seconds user
 >>    time on 3Ghz Pentium 4 CPU (GenuineIntel).

> Hm, ok.  In the old bulk build, using make to evaluate PKGNAME,
> DEPENDS and so on takes really significant amounts of time.  A
> comment I saw indicated > 1 month on a VAXstation 4000/60.  So if
> that's carried forward and you only manage to slice the time in 3
> or 4 via parallelism, it's still taking far too much time.

Yes, step 1 is really slow even on fast machines.  But this process is
spreaded on multiple slave host, i.e. easily parallelized. Potential
bottleneck here may be in select(2) (in distbb) (up to 100 slave hosts?)
that runs on master host but I don't think this is a real problem.

> My point is that a solution can be engineered where the client
> always takes the initiative to communicate, and if that's always
> the case, NAT will not pose a problem.
Some amount of work is needed to organize a transport program (built on
top of ssh) that manages dynamically changed list of slave hosts. I'd
say this task and NAT problem are independent. Initiating communication
by slave hosts is reasonable with NAT problem or without it. As for
those who are behind the NAT, I'd consider using VPN.

> However...  This will make it difficult to have machines joining
> the fun after the initial start, or at least it will make the
> initial phase after joining more expensive because the source has
> to be synced.

In distbb I'd do the following. By default TARGETS config variable
contains a list of targets required for building the package: Init,
Clean, Available, Excluded, Vars, Depends, fetch, checksum, extract,
patch, configure, build and Package. We can add new target
SyncPkgsrcTree before Init. So, syncing with master's pkgsrc tree will
be run for each package. In order to implement this step efficiently we
can check /usr/pkgsrc/uuid file generated by master host after every
update of pkgsrc tree, and then only if it differs from local
/usr/pkgsrc/uuid start real syncing.

-- 
Best regards, Aleksey Cheusov.


Home | Main Index | Thread Index | Old Index