Subject: Re: SoC Part I: pbulk
To: Joerg Sonnenberger <joerg@britannica.bec.de>
From: Hubert Feyrer <hubert@feyrer.de>
List: tech-pkg
Date: 05/19/2007 19:01:06
On Fri, 18 May 2007, Joerg Sonnenberger wrote:

> On Wed, May 16, 2007 at 08:44:32PM +0200, Hubert Feyrer wrote:
>>> The pbulk system is modular and allows customisation of each phase. This
>>> is used handle to full vs. limited bulk builds, but also to handle
>>> environental differences for parallel builds.
>>
>> Will/can any of the existing code be reused?
>
> It is hard to do so. I want to get rid of the Perl parts for obvious
> reasons.

What is that "obvious" reason? Because it requires perl, or anything else?


> The post-phase works quite a bit differently, as e.g. the check
> for the restricted packages uses the output of the scan phase.

Are you saying that "restricted" packages aren't built in a bulk build now 
any more?


>>> _ALL_DEPENDS are used as hints, but can be overriden.
>>>
>>> For partial builds, two different mechanisms could be used. I'm not sure
>>> which is the better. For both a list of directories to scan is given.
>>> For the first idea, pbulk-index is called that gives all possible
>>> packages to create. Those are filtered by a pattern. The second approach
>>> is to list the options directly and call pbulk-index-item instead.
>>
>> (pbulk-index?)
>>
>> What is that filtering pattern - an addition to the list of directories in
>> pkgsrc for pkgs to build, defined by the pkg builder?
>
> Variant I: www/ap-php PKG_APACHE=ap13
> Variant II: www/ap-php ap13-*
>
> Both say "don't build all variants in www/ap-php, but only those
> specified".

Ah - can we also say "please build all combinations of apache{1,2} and 
php{4,5} with that?


>>> Dependencies are resolved for partial builds as well, but missing
>>> dependencies are searched by calling pbulk-index in the given
>>> directories. Those that fulfill the patterns are adding to the list and
>>> the process is repeated.
>>
>> I'm not 100% sure what depends you mean here - if it's in pkgsrc it was
>> either already built and is available as binary pkg and can be pkg_add'ed,
>> or can be built. What is that pattern for, and is it something different
>> then the one mentioned above?
>
> I have listed www/ap-php in the to-build list. It now needs to figure
> out how to get Apache, right? It gets a directory (later possibly a list
> of directories, see pkgsrcCon) and calls pbulk-index to find what can be
> built. The first package that can fulfill the dependency of ap-php gets
> added to the to-build list.

... using the above constraints ("but i want apache2!"), I guess - sounds 
good!

What's that thing about multiple directories from pkgsrcCon - you should 
obviously know that I (and probably others on this list) were not there, 
and silently implying that seems wrong. Please stop what's arriving here 
as "it's your problem that you were not there" attitude.


>> Nuking $PREFIX is fine & fast, please consider update builds e.g. from a
>> pkgsrc stable branch. No need to rebuild everything there (which was about
>> THE design criteria for the current bulk build code :).
>
> Incremental builds are not affected. The Python version currently has
> the following rules:
> - check if the list of dependencies changed -> rebuild
> - check if any of the depending packages changed -> rebuild
> - check if any of the recorded RCS ID changed -> rebuild

That's about what the current one has, too, AFAIK. Where's the difference?

And: Python version?
(You're not telling me you're rewriting the new bulk build framework in 
python, to escape perl, right? :-)


>> BTW, do you take any precaution to make sure the build is done on a
>> "clear" system, e.g. one with a freshly installed $OS release in a chroot?
>
> No, that's outside the scope.

OK.


>> Also: will the bootstrap kit be mandatory on NetBSD systems, too?
>> It should have all this in base, and while it's probably fast to build
>> compared to the bulk build's time, but for a selective built it seems like
>> overkill to me.
>
> Given the issues with requiring newer pkg_install versions, my answer is
> "they most likely are". If not, it is easy to prepare a tarball without.
> I haven't worried about that part too much yet.

I see... I guess the "install tarball" step would add workload on the 
build admin, and thus automating that step would indeed be better.
(Where's your list of goals for the new build framework again?)


>> How does this FTP/rsyncing before (yumm) play with the distributed
>> environment (NFS) mentioned in answers to the proposal? Or is this for a
>> setup with no common file system? (guessing)
>
> The latter. The way I'm currnetly running concurrent builds is with a
> NFS for each of PACKAGES, DISTDIR and the log area.

Um, "the latter" would be "no common file system", thus "does not need 
NFS". What now? :)


>> BTW, did you take SMP machines into account in any way?
>
> This project started to solve two major issues with the current code:
> - build more than one package from one location in the tree.
> - build more than one job in parallel.
>
> Both are very hard to do in the current structure. E.g. you have to use
> a PKGNAME centered view for the first to work, which would have meant
> quite a bit of reorganisation.
>
> Doing the scheduling within make would be possible, but only when
> knowing the number of parallel jobs in advance etc.

Was that a "yes" or a "no" on SMP now?
Or asked another way round: if someone has a SMP machine, will he be able 
to use more than one CPU?


  - Hubert