Subject: Re: Removing pkgviews?
To: Alistair Crooks <agc@pkgsrc.org>
From: Johnny Lam <jlam@pkgsrc.org>
List: tech-pkg
Date: 03/28/2006 19:40:26
Alistair Crooks wrote:
> On Fri, Mar 24, 2006 at 11:26:35AM -0500, Johnny C. Lam wrote:
>> I hope this is not *too* controversial, but I am thinking of removing 
>> the existing pkgviews implementation in pkgsrc after the 2006Q1 branch. 
>>  The existing code is complicating development of some new 
>> infrastructure components that I'm pursuing.  I'd like to get a sense of 
>> the number of users that this might impact.
> 
> I'm interested in your new infrastructure, but would like to know a
> bit more about why it's so invasive that pkgviews needs to go. I'd
> also like to know what needs to be done to complete pkgviews.
> 
> My understanding is that you still need to "reference count"
> directories (which can be manoeuvered around by adding a switch to
> pkg_delete to ignore the return results from an rmdir(2), and which I
> think would be a good addition anyway), and also the problems arising
> from the mini-packages debate, which I think are greatly reduced by the
> former change. Are there any more showstopeers?

For clarification, I will refer to "package views" as the concept, and 
"pkgviews" as the current implementation in pkgsrc today.

I outlined what I think needs to happen for pkgviews to be "finished" in 
my talk at the last pkgsrcCon[1].  The problem with the existing 
implementation boils down to needing a globally-unique way to refer to 
files in the latest version of a package from another package.  I 
outlined three possible solutions to this problem, but I'm not really 
satisfied with any of them.  The solutions that don't involve 
source-level modification of packages can be summarized as "just use 
more symlinks", and here I'm talking about a *lot* more symlinks.  Those 
solutions all lack elegance, which is irksome from both a developer's 
and a system administrator's standpoint.

Looking back, I think that pkgviews is a flawed implementation of the 
package views concept.  The original package views paper[2] noted that 
there were problems with wildcard dependencies if we had pkgviews 
packages link directly against files in other depot directories, but 
didn't really cover what those problem might be.  I think we now have a 
clear understanding of what those problems are, especially as to how it 
relates to pkgviews in pkgsrc today.

If I were to implement package views again, or at least something quite 
like it, I think I would do it completely differently, and the design 
would be driven by what it means for a package to depend on another 
package.  I think the relationship that is important is which package 
"uses" the other package, regardless of how the dependency relationship 
is expressed between those two package.  For example, links-gui depends 
on png and links-gui does indeed use libpng.so, whereas p5-CGI depends 
on perl, but perl is the one that actually uses CGI.pm (i.e. 'use CGI').

If package A uses package B, then we should just symlink the contents of 
package B's depot directory into package A's depot directory.  This 
gives each package a well-known way to refer to files that belong to 
another package -- it simply pretends they're installed into the same 
place as its own files, directly under ${PREFIX}.  Thus, if a package 
installs a Perl script, then it "uses" the perl package so the perl 
binary is symlinked to ${PREFIX}/bin/perl, and the Perl script can 
simply start with "#!${PREFIX}/bin/perl".  If a package needs shared 
libraries provided by a library package, that those shared libraries are 
symlinked into the package depot directory, and we just need 
${PREFIX}/lib in the run-time library search path.

In this design, the depot directory of a package is not a sacrosanct 
location that only contains files belonging to that package -- instead, 
it becomes a collection of files and symlinks needed to make that 
package work.  We can generalize this by not requiring depot directories 
at all, but rather have directories where whole interrelated families of 
packages could co-exist, i.e. multiple LOCALBASEs.  We also don't need 
the package's meta-data directory to be the parent directory for the 
depot directory in this design, so we can keep using either a single 
PKG_DBDIR, or multiple PKG_DBDIRs specified in a "PKG_DBPATH".  In this 
generalized design, you could implement the depot directory idea 
contained in the pkgviews paper, but this is also flexible enough for 
most folks to do what I think they want, which is to install families of 
packages into a few separate LOCALBASEs.

There's no beating around the bush -- we're still using a lot of 
symlinks to make this happen.  I think we would improve this by doing 
"tree-folding"[3] like GNU Stow does.  There are also ramifications 
regarding PKG_SYSCONFDIR and VARBASE when they are shared between 
packages that need to be addressed, but I haven't thought through them 
yet (I note that this is still a problem for the existing pkgviews 
implementation).  Lastly, we still need better tools for managing these 
symlink farms.

I'd be happy to keep discussing this further, as we still don't have a 
decent way to manage multiple installations of the same package in 
pkgsrc, and package views is really the only proposal put forth to 
address this in a general way.

	Cheers,

	-- Johnny Lam <jlam@pkgsrc.org>

[1] http://www.pkgsrccon.org/2005/slides/jlam/pkgviews.html
[2] http://www.netbsd.org/Documentation/software/pkgviews.pdf
[3] http://www.gnu.org/software/stow/manual.html#IDX11