Re: R packages

To: Greg Troxel <gdt%ir.bbn.com@localhost>
Subject: Re: R packages
From: brook%biology.nmsu.edu@localhost (Brook Milligan)
Date: Sun, 9 Oct 2011 22:00:31 -0600

Greg Troxel writes:
 > There are a lot of python packages, and a lot of perl packages, so that
 > seems ok.

Yes, however, I don't believe we have any tools that intend to
automatically generate and/or maintain them.  See below.

 > Do you think it makes sense to have R2pkg, or to have a mk/r.mk that one
 > can set a few variables and include?  In other words, do the packages
 > created with R2pkg have a lot of lines ripe for common subexpression
 > elimination?

Perhaps I wasn't clear on this point.  As you have noticed, all the R
packages already make extensive use of

 >   .include "../../math/R/Makefile.extension"

They are also almost entirely identical in structure.  Most of the
remaining bits can be derived from the DESCRIPTION file provided by
upstream.  That is why, in fact, this tool is even reasonable.  It is
creating highly idiomatic packages based on the already highly
effective factoring that has been done.  Indeed, in the case of
existing R packages the tool generates essentially identical Makefiles
(minus manual hand tuning like USE_LANGUAGES or buildlink3 inclusion,
but things like those are maintained when existing packages are
updated).

 > I am generally opposed to renaming.
 > 
 > In this case, it would be like having a python category for all the
 > py-foo scripts, and then perl, ruby, etc..  I'd say each R-foo package
 > should go where it should go, in the existing categories.

Yes, there is definitely an analogy with the python and perl packages.
However, I think there is a compelling argument that makes that
analogy less useful in this case.  First, to my knowledge there are no
tools creating python and perl packages from upstream information;
perhaps there should be, but that is another issue.  Second, it is not
clear how to discover the appropriate category to use, as there is
generally no corresponding information in an individual package's
DESCRIPTION file and even if there was there is no guarrantee that it
would make sense within the context of the pkgsrc categories.  Thus,
the tool cannot easily divine what category to use.  Third, it is
important to include dependency information in the generated
Makefiles.  If R packages are scattered about in various directories,
then it will be needlessly difficult to find them and generate
appropriate DEPENDS clauses.  For these reasons I feel it is
appropriate to keep the R packages together in a single category as
they are now; given how many there may be, however, it seems that a
distinct category may be appropriate.  However, if people are happy
having a math category dominated by R packages I suppose that is fine;
to me it seems to be a miscategorization rather than a help, though.

 > If it is not entirely clear what the actual text of the license is from
 > data in the upstream distribution, upstream is broken and you should
 > file a bug report.   Someone who says "Permission granted to copy under
 > the BSD license." is being unclear.

I am aware of that uncertainty.  I am only trying to provide a means
of identifying automatically when the upstream terminology is
unclear.  Whoever is creating or updating a particular package should
be looking at the generated Makefile and should see the commented
LICENSE clause and can then notify upstream.

 > --fetch seems like it should just be on; url2pkg works that way.

Sounds fine.

 > --recurse: I can see the point, but it seems like the right thing is to
 >   default to off, and to fail with a list of the prereqs that are not
 >   installed.

The current behavior is to only create/update the explicitly requested
packages unless --recurse is used in which case the dependencies are
also created/updated.  Do you think it is important to list all the
dependencies that were skipped because they were not requested?  Can
you see an important use case for _not_ using the --recurse option?
That is, when would you not want the dependencies created/updated?

> Confused; there  is no such thing as an "MIT" license.  MIT has used a
> number of licenses, and  usually when people say "MIT license" they mean
> "X11 license".

This is one I thought I had a correct match for.  Pkgsrc does define
licenses/mit which I take to be the same as what one describes as the
MIT license.  If this is a problem it is unclear how any mapping is
possible.  Are you suggesting that the tool do some sort of textual
comparison between some distributed license file and the pkgsrc
licenses?  Would it be better to comment out all the LICENSE clauses
regardless of whether there is a plausible match?  Would it be better
to produce two commented out LICENSE clauses, one with the upstream
descriptor and one with the best guess from pkgsrc, to aid the
developer?  Presumably, people have to look at the output of this
stuff and make some judgements as to whether the tool did the right
thing in any particular instance.  I hope nobody would create a
zillion R packages and commit them without some appropriate scrutiny.

 > So there perhaps one needs to maybe add a license, or (manually) wdiff
 > to find one that's textually equivalent.

The point is to discover cases that can be handled automatically and
to indicate via a #LICENSE clause the cases that cannot.  Those will
have to be investigated manually by whoever is working with a package.
Yes, the upstream terminology leaves much to be desired.  I see no way
around this but at least can more or less flag a set of cases that
need manual intervention.  The point of being able to print out the
mapping table is that the manual intervention might actually be to add
something to the table for a case that is well-defined but that I have
not yet discovered.  Perhaps your point is that all cases need enough
manual intervention that every package should have #LICENSE.

I hope this clears up some of the points you raised.  As always more
comments are welcome.

Cheers,
Brook

Follow-Ups:
- Re: R packages
  - From: Benny Siegert
- Re: R packages
  - From: Greg Troxel

References:
- R packages
  - From: Brook Milligan
- Re: R packages
  - From: Greg Troxel

Prev by Date: daily pkgsrc CVS update output
Next by Date: ruby versions, accepted and otherwise
Previous by Thread: Re: R packages
Next by Thread: Re: R packages
Indexes:

Home | Main Index | Thread Index | Old Index