brook%biology.nmsu.edu@localhost (Brook Milligan) writes: > They are also almost entirely identical in structure. Most of the > remaining bits can be derived from the DESCRIPTION file provided by > upstream. That is why, in fact, this tool is even reasonable. It is > creating highly idiomatic packages based on the already highly > effective factoring that has been done. Indeed, in the case of > existing R packages the tool generates essentially identical Makefiles > (minus manual hand tuning like USE_LANGUAGES or buildlink3 inclusion, > but things like those are maintained when existing packages are > updated). That sounds fine then. To me the main point is that as much common subexpression elimination as is reasonable has already been done. > > In this case, it would be like having a python category for all the > > py-foo scripts, and then perl, ruby, etc.. I'd say each R-foo package > > should go where it should go, in the existing categories. > > Yes, there is definitely an analogy with the python and perl packages. > However, I think there is a compelling argument that makes that > analogy less useful in this case. First, to my knowledge there are no > tools creating python and perl packages from upstream information; That's an artifact of the current situation. Regardless, there are 706 py- and 1939 p5- packages. I expect that there will be fewer R packages than that. > perhaps there should be, but that is another issue. Second, it is not > clear how to discover the appropriate category to use, as there is > generally no corresponding information in an individual package's > DESCRIPTION file and even if there was there is no guarrantee that it > would make sense within the context of the pkgsrc categories. Thus, > the tool cannot easily divine what category to use. Third, it is I think it is a fundamental error to adjust the pkgsrc hierarchy to accomodate a pkg-generating tool so that can it be used without thought. One of the core pkgsrc ideas, not often discussed, is that packages are curated. pkgsrc maintainers choose what to package, choose the category (this part is weak), choose appropriate dependencies and options, regularlize the behavior into the pkgsrc layout and startup scripts, choose when to upgrade, and thus present to users a "now this works as it should, and it's what you should run if you haven't understood the details" version. I find this aspect of pkgsrc very valuable. The person making an R package should read the description and choose a category. (This is perhaps a reason not to use the --recurse option.) > important to include dependency information in the generated > Makefiles. If R packages are scattered about in various directories, > then it will be needlessly difficult to find them and generate > appropriate DEPENDS clauses. For these reasons I feel it is That's only a few lines of code. I can't believe that this is really a big problem encountered only because R is so special -- that hasn't come up in the first ~10K packages. > appropriate to keep the R packages together in a single category as > they are now; given how many there may be, however, it seems that a > distinct category may be appropriate. However, if people are happy > having a math category dominated by R packages I suppose that is fine; > to me it seems to be a miscategorization rather than a help, though. Trying to save a minute of the packager's time and imposing a category that otherwise shouldn't exist seems like a very bad tradeoff. Packager cycles are arguably more valuable than user cycles, but clearly not infinitely so. (We've probably spent as long discussing this as it takes to choose categories for the first 100 packages, if not 500.) > > If it is not entirely clear what the actual text of the license is from > > data in the upstream distribution, upstream is broken and you should > > file a bug report. Someone who says "Permission granted to copy under > > the BSD license." is being unclear. > > I am aware of that uncertainty. I am only trying to provide a means > of identifying automatically when the upstream terminology is > unclear. Whoever is creating or updating a particular package should > be looking at the generated Makefile and should see the commented > LICENSE clause and can then notify upstream. OK. My point is that if upstream is confused/unclear, that's how it is, and pkgsrc can't fix it. Maybe pkgsrc should cope with this better, but it's not about R specifically. > > --recurse: I can see the point, but it seems like the right thing is to > > default to off, and to fail with a list of the prereqs that are not > > installed. > > The current behavior is to only create/update the explicitly requested > packages unless --recurse is used in which case the dependencies are > also created/updated. Do you think it is important to list all the > dependencies that were skipped because they were not requested? Can > you see an important use case for _not_ using the --recurse option? > That is, when would you not want the dependencies created/updated? I've done something similar, by hand, for python packages, when I packaged tahoe-lafs. I ended up making about 10 packages. For each, I had to examine it with pkglint, test the install, etc. So being told to deal with the next level by running the script again wouldn't be annoying. Creating packages for dependencies seems ok. But you need to say where they go, which requires showing the description to a human and having them choose a category. Updating is another matter. Presumably you mean creating package foo is found to need bar>=D, and bar is only at C (<D). That's an entirely different matter, because you have to go read the bar NEWS and decide if updating from C to D (or E>D) will break existing other packages. Perhaps R has a culture of API compatibility and this isn't such a problem, but in general it's an issue. So in general I favor making the packager think about everything that needs thought. If I could do one package every 5 minutes (from nothing to committed) that would be blindingly fast, and I'm not sure I'd want to impose packages on the community with less than that much thought anyway. >> Confused; there is no such thing as an "MIT" license. MIT has used a >> number of licenses, and usually when people say "MIT license" they mean >> "X11 license". > > This is one I thought I had a correct match for. Pkgsrc does define > licenses/mit which I take to be the same as what one describes as the > MIT license. If this is a problem it is unclear how any mapping is I see this notion (that "MIT license" refers to the text of the X11 license pretty widely, so I'm probably off in insisting that "MIT license" is a bad term (or rather that it's ambiguous). But reading: http://en.wikipedia.org/wiki/MIT_License I end up not being sure which license is which, and the FSF makes a good case that the term is ambiguous: http://www.gnu.org/licenses/license-list.html "MIT" refers to both the expat license and the x11 license: http://www.gnu.org/licenses/license-list.html and http://www.xfree86.org/3.3.6/COPYRIGHT2.html#3 The one in /usr/pkgsrc/licenses/mit is the expat version. Fortunately, they only differ in that the X11 license adds a no-use-of-name clause, and this is below the level at which the pkgsrc licensing framework is intended to work. Again - if you can go to upstream and say "Your tag says 'MIT'. Please show me the text." and then compare it to what's in pkgsrc, all is well. If you can't, then you don't know the terms of the license. This is the real issue, not what the name is in either scheme. > possible. Are you suggesting that the tool do some sort of textual > comparison between some distributed license file and the pkgsrc > licenses? Would it be better to comment out all the LICENSE clauses No, I am saying that the point of the tool is to do repetitive work which does not require human judgement, and that work that needs a human should be left for the human. I am also saying that the notion that everything can and should be automated is an assertion, not a supportable conclusion. > regardless of whether there is a plausible match? Would it be better > to produce two commented out LICENSE clauses, one with the upstream > descriptor and one with the best guess from pkgsrc, to aid the > developer? Presumably, people have to look at the output of this If you can't figure out the text of the license from upstream, it isn't possible to get this right. > stuff and make some judgements as to whether the tool did the right > thing in any particular instance. I hope nobody would create a > zillion R packages and commit them without some appropriate scrutiny. Agreed. > > So there perhaps one needs to maybe add a license, or (manually) wdiff > > to find one that's textually equivalent. > > The point is to discover cases that can be handled automatically and > to indicate via a #LICENSE clause the cases that cannot. Those will > have to be investigated manually by whoever is working with a package. > Yes, the upstream terminology leaves much to be desired. I see no way > around this but at least can more or less flag a set of cases that > need manual intervention. The point of being able to print out the > mapping table is that the manual intervention might actually be to add > something to the table for a case that is well-defined but that I have > not yet discovered. Perhaps your point is that all cases need enough > manual intervention that every package should have #LICENSE. No, I think if upstream has a tagging scheme and it can be understood that automatic translation is possible and entirely reasonable. But if a human can't undertake to understand and say "upstream X means pkgsrc Y, and if that isn't the case then either upstream or pkgsrc will view it as a definite bug", then you can't translate automatically. Is the R license tag a clue for their packaging system, or how licenses are defined? If I download a source package, is there actual license text? If so, that's what counts, not some metadata put on by someone else.
Attachment:
pgpvNMgMdVXRZ.pgp
Description: PGP signature