tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkglint and R package MASTER_SITES

> On Aug 9, 2020, at 1:05 PM, Roland Illig <> wrote:
> On 09.08.2020 19:20, Brook Milligan wrote:
>> R packages are supposed to include math/R/Makefile.extension, which defines MASTER_SITES.  It turns out that a number of R packages overrode MASTER_SITES when they should not.  (Other than wip, I believe that two do override it correctly, because their distfiles are in unusual places; the rest are now fixed.)
>> It would be really helpful if pkglint could at least warn if a package includes math/R/Makefile.extension and also overrides MASTER_SITES.
> In general, this exact pattern is the recommended one: The
> Makefile.extension (or as I prefer: provides useful
> default values that most packages are happy with, and the remaining few
> packages override these defaults.

Clearly, that is the right way to go.  I agree.  (By the way, would it make sense to change Makefile.extension to

> How do you know that these packages overrode MASTER_SITES "when they
> should not", and what exactly is this "when they should not"?  Right
> now, if you showed me one of these packages, I could probably not tell
> whether this overriding is redundant or harmful or intended.
> What exactly are the "unusual places" of the few packages that are
> supposed to override MASTER_SITES, and how can pkglint distinguish
> "unusual places" from "redundant places"?

I'm not sure if these are broader hypothetical questions or apply to the R packages that triggered my comment.

In general, there is no way to know one way or the other about this.  That is one reason I did not suggest implementing this in general, but only for R packages, i.e., those including math/R/Makefile.extension.  

However, I think the case of R packages is somewhat different than the general case.  So far, almost all R packages get distfiles from a CRAN mirror.  I know of only two exceptions to this where packages legitimately override MASTER_SITES (there were more before today; see below); one gets a distfile from, which should probably be added to MASTER_SITES_R_CRAN anyway, and one gets a distfile from  CRAN is the canonical repository for all R packages, so it is no surprise that this can be handled in a very general way through Makefile.extension.  

One problem with individual packages overriding MASTER_SITES for R packages is that it is often done incompletely.  For example, I fixed a bunch today that overrode MASTER_SITES with only a portion of the definition in MASTER_SITES_R_CRAN.  Thus, they did not add anything new, but rather effectively removed options from the default.  They left out the CRAN archives in their override and therefore could no longer be fetched, because the files had been moved.  Distfiles regularly get relegated to archives on CRAN as they are replaced by newer versions, so tracking this on an individual-package level makes no sense and it is too easy for a package author to forget.  Given the regular relocations, the pitfall of incomplete overrides, and the idiomatic solution that applies to almost all packages, I feel that R packages should not override MASTER_SITES unless there is a very good reason to do so.  An example of a good reason would be a source other than CRAN (or probably rstudio), which applies to exactly one package as far as I know.

I think the algorithm is:

- does a package include math/R/Makefile.extension and does it override MASTER_SITES? if yes, then warn (or warn if PKG_DEVELOPER=yes).

If this had been in place before, pkglint would have caught early all the cases I have had to fix and would result in ignorable warnings for 2 of the 300+ R packages.

This seems to be quite a different situation from designing general answers to the questions you posed above, which I agree is impossible.

> I need to know the answers to these questions before I can implement
> this check in pkglint.  Maybe there is a more general pattern that not
> only applies to R packages.  Or maybe answering these questions requires
> so much detailed knowledge and thoughts that there is no point feeding
> all this into pkglint.  I'm not sure which one it is, therefore I'm asking.

I hope I have distinguished between the R case and the general case.  Solutions for the former are tractable, whereas solutions for the latter are not and lead directly to all the unanswerable questions you are raising.


Home | Main Index | Thread Index | Old Index