tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: What to do about github (dynamic) downloads



On 08/10/17 06:35, Greg Troxel wrote:
John Klos <john%ziaspace.com@localhost> writes:

It seems that some pkgsrc packages use github for some distfiles (via
codeload.github.com).

It appears that github generates these on the fly and has decided to
change their method, seemingly arbitrarily, which makes checksums
fail.
One of the core principles of pkgsrc and distfiles is that checksums
should not change.  This dates from the old days when software was
always released in some form of distfile, usually foo-x.y.tar.gz.  When
upstream changes a published distfile, that's considered bad behavior.
pkgsrc uses DIST_SUBDIR to work around this; see the pkgsrc guide for a
detailed explanation.

So, if github is returning a different bytestream for a given URL that
is supposed to be a release, that's broken, according to the pkgsrc
expectation of what a release is.

In these days of discussion of reproducible builds, changing what
amounts to distfiles seems like a serious problem.   I wonder if you are
able to communicate with upstream and have them complain to github to
fix this.

Should it be decided, whether by concensus or a decision by
pkgsrc-pmc, that NetBSD should avoid services such as github which do
this kind of dynamic packaging?
We tend to go light on policy unless really necessary.  I would say:

  - people should use DIST_SUBDIR if upstream changes a release, whether
    by replacing the file or changing their process

  - when packaging, if there is a distfile available in a reliable way
    (like a file on a http/ftp server), I think it should be preferred
    over files that are generated on-the-fly, at least as long as the
    on-the-fly generation appears unreliable

  - Note that normally, distfiles are fetched and mirrored on
    ftp.netbsd.org.  However, this doesn't really address the issue
    because the DIST_SUBDIR approach is still needed when they change,
    whether because of changes in the generated process or because an
    upstream decided to replace the file with different contents.

  - Remember that changed distfiles can be an attack.   Diffing them like
    you did is good practice.

Overall, I'm not quite sure what you're asking for.  If you want to fix
a pkgsrc package to use a more reliable (and authorized by upstream)
distfile location, that seems fine, modulo the usual MAINTAINER/OWNER
issues.  If the upstream has no reliable distfile location, that's a bug
to be fixed in upstream, not a pkgsrc bug, but then pkgsrc has to work
around it.

If you're suggesting that everyone be aware of this issue and try to
make choices to have more reliable distfiles, balancing all the other
concerns, that sounds good (but also not very prescriptive :-).

If you're asking for a sweeping policy statement that github generated
distfiles are banned for distfiles, I don't see that happening.

In some cases like this, if I cannot convince the developers to provide consistent, tagged distfiles, I simply mirror snapshots myself as distname-yyyy.mm.dd.tgz.

I'd always talk to the developers about the issue first, but in my experience, if they don't see it as a problem immediately, there's little point in arguing and it's better to just work around them and spend your valuable time creating something.

--
Earth is a beta site.



Home | Main Index | Thread Index | Old Index