Subject: pkg/36603: pkgsrc "make checksum" even more broken than I suspected
To: None <pkg-manager@netbsd.org, gnats-admin@netbsd.org,>
From: None <kre@munnari.OZ.AU>
List: pkgsrc-bugs
Date: 07/04/2007 15:25:01
>Number: 36603
>Category: pkg
>Synopsis: pkgsrc "make checksum" even more broken than I suspected
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: pkg-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 04 15:25:00 +0000 2007
>Originator: Robert Elz
>Release: NetBSD 3.99.15 (pkgsrc current, anything within last 3 months)
>Organization:
Prince of Songkla University
>Environment:
System: NetBSD jade.coe.psu.ac.th 3.99.15 NetBSD 3.99.15 (GENERIC-1.696-20060125) #8: Wed Jan 25 04:59:39 ICT 2006 kre@jade.coe.psu.ac.th:/usr/obj/current/kernels/JADE_ASUS i386
Architecture: i386
Machine: i386
>Description:
In PR pkg/36262 I described an annoying problem with pkgsrc's
"make checksum" where (for me) it was constantly complaining
about not being able to create .xxx cookie files (because I
run the make fetch / make checksum commands with a read-only
WORKDIR - with just the distfiles directory writable, even
that should not be needed for "make checksum" as a validation
pass).
Since then, pkgsrc has been so corrupted with these absurd
cookie files, and lock files, created in WORKDIR, that it
is impossible to do anything at all with a read-only /usr/obj
(distfiles don't fetch any more with read only /usr/obj - WORKDIR
is /usr/obj/pkgsrc - some lock file cannot be created, and the
fetch fails). Because of that, I switched to keeping /usr/obj
read write, and simply paying the increased fsck time on occasion.
This allowed another bug to reveal itself. This one is even
worse...
I do daily (twice daily) actually cvs updates of pkgsrc. After
each, I have a script that takes the cvs log (cvs command output)
and works out what distfiles might have changed (ie: the cvs
update touched the Makefile, or the distinfo file) and do a
"make checksum" in the pkgsrc directory for each such package.
Back in the afternoon on June 16 (my local time) devel/meld
was upgraded (my cvs update discovered that it had been
updated) and a new distfile was required. This is the
log of that session ...
================ devel/meld == Sat Jun 16 16:08:55 ICT 2007
=> Required installed package digest>=20010302: digest-20050731 found
=> Fetching meld-1.1.5.1.tar.bz2
=> Total size: 602859 bytes
ftp: Connect to address `2001:6b0:e:2018::137': No route to host
checksum: Checksum SHA1 mismatch for meld-1.1.5.1.tar.bz2
checksum: Checksum RMD160 mismatch for meld-1.1.5.1.tar.bz2
fetch: Unable to verify fetched file meld-1.1.5.1.tar.bz2
ftp: connect to address 2001:4f8:1:c:230:48ff:fe31:43f2: No route to host
meld-1.1.5.1.tar.bz2: No such file or directory.
fetch: Unable to fetch expected file meld-1.1.5.1.tar.bz2
I neither know, nor care, why the fetch failed, that kind
of thing happens (the IPv6 "no route" I do understand, that
was a local problem, since fixed). (The file fetched really
was bogus, only approx 2/3 of the file arrived - stupid HTTP!)
Aside from that, once a week, early Sunday morning (local)
I run make checksum on everything (this was the subject of
PR 36262). That's supposed to verify that all the packages'
distfiles are correct - and is where I generally look to see
if I have any problems. The normal twice a day logs mostly
just sit unexamined unless I really need to know...
June 16 was a Sat, the very next morning, that full checksum
check ran (with read-write /usr/obj) which produced this
about devel/meld ...
Started at Sun Jun 17 06:11:33 ICT 2007
++++++++++++++++++++++++++++++++ devel/meld ...
=> Required installed package digest>=20010302: digest-20050731 found
That's it. Nothing else.
Obviously what happened, was that the "make checksum" file
from the day before, left a "checksum is done" cookie file
around, and this "make checksum" then simply decided that
it had nothing to do.
That's even though the "make checksum" failed the first time!
That's totally broken (using cookie files at all is broken,
but using them to say something is done, when it isn't, is
truly absurd).
Then there's the side issue of why the digest package is
required to be installed if digest isn't actually going to be
used! (If it had been it would have found the problem).
>How-To-Repeat:
Take any random package, fetch its distfile, then
make checksum
Verify that the checksum is OK. Then mangle the distfile
(doesn't matter how, just truncate it should be enough).
Perhaps even (re-)moving the disfile would work too, but
I haven't verified that. Then:
make checksum
Expect to see a checksum error. Well, that's what you should
expect to see.
>Fix:
A workaround is to continually run "make clean" everywhere, all
the time, to discard all the stupid cookie files. That's what
I'm doing now - but it isn't any real kind of fix. make clean
should never be required in any correct makefile system, its
purpose should be just to remove junk files to save space. The
makefile should recover from it (by recreating the junk) when
needed (and pkgsrc handles that OK), but should never require
it, ever, for anything - requiring it simply means that the
dependency information is incorrect, and should be fixed.
At a very minimum "make fetch" and "make checksum" must never
leave there "I am done" cookie files if the targets fail!
Better, would be to remove those cookie files completely,
the cost of an occasional "make checksum" is pretty small,
and actually runnning digest has the benefit that it checks
the distfile is still correct now, and no hardware, filesystem,
or other, error has caused to be become corrupted, even though
it was once OK.
I also haven't tested to see what happens in those odd cases
where a distfile changes, and the distinfo file is updated,
without any change to the Makefile (no revbump, no new version).
But I suspect that if one had the old distfile, and did not do
a "make clean" one would never discover that the distfile in use
was not the one the distinfo file was expecting.