Subject: pkg/36603: pkgsrc "make checksum" even more broken than I suspected
To: None <pkg-manager@netbsd.org, gnats-admin@netbsd.org,>
From: None <kre@munnari.OZ.AU>
List: pkgsrc-bugs
Date: 07/04/2007 15:25:01
>Number:         36603
>Category:       pkg
>Synopsis:       pkgsrc "make checksum" even more broken than I suspected
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 04 15:25:00 +0000 2007
>Originator:     Robert Elz
>Release:        NetBSD 3.99.15   (pkgsrc current, anything within last 3 months)
>Organization:
	Prince of Songkla University
>Environment:
System: NetBSD jade.coe.psu.ac.th 3.99.15 NetBSD 3.99.15 (GENERIC-1.696-20060125) #8: Wed Jan 25 04:59:39 ICT 2006 kre@jade.coe.psu.ac.th:/usr/obj/current/kernels/JADE_ASUS i386
Architecture: i386
Machine: i386
>Description:
	In PR pkg/36262 I described an annoying problem with pkgsrc's
	"make checksum" where (for me) it was constantly complaining
	about not being able to create .xxx cookie files (because I
	run the make fetch / make checksum commands with a read-only
	WORKDIR - with just the distfiles directory writable, even
	that should not be needed for "make checksum" as a validation
	pass).

	Since then, pkgsrc has been so corrupted with these absurd
	cookie files, and lock files, created in WORKDIR, that it
	is impossible to do anything at all with a read-only /usr/obj
	(distfiles don't fetch any more with read only /usr/obj - WORKDIR
	is /usr/obj/pkgsrc -  some lock file cannot be created, and the
	fetch fails).   Because of that, I switched to keeping /usr/obj
	read write, and simply paying the increased fsck time on occasion.

	This allowed another bug to reveal itself.   This one is even
	worse...

	I do daily (twice daily) actually cvs updates of pkgsrc.   After
	each, I have a script that takes the cvs log (cvs command output)
	and works out what distfiles might have changed (ie: the cvs
	update touched the Makefile, or the distinfo file) and do a
	"make checksum" in the pkgsrc directory for each such package.

	Back in the afternoon on June 16 (my local time) devel/meld
	was upgraded (my cvs update discovered that it had been
	updated) and a new distfile was required.   This is the
	log of that session ...

================ devel/meld ==          Sat Jun 16 16:08:55 ICT 2007
=> Required installed package digest>=20010302: digest-20050731 found
=> Fetching meld-1.1.5.1.tar.bz2
=> Total size: 602859 bytes
ftp: Connect to address `2001:6b0:e:2018::137': No route to host
checksum: Checksum SHA1 mismatch for meld-1.1.5.1.tar.bz2
checksum: Checksum RMD160 mismatch for meld-1.1.5.1.tar.bz2
fetch: Unable to verify fetched file meld-1.1.5.1.tar.bz2
ftp: connect to address 2001:4f8:1:c:230:48ff:fe31:43f2: No route to host
meld-1.1.5.1.tar.bz2: No such file or directory.
fetch: Unable to fetch expected file meld-1.1.5.1.tar.bz2

	I neither know, nor care, why the fetch failed, that kind
	of thing happens (the IPv6 "no route" I do understand, that
	was a local problem, since fixed).   (The file fetched really
	was bogus, only approx 2/3 of the file arrived - stupid HTTP!)

	Aside from that, once a week, early Sunday morning (local)
	I run make checksum on everything (this was the subject of
	PR 36262).   That's supposed to verify that all the packages'
	distfiles are correct - and is where I generally look to see
	if I have any problems.   The normal twice a day logs mostly
	just sit unexamined unless I really need to know...

	June 16 was a Sat, the very next morning, that full checksum
	check ran (with read-write /usr/obj) which produced this
	about devel/meld ...

Started at Sun Jun 17 06:11:33 ICT 2007
++++++++++++++++++++++++++++++++ devel/meld ... 
=> Required installed package digest>=20010302: digest-20050731 found

	That's it.   Nothing else.

	Obviously what happened, was that the "make checksum" file
	from the day before, left a "checksum is done" cookie file
	around, and this "make checksum" then simply decided that
	it had nothing to do.

	That's even though the "make checksum" failed the first time!

	That's totally broken (using cookie files at all is broken,
	but using them to say something is done, when it isn't, is
	truly absurd).

	Then there's the side issue of why the digest package is
	required to be installed if digest isn't actually going to be
	used!   (If it had been it would have found the problem).

>How-To-Repeat:
	Take any random package, fetch its distfile, then

		make checksum

	Verify that the checksum is OK.   Then mangle the distfile
	(doesn't matter how, just truncate it should be enough).
	Perhaps even (re-)moving the disfile would work too, but
	I haven't verified that.   Then:

		make checksum

	Expect to see a checksum error.   Well, that's what you should
	expect to see.

>Fix:
	A workaround is to continually run "make clean" everywhere, all
	the time, to discard all the stupid cookie files.   That's what
	I'm doing now - but it isn't any real kind of fix.   make clean
	should never be required in any correct makefile system, its
	purpose should be just to remove junk files to save space.  The
	makefile should recover from it (by recreating the junk) when
	needed (and pkgsrc handles that OK), but should never require
	it, ever, for anything - requiring it simply means that the
	dependency information is incorrect, and should be fixed.

	At a very minimum "make fetch" and "make checksum" must never
	leave there "I am done" cookie files if the targets fail!

	Better, would be to remove those cookie files completely,
	the cost of an occasional "make checksum" is pretty small,
	and actually runnning digest has the benefit that it checks
	the distfile is still correct now, and no hardware, filesystem,
	or other, error has caused to be become corrupted, even though
	it was once OK.

	I also haven't tested to see what happens in those odd cases
	where a distfile changes, and the distinfo file is updated,
	without any change to the Makefile (no revbump, no new version).
	But I suspect that if one had the old distfile, and did not do
	a "make clean" one would never discover that the distfile in use
	was not the one the distinfo file was expecting.